Export methods
There are two different ways to export data. Full or Incrementally.
Note
The stage number seen in the documentation are the internal stage id that is used by DBImport. This number is also used in the export_stage and the export_retries_log table.
Full export
Full imports reads the entire Hive table and makes a copy of it available in the Target database.
Setting |
Configuration |
---|---|
Export Type |
full |
Getting Hive tableschemaThis stage connects to Hive and reads all columns, columntypes and comments and saves the to the configuration database. Clear table rowcountRemoves the number of rows that was import in the previous import of the table Update Hive statistics on exported tableUpdates all the statistcs in Hive for the table that is exported. This is needed for correct row count Create Export Temp tableIf required, the export will create an Export Temp Table. Truncate Export Temp tableIf required, will truncate the Export Temp Table Insert data into Export Temp tableIf required, will insert data from the exported Hive table into the Export Temp Table Create Target tableThe Target table will be created on the system we are exporting data to. It will also update the table definition if there is a change in Hive. Truncate Target tableTruncates the table we will export to Sqoop ExportExecutes the sqoop exportIf the sqoop command fails, the next export will restart from stage 106 ValidationsCompare the number of rows in Hive table with the number of rows in the target table.If the validation fails, the next import will restart from stage 101
Incremental export
An incremental export keeps track of how much data have been read from the Hive table and only exports the new data. Incremental exports always requires a Temporary Export Table.
Setting |
Configuration |
---|---|
Export Type |
incr |
Getting Hive tableschemaThis stage connects to Hive and reads all columns, columntypes and comments and saves the to the configuration database. Clear table rowcountRemoves the number of rows that was import in the previous import of the table Update Hive statistics on exported tableUpdates all the statistcs in Hive for the table that is exported. This is needed for correct row count Create Export Temp tableThe Export Temp Table will be created. Truncate Export Temp tableTruncate the Export Temp Table Fetching the Max value from HiveThe max value for the incremental columns are read from Hive. This is used to get the incremental delta that we will load into the Target table. Insert data into Export Temp tableData will be inserted into the Export Temp Table based on the min and max values Create Target tableThe Target table will be created on the system we are exporting data to. It will also update the table definition if there is a change in Hive. Sqoop ExportExecutes the sqoop exportIf the sqoop command fails, the next export will restart from stage 106 ValidationsCompare the number of rows in Hive table with the number of rows in the target table.If the validation fails, the next import will restart from stage 101 Saving pending incremental valuesIn order to start the next incremental export from the last entry that the current export read, we are saving the min and max values into the export_tables table. The next export will then start to read from the next record after the max we read this time.