Release Notes
v0.81 (not released)
New Features
Add tracking of yarn application name with DBImport job. Data available in new table called yarn_statistics
Support for DBImport on AWS using Glue catalog, EMR and MWAA
- Fixed issues
Issue #101: Invalid column in Index
v0.80
New Features
Ability to use Spark as the ETL engine instead of Hive (experimental)
Support for SQLAnywhere databases
Support for Spark3
Support for Iceberg as a file format (requires Spark3 and Spark as ETL)
Impala metadata validation after Import is completed
v0.70
New Features
Support for import and export to Snowflake
Support for writing Hive data to AWS S3 bucket
Airflow email support for failed dags
v0.68
Fixed issues
Issue #85: Incremental import of empty database might result in duplicate rows
Issue #86: Time start/stop on jdbc_connections does not follow scheduled timezone
Issue #87: Time start/stop on jdbc_connections does not follow scheduled timezone
Issue #88: Column comments from Oracle is missing
Issue #90: “No common columns to perform merge on” during Deploy
Issue #91: MySQL Connection leak in copy operations
New Features
Source table information likes indexes are fetched from sources database and stored for information use
Deploy tool to handle deployments of DAGs and affected tables/columns/connections to a remote DBImport instance
Information of started and finised imports, exports and Airflow DAGs can be sent in json format to Kafka and REST interfaces
v0.67
New Features
Support for imports with MSSQL Change Tracking functions
v0.66
Fixed issues
Issue #43: incr_mode in import_tables is not an Enum column
Issue #74: Tasks from custom_dags is not using retries setting
Issue #75: Add alias argument to manage script for encryptCredentials
Issue #76: SQL ‘where’ in export_tables is not working
Issue #77: Default value of etl_phase_type in import_tables is missing
Issue #78: ResetIncrementalExport dont truncate table
Issue #79: Airflow pool will get duplicate error if upper/lower case hostname differ
New Features
Custom SQL validation
Support for timezone
Atlas integration performance from DBImport server has been improved and can now scale with multiple threads
v0.65
Fixed issues
Issue #61: Import from View on MSSQL failes with no columns
Issue #62: DBImport Server connection retries against config database
Issue #63: Control executors for Spark export
Issue #64: Initial incremental export with 0 rows fails
Issue #65: Python error when missing permissions in Atlas
Issue #66: Setting a timeslot where start is larger than stop gives a configuration error
Issue #67: Unneeded column change during export to Oracle
Issue #68: dropExportTable generates error if table does not exist
Issue #69: schedule_interval columns with different size
Issue #70: Support for authentication source for Mongo connections
Issue #71: testConnection against Mongo does not work
Issue #72: Incremental import of empty tables failes because no rows stored from source
Changed behavior
Hive connection supports multiple servers.
Poke interval for Multi-cluster sensors is changed from 30 sec to 5 min
You force DBImport to process all steps for incremental imports even if it contains no new data.
Encrypt password for jdbc connections supports ‘-a’ as an argument for connection name
New Features
Atlas Integration
Import from MongoDB is supported with Spark as import tool
Export to PostgreSQL
Oracle CDC Import supports History table
Can force a major compaction after an Import that uses Hive Merge
Support for anonymization of columns during import
v0.64
Fixed issues
Issue #59: Export update_table failes because table is not empty
New Features
Spark supported for both import and exports (TechPreview)
v0.63
Fixed issues
Issue #60: Export tries to alter column types FLOAT(*)
Improvments
Better description of parameters in manage command
Changed behavior
Kerberos ticket is created and handled by DBImport internally. No need to have a valid ticket before start anymore
New Features
Dedicated copy command
Sqoop column type can be overridden with setting in import_columns
v0.62
New Features
Multi-cluster imports with asynchronous copy mode
DBImport server daemon. This is the service that handles asynchronous copy of data between clusters
v0.61
Fixed issues
Issue #39: Export failes when sqoop timeout against Kafka for Atlas info
Issue #40: Creating Airflow Pools failes when pool table is empty in Airflow
Issue #41: Error when creating DBImport database
Issue #42: Airflow Tasks failes ‘In Main’ if there is a dependency to a DBImport Task
Issue #44: Importing a table with a column called ‘const’ is not supported
Issue #45: Retries sometimes failes due to Hive connection
Issue #46: Exporting from a Hive table that doesnt exists gives errors
Issue #47: Get rowcount failes if column for incremental load is a reserved word
Issue #48: Column names containing # fails on column not found
Issue #49: Importing ‘time’ columns from MSSQL fails
Issue #50: SQL Server connection with encryption uses wrong JDBC driver
Issue #51: sqoop_sql_where_statement with validation = query failes with double where statements
Issue #52: column type ‘long’ in oracle gets wrong column type in Hive
Issue #53: No logging of forced removal of locks
Issue #54: DB2 clob columns is not mapped to String in sqoop
Issue #55: DB2 import with column type time(3) result in null values
Issue #56: timestamp columns from MSSQL will result in NULL values
Issue #58: merge operations only look at mergeonly override for PK
Improvments
Foreign Keys can be disabled per table or connection
v0.60
Fixed issues
Issue #30: manage generates error when no valid Kerberos ticket available
Issue #31: Oracle Flashback imports get Merge cardinality_violation
Issue #32: Airflow sensor never times out
Issue #33: truncate_hive column in import_tables is not used/implemented
Issue #34: pk_column_override and pk_column_override_mergeonly with uppercase columns failes
Issue #35: datalake_source is only created with a new table, not added to a already existing
Issue #36: sqoop mappers not based on history
Issue #37: changing HDFS_Basedir doesnt trigger an alter of the Import table
Issue #38: Wrong row count on exported tables
Improvments
HDFS basedir is configurable in the configuration table
Changed behavior
Configuration for HDFS are move to the configuration table in MySQL
Configuration for Sqoop mappers are move to the configuration table in MySQL
New Features
Multi-cluster imports (synchronous only)
full_insert import method
v0.51
Fixed issues
Issue #29: Duplicate column in statistics when changing import type without reset
Improvments
Possible to specify Java Heap for Export operations
Changed behavior
hive_merge_heap column in import_tables sets Java Heap for the entire Hive session, not just for Merge operations.
New Features
Airflow integration
v0.50
Fixed issues
Issue #26: Schema changes in configuration database is not handled
Issue #27: String export to MSSQL into varchar gets converted everytime
Issue #28: Update column description on exported MSSQL table failes
Improvments
resetIncrementalImport is added to ‘manage’ in order to clear an incremental import and force the next import to start with a initial import
Changed behavior
Configuration for Hive validation test and extended messages are move to the configuration table in MySQL
New Features
New import type called ‘oracle_flashback_merge’ is availble. Will use the Oracle Flashback Version Query to import changed rows into Hive
v0.42
Fixed issues
Issue #20: Going from Merge to non-merge imports fails because missing datalake_import column
Issue #22: Column starting with _ failed if it’s part of Primary Key and merge operation is running
Issue #23: varchar(-1) from MSSQL generates error in Sqoop
Issue #24: Remove locks by force only in target table
Issue #25: column with the name ‘int’ is not supported
Improvments
Removing locks by force is configurable in the configuration table
Changed behavior
Configuration to Hive metastore must be changed to a SQLAlchemy connection string stored in the setting hive_metastore_alchemy_conn
New Features
Hive Metastore SQL connection now uses SQLAlchemy. This enables more than MySQL as database type for Hive Metastore
v0.41.1
Fixed issues
Issue #17: Oracle Primary Key got columns from Unique key
Issue #18: Error if Merge run on table with only PK columns
Issue #19: Hive Merge implicit cast wont work with X number of columns
Issue #21: _ at the start of the column name generates errors during import
Improvments
Propper error message when table contains no primary key and a merge operation is running
v0.41
Fixed issues
Issue #16: include_in_import for map-column-java is not affected
Improvments
Issue #15: Move JDBC Driver config to database
New Features
Functions to add import tables by searching for tables in source that we dont already have
Functions to add export tables by searching for tables in hive that we dont already have
v0.40
Fixed issues
Issue #14: force_string settings in import_columns was not used
New Features
Exports to MsSQL, Oracle, MySQL and DB2 is fully supported
v0.30
Fixed issues
Issue #13: sqoop_query not respected
Issue #12: Include_in_import not respected
Issue #11: Oracle Number(>10) column having java_column_type = Integer
Issue #10: MySQL decimal columns gets created without precision
New Features
Ability to override the name and type of the column in Hive
It’s now possible to select where to get the number of rows from for the validation. sqoop or query
Support for Merge operation during ETL Phase, including History Audit tables
Import supports command options -I, -C and -E for running only Import, Copy or ETL Phase
Changed behavior
Stage 1 is renamed to Import Phase. -1 command option still works against import for compability
Stage 2 is renamed to ETL Phase. -2 command option still works against import for compability
The values in the column sqoop_options in import_tables will be converted to lowercase before added to sqoop
v0.21
Fixed issues
Issue #9: PK with spaces in column name failes on –split-by
Issue #8: Columnnames with two spaces after each other failes in sqoop
Issue #6: MySQL cant handle “ around column names
New Features
You can limit the number of sqoop mappers globaly on a database connection by specifying a positiv value in the column max_import_sessions
Import statistics is stored in table import_statistics and import_statistics_last
v0.20
Fixed issues
Issue #5: Message about ‘split-by-text’ even if the column is an integer
Issue #4: Parquet cant handle SPACE in column name
Issue #3: TimeCheck failes before 10.00
Issue #2: ‘sqoop_sql_where_addition’ assumes ‘where’ is in config
Issue #1: Errors when running without an valid Kerberos ticket
New Features
Incremental Imports are now supported
Encryption of username/password with manage –encryptCredentials
Repair of incremental import with manage –repairIncrementalImport
Repair of all failed incremental imports with manage –repairAllIncrementalImports
It’s possible to ignore the timeWindow by adding –ignoreTime to the import command
You can force an import to start from the begining by adding –resetStage to the import command