Release Notes ============= v0.81 (not released) ------------------------------ **New Features** - Add tracking of yarn application name with DBImport job. Data available in new table called yarn_statistics - Support for DBImport on AWS using Glue catalog, EMR and MWAA **Fixed issues** - Issue #101: Invalid column in Index v0.80 ------------------------------ **New Features** - Ability to use Spark as the ETL engine instead of Hive (experimental) - Support for SQLAnywhere databases - Support for Spark3 - Support for Iceberg as a file format (requires Spark3 and Spark as ETL) - Impala metadata validation after Import is completed v0.70 ------------------------------ **New Features** - Support for import and export to Snowflake - Support for writing Hive data to AWS S3 bucket - Airflow email support for failed dags v0.68 ------------------------------ **Fixed issues** - Issue #85: Incremental import of empty database might result in duplicate rows - Issue #86: Time start/stop on jdbc_connections does not follow scheduled timezone - Issue #87: Time start/stop on jdbc_connections does not follow scheduled timezone - Issue #88: Column comments from Oracle is missing - Issue #90: "No common columns to perform merge on" during Deploy - Issue #91: MySQL Connection leak in copy operations **New Features** - Source table information likes indexes are fetched from sources database and stored for information use - Deploy tool to handle deployments of DAGs and affected tables/columns/connections to a remote DBImport instance - Information of started and finised imports, exports and Airflow DAGs can be sent in json format to Kafka and REST interfaces v0.67 ------------------------------ **New Features** - Support for imports with MSSQL Change Tracking functions v0.66 ------------------------------ **Fixed issues** - Issue #43: incr_mode in import_tables is not an Enum column - Issue #74: Tasks from custom_dags is not using retries setting - Issue #75: Add alias argument to manage script for encryptCredentials - Issue #76: SQL 'where' in export_tables is not working - Issue #77: Default value of etl_phase_type in import_tables is missing - Issue #78: ResetIncrementalExport dont truncate table - Issue #79: Airflow pool will get duplicate error if upper/lower case hostname differ **New Features** - Custom SQL validation - Support for timezone - Atlas integration performance from DBImport server has been improved and can now scale with multiple threads v0.65 ------------------------------ **Fixed issues** - Issue #61: Import from View on MSSQL failes with no columns - Issue #62: DBImport Server connection retries against config database - Issue #63: Control executors for Spark export - Issue #64: Initial incremental export with 0 rows fails - Issue #65: Python error when missing permissions in Atlas - Issue #66: Setting a timeslot where start is larger than stop gives a configuration error - Issue #67: Unneeded column change during export to Oracle - Issue #68: dropExportTable generates error if table does not exist - Issue #69: schedule_interval columns with different size - Issue #70: Support for authentication source for Mongo connections - Issue #71: testConnection against Mongo does not work - Issue #72: Incremental import of empty tables failes because no rows stored from source **Changed behavior** - Hive connection supports multiple servers. - Poke interval for Multi-cluster sensors is changed from 30 sec to 5 min - You force DBImport to process all steps for incremental imports even if it contains no new data. - Encrypt password for jdbc connections supports '-a' as an argument for connection name **New Features** - Atlas Integration - Import from MongoDB is supported with Spark as import tool - Export to PostgreSQL - Oracle CDC Import supports History table - Can force a major compaction after an Import that uses Hive Merge - Support for anonymization of columns during import v0.64 ------------------------------ **Fixed issues** - Issue #59: Export update_table failes because table is not empty **New Features** - Spark supported for both import and exports (TechPreview) v0.63 ------------------------------ **Fixed issues** - Issue #60: Export tries to alter column types FLOAT(*) **Improvments** - Better description of parameters in *manage* command **Changed behavior** - Kerberos ticket is created and handled by DBImport internally. No need to have a valid ticket before start anymore **New Features** - Dedicated *copy* command - Sqoop column type can be overridden with setting in import_columns v0.62 ------------------------------ **New Features** - Multi-cluster imports with asynchronous copy mode - DBImport server daemon. This is the service that handles asynchronous copy of data between clusters v0.61 ------------------------------ **Fixed issues** - Issue #39: Export failes when sqoop timeout against Kafka for Atlas info - Issue #40: Creating Airflow Pools failes when pool table is empty in Airflow - Issue #41: Error when creating DBImport database - Issue #42: Airflow Tasks failes 'In Main' if there is a dependency to a DBImport Task - Issue #44: Importing a table with a column called 'const' is not supported - Issue #45: Retries sometimes failes due to Hive connection - Issue #46: Exporting from a Hive table that doesnt exists gives errors - Issue #47: Get rowcount failes if column for incremental load is a reserved word - Issue #48: Column names containing # fails on column not found - Issue #49: Importing ‘time’ columns from MSSQL fails - Issue #50: SQL Server connection with encryption uses wrong JDBC driver - Issue #51: sqoop_sql_where_statement with validation = query failes with double where statements - Issue #52: column type 'long' in oracle gets wrong column type in Hive - Issue #53: No logging of forced removal of locks - Issue #54: DB2 clob columns is not mapped to String in sqoop - Issue #55: DB2 import with column type time(3) result in null values - Issue #56: timestamp columns from MSSQL will result in NULL values - Issue #58: merge operations only look at mergeonly override for PK **Improvments** - Foreign Keys can be disabled per table or connection v0.60 ------------------------------ **Fixed issues** - Issue #30: manage generates error when no valid Kerberos ticket available - Issue #31: Oracle Flashback imports get Merge cardinality_violation - Issue #32: Airflow sensor never times out - Issue #33: truncate_hive column in import_tables is not used/implemented - Issue #34: pk_column_override and pk_column_override_mergeonly with uppercase columns failes - Issue #35: datalake_source is only created with a new table, not added to a already existing - Issue #36: sqoop mappers not based on history - Issue #37: changing HDFS_Basedir doesnt trigger an alter of the Import table - Issue #38: Wrong row count on exported tables **Improvments** - HDFS basedir is configurable in the configuration table **Changed behavior** - Configuration for HDFS are move to the configuration table in MySQL - Configuration for Sqoop mappers are move to the configuration table in MySQL **New Features** - Multi-cluster imports (synchronous only) - *full_insert* import method v0.51 ------------------------------ **Fixed issues** - Issue #29: Duplicate column in statistics when changing import type without reset **Improvments** - Possible to specify Java Heap for Export operations **Changed behavior** - *hive_merge_heap* column in *import_tables* sets Java Heap for the entire Hive session, not just for Merge operations. **New Features** - Airflow integration v0.50 ------------------------------ **Fixed issues** - Issue #26: Schema changes in configuration database is not handled - Issue #27: String export to MSSQL into varchar gets converted everytime - Issue #28: Update column description on exported MSSQL table failes **Improvments** - resetIncrementalImport is added to 'manage' in order to clear an incremental import and force the next import to start with a initial import **Changed behavior** - Configuration for Hive validation test and extended messages are move to the configuration table in MySQL **New Features** - New import type called 'oracle_flashback_merge' is availble. Will use the *Oracle Flashback Version Query* to import changed rows into Hive v0.42 ------------------------------ **Fixed issues** - Issue #20: Going from Merge to non-merge imports fails because missing datalake_import column - Issue #22: Column starting with _ failed if it's part of Primary Key and merge operation is running - Issue #23: varchar(-1) from MSSQL generates error in Sqoop - Issue #24: Remove locks by force only in target table - Issue #25: column with the name 'int' is not supported **Improvments** - Removing locks by force is configurable in the configuration table **Changed behavior** - Configuration to Hive metastore must be changed to a SQLAlchemy connection string stored in the setting *hive_metastore_alchemy_conn* **New Features** - Hive Metastore SQL connection now uses SQLAlchemy. This enables more than MySQL as database type for Hive Metastore v0.41.1 ------------------------------ **Fixed issues** - Issue #17: Oracle Primary Key got columns from Unique key - Issue #18: Error if Merge run on table with only PK columns - Issue #19: Hive Merge implicit cast wont work with X number of columns - Issue #21: _ at the start of the column name generates errors during import **Improvments** - Propper error message when table contains no primary key and a merge operation is running v0.41 ----- **Fixed issues** - Issue #16: include_in_import for map-column-java is not affected **Improvments** - Issue #15: Move JDBC Driver config to database **New Features** - Functions to add import tables by searching for tables in source that we dont already have - Functions to add export tables by searching for tables in hive that we dont already have v0.40 ----- **Fixed issues** - Issue #14: force_string settings in import_columns was not used **New Features** - Exports to MsSQL, Oracle, MySQL and DB2 is fully supported v0.30 ----- **Fixed issues** - Issue #13: sqoop_query not respected - Issue #12: Include_in_import not respected - Issue #11: Oracle Number(>10) column having java_column_type = Integer - Issue #10: MySQL decimal columns gets created without precision **New Features** - Ability to override the name and type of the column in Hive - It's now possible to select where to get the number of rows from for the validation. sqoop or query - Support for Merge operation during ETL Phase, including History Audit tables - Import supports command options -I, -C and -E for running only Import, Copy or ETL Phase **Changed behavior** - *Stage 1* is renamed to *Import Phase*. -1 command option still works against *import* for compability - *Stage 2* is renamed to *ETL Phase*. -2 command option still works against *import* for compability - The values in the column *sqoop_options* in *import_tables* will be converted to lowercase before added to sqoop v0.21 ----- **Fixed issues** - Issue #9: PK with spaces in column name failes on --split-by - Issue #8: Columnnames with two spaces after each other failes in sqoop - Issue #6: MySQL cant handle " around column names **New Features** - You can limit the number of sqoop mappers globaly on a database connection by specifying a positiv value in the column *max_import_sessions* - Import statistics is stored in table *import_statistics* and *import_statistics_last* v0.20 ----- **Fixed issues** - Issue #5: Message about 'split-by-text' even if the column is an integer - Issue #4: Parquet cant handle SPACE in column name - Issue #3: TimeCheck failes before 10.00 - Issue #2: 'sqoop_sql_where_addition' assumes 'where' is in config - Issue #1: Errors when running without an valid Kerberos ticket **New Features** - Incremental Imports are now supported - Encryption of username/password with manage --encryptCredentials - Repair of incremental import with manage --repairIncrementalImport - Repair of all failed incremental imports with manage --repairAllIncrementalImports - It's possible to ignore the timeWindow by adding --ignoreTime to the import command - You can force an import to start from the begining by adding --resetStage to the import command