Release Notes

v0.81 (not released)

New Features

  • Add tracking of yarn application name with DBImport job. Data available in new table called yarn_statistics

  • Support for DBImport on AWS using Glue catalog, EMR and MWAA

Fixed issues
  • Issue #101: Invalid column in Index

v0.80

New Features

  • Ability to use Spark as the ETL engine instead of Hive (experimental)

  • Support for SQLAnywhere databases

  • Support for Spark3

  • Support for Iceberg as a file format (requires Spark3 and Spark as ETL)

  • Impala metadata validation after Import is completed

v0.70

New Features

  • Support for import and export to Snowflake

  • Support for writing Hive data to AWS S3 bucket

  • Airflow email support for failed dags

v0.68

Fixed issues

  • Issue #85: Incremental import of empty database might result in duplicate rows

  • Issue #86: Time start/stop on jdbc_connections does not follow scheduled timezone

  • Issue #87: Time start/stop on jdbc_connections does not follow scheduled timezone

  • Issue #88: Column comments from Oracle is missing

  • Issue #90: “No common columns to perform merge on” during Deploy

  • Issue #91: MySQL Connection leak in copy operations

New Features

  • Source table information likes indexes are fetched from sources database and stored for information use

  • Deploy tool to handle deployments of DAGs and affected tables/columns/connections to a remote DBImport instance

  • Information of started and finised imports, exports and Airflow DAGs can be sent in json format to Kafka and REST interfaces

v0.67

New Features

  • Support for imports with MSSQL Change Tracking functions

v0.66

Fixed issues

  • Issue #43: incr_mode in import_tables is not an Enum column

  • Issue #74: Tasks from custom_dags is not using retries setting

  • Issue #75: Add alias argument to manage script for encryptCredentials

  • Issue #76: SQL ‘where’ in export_tables is not working

  • Issue #77: Default value of etl_phase_type in import_tables is missing

  • Issue #78: ResetIncrementalExport dont truncate table

  • Issue #79: Airflow pool will get duplicate error if upper/lower case hostname differ

New Features

  • Custom SQL validation

  • Support for timezone

  • Atlas integration performance from DBImport server has been improved and can now scale with multiple threads

v0.65

Fixed issues

  • Issue #61: Import from View on MSSQL failes with no columns

  • Issue #62: DBImport Server connection retries against config database

  • Issue #63: Control executors for Spark export

  • Issue #64: Initial incremental export with 0 rows fails

  • Issue #65: Python error when missing permissions in Atlas

  • Issue #66: Setting a timeslot where start is larger than stop gives a configuration error

  • Issue #67: Unneeded column change during export to Oracle

  • Issue #68: dropExportTable generates error if table does not exist

  • Issue #69: schedule_interval columns with different size

  • Issue #70: Support for authentication source for Mongo connections

  • Issue #71: testConnection against Mongo does not work

  • Issue #72: Incremental import of empty tables failes because no rows stored from source

Changed behavior

  • Hive connection supports multiple servers.

  • Poke interval for Multi-cluster sensors is changed from 30 sec to 5 min

  • You force DBImport to process all steps for incremental imports even if it contains no new data.

  • Encrypt password for jdbc connections supports ‘-a’ as an argument for connection name

New Features

  • Atlas Integration

  • Import from MongoDB is supported with Spark as import tool

  • Export to PostgreSQL

  • Oracle CDC Import supports History table

  • Can force a major compaction after an Import that uses Hive Merge

  • Support for anonymization of columns during import

v0.64

Fixed issues

  • Issue #59: Export update_table failes because table is not empty

New Features

  • Spark supported for both import and exports (TechPreview)

v0.63

Fixed issues

  • Issue #60: Export tries to alter column types FLOAT(*)

Improvments

  • Better description of parameters in manage command

Changed behavior

  • Kerberos ticket is created and handled by DBImport internally. No need to have a valid ticket before start anymore

New Features

  • Dedicated copy command

  • Sqoop column type can be overridden with setting in import_columns

v0.62

New Features

  • Multi-cluster imports with asynchronous copy mode

  • DBImport server daemon. This is the service that handles asynchronous copy of data between clusters

v0.61

Fixed issues

  • Issue #39: Export failes when sqoop timeout against Kafka for Atlas info

  • Issue #40: Creating Airflow Pools failes when pool table is empty in Airflow

  • Issue #41: Error when creating DBImport database

  • Issue #42: Airflow Tasks failes ‘In Main’ if there is a dependency to a DBImport Task

  • Issue #44: Importing a table with a column called ‘const’ is not supported

  • Issue #45: Retries sometimes failes due to Hive connection

  • Issue #46: Exporting from a Hive table that doesnt exists gives errors

  • Issue #47: Get rowcount failes if column for incremental load is a reserved word

  • Issue #48: Column names containing # fails on column not found

  • Issue #49: Importing ‘time’ columns from MSSQL fails

  • Issue #50: SQL Server connection with encryption uses wrong JDBC driver

  • Issue #51: sqoop_sql_where_statement with validation = query failes with double where statements

  • Issue #52: column type ‘long’ in oracle gets wrong column type in Hive

  • Issue #53: No logging of forced removal of locks

  • Issue #54: DB2 clob columns is not mapped to String in sqoop

  • Issue #55: DB2 import with column type time(3) result in null values

  • Issue #56: timestamp columns from MSSQL will result in NULL values

  • Issue #58: merge operations only look at mergeonly override for PK

Improvments

  • Foreign Keys can be disabled per table or connection

v0.60

Fixed issues

  • Issue #30: manage generates error when no valid Kerberos ticket available

  • Issue #31: Oracle Flashback imports get Merge cardinality_violation

  • Issue #32: Airflow sensor never times out

  • Issue #33: truncate_hive column in import_tables is not used/implemented

  • Issue #34: pk_column_override and pk_column_override_mergeonly with uppercase columns failes

  • Issue #35: datalake_source is only created with a new table, not added to a already existing

  • Issue #36: sqoop mappers not based on history

  • Issue #37: changing HDFS_Basedir doesnt trigger an alter of the Import table

  • Issue #38: Wrong row count on exported tables

Improvments

  • HDFS basedir is configurable in the configuration table

Changed behavior

  • Configuration for HDFS are move to the configuration table in MySQL

  • Configuration for Sqoop mappers are move to the configuration table in MySQL

New Features

  • Multi-cluster imports (synchronous only)

  • full_insert import method

v0.51

Fixed issues

  • Issue #29: Duplicate column in statistics when changing import type without reset

Improvments

  • Possible to specify Java Heap for Export operations

Changed behavior

  • hive_merge_heap column in import_tables sets Java Heap for the entire Hive session, not just for Merge operations.

New Features

  • Airflow integration

v0.50

Fixed issues

  • Issue #26: Schema changes in configuration database is not handled

  • Issue #27: String export to MSSQL into varchar gets converted everytime

  • Issue #28: Update column description on exported MSSQL table failes

Improvments

  • resetIncrementalImport is added to ‘manage’ in order to clear an incremental import and force the next import to start with a initial import

Changed behavior

  • Configuration for Hive validation test and extended messages are move to the configuration table in MySQL

New Features

  • New import type called ‘oracle_flashback_merge’ is availble. Will use the Oracle Flashback Version Query to import changed rows into Hive

v0.42

Fixed issues

  • Issue #20: Going from Merge to non-merge imports fails because missing datalake_import column

  • Issue #22: Column starting with _ failed if it’s part of Primary Key and merge operation is running

  • Issue #23: varchar(-1) from MSSQL generates error in Sqoop

  • Issue #24: Remove locks by force only in target table

  • Issue #25: column with the name ‘int’ is not supported

Improvments

  • Removing locks by force is configurable in the configuration table

Changed behavior

  • Configuration to Hive metastore must be changed to a SQLAlchemy connection string stored in the setting hive_metastore_alchemy_conn

New Features

  • Hive Metastore SQL connection now uses SQLAlchemy. This enables more than MySQL as database type for Hive Metastore

v0.41.1

Fixed issues

  • Issue #17: Oracle Primary Key got columns from Unique key

  • Issue #18: Error if Merge run on table with only PK columns

  • Issue #19: Hive Merge implicit cast wont work with X number of columns

  • Issue #21: _ at the start of the column name generates errors during import

Improvments

  • Propper error message when table contains no primary key and a merge operation is running

v0.41

Fixed issues

  • Issue #16: include_in_import for map-column-java is not affected

Improvments

  • Issue #15: Move JDBC Driver config to database

New Features

  • Functions to add import tables by searching for tables in source that we dont already have

  • Functions to add export tables by searching for tables in hive that we dont already have

v0.40

Fixed issues

  • Issue #14: force_string settings in import_columns was not used

New Features

  • Exports to MsSQL, Oracle, MySQL and DB2 is fully supported

v0.30

Fixed issues

  • Issue #13: sqoop_query not respected

  • Issue #12: Include_in_import not respected

  • Issue #11: Oracle Number(>10) column having java_column_type = Integer

  • Issue #10: MySQL decimal columns gets created without precision

New Features

  • Ability to override the name and type of the column in Hive

  • It’s now possible to select where to get the number of rows from for the validation. sqoop or query

  • Support for Merge operation during ETL Phase, including History Audit tables

  • Import supports command options -I, -C and -E for running only Import, Copy or ETL Phase

Changed behavior

  • Stage 1 is renamed to Import Phase. -1 command option still works against import for compability

  • Stage 2 is renamed to ETL Phase. -2 command option still works against import for compability

  • The values in the column sqoop_options in import_tables will be converted to lowercase before added to sqoop

v0.21

Fixed issues

  • Issue #9: PK with spaces in column name failes on –split-by

  • Issue #8: Columnnames with two spaces after each other failes in sqoop

  • Issue #6: MySQL cant handle “ around column names

New Features

  • You can limit the number of sqoop mappers globaly on a database connection by specifying a positiv value in the column max_import_sessions

  • Import statistics is stored in table import_statistics and import_statistics_last

v0.20

Fixed issues

  • Issue #5: Message about ‘split-by-text’ even if the column is an integer

  • Issue #4: Parquet cant handle SPACE in column name

  • Issue #3: TimeCheck failes before 10.00

  • Issue #2: ‘sqoop_sql_where_addition’ assumes ‘where’ is in config

  • Issue #1: Errors when running without an valid Kerberos ticket

New Features

  • Incremental Imports are now supported

  • Encryption of username/password with manage –encryptCredentials

  • Repair of incremental import with manage –repairIncrementalImport

  • Repair of all failed incremental imports with manage –repairAllIncrementalImports

  • It’s possible to ignore the timeWindow by adding –ignoreTime to the import command

  • You can force an import to start from the begining by adding –resetStage to the import command