Release Notes

v0.81 (not released)

New Features

Add tracking of yarn application name with DBImport job. Data available in new table called yarn_statistics

Support for DBImport on AWS using Glue catalog, EMR and MWAA

Fixed issues

Issue #101: Invalid column in Index

v0.80

New Features

Ability to use Spark as the ETL engine instead of Hive (experimental)

Support for SQLAnywhere databases

Support for Spark3

Support for Iceberg as a file format (requires Spark3 and Spark as ETL)

Impala metadata validation after Import is completed

v0.70

New Features

Support for import and export to Snowflake

Support for writing Hive data to AWS S3 bucket

Airflow email support for failed dags

v0.68

Fixed issues

Issue #85: Incremental import of empty database might result in duplicate rows

Issue #86: Time start/stop on jdbc_connections does not follow scheduled timezone

Issue #87: Time start/stop on jdbc_connections does not follow scheduled timezone

Issue #88: Column comments from Oracle is missing

Issue #90: “No common columns to perform merge on” during Deploy

Issue #91: MySQL Connection leak in copy operations

New Features

Source table information likes indexes are fetched from sources database and stored for information use

Deploy tool to handle deployments of DAGs and affected tables/columns/connections to a remote DBImport instance

Information of started and finised imports, exports and Airflow DAGs can be sent in json format to Kafka and REST interfaces

v0.67

New Features

Support for imports with MSSQL Change Tracking functions

v0.66

Fixed issues

Issue #43: incr_mode in import_tables is not an Enum column

Issue #74: Tasks from custom_dags is not using retries setting

Issue #75: Add alias argument to manage script for encryptCredentials

Issue #76: SQL ‘where’ in export_tables is not working

Issue #77: Default value of etl_phase_type in import_tables is missing

Issue #78: ResetIncrementalExport dont truncate table

Issue #79: Airflow pool will get duplicate error if upper/lower case hostname differ

New Features

Custom SQL validation

Support for timezone

Atlas integration performance from DBImport server has been improved and can now scale with multiple threads

v0.65

Fixed issues

Issue #61: Import from View on MSSQL failes with no columns

Issue #62: DBImport Server connection retries against config database

Issue #63: Control executors for Spark export

Issue #64: Initial incremental export with 0 rows fails

Issue #65: Python error when missing permissions in Atlas

Issue #66: Setting a timeslot where start is larger than stop gives a configuration error

Issue #67: Unneeded column change during export to Oracle

Issue #68: dropExportTable generates error if table does not exist

Issue #69: schedule_interval columns with different size

Issue #70: Support for authentication source for Mongo connections

Issue #71: testConnection against Mongo does not work

Issue #72: Incremental import of empty tables failes because no rows stored from source

Changed behavior

Hive connection supports multiple servers.

Poke interval for Multi-cluster sensors is changed from 30 sec to 5 min

You force DBImport to process all steps for incremental imports even if it contains no new data.

Encrypt password for jdbc connections supports ‘-a’ as an argument for connection name

New Features

Atlas Integration

Import from MongoDB is supported with Spark as import tool

Export to PostgreSQL

Oracle CDC Import supports History table

Can force a major compaction after an Import that uses Hive Merge

Support for anonymization of columns during import

v0.64

Fixed issues

Issue #59: Export update_table failes because table is not empty

New Features

Spark supported for both import and exports (TechPreview)

v0.63

Fixed issues

Issue #60: Export tries to alter column types FLOAT(*)

Improvments

Better description of parameters in manage command

Changed behavior

Kerberos ticket is created and handled by DBImport internally. No need to have a valid ticket before start anymore

New Features

Dedicated copy command

Sqoop column type can be overridden with setting in import_columns

v0.62

New Features

Multi-cluster imports with asynchronous copy mode

DBImport server daemon. This is the service that handles asynchronous copy of data between clusters

v0.61

Fixed issues

Issue #39: Export failes when sqoop timeout against Kafka for Atlas info

Issue #40: Creating Airflow Pools failes when pool table is empty in Airflow

Issue #41: Error when creating DBImport database

Issue #42: Airflow Tasks failes ‘In Main’ if there is a dependency to a DBImport Task

Issue #44: Importing a table with a column called ‘const’ is not supported

Issue #45: Retries sometimes failes due to Hive connection

Issue #46: Exporting from a Hive table that doesnt exists gives errors

Issue #47: Get rowcount failes if column for incremental load is a reserved word

Issue #48: Column names containing # fails on column not found

Issue #49: Importing ‘time’ columns from MSSQL fails

Issue #50: SQL Server connection with encryption uses wrong JDBC driver

Issue #51: sqoop_sql_where_statement with validation = query failes with double where statements

Issue #52: column type ‘long’ in oracle gets wrong column type in Hive

Issue #53: No logging of forced removal of locks

Issue #54: DB2 clob columns is not mapped to String in sqoop

Issue #55: DB2 import with column type time(3) result in null values

Issue #56: timestamp columns from MSSQL will result in NULL values

Issue #58: merge operations only look at mergeonly override for PK

Improvments

Foreign Keys can be disabled per table or connection

v0.60

Fixed issues

Issue #30: manage generates error when no valid Kerberos ticket available

Issue #31: Oracle Flashback imports get Merge cardinality_violation

Issue #32: Airflow sensor never times out

Issue #33: truncate_hive column in import_tables is not used/implemented

Issue #34: pk_column_override and pk_column_override_mergeonly with uppercase columns failes

Issue #35: datalake_source is only created with a new table, not added to a already existing

Issue #36: sqoop mappers not based on history

Issue #37: changing HDFS_Basedir doesnt trigger an alter of the Import table

Issue #38: Wrong row count on exported tables

Improvments

HDFS basedir is configurable in the configuration table

Changed behavior

Configuration for HDFS are move to the configuration table in MySQL

Configuration for Sqoop mappers are move to the configuration table in MySQL

New Features

Multi-cluster imports (synchronous only)

full_insert import method

v0.51

Fixed issues

Issue #29: Duplicate column in statistics when changing import type without reset

Improvments

Possible to specify Java Heap for Export operations

Changed behavior

hive_merge_heap column in import_tables sets Java Heap for the entire Hive session, not just for Merge operations.

New Features

Airflow integration

v0.50

Fixed issues

Issue #26: Schema changes in configuration database is not handled

Issue #27: String export to MSSQL into varchar gets converted everytime

Issue #28: Update column description on exported MSSQL table failes

Improvments

resetIncrementalImport is added to ‘manage’ in order to clear an incremental import and force the next import to start with a initial import

Changed behavior

Configuration for Hive validation test and extended messages are move to the configuration table in MySQL

New Features

New import type called ‘oracle_flashback_merge’ is availble. Will use the Oracle Flashback Version Query to import changed rows into Hive

v0.42

Fixed issues

Issue #20: Going from Merge to non-merge imports fails because missing datalake_import column

Issue #22: Column starting with _ failed if it’s part of Primary Key and merge operation is running

Issue #23: varchar(-1) from MSSQL generates error in Sqoop

Issue #24: Remove locks by force only in target table

Issue #25: column with the name ‘int’ is not supported

Improvments

Removing locks by force is configurable in the configuration table

Changed behavior

Configuration to Hive metastore must be changed to a SQLAlchemy connection string stored in the setting hive_metastore_alchemy_conn

New Features

Hive Metastore SQL connection now uses SQLAlchemy. This enables more than MySQL as database type for Hive Metastore

v0.41.1

Fixed issues

Issue #17: Oracle Primary Key got columns from Unique key

Issue #18: Error if Merge run on table with only PK columns

Issue #19: Hive Merge implicit cast wont work with X number of columns

Issue #21: _ at the start of the column name generates errors during import

Improvments

Propper error message when table contains no primary key and a merge operation is running

v0.41

Fixed issues

Issue #16: include_in_import for map-column-java is not affected

Improvments

Issue #15: Move JDBC Driver config to database

New Features

Functions to add import tables by searching for tables in source that we dont already have

Functions to add export tables by searching for tables in hive that we dont already have

v0.40

Fixed issues

Issue #14: force_string settings in import_columns was not used

New Features

Exports to MsSQL, Oracle, MySQL and DB2 is fully supported

v0.30

Fixed issues

Issue #13: sqoop_query not respected

Issue #12: Include_in_import not respected

Issue #11: Oracle Number(>10) column having java_column_type = Integer

Issue #10: MySQL decimal columns gets created without precision

New Features

Ability to override the name and type of the column in Hive

It’s now possible to select where to get the number of rows from for the validation. sqoop or query

Support for Merge operation during ETL Phase, including History Audit tables

Import supports command options -I, -C and -E for running only Import, Copy or ETL Phase

Changed behavior

Stage 1 is renamed to Import Phase. -1 command option still works against import for compability

Stage 2 is renamed to ETL Phase. -2 command option still works against import for compability

The values in the column sqoop_options in import_tables will be converted to lowercase before added to sqoop

v0.21

Fixed issues

Issue #9: PK with spaces in column name failes on –split-by

Issue #8: Columnnames with two spaces after each other failes in sqoop

Issue #6: MySQL cant handle “ around column names

New Features

You can limit the number of sqoop mappers globaly on a database connection by specifying a positiv value in the column max_import_sessions

Import statistics is stored in table import_statistics and import_statistics_last

v0.20

Fixed issues

Issue #5: Message about ‘split-by-text’ even if the column is an integer

Issue #4: Parquet cant handle SPACE in column name

Issue #3: TimeCheck failes before 10.00

Issue #2: ‘sqoop_sql_where_addition’ assumes ‘where’ is in config

Issue #1: Errors when running without an valid Kerberos ticket

New Features

Incremental Imports are now supported

Encryption of username/password with manage –encryptCredentials

Repair of incremental import with manage –repairIncrementalImport

Repair of all failed incremental imports with manage –repairAllIncrementalImports

It’s possible to ignore the timeWindow by adding –ignoreTime to the import command

You can force an import to start from the begining by adding –resetStage to the import command