Release Notes
=============

v0.81 (not released)
------------------------------

**New Features**

  - Add tracking of yarn application name with DBImport job. Data available in new table called yarn_statistics
  - Support for DBImport on AWS using Glue catalog, EMR and MWAA

**Fixed issues**
  - Issue #101: Invalid column in Index


v0.80 
------------------------------

**New Features**

  - Ability to use Spark as the ETL engine instead of Hive (experimental)
  - Support for SQLAnywhere databases
  - Support for Spark3
  - Support for Iceberg as a file format (requires Spark3 and Spark as ETL)
  - Impala metadata validation after Import is completed

v0.70 
------------------------------

**New Features**

  - Support for import and export to Snowflake
  - Support for writing Hive data to AWS S3 bucket
  - Airflow email support for failed dags

v0.68 
------------------------------

**Fixed issues**

  - Issue #85: Incremental import of empty database might result in duplicate rows
  - Issue #86: Time start/stop on jdbc_connections does not follow scheduled timezone
  - Issue #87: Time start/stop on jdbc_connections does not follow scheduled timezone
  - Issue #88: Column comments from Oracle is missing
  - Issue #90: "No common columns to perform merge on" during Deploy
  - Issue #91: MySQL Connection leak in copy operations

**New Features**

  - Source table information likes indexes are fetched from sources database and stored for information use
  - Deploy tool to handle deployments of DAGs and affected tables/columns/connections to a remote DBImport instance
  - Information of started and finised imports, exports and Airflow DAGs can be sent in json format to Kafka and REST interfaces


v0.67 
------------------------------

**New Features**

  - Support for imports with MSSQL Change Tracking functions


v0.66
------------------------------

**Fixed issues**

  - Issue #43: incr_mode in import_tables is not an Enum column
  - Issue #74: Tasks from custom_dags is not using retries setting
  - Issue #75: Add alias argument to manage script for encryptCredentials
  - Issue #76: SQL 'where' in export_tables is not working
  - Issue #77: Default value of etl_phase_type in import_tables is missing
  - Issue #78: ResetIncrementalExport dont truncate table
  - Issue #79: Airflow pool will get duplicate error if upper/lower case hostname differ

**New Features**

  - Custom SQL validation
  - Support for timezone
  - Atlas integration performance from DBImport server has been improved and can now scale with multiple threads


v0.65
------------------------------

**Fixed issues**

  - Issue #61: Import from View on MSSQL failes with no columns
  - Issue #62: DBImport Server connection retries against config database 
  - Issue #63: Control executors for Spark export
  - Issue #64: Initial incremental export with 0 rows fails
  - Issue #65: Python error when missing permissions in Atlas
  - Issue #66: Setting a timeslot where start is larger than stop gives a configuration error
  - Issue #67: Unneeded column change during export to Oracle
  - Issue #68: dropExportTable generates error if table does not exist
  - Issue #69: schedule_interval columns with different size
  - Issue #70: Support for authentication source for Mongo connections
  - Issue #71: testConnection against Mongo does not work
  - Issue #72: Incremental import of empty tables failes because no rows stored from source

**Changed behavior**

  - Hive connection supports multiple servers.
  - Poke interval for Multi-cluster sensors is changed from 30 sec to 5 min
  - You force DBImport to process all steps for incremental imports even if it contains no new data. 
  - Encrypt password for jdbc connections supports '-a' as an argument for connection name

**New Features**

  - Atlas Integration
  - Import from MongoDB is supported with Spark as import tool 
  - Export to PostgreSQL
  - Oracle CDC Import supports History table
  - Can force a major compaction after an Import that uses Hive Merge
  - Support for anonymization of columns during import

v0.64
------------------------------

**Fixed issues**

  - Issue #59: Export update_table failes because table is not empty

**New Features**

  - Spark supported for both import and exports (TechPreview)

v0.63
------------------------------

**Fixed issues**

  - Issue #60: Export tries to alter column types FLOAT(*)

**Improvments**

  - Better description of parameters in *manage* command

**Changed behavior**

  - Kerberos ticket is created and handled by DBImport internally. No need to have a valid ticket before start anymore

**New Features**

  - Dedicated *copy* command
  - Sqoop column type can be overridden with setting in import_columns

v0.62
------------------------------

**New Features**

  - Multi-cluster imports with asynchronous copy mode
  - DBImport server daemon. This is the service that handles asynchronous copy of data between clusters

v0.61
------------------------------

**Fixed issues**

  - Issue #39: Export failes when sqoop timeout against Kafka for Atlas info
  - Issue #40: Creating Airflow Pools failes when pool table is empty in Airflow
  - Issue #41: Error when creating DBImport database
  - Issue #42: Airflow Tasks failes 'In Main' if there is a dependency to a DBImport Task
  - Issue #44: Importing a table with a column called 'const' is not supported
  - Issue #45: Retries sometimes failes due to Hive connection
  - Issue #46: Exporting from a Hive table that doesnt exists gives errors
  - Issue #47: Get rowcount failes if column for incremental load is a reserved word
  - Issue #48: Column names containing # fails on column not found
  - Issue #49: Importing ‘time’ columns from MSSQL fails
  - Issue #50: SQL Server connection with encryption uses wrong JDBC driver
  - Issue #51: sqoop_sql_where_statement with validation = query failes with double where statements
  - Issue #52: column type 'long' in oracle gets wrong column type in Hive
  - Issue #53: No logging of forced removal of locks 
  - Issue #54: DB2 clob columns is not mapped to String in sqoop
  - Issue #55: DB2 import with column type time(3) result in null values
  - Issue #56: timestamp columns from MSSQL will result in NULL values
  - Issue #58: merge operations only look at mergeonly override for PK

**Improvments**

  - Foreign Keys can be disabled per table or connection 

v0.60
------------------------------

**Fixed issues**

  - Issue #30: manage generates error when no valid Kerberos ticket available
  - Issue #31: Oracle Flashback imports get Merge cardinality_violation
  - Issue #32: Airflow sensor never times out
  - Issue #33: truncate_hive column in import_tables is not used/implemented
  - Issue #34: pk_column_override and pk_column_override_mergeonly with uppercase columns failes
  - Issue #35: datalake_source is only created with a new table, not added to a already existing
  - Issue #36: sqoop mappers not based on history
  - Issue #37: changing HDFS_Basedir doesnt trigger an alter of the Import table
  - Issue #38: Wrong row count on exported tables

**Improvments**

  - HDFS basedir is configurable in the configuration table

**Changed behavior**

  - Configuration for HDFS are move to the configuration table in MySQL
  - Configuration for Sqoop mappers are move to the configuration table in MySQL

**New Features**

  - Multi-cluster imports (synchronous only)
  - *full_insert* import method

v0.51
------------------------------

**Fixed issues**

  - Issue #29: Duplicate column in statistics when changing import type without reset

**Improvments**

  - Possible to specify Java Heap for Export operations

**Changed behavior**

  - *hive_merge_heap* column in *import_tables* sets Java Heap for the entire Hive session, not just for Merge operations.

**New Features**

  - Airflow integration 

v0.50
------------------------------

**Fixed issues**

  - Issue #26: Schema changes in configuration database is not handled
  - Issue #27: String export to MSSQL into varchar gets converted everytime
  - Issue #28: Update column description on exported MSSQL table failes

**Improvments**

  - resetIncrementalImport is added to 'manage' in order to clear an incremental import and force the next import to start with a initial import 

**Changed behavior**

  - Configuration for Hive validation test and extended messages are move to the configuration table in MySQL

**New Features**

  - New import type called 'oracle_flashback_merge' is availble. Will use the *Oracle Flashback Version Query* to import changed rows into Hive

v0.42
------------------------------

**Fixed issues**

  - Issue #20: Going from Merge to non-merge imports fails because missing datalake_import column
  - Issue #22: Column starting with _ failed if it's part of Primary Key and merge operation is running
  - Issue #23: varchar(-1) from MSSQL generates error in Sqoop
  - Issue #24: Remove locks by force only in target table
  - Issue #25: column with the name 'int' is not supported

**Improvments**

  - Removing locks by force is configurable in the configuration table

**Changed behavior**

  - Configuration to Hive metastore must be changed to a SQLAlchemy connection string stored in the setting *hive_metastore_alchemy_conn* 

**New Features**

  - Hive Metastore SQL connection now uses SQLAlchemy. This enables more than MySQL as database type for Hive Metastore


v0.41.1
------------------------------

**Fixed issues**

  - Issue #17: Oracle Primary Key got columns from Unique key
  - Issue #18: Error if Merge run on table with only PK columns
  - Issue #19: Hive Merge implicit cast wont work with X number of columns
  - Issue #21: _ at the start of the column name generates errors during import

**Improvments**

  - Propper error message when table contains no primary key and a merge operation is running

v0.41
-----

**Fixed issues**

  - Issue #16: include_in_import for map-column-java is not affected

**Improvments**

  - Issue #15: Move JDBC Driver config to database

**New Features**

  - Functions to add import tables by searching for tables in source that we dont already have
  - Functions to add export tables by searching for tables in hive that we dont already have

v0.40
-----

**Fixed issues**

  - Issue #14: force_string settings in import_columns was not used

**New Features**

  - Exports to MsSQL, Oracle, MySQL and DB2 is fully supported


v0.30
-----

**Fixed issues**

  - Issue #13: sqoop_query not respected
  - Issue #12: Include_in_import not respected
  - Issue #11: Oracle Number(>10) column having java_column_type = Integer
  - Issue #10: MySQL decimal columns gets created without precision

**New Features**

  - Ability to override the name and type of the column in Hive
  - It's now possible to select where to get the number of rows from for the validation. sqoop or query
  - Support for Merge operation during ETL Phase, including History Audit tables
  - Import supports command options -I, -C and -E for running only Import, Copy or ETL Phase

**Changed behavior**

  - *Stage 1* is renamed to *Import Phase*. -1 command option still works against *import* for compability
  - *Stage 2* is renamed to *ETL Phase*. -2 command option still works against *import* for compability
  - The values in the column *sqoop_options* in *import_tables* will be converted to lowercase before added to sqoop

v0.21
-----

**Fixed issues**

  - Issue #9: PK with spaces in column name failes on --split-by
  - Issue #8: Columnnames with two spaces after each other failes in sqoop
  - Issue #6: MySQL cant handle " around column names

**New Features**

  - You can limit the number of sqoop mappers globaly on a database connection by specifying a positiv value in the column *max_import_sessions*
  - Import statistics is stored in table *import_statistics* and *import_statistics_last*

v0.20
-----

**Fixed issues**

  - Issue #5: Message about 'split-by-text' even if the column is an integer
  - Issue #4: Parquet cant handle SPACE in column name
  - Issue #3: TimeCheck failes before 10.00
  - Issue #2: 'sqoop_sql_where_addition' assumes 'where' is in config
  - Issue #1: Errors when running without an valid Kerberos ticket

**New Features**

  - Incremental Imports are now supported
  - Encryption of username/password with manage --encryptCredentials
  - Repair of incremental import with manage --repairIncrementalImport
  - Repair of all failed incremental imports with manage --repairAllIncrementalImports
  - It's possible to ignore the timeWindow by adding --ignoreTime to the import command
  - You can force an import to start from the begining by adding --resetStage to the import command