Iceberg Agent Overview
Data Migrator provides support for Iceberg to migrate your source Hive metadata to a target Iceberg supported catalog. Review the following prerequisites and functionality before configuring your agent.
Prerequisites
- Connection to an Apache Iceberg Hive Catalog on watsonx.data or a REST Catalog.
- For Hive Catalog on watsonx.data, the target filesystem must be S3 compatible targets.
- If your migration includes column addition operations, ensure
hive.metastore.disallow.incompatible.col.type.changes
is set tofalse
on your target Hive Metastore configuration, either a hive-site.xml or a metastore-site.xml
Limitations
The following source table formats are supported:
- Parquet.
- ORC Hive.
With regard to transaction support: Full ACID transactions are not currently supported. Insert only transactions are supported.
Historical metadata retention limit:
- The default and recommended maximum number of previous metadata versions to retain is 200 snapshots. Increasing beyond this recommended value may cause errors and undesired behaviour.
Hive Compaction:
- Using Hive compaction results in Data Migrator removing those files from the target, this means time travel queries will no longer work correctly on the Iceberg target as the old files no longer exist and so cannot be included in a manifest list for an earlier snapshot.
Unsupported migration functionality
Functionality |
---|
ORC files generated by Hive versions pre 2.0.0 |
Hive 3.x ACID transactional tables |
Hive constraints. |
Indexes |
Functions |
Views |
Materialized Views |
Schema evolution involving column renames or data type changes, either in the past or while migrating. (Schema evolution involving add, drop or reordering columns is supported if supported on source.) |
TBLPROPERTIES are not migrated from Hive to Iceberg |
Target snapshot expiry and Garbage collection are not migrated by Hivemigrator, and should be configured on the target if required |
info
Regarding drop-create rename operations. See the following Known issue for more information.
caution
Resetting an Iceberg migration will cause all tables to remigrate.
Supported partition column types
Partition column type |
---|
boolean |
integer |
bigint |
float |
double |
string* (converts to varchar) |
binary |
decimal |
date |
- STRING type columns/partitions will be migrated to Iceberg, but will be converted to VARCHAR type
Next steps
Configure your Iceberg Metadata Agent for Hive Catalog or Iceberg Metadata Agent for REST Catalog.