Configure Iceberg as a target
With an Iceberg metadata agent you can migrate your source Hive metadata to an Apache Iceberg catalog within IBM Watsonx.data stored in the Apache Iceberg table format.
Prerequisites
- If your migration includes column addition operations, ensure
hive.metastore.disallow.incompatible.col.type.changes
is set tofalse
on your Watsonx.data target metatstore-site.xml.
Limitations
The following source table formats are supported:
- Parquet.
- ORC Hive.
The following Apache Iceberg catalog types are currently supported: Apache Hive.
The following target filesystems are currently supported: S3 compatible targets.
With regard to transaction support: Full ACID transactions are not currently supported. Insert only transactions are supported.
Historical metadata retention limit:
- The default and recommended maximum number of previous metadata versions to retain is 200 snapshots. Increasing beyond this recommended value may cause errors and undesired behaviour.
Hive Compaction:
- Using Hive compaction results in Data Migrator removing those files from the target, this means time travel queries will no longer work correctly on the Iceberg target as the old files no longer exist and so cannot be included in a manifest list for an earlier snapshot.
Unsupported migration functionality
Functionality |
---|
ORC files generated by Hive versions pre 2.0.0 |
Hive 3.x ACID transactional tables |
Hive constraints. |
Indexes |
Functions |
Views |
Materialized Views |
Schema evolution involving column renames or data type changes, either in the past or while migrating. (Schema evolution involving add, drop or reordering columns is supported if supported on source.) |
TBLPROPERTIES are not migrated from Hive to Iceberg |
Target snapshot expiry and Garbage collection are not migrated by Hivemigrator, and should be configured on the target if required |
Regarding drop-create rename operations. See the following Known issue for more information.
Supported partition column types
Partition column type |
---|
boolean |
integer |
bigint |
float |
double |
string* (converts to varchar) |
binary |
decimal |
date |
*STRING type columns/partitions will be migrated to Iceberg, but will be converted to VARCHAR type
Configure an Iceberg metastore agent
You can add an Apache Iceberg metastore agent with both the UI and the CLI.
In this release, only the Apache Hive Catalog Type is supported.
Add an Iceberg metastore agent
To add an Iceberg agent with the UI:
- From the Dashboard, select an instance under Instances.
- Under Filesystems & Agents, select Metastore Agents.
- Select Connect to Metastore.
- Select the filesystem.
- Select Iceberg as the Metastore Type.
- Enter a Display Name.
- Select/confirm Apache Hive as the Catalog Type.
- Enter the name of your Iceberg catalog under Catalog Name.
- Enter the local path to a
hive-site.xml
file containing additional Iceberg Hive configuration in the Configuration Path field. Ensure the user running Data Migrator can access this path. - Enter the name used to connect to the Iceberg Hive metastore under Hive Metastore Username.
- Enter the URI of your Iceberg Hive metastore thrift endpoint under Metastore URI. Include the scheme, for example:
thrift://<host>:<port>
. - Enter the location on the target storage where the Iceberg metadata, manifest and snapshot files will reside under Warehouse Directory. For example:
/warehouse
.noteThe Warehouse Directory path supplied should not reside under a migrated directory with Target Match enabled, as Target Match will attempt to match the source and target and remove the metadata files.
- (Optional) - Enter a filesystem URI into Default Filesystem Override to override the default filesystem URI.
Check and ensure you use the correct Catalog Name as your agent may initially appear healthy when an invalid value is used.
Update an existing Iceberg metastore agent
Use the Update an Iceberg metastore agent with the CLI section to update your existing Iceberg metastore agent.
An Iceberg agent health check status may report incorrectly if updated repeatedly. See the following Known issue for more information.
Remember to define your target filesystem and add any accompanying data migrations for the tables and databases you need to migrate.
Additional Iceberg Hive configuration
Specify any additional configuration required to connect to your specific Watsonx.data Iceberg Hive Catalog instance using a hive-site.xml
file in the Hadoop XML configuration format.
Supply this configuration when adding your agent using the Configuration Path field in the UI or with the --config-path
when using the CLI.
See the examples below for some common types of configuration which may be required depending on your specific Watsonx.data instance.
Ensure the user running Data Migrator can access the path and file specified when you supply additional configuration.
Example: Provide target metastore security credentials
The example below uses client configuration to specify the authentication mode, username and password required to connect to the target metastore. The example specifically demonstrates use of a JCEKS credential provider file used to store the security credential.
<configuration>
<property>
<name>hive.metastore.client.auth.mode</name>
<value>PLAIN</value>
</property>
<property>
<name>hive.metastore.client.plain.username</name>
<value>metastoreuser1</value>
</property>
<property>
<name>hadoop.security.credential.provider.path</name>
<value>localjceks://file/etc/cirata/hivemigrator/watsonx_truststore/wandisco-watsonx.jceks</value>
</property>
...
...
</configuration>
Example: SSL configuration
For example, if your Watsonx.data Hive Catalog metastore provides a certificate, provide additional configuration to your Iceberg agent to trust this certificate.
<configuration>
<property>
<name>hive.metastore.truststore.type</name>
<value>JKS</value>
</property>
<property>
<name>hive.metastore.truststore.path</name>
<value>file:///etc/cirata/hivemigrator/watsonx_truststore/cacerts</value>
</property>
<property>
<name>hive.metastore.truststore.password</name>
<value>changeme</value>
</property>
...
...
</configuration>
Add an Iceberg metastore agent with the CLI
To add an Iceberg agent with the CLI use the hive agent add iceberg
CLI command:
hive agent add iceberg --catalog-name catalog_cat1 --config-path /etc/hadoop/watsonx/ --username ibmlhadmin --metastore-uri thrift://my.thrift.host:9083 --file-system-id aws-target --warehouse-dir / --catalog-type HIVE --name SUPERAGENT
Check and ensure you use the correct --catalog-name
as your agent may initially appear healthy when an invalid value is used.
Update an Iceberg metastore agent with the CLI
To update an Iceberg agent with the CLI, use the hive agent configure iceberg
CLI command:
hive agent configure iceberg --name ice1 --username admin2
Next steps
If you have already added Metadata Rules, create a Metadata Migration.
You can also add metadata rules with the hive rule add
CLI command to define the scope then create a metadata migration with hive migration add
.