Skip to main content
Version: 3.0 (latest)

Configure Iceberg as a target

With an Iceberg metadata agent you can migrate your source Hive metadata to an Apache Iceberg catalog within IBM Watsonx.data stored in the Apache Iceberg table format.

Prerequisites

  • If your migration includes column addition operations, ensure hive.metastore.disallow.incompatible.col.type.changes is set to false on your Watsonx.data target metatstore-site.xml.

Limitations

The following source table formats are supported:

  • Parquet.
  • ORC Hive.

The following Apache Iceberg catalog types are currently supported: Apache Hive.

The following target filesystems are currently supported: S3 compatible targets.

With regard to transaction support: Full ACID transactions are not currently supported. Insert only transactions are supported.

Historical metadata retention limit:

  • The default and recommended maximum number of previous metadata versions to retain is 200 snapshots. Increasing beyond this recommended value may cause errors and undesired behaviour.

Hive Compaction:

  • Using Hive compaction results in Data Migrator removing those files from the target, this means time travel queries will no longer work correctly on the Iceberg target as the old files no longer exist and so cannot be included in a manifest list for an earlier snapshot.

Unsupported migration functionality

Functionality
ORC files generated by Hive versions pre 2.0.0
Hive 3.x ACID transactional tables
Hive constraints.
Indexes
Functions
Views
Materialized Views
Schema evolution involving column renames or data type changes, either in the past or while migrating. (Schema evolution involving add, drop or reordering columns is supported if supported on source.)
TBLPROPERTIES are not migrated from Hive to Iceberg
Target snapshot expiry and Garbage collection are not migrated by Hivemigrator, and should be configured on the target if required
info

Regarding drop-create rename operations. See the following Known issue for more information.

Supported partition column types

Partition column type
boolean
integer
bigint
float
double
string* (converts to varchar)
binary
decimal
date

*STRING type columns/partitions will be migrated to Iceberg, but will be converted to VARCHAR type

Configure an Iceberg metastore agent

You can add an Apache Iceberg metastore agent with both the UI and the CLI.

note

In this release, only the Apache Hive Catalog Type is supported.

Add an Iceberg metastore agent

To add an Iceberg agent with the UI:

  1. From the Dashboard, select an instance under Instances.
  2. Under Filesystems & Agents, select Metastore Agents.
  3. Select Connect to Metastore.
  4. Select the filesystem.
  5. Select Iceberg as the Metastore Type.
  6. Enter a Display Name.
  7. Select/confirm Apache Hive as the Catalog Type.
  8. Enter the name of your Iceberg catalog under Catalog Name.
  9. Enter the local path to a hive-site.xml file containing additional Iceberg Hive configuration in the Configuration Path field. Ensure the user running Data Migrator can access this path.
  10. Enter the name used to connect to the Iceberg Hive metastore under Hive Metastore Username.
  11. Enter the URI of your Iceberg Hive metastore thrift endpoint under Metastore URI. Include the scheme, for example: thrift://<host>:<port>.
  12. Enter the location on the target storage where the Iceberg metadata, manifest and snapshot files will reside under Warehouse Directory. For example: /warehouse.
    note

    The Warehouse Directory path supplied should not reside under a migrated directory with Target Match enabled, as Target Match will attempt to match the source and target and remove the metadata files.

  13. (Optional) - Enter a filesystem URI into Default Filesystem Override to override the default filesystem URI.
tip

Check and ensure you use the correct Catalog Name as your agent may initially appear healthy when an invalid value is used.

Update an existing Iceberg metastore agent

Use the Update an Iceberg metastore agent with the CLI section to update your existing Iceberg metastore agent.

info

An Iceberg agent health check status may report incorrectly if updated repeatedly. See the following Known issue for more information.

tip

Remember to define your target filesystem and add any accompanying data migrations for the tables and databases you need to migrate.

Additional Iceberg Hive configuration

Specify any additional configuration required to connect to your specific Watsonx.data Iceberg Hive Catalog instance using a hive-site.xml file in the Hadoop XML configuration format. Supply this configuration when adding your agent using the Configuration Path field in the UI or with the --config-path when using the CLI. See the examples below for some common types of configuration which may be required depending on your specific Watsonx.data instance.

tip

Ensure the user running Data Migrator can access the path and file specified when you supply additional configuration.

Example: Provide target metastore security credentials

The example below uses client configuration to specify the authentication mode, username and password required to connect to the target metastore. The example specifically demonstrates use of a JCEKS credential provider file used to store the security credential.

Example hive-site.xml
<configuration>
<property>
<name>hive.metastore.client.auth.mode</name>
<value>PLAIN</value>
</property>
<property>
<name>hive.metastore.client.plain.username</name>
<value>metastoreuser1</value>
</property>
<property>
<name>hadoop.security.credential.provider.path</name>
<value>localjceks://file/etc/cirata/hivemigrator/watsonx_truststore/wandisco-watsonx.jceks</value>
</property>
...
...
</configuration>

Example: SSL configuration

For example, if your Watsonx.data Hive Catalog metastore provides a certificate, provide additional configuration to your Iceberg agent to trust this certificate.

Example hive-site.xml
<configuration>
<property>
<name>hive.metastore.truststore.type</name>
<value>JKS</value>
</property>
<property>
<name>hive.metastore.truststore.path</name>
<value>file:///etc/cirata/hivemigrator/watsonx_truststore/cacerts</value>
</property>
<property>
<name>hive.metastore.truststore.password</name>
<value>changeme</value>
</property>
...
...
</configuration>

Add an Iceberg metastore agent with the CLI

To add an Iceberg agent with the CLI use the hive agent add iceberg CLI command:

Iceberg agent add example
hive agent add iceberg --catalog-name catalog_cat1 --config-path /etc/hadoop/watsonx/ --username ibmlhadmin --metastore-uri thrift://my.thrift.host:9083 --file-system-id aws-target   --warehouse-dir / --catalog-type HIVE --name SUPERAGENT
tip

Check and ensure you use the correct --catalog-name as your agent may initially appear healthy when an invalid value is used.

Update an Iceberg metastore agent with the CLI

To update an Iceberg agent with the CLI, use the hive agent configure iceberg CLI command:

Example update of an existing Iceberg agent
hive agent configure iceberg --name ice1 --username admin2

Next steps

If you have already added Metadata Rules, create a Metadata Migration. You can also add metadata rules with the hive rule add CLI command to define the scope then create a metadata migration with hive migration add.