Skip to main content
Version: 3.0 (latest)

Supported sources and targets

Sources

Select a source below to learn more about which migration types and event notifications it supports.

When you configure Data Migrator to transfer data from a source, you must select one of the following migration types:

  • One-time
    Data Migrator scans the existing source data once and migrates the data to the target. After the data is transferred, the migration is complete and no further changes are migrated.

    No event stream is required for one-time migrations.

  • Live
    After Data Migrator performs an initial content scan, it moves existing data to the target. Any changes made to the source filesystem are migrated in real time using the notification system defined for this storage.

    You must have an event stream set up in your environment as detailed below. Expand the sections below to learn more about which event streams work for the sources and Data Migrator.

  • Recurring
    After existing data is moved, the migration scan is repeated to discover new changes. Changes are then migrated to the target.

    No event stream is required for recurring migrations as Data Migrator performs scans of the source.

Amazon S3

Amazon S3 as a source supports live migrations.

Set up your source bucket to use Simple Queue Service (SQS) to handle event notifications.

See the Amazon documentation on enabling messages to be published to an SQS.

We don't support versioning, metadata migrations, or object locks.

*Supported HDFS target versions. (Other HDFS-compatible environments can be targeted without change. For details, contact Support.)

HDP: 2.6.x, 3.1.
CDH: 5.11, 5.13, 5.14, 5.16, 6.2, 6.3.
CDP: 7.1.4, 7.1.6, 7.1.7, 7.1.8, 7.1.9
TargetOne-time migrationLive migrationRecurring migration
Alibaba Cloud Object Storage Service
Amazon S3
Azure Data Lake Storage Gen2
Google Cloud Storage
Hadoop Distributed File System*
IBM Cloud Object Storage
Local Filesystem
Oracle Object Storage
S3
Azure Data Lake Storage Gen2

*Live migration: Additional third party requirements are necessary. Learn more and contact Support for more information.

*Supported HDFS target versions. (Other HDFS-compatible environments can be targeted without change. For details, contact Support.)

HDP: 2.6.x, 3.1.
CDH: 5.11, 5.13, 5.14, 5.16, 6.2, 6.3.
CDP: 7.1.4, 7.1.6, 7.1.7, 7.1.8, 7.1.9
TargetOne-time migrationLive migration*Recurring migration
Alibaba Cloud Object Storage Service
Amazon S3
Azure Data Lake Storage Gen2
Google Cloud Storage
Hadoop Distributed File System*
IBM Cloud Object Storage
Local Filesystem
Oracle Object Storage
S3
Google Cloud Storage

Google Cloud Storage (GCS) as a source supports one-time migrations.

*Supported HDFS target versions. (Other HDFS-compatible environments can be targeted without change. For details, contact Support.)

HDP: 2.6.x, 3.1.
CDH: 5.11, 5.13, 5.14, 5.16, 6.2, 6.3.
CDP: 7.1.4, 7.1.6, 7.1.7, 7.1.8, 7.1.9
TargetOne-time migrationLive migrationRecurring migration
Alibaba Cloud Object Storage Service
Amazon S3
Azure Data Lake Storage Gen2
Google Cloud Storage
Hadoop Distributed File System*
IBM Cloud Object Storage
Local Filesystem
Oracle Object Storage
S3
Hadoop Distributed Filesystem

Hadoop Distributed Filesystem (HDFS) supports all migration types including live migrations.

Data Migrator reads events from a HDFS cluster's NameNode to track changes to data on the filesystem.

For more information, see Configure your HDFS cluster.

Supported HDFS source versions. (All HDFS versions from Hadoop 2.6 are likely to function without issue. For details on additional support, contact Support.)

HDP: 2.6.3, 2.6.5, 3.1.0
CDH: 5.13, 5.15, 5.16, 6.2, 6.3.
CDP: 7.1.4, 7.1.6, 7.1.7, 7.1.8, 7.1.9

*Supported HDFS target versions. (Other HDFS-compatible environments can be targeted without change. For details, contact Support.)

HDP: 2.6.x, 3.1.
CDH: 5.11, 5.13, 5.14, 5.16, 6.2, 6.3.
CDP: 7.1.4, 7.1.6, 7.1.7, 7.1.8, 7.1.9
TargetOne-time migrationLive migrationRecurring migration
Alibaba Cloud Object Storage Service
Amazon S3
Azure Data Lake Storage Gen2
Google Cloud Storage
Hadoop Distributed File System*
IBM Cloud Object Storage
Local Filesystem
Oracle Object Storage
S3
IBM Cloud Object Storage

IBM Cloud Object Storage (IBM COS) supports live migrations.

IBM COS has Apache Kafka event streaming which allows us to handle event notifications.

See the IBM documentation on event streams and how to configure Kafka for Data Migrator in Configure IBM COS as a source.

One-time migration support can be enabled by configuring IBM COS as a generic S3 source filesystem.

*Supported HDFS target versions. (Other HDFS-compatible environments can be targeted without change. For details, contact Support.)

HDP: 2.6.x, 3.1.
CDH: 5.11, 5.13, 5.14, 5.16, 6.2, 6.3.
CDP: 7.1.4, 7.1.6, 7.1.7, 7.1.8, 7.1.9
TargetOne-time migrationLive migrationRecurring migration
Alibaba Cloud Object Storage Service
Amazon S3
Azure Data Lake Storage Gen2
Google Cloud Storage
Hadoop Distributed File System*
IBM Cloud Object Storage
Local Filesystem
Oracle Object Storage
S3
IBM Spectrum Scale

IBM Spectrum Scale/Storage Scale (GPFS) supports live migrations with Apache Kafka event streaming configured to handle event notifications.

*Supported HDFS target versions. (Other HDFS-compatible environments can be targeted without change. For details, contact Support.)

HDP: 2.6.x, 3.1.
CDH: 5.11, 5.13, 5.14, 5.16, 6.2, 6.3.
CDP: 7.1.4, 7.1.6, 7.1.7, 7.1.8, 7.1.9
TargetOne-time migrationLive migrationRecurring migration
Alibaba Cloud Object Storage Service
Amazon S3
Azure Data Lake Storage Gen2
Google Cloud Storage
Hadoop Distributed File System*
IBM Cloud Object Storage
Local Filesystem
Oracle Object Storage
S3
Local Filesystem

A local filesystem is a filesystem mounted on a Linux server on which Data Migrator runs. This can be any filesystem supported by the operating system ideal for migrating small to midsize business data, for example.

Data Migrator must have sufficient privileges to access the filesystem. This often means running as root.

Data Migrator doesn't migrate file permissions and access control lists (ACLs). For example, migrating a network file system version 4 (NFS v4) file share with ACLs to S3 migrates the data and gives control to anyone who has access to the target bucket.

*Supported HDFS target versions. (Other HDFS-compatible environments can be targeted without change. For details, contact Support.)

HDP: 2.6.x, 3.1.
CDH: 5.11, 5.13, 5.14, 5.16, 6.2, 6.3.
CDP: 7.1.4, 7.1.6, 7.1.7, 7.1.8, 7.1.9
TargetOne-time migrationLive migrationRecurring migration
Alibaba Cloud Object Storage Service
Amazon S3
Azure Data Lake Storage Gen2
Google Cloud Storage
Hadoop Distributed File System*
IBM Cloud Object Storage
Local Filesystem
Oracle Object Storage
S3
Network-Attached Storage

Add network-attached storage as a local filesystem.

*Supported HDFS target versions. (Other HDFS-compatible environments can be targeted without change. For details, contact Support.)

HDP: 2.6.x, 3.1.
CDH: 5.11, 5.13, 5.14, 5.16, 6.2, 6.3.
CDP: 7.1.4, 7.1.6, 7.1.7, 7.1.8, 7.1.9
TargetOne-time migrationLive migrationRecurring migration
Alibaba Cloud Object Storage Service
Amazon S3
Azure Data Lake Storage Gen2
Google Cloud Storage
Hadoop Distributed File System*
IBM Cloud Object Storage
Local Filesystem
Oracle Object Storage
S3
S3

Generic S3 covers all S3 sources other than Amazon S3 and IBM COS. This includes cloud storage from other providers, for example, on-premise S3 storage such as MinIO, Dell EMC PowerScale, Scality, and Cloudian.

These providers don't support live migrations yet.

*Supported HDFS target versions. (Other HDFS-compatible environments can be targeted without change. For details, contact Support.)

HDP: 2.6.x, 3.1.
CDH: 5.11, 5.13, 5.14, 5.16, 6.2, 6.3.
CDP: 7.1.4, 7.1.6, 7.1.7, 7.1.8, 7.1.9
TargetOne-time migrationLive migrationRecurring migration
Alibaba Cloud Object Storage Service
Amazon S3
Azure Data Lake Storage Gen2
Google Cloud Storage
Hadoop Distributed File System*
IBM Cloud Object Storage
Local Filesystem
Oracle Object Storage
S3

Metastore Agent support

The following shows the Metastore agents currently available. See Connect to source and target metastores and individual agent sections to configure an agent and any additional information for each agent.

Source

Supported Hive Migrator source Metastore Agents and platforms.

Supported Hive versions: Hive 1.1.0, 1.2.1, 2.1.1, 3.1.0, and 3.1.3, as distributed with the supported Hadoop distributions shown below.

info

*Although the AWS Glue agent is listed as a source agent below, it may only be used as a source in limited use cases. Get in contact if you need to use AWS Glue as your source for metadata migration.

AgentPlatformHive VersionListening mode
HiveCDH5.x1.1.0No
HiveCDH6.x2.1.1No
HiveHDP3.x3.1.0Yes*
HiveCDP7.1.x3.1.3Yes
*AWS GlueAWS Glue Data CatalogN/ANo
info

*Listening mode is supported for HDP 3.1.5 and CDP 7.1.x. See HVM listener event type limitations for more information. Users for any Hadoop version other than HDP 3.1.5 or CDP 7.1.x must select Scanning mode. For information on other Hadoop versions, contact support.

Target

Supported Hive Migrator target Metastore Agents and platforms.

info

'Local Agent' marked as 'Yes - if same version as source' means a local agent can be used as a target as long as the source and target Hive versions are the same.

Agent TypePlatformLocal AgentRemote AgentAssociated filesystems
Apache HiveCDH5.xYes - if same version as sourceYesHDFS, S3
Apache HiveCDH6.xYes - if same version as sourceYesHDFS, S3
Apache HiveHDP3.xYes - if same version as sourceYesHDFS, S3
Apache HiveCDP7.1.xYes - if same version as sourceYesHDFS, S3
AWS GlueAWS Glue Data CatalogYesYesAmazon S3
Azure SQL DatabaseHDI 4.0 internal DBYesYesADLS Gen2
Azure SQL DatabaseHDI 4.0 external DBYesYesADLS Gen2
Azure SQL DatabaseHDI 3.6 internal DBYesYesADLS Gen2
Azure SQL DatabaseHDI 3.6 external DBYesYesADLS Gen2
DataprocGoogle Dataproc 2.1 (Ubuntu 20.04 LTS)NoYesGCS
DataprocGoogle Dataproc 2.0 (Ubuntu 18.04 LTS)NoYesGCS
DatabricksDatabricksYesNoADLS Gen2, Amazon S3, S3, GCS
SnowflakeSnowflakeYesNoADLS Gen2, Amazon S3, S3, GCS
Iceberg*Watsonx.dataYesNoS3
info

*Iceberg metastore agent within Watsonx.data and Hive sources only. See Configure Iceberg as a target for further information and limitations.