Version: 2.3

Configure a HDFS target

You can migrate data to a Hadoop Distributed File System (HDFS) by configuring it as a target filesystem for Data Migrator.

Follow these steps to create a HDFS target:

Configure a HDFS target filesystem

Prerequisites

You need the following:

A HDFS cluster running Hadoop 2.6 or above.
If Kerberos is enabled on your filesystem, a valid keytab containing a suitable principal for the HDFS superuser must be available on the host machine for Data Migrator. See Configure Kerberos.
Oracle Big Data Services (BDS) - If running with Oracle's Distribution of Apache Hadoop (ODH), Data Migrator must provide fully qualified hostnames for DNS. Ensure that the following configuration property overrides are added for the target filesystem:
Configuration property overrides are an option under Advanced Configuration on the Target Filesystem Configuration screen.
```
dfs.client.use.datanode.hostname=true
dfs.datanode.use.datanode.hostname=true
```

Configure a HDFS target filesystem with the UI

Connect to the UI.
From the Dashboard, select an instance under Instances.
In the Filesystems & Agents menu, select Filesystems.
Select Add target filesystem.
Enter the following details:
- Filesystem Type - The type of filesystem target. Select Hadoop Distributed File System (HDFS).
- Display Name - Enter a name for your target filesystem.
- Default FS - Enter the fs.defaultFS value from your HDFS configuration. For example, hdfs://nameservice:8020.
- User - If you're not running Kerberos on your target HDFS cluster, enter the name of the filesystem user you want to migrate data with.
- Kerberos Configuration - The details of your Kerberos configuration. You can authenticate with Kerberos using multi-realm Kerberos, cross-realm trust or target-only Kerberos. See Configure Kerberos.
  - Kerberos Principal - Enter a principal that will map to a HDFS superuser.
  - Kerberos Keytab Location - Enter the path to the Kerberos keytab file containing the Kerberos Principal. The keytab file must be accessible from the edge node where Data Migrator is installed.
- Advanced Configuration
  - Configuration Property File Paths - Enter the directory or directories containing your target filesystem's HDFS configuration (such as the core-site.xml and hdfs-site.xml) on your Data Migrator host's local filesystem. You require this if you have Kerberos or a High Availability (HA) HDFS.
    note
    Data Migrator reads core-site.xml and hdfs-site.xml once, during filesystem creation, applying any configuration within paths added under Configuration Property File Paths. After creation, these paths are no longer visible in the UI. You can see all filesystem properties using the API.
  - Configuration Property Overrides (Optional) - Enter override properties or additional properties for your HDFS filesystem by adding key/value pairs.
Select Save. You can now use your HDFS target in data migrations.

Configure a HDFS target filesystem with the CLI

To create a HDFS target, run the filesystem add hdfs command in the Data Migrator CLI:

Add a Hadoop Distributed File System
filesystem add hdfs       [--file-system-id] string  
                          [--default-fs] string  
                          [--user] string
                          [--kerberos-principal] string  
                          [--kerberos-keytab] string  
                          [--properties-files] list  
                          [--properties] string

Mandatory parameters

--file-system-id The ID to give the new filesystem resource.
--default-fs A string that defines how Data Migrator accesses HDFS. You can enter it in the following ways:
1. As a single HDFS URI, such as hdfs://192.168.1.10:8020 (using an IP address) or hdfs://myhost.localdomain:8020 (using a hostname).
2. As a HDFS URI that references a nameservice if the NameNodes have high availability. For example, hdfs://mynameservice. For more information, see HDFS High Availability.
--properties-files Reference a list of existing properties files that contain Hadoop configuration properties in the format used by core-site.xml or hdfs-site.xml.
note
If you're using an HA HDFS filesystem, you must include this parameter. Define the absolute paths to the core-site.xml and hdfs-site.xml files. For example, --properties-files /etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml.

Optional parameters

--user Data Migrator will use this HDFS user to perform operations against the filesystem. If Kerberos is disabled on the filesystem, this user must be the HDFS superuser, such as hdfs.
--kerberos-principal The Kerberos principal to authenticate with and perform migrations as. This principal should map to the HDFS superuser using auth_to_local rules.
--kerberos-keytab The Kerberos keytab that contains the principal defined for the --kerberos-principal parameter. This must be accessible to the local system user running the Data Migrator service (default is hdfs).
--properties Enter properties to use in a comma-separated key/value list.

Other parameters

Exclude these parameters from the command unless you want to create an HDFS source filesystem:

--source This parameter creates the filesystem as a source.
--scan-only This parameter creates a static source filesystem for one-time migrations. This parameter needs the --source parameter.
--success-file This parameter uses a file name or glob pattern for files that Data Migrator will migrate last from the directory they're contained in. For example, --success-file /mypath/myfile.txt or --success-file /**_SUCCESS. You can use these files to confirm the source directory they're in has finished migrating. This parameter only applies to source filesystems.

Examples

note

If you enter a HDFS filesystem as a target, the property files for the target cluster must exist on the local filesystem and be accessible to the Data Migrator system user.

Example for target NameNode HA cluster with Kerberos enabled

filesystem add hdfs --file-system-id mytarget --default-fs hdfs://targetnameservice --properties-files /etc/targetClusterConfig/core-site.xml,/etc/targetClusterConfig/hdfs-site.xml --kerberos-keytab /etc/security/keytabs/hdfs.headless.keytab --kerberos-principal hdfs@SOURCEREALM.COM

Example for target single NameNode cluster

filesystem add hdfs --file-system-id mytarget --default-fs hdfs://namenode.targetdomain:8020 --user hdfs

Configure Kerberos

From the tabs below, select which Kerberos use case you want to set up to view the relevant instructions.

Source only
Target only
Source and target with cross-realm trust
Source and target without cross-realm trust

Use Kerberos on the source filesystem only

To set up Kerberos on a source filesystem, enter the Kerberos details for your HDFS source during the HDFS source creation process.

Use Kerberos on the target filesystem only

To migrate data from a source filesystem without Kerberos to a target with Kerberos:

Copy the krb5.conf file with the configuration and keytabs for your target HDFS to your source HDFS.
Open /etc/wandisco/livedata-migrator/vars.env and add the file path for your krb5.conf file to LDM_EXTRA_JVM_ARGS. For example:
```
LDM_EXTRA_JVM_ARGS="-Djava.security.krb5.conf=/etc/remote/krb5.conf"
```
Restart Data Migrator.
Enter the Kerberos parameters for the Kerberos configuration you moved to your source HDFS during target filesystem creation (see above).

Next steps

If you haven't already, configure a source filesystem from which to migrate data. Then, you can create a migration to migrate data to your new HDFS target.

Configure a HDFS target

Configure a HDFS target filesystem

Prerequisites

Configure a HDFS target filesystem with the UI

Configure a HDFS target filesystem with the CLI

Mandatory parameters

Optional parameters

Other parameters

Examples

Configure Kerberos

Use Kerberos on the source filesystem only

Use Kerberos on the target filesystem only

Use Kerberos on both filesystems with cross-realm trust

Use Kerberos on both filesystems without cross-realm trust

Next steps

Configure a HDFS target filesystem​

Prerequisites​

Configure a HDFS target filesystem with the UI​

Configure a HDFS target filesystem with the CLI​

Mandatory parameters​

Optional parameters​

Other parameters​

Examples​

Configure Kerberos​

Use Kerberos on the source filesystem only​

Use Kerberos on the target filesystem only​

Use Kerberos on both filesystems with cross-realm trust​

Use Kerberos on both filesystems without cross-realm trust​

Next steps​

Configure a HDFS target filesystem

Prerequisites

Configure a HDFS target filesystem with the UI

Configure a HDFS target filesystem with the CLI

Mandatory parameters

Optional parameters

Other parameters

Examples

Configure Kerberos

Use Kerberos on the source filesystem only

Use Kerberos on the target filesystem only

Use Kerberos on both filesystems with cross-realm trust

Use Kerberos on both filesystems without cross-realm trust

Next steps