Configure a HDFS target
You can migrate data to a Hadoop Distributed File System (HDFS) by configuring it as a target filesystem for Data Migrator.
Follow these steps to create a HDFS target:
Configure a HDFS target filesystem
Prerequisites
You need the following:
- A HDFS cluster running Hadoop 2.6 or above.
- If Kerberos is enabled on your filesystem, a valid keytab containing a suitable principal for the HDFS superuser must be available on the host machine for Data Migrator. See Configure Kerberos.
- Oracle Big Data Services (BDS) - If running with Oracle's Distribution of Apache Hadoop (ODH), Data Migrator must provide fully qualified hostnames for DNS. Ensure that the following configuration property overrides are added for the target filesystem: Configuration property overrides are an option under Advanced Configuration on the Target Filesystem Configuration screen.
dfs.client.use.datanode.hostname=true
dfs.datanode.use.datanode.hostname=true
- UI
- CLI
Configure a HDFS target filesystem with the UI
From the Dashboard, select a product under Products.
In the Filesystems & Agents menu, select Filesystems.
Select Add target filesystem.
Enter the following details:
- Filesystem Type - The type of filesystem target. Select Hadoop Distributed File System (HDFS).
- Display Name - Enter a name for your target filesystem.
- Default FS - Enter the
fs.defaultFS
value from your HDFS configuration. For example,hdfs://nameservice:8020
. - User - If you're not running Kerberos on your target HDFS cluster, enter the name of the filesystem user you want to migrate data with.
- Kerberos Configuration - The details of your Kerberos configuration. You can authenticate with Kerberos using multi-realm Kerberos, cross-realm trust or target-only Kerberos. See Configure Kerberos.
- Kerberos Principal - Enter a principal that will map to a HDFS superuser.
- Kerberos Keytab Location - Enter the path to the Kerberos keytab file containing the Kerberos Principal. The keytab file must be accessible from the edge node where Data Migrator is installed.
- Advanced Configuration
- Configuration Property File Paths - Enter the directory or directories containing your target filesystem's HDFS configuration (such as the
core-site.xml
andhdfs-site.xml
) on your Data Migrator host's local filesystem. You require this if you have Kerberos or a High Availability (HA) HDFS.noteData Migrator reads
core-site.xml
andhdfs-site.xml
once, during filesystem creation, applying any configuration within paths added under Configuration Property File Paths. After creation, these paths are no longer visible in the UI. You can see all filesystem properties using the API. - Configuration Property Overrides (Optional) - Enter override properties or additional properties for your HDFS filesystem by adding key/value pairs.
- Configuration Property File Paths - Enter the directory or directories containing your target filesystem's HDFS configuration (such as the
Select Save. You can now use your HDFS target in data migrations.
Configure a HDFS target filesystem with the CLI
To create a HDFS target, run the filesystem add hdfs
command in the WANdisco CLI:
filesystem add hdfs [--file-system-id] string
[--default-fs] string
[--user] string
[--kerberos-principal] string
[--kerberos-keytab] string
[--properties-files] list
[--properties] string
Mandatory parameters
--file-system-id
The ID to give the new filesystem resource.--default-fs
A string that defines how Data Migrator accesses HDFS. You can enter it in the following ways:- As a single HDFS URI, such as
hdfs://192.168.1.10:8020
(using an IP address) orhdfs://myhost.localdomain:8020
(using a hostname). - As a HDFS URI that references a nameservice if the NameNodes have high availability. For example,
hdfs://mynameservice
. For more information, see HDFS High Availability.
- As a single HDFS URI, such as
--properties-files
Reference a list of existing properties files that contain Hadoop configuration properties in the format used bycore-site.xml
orhdfs-site.xml
.noteIf you're using an HA HDFS filesystem, you must include this parameter. Define the absolute paths to the
core-site.xml
andhdfs-site.xml
files. For example,--properties-files /etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml
.
Optional parameters
--user
Data Migrator will use this HDFS user to perform operations against the filesystem. If Kerberos is disabled on the filesystem, this user must be the HDFS superuser, such ashdfs
.--kerberos-principal
The Kerberos principal to authenticate with and perform migrations as. This principal should map to the HDFS superuser using auth_to_local rules.--kerberos-keytab
The Kerberos keytab that contains the principal defined for the--kerberos-principal
parameter. This must be accessible to the local system user running the Data Migrator service (default ishdfs
).--properties
Enter properties to use in a comma-separated key/value list.
Other parameters
Exclude these parameters from the command unless you want to create an HDFS source filesystem:
--source
This parameter creates the filesystem as a source.--scan-only
This parameter creates a static source filesystem for one-time migrations. This parameter needs the--source
parameter.--success-file
This parameter uses a file name or glob pattern for files that Data Migrator will migrate last from the directory they're contained in. For example,--success-file /mypath/myfile.txt
or--success-file /**_SUCCESS
. You can use these files to confirm the source directory they're in has finished migrating. This parameter only applies to source filesystems.
Examples
If you enter a HDFS filesystem as a target, the property files for the target cluster must exist on the local filesystem and be accessible to the Data Migrator system user.
filesystem add hdfs --file-system-id mytarget --default-fs hdfs://targetnameservice --properties-files /etc/targetClusterConfig/core-site.xml,/etc/targetClusterConfig/hdfs-site.xml --kerberos-keytab /etc/security/keytabs/hdfs.headless.keytab --kerberos-principal hdfs@SOURCEREALM.COM
filesystem add hdfs --file-system-id mytarget --default-fs hdfs://namenode.targetdomain:8020 --user hdfs
Configure Kerberos
From the tabs below, select which Kerberos use case you want to set up to view the relevant instructions.
- Source only
- Target only
- Source and target with cross-realm trust
- Source and target without cross-realm trust
Use Kerberos on the source filesystem only
To set up Kerberos on a source filesystem, enter the Kerberos details for your HDFS source during the HDFS source creation process.
Use Kerberos on the target filesystem only
To migrate data from a source filesystem without Kerberos to a target with Kerberos:
- Copy the
krb5.conf
file with the configuration and keytabs for your target HDFS to your source HDFS. - Open
/etc/wandisco/livedata-migrator/vars.env
and add the file path for yourkrb5.conf
file toLDM_EXTRA_JVM_ARGS
. For example:LDM_EXTRA_JVM_ARGS="-Djava.security.krb5.conf=/etc/remote/krb5.conf"
- Restart Data Migrator.
- Enter the Kerberos parameters for the Kerberos configuration you moved to your source HDFS during target filesystem creation (see above).
Use Kerberos on both filesystems with cross-realm trust
Use cross-realm trust to use your source HDFS Kerberos credentials to connect to your target HDFS as well.
To use cross-realm trust to migrate data from a Kerberos-enabled source filesystem to a Kerberos-enabled target filesystem, use a Kerberos configuration with the correct cross-realm trust settings during the creation of an HDFS source filesystem.
See the links below for Kerberos configuration guidance for common Hadoop distributions:
Use Kerberos on both filesystems without cross-realm trust
See Configure Kerberos.
Next steps
If you haven't already, configure a source filesystem from which to migrate data. Then, you can create a migration to migrate data to your new HDFS target.