Configure Kerberos
Data Migrator supports a Kerberos-enabled Hadoop Distributed File System (HDFS) environment as a source filesystem. Data Migrator runs as a service and uses a Kerberos keytab for authentication. It doesn't use credential caches or passwords for the principal.
Kerberos use cases
Data Migrator supports the following Kerberos use cases:
- Kerberos is only active on the source filesystem
- Kerberos is active on both the source and target filesystems, with cross-realm trust enabled
- Configure this in your Kerberos configuration files. See Configuration steps for multi-realm Kerberos.
- Kerberos is only active on the target filesystem
- Install Data Migrator on the intended target filesystem and set up the source cluster as a HDFS source filesystem.
- Set up
krb5.conf
on the source filesystem with the necessary configuration and keytabs to give it access to the target filesystem through Kerberos.
- Kerberos is active on both the source and target filesystems, but cross-realm trust is not enabled
- See the next section.
Kerberos is active on both the source and target filesystems, but cross-realm trust is not enabled
Data Migrator supports migration between clusters that run independent Kerberos implementations.
Prerequisites
The Data Migrator service needs access to:
- A copy of the Kerberos configuration file (
krb5.conf
) with both realms defined. - The domain, Key Distribution Center (KDC), and admin server mappings set.
- A keytab with the principal for the HDFS superuser of the target filesystem.
Network communication must also be available from the Data Migrator service to the KDC and the admin server of the target filesystem.
Set up Kerberos without cross-realm trust
Copy the target cluster keytab and sites files, and place them on the source filesystem:
On the source filesystem, create a folder for the keytab and Hadoop sites configuration files:
mkdir -p /path/keytabs
mkdir -p /path/sites
Copy the keytab files from the target filesystem to the source:
For Data Migrator, copy your HDFS keytab.
scp root@<target_host>:/etc/security/keytabs/hdfs.headless.keytab /path/keytabs/
For Hivemigrator, copy your Hive service keytab.
scp root@<target_host>:/etc/security/keytabs/hive.service.keytab /path/keytabs/
Ensure the keytab files have the correct owner and group (For example, your service user):
For Data Migrator,
chown hdfs:hadoop /path/keytabs/hdfs.headless.keytab
For Hivemigrator,
chown hive:hadoop /path/keytabs/hive.service.keytab
Copy the
core-site.xml
andhdfs-site.xml
files from the target to the source:scp root@<target_host>:/etc/hadoop/conf/core-site.xml /path/sites/
scp root@<target_host>:/etc/hadoop/conf/hdfs-site.xml /path/sites/
Create a copy of the Kerberos configuration file on all Data Migrator instances. For example:
copy krb5.conf using superuser privilegescp -p /etc/krb5.conf /etc/remote/krb5.conf
Adjust the service variables to use the Kerberos configuration file:
For Data Migrator, open
/etc/wandisco/livedata-migrator/vars.env
and add the followingLDM_EXTRA_JVM_ARGS
:LDM_EXTRA_JVM_ARGS="-Djava.security.krb5.conf=/etc/remote/krb5.conf"
For Hivemigrator, open
/etc/wandisco/hivemigrator/vars.sh
and add the followingLDM_EXTRA_JVM_ARGS
:HVM_EXTRA_JVM_ARGS="-Djava.security.krb5.conf=/etc/remote/krb5.conf"
In
/etc/remote/krb5.conf
, add the realm from the target to the source, and map[domain-realm]
from the target domain to the target realm.Add the target realm, copied from target
/etc/remote/krb5.conf
:Source /etc/remote/krb5.conf[realms]
SOURCE_REALM = {
kdc = <source-host-domain.com>
admin_server = <source-host-domain.com>
} //Source realm
TARGET_REALM = {
kdc = target-host-domain.com
admin_server = target-host-domain.com
}Insert the following mapping into the
[domain_realm]
section of the/etc/remote/krb5.conf
:Source /etc/remote/krb5.conf[domain_realm]
.wandisco.hadoop1 = SOURCE_REALM
wandisco.hadoop1 = SOURCE_REALM
.host-domain.com = SOURCE_REALM
host-domain.com = SOURCE_REALM
target-host2-domain.com = TARGET_REALM
.target-host2-domain.com = TARGET_REALMThe explicit address mapping of
target-host2
stops the target realm from being mapped as the source realm if their domain patterns match.Restart Data Migrator services. For example:
systemctl restart livedata-migrator
systemctl restart hivemigrator
Create the source and target filesystems
You can create the source and target filesystems in the WANdisco CLI or the WANdisco UI.
Create the filesystems in the CLI
Run the following commands in the CLI:
Use the local keytab location and local Kerberos principal in the creation of the source filesystem:
filesystem add hdfs --file-system-id sourceHdfs --kerberos-keytab /etc/security/keytabs/hdfs.headless.keytab --kerberos-principal <source_principal> --source --properties-files /etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml --user hdfs
Use the downloaded keytab and Kerberos principal from the target cluster in the creation of the target filesystem:
filesystem add hdfs --file-system-id targetHdfs --kerberos-keytab /path/keytabs/hdfs.headless.keytab --kerberos-principal <target_principal> --properties-files /path/sites/core-site.xml,/path/sites/hdfs-site.xml --user hdfs
See the Command Reference for more information.
Create a test migration to validate that the Kerberos configuration is working correctly.
Create the filesystems in the UI
For the source filesystem, auto-discovery picks up the Kerberos configuration details automatically. If you manually add a source filesystem, enter the same Kerberos Configuration as shown in the following steps.
Add a target filesystem in the UI and complete the following steps:
Select the applicable Data Migrator instance from the Products panel.
Add an Apache Hadoop target filesystem in the Filesystem Configuration page.
Under Kerberos Configuration, enter the Kerberos Principal and the Kerberos Keytab Location.
Under Advanced Configuration, enter the paths for
core-site.xml
andhdfs.xml
into the Configuration Property Files Paths entry field. The default path is:Example path containing source cluster configuration/etc/hadoop/conf
Configuration steps for multi-realm Kerberos
By default, a Kerberos principal must match against a rule that transforms the principal to a short form, such as a user account name (without '@' or '/'). Otherwise, a principal won't be authorized. The default_realm
on both the target and source filesystems must match to prevent the following error:
javax.security.auth.login.LoginException:
java.lang.IllegalArgumentException: Illegal principal name hdfs@WANDISCO.SOURCE:
org.apache.hadoop.security.authentication.util.KerberosName$NoMatchingRule:
No rules applied to hdfs@WANDISCO.SOURCE"
For example, in the following scenario, the DEFAULT rule won't work because the source default_realm
[WANDISCO.SOURCE] can't be applied to the target principal [WANDISCO.TARGET]:
Source principal: hdfs@WANDISCO.SOURCE
Target principal: hdfs@WANDISCO.TARGET
To make sure the default_realm
on the target and the source match, apply these steps:
Open the
core-site.xml
file and modify thehadoop.security.auth_to_local
property as follows:Example error<property>
<name>hadoop.security.auth_to_local</name>
<value>RULE:[1:$1@$0](.*@\QWANDISCO.TARGET\E$)s/@\QWANDISCO.TARGET\E$//
DEFAULT</value>
</property>Construct the local name using this expression:
[n:string](regexp)s/pattern/replacement/g.
(.*@\QWANDISCO.TARGET\E$)
- Applies the following rule: If the principal matches principals with realm "@WANDISCO.TARGET", apply the following rule in the next line:
s/@\QWANDISCO.TARGET\E$//
- Applies the following rule: Use a regular expression replacement to match the entire principal (
hdfs@WANDISCO.TARGET
) and replace it with the user "hdfs".
- Save the file. You don't need to restart Data Migrator to apply the changes.
Cross-realm Hadoop now functions as intended because the default_realm
configuration matches both realms.
For more information, see the Apache documentation: Mapping from Kerberos principals to OS user accounts.
Troubleshooting
See the Kerberos troubleshooting section if you have any problems setting up Kerberos.