Command reference
System service commands
The service scripts are used to control operation of each individual service. In most supported Linux distributions, the following commands can be used to manage Data Migrator, Hive Migrator, and UI processes.
Data Migrator
systemd command | Use it to... |
---|---|
systemctl start livedata-migrator | Start a service that isn't currently running. |
systemctl stop livedata-migrator | Stop a running service. |
systemctl restart livedata-migrator | Run a command that performs a stop and then a start .If the service isn't running, this works the same as a start command. |
systemctl status livedata-migrator | Get details of the running service's status. |
Running without systemd
For servers running older versions of Linux that don't include systemd, change the listed commands to use the following format:
service <service name> <command>
service livedata-migrator restart
If you're working in an environment without systemd or a system and service manager, you need to run the start.sh
script located in /opt/wandisco/livedata-migrator
again after running the restart command.
CentOS 6 systems don't support the service
command. Instead, use initctl
with the format:
initctl <command> <service name>
For example:
initctl restart livedata-migrator
Hive Migrator
Service script | Use it to... |
---|---|
systemctl start hivemigrator | Start a service that isn't currently running. |
systemctl stop hivemigrator | Stop a running service. |
systemctl restart hivemigrator | Run a command that performs a stop and then a start .If the service isn't running, this works the same as a start command. |
systemctl status hivemigrator | Get details of the running service's status. |
Always start/restart Hive Migrator services in the following order:
- Remote agents
- Hive Migrator service.
Not starting services in this order may cause live migrations to fail.
If you're working in an environment without systemd or a system and service manager, you need to run the start.sh
script located in /opt/wandisco/hivemigrator
again after running the restart command.
Running without systemd
For servers running older versions of Linux that don't include systemd, change the listed commands to use the following format:
service <service name> <command>
service hivemigrator status
CentOS 6 systems don't support the service
command. Instead, use initctl
with the format:
initctl <command> <service name>
For example:
initctl start hivemigrator
Hive Migrator remote server
Service script | Use it to... |
---|---|
systemctl start hivemigrator-remote-server | Start a service that isn't currently running. |
systemctl stop hivemigrator-remote-server | Stop a running service. |
systemctl restart hivemigrator-remote-server | Run a command that performs a stop and then a start .If the service isn't running, this works the same as a start command. |
systemctl status hivemigrator-remote-server | Get details of the running service's status. |
Always start/restart Hive Migrator services in the following order:
- Remote agents
- Hive Migrator service.
Not starting services in this order may cause live migrations to fail.
If you're working in an environment without systemd or a system and service manager, you need to run the start.sh
script located in /opt/wandisco/hivemigrator
again after running the restart command.
Running without systemd
For servers running older versions of Linux that don't include systemd, change the listed commands to use the following format:
service <service name> <command>
service hivemigrator-remote-server status
CentOS 6 systems don't support the service
command. Instead, use initctl
with the format:
initctl <command> <service name>
For example:
initctl start hivemigrator-remote-server
UI
Service script | Use it to... |
---|---|
systemctl start livedata-ui | Start a service that isn't currently running. |
systemctl stop livedata-ui | Stop a running service. |
systemctl restart livedata-ui | Run a command that performs a stop and then a start .If the service isn't running, this works the same as a start command. |
systemctl status livedata-ui | Get details of the running service's status. |
Running without systemd
For servers running older versions of Linux that don't include systemd, change the listed commands to use the following format:
service <service name> <command>
service livedata-ui status
If you're working in an environment without systemd or a system and service manager, you need to run the start.sh
script located in /opt/wandisco/livedata-ui
again after running the restart command.
CentOS 6 systems don't support the service
command. Instead, use initctl
with the format:
initctl <command> <service name>
For example:
initctl start livedata-ui
Data transfer agents
systemd command | Use it to... |
---|---|
systemctl start livedata-migrator-data-agent | Start a service that isn't currently running. |
systemctl stop livedata-migrator-data-agent | Stop a running service. |
systemctl restart livedata-migrator-data-agent | Run a command that performs a stop and then a start .If the service isn't running, this works the same as a start command. |
systemctl status livedata-migrator-data-agent | Get details of the running service's status. |
Running without systemd
For servers running older versions of Linux that don't include systemd, change the listed commands to use the following format:
service <service name> <command>
service livedata-migrator-data-agent restart
If you're working in an environment without systemd or a system and service manager, you need to run the start.sh
scripts located in /opt/wandisco/livedata-migrator-data-agent
again after running the restart commands.
CentOS 6 systems don't support the service
command. Instead, use initctl
with the format:
initctl <command> <service name>
For example:
initctl restart livedata-migrator-data-agent
Connect to the CLI
Open a terminal session on the Data Migrator host machine and enter the following command:
livedata-migrator
When the CLI connects to the Data Migrator and Hive Migrator services, you get the following command prompt:
Cirata LiveData Migrator >>
The CLI is now ready to accept commands.
Optional parameters
--host
The IP or hostname of the Data Migrator API to connect to. Defaults to localhost when not specified.--vm-port
Data Migrator API port. Defaults to 18080 when not specified.--hm-port
Hivemigrator API port. Defaults to 6780 when not specified.--lm-ssl
Flag to use https. Defaults to http when not specified.
Version check
Check the current versions of included components by using the livedata-migrator
command with the --version
parameter. For example:
# livedata-migrator --version
This doesn't start the CLI. You get a list of the current Data Migrator components, along with their version numbers.
CLI features
Feature | How to use it |
---|---|
Review available commands | Use the help command to get details of all commands available. |
Command completion | Hit the <tab> key at any time to get assistance or to complete partially-entered commands. |
Cancel input | Type <Ctrl-C> before entering a command to return to an empty action prompt. |
Syntax indication | Invalid commands are highlighted as you type. |
Clear the display | Type <Ctrl-L> at any time. |
Previous commands | Navigate previous commands using the up and down arrows, and use standard emacs shortcuts. |
Interactive or scripted operation | You can interact with the command line interface directly, or send it commands on standard input to incorporate it into shell scripts. See script for more information and examples. |
CLI commands
You can manage filesystems, migrations, and more in the CLI.
Backup commands
backup add
backup add
backup config show
backup config show
{
"backupsLocation": "/opt/wandisco/livedata-migrator/db/backups",
"lastSuccessfulTs": 0,
"backupSchedule": {
"enabled": true,
"periodMinutes": 10
},
"storedFilePaths": [
"/etc/wandisco/livedata-migrator/application.properties",
"/etc/wandisco/livedata-migrator/vars.env",
"/etc/wandisco/livedata-migrator/logback-spring.xml",
"/etc/wandisco/livedata-migrator/application.properties",
"/etc/wandisco/livedata-migrator/vars.env",
"/etc/wandisco/livedata-migrator/logback-spring.xml"
]
}
backup list
backup list
backup restore
backup restore --name string
backup schedule configure
backup schedule configure --period-minutes 10 --enable
{
"enabled": true,
"periodMinutes": 10
}
backup schedule show
backup schedule show
{
"enabled": true,
"periodMinutes": 10
}
backup show
backup show --name string
Bandwidth policy commands
bandwidth policy delete
bandwidth policy delete
bandwidth policy set
bandwidth policy set [--value] long
[--unit] string
[--data-agent] string
Mandatory parameters
--value
Define the number of byte units.--unit
Define the byte unit to be used.
Decimal units:B
,KB
,MB
,GB
,TB
,PB
.
Binary units:KiB
,MiB
,GiB
,TiB
,PiB
.
Optional parameters
--data-agent
Apply the limit to a specified data agent.
Example
bandwidth policy set --value 10 --unit MB
bandwidth policy set --data-agent DTA1 --value 10 --unit MB
bandwidth policy show
bandwidth policy show
Data transfer agent commands
agent add
Add a new agent.
Mandatory parameters
--agent-name
User-specified agent name.
You must enter a value for either the --agent-token
or the --agent-token-file
parameter:
--agent-token
Connection token text provided by the token generator. You can use the content of/opt/wandisco/livedata-migrator-data-agent/connection_token
in the node on which you're installing the agent.--agent-token-file
Path to file contains connection token, for example/opt/wandisco/livedata-migrator-data-agent/connection_token
. Ensure the token file is accessible on the Data Migrator host.
agent add --agent-name dta1 --agent-token-file /opt/wandisco/livedata-migrator-data-agent/connection_token
To check the agent was added, run:
agent show --agent-name example_name
Register an agent
curl -X POST -H "Content-Type: application/json" -d @/opt/wandisco/livedata-migrator-data-agent/reg_data_agent.json http://migrator-host:18080/scaling/dataagents/
curl -X GET http://migrator-host:18080/scaling/dataagents/example_name
migrator-host
is the host where Data Migrator is installed.
Start an agent
service livedata-migrator-data-agent start
Remove an agent
agent delete --agent-name example_name
agent delete --agent-name agent-example-vm.bdauto.wandisco.com
Mandatory parameters
--agent-name
The name you give the agent which can be a string such asagent-example-vm.bdauto.wandisco.com
.
View an agent
agent show --agent-name example_name
agent show --agent-name agent-example-vm.bdauto.wandisco.com
{
"name": "agent-example-vm.bdauto.wandisco.com",
"host": "example-vm.bdauto.wandisco.com",
"port": 1433,
"type": "GRPC",
"version": "2.0.0",
"healthy": true,
"health": {
"lastStatusUpdateTime": 1670924489556,
"lastHealthMessage": "Agent agent-example-vm.bdauto.wandisco.com - health check became OK",
"status": "CONNECTED"
Mandatory parameters
--agent-name
User-specified agent name.
agent list
List all agents.
Filesystem commands
filesystem add adls2 oauth
Add an Azure Data Lake Storage (ADLS) Gen2 container as a migration target using the filesystem add adls2 oauth
command, which requires a service principal and OAuth 2 credentials.
The service principal that you want to use must have either the Storage Blob Data Owner role assigned to the ADLS Gen2 storage account, or an access control list with RWX permissions for the migration path and all parent paths. For more information, see the Microsoft documentation.
filesystem add adls2 oauth [--container-name] string
[--file-system-id] string
[--insecure]
[--oauth2-client-endpoint] string
[--oauth2-client-id] string
[--oauth2-client-secret] string
[--properties] string
[--properties-files] list
[--scan-only]
[--source]
[--storage-account-name] string
Mandatory parameters
--container-name
The name of the container in the storage account to which content will be migrated.--file-system-id
The ID to give the new filesystem resource.--oauth2-client-endpoint
The client endpoint for the Azure service principal.
This will often take the form ofhttps://login.microsoftonline.com/{tenant}/oauth2/v2.0/token
where{tenant}
is the directory ID for the Azure service principal. You can enter a custom URL (such as a proxy endpoint that manually interfaces with Azure Active Directory (Azure AD)).--oauth2-client-id
The client ID (also known as application ID) for your Azure service principal.--oauth2-client-secret
The client secret (also known as application secret) for the Azure service principal.--storage-account-name
The name of the ADLS Gen2 storage account to target.
Optional parameters
--insecure
If you enter this parameter, Data Migrator will not use TLS to encrypt communication with ADLS Gen2. This may improve throughput, but should only be used when you have other means of securing communication.--properties
Enter properties to use in a comma-separated key/value list.--properties-files
Reference a list of existing properties files, each containing Hadoop configuration properties in the format used bycore-site.xml
orhdfs-site.xml
.--scan-only
Supply this parameter to create a static source filesystem for use in one-time, non-live migrations. Requires--source
.--source
Add this filesystem as the source for migrations.
Example
filesystem add adls2 oauth --file-system-id mytarget
--storage-account-name myadls2
--oauth2-client-id b67f67ex-ampl-e2eb-bd6d-client9385id
--oauth2-client-secret 2IPO8*secretk-9OPs8n*TexampleHJ=
--oauth2-client-endpoint https://login.microsoftonline.com/78u098ex-ampl-e498-8bce-ndpoint5f2e5/oauth2/v2.0/token
--container-name lm2target
filesystem add adls2 sharedKey
Add an ADLS Gen2 container as a migration target using the filesystem add adls2 sharedKey
command, which requires credentials in the form of an account key.
filesystem add adls2 sharedKey [--file-system-id] string
[--storage-account-name] string
[--container-name] string
[--insecure]
[--shared-key] string
[--properties-files] list
[--properties] string
[--scan-only]
[--source]
Mandatory parameters
--file-system-id
The ID to give the new filesystem resource.--storage-account-name
The name of the ADLS Gen2 storage account to target.--shared-key
The shared account key to use as credentials to write to the storage account.--container-name
The name of the container in the storage account to which content will be migrated.
Optional parameters
--insecure
If you enter this parameter, Data Migrator will not use TLS to encrypt communication with ADLS Gen2. This may improve throughput, but should only be used when you have other means of securing communication.--properties-files
Reference a list of existing properties files, each containing Hadoop configuration properties in the format used bycore-site.xml
orhdfs-site.xml
.--properties
Enter properties to use in a comma-separated key/value list.--scan-only
Supply this parameter to create a static source filesystem for use in one-time, non-live migrations. Requires--source
.--source
Add this filesystem as the source for migrations.
Example
filesystem add adls2 sharedKey --file-system-id mytarget
--storage-account-name myadls2
--container-name lm2target
--shared-key Yi8NxHGqoQ79DBGLVn+COK/sRDwbNqAEXAMPLEDaMxRkvXt2ijUtASHAREDj/vaS/NbzR5rtjEKEY31eIopUVA==
filesystem add gcs
Add a Google Cloud Storage as a migration target using the filesystem add gcs
command, which requires credentials in the form of an account key file.
filesystem add gcs [--file-system-id] string
[--service-account-json-key-file] string
[--service-account-p12-key-file] string
[--service-account-json-key-file-server-location] string
[--service-account-p12-key-file-server-location] string
[--service-account-email] string
[--bucket-name] string
[--properties-files] list
[--properties] string
[--source]
Mandatory parameters
--file-system-id
The ID to give the new filesystem resource.--bucket-name
The bucket name of a Google Cloud Storage account.Service account key parameters
infoEnter your service account key for the Google Cloud Storage bucket by choosing one of the parameters below.
You can also upload the service account key directly when using the UI (this isn't supported through the CLI).
--service-account-json-key-file-server-location
The absolute filesystem path on the Data Migrator server of your service account key file in JSON format. You can either create a Google Cloud Storage service account key or use an existing one.
In the UI, this is called Key File and becomes visible when you select Key File Options -> Provide a Path.--service-account-p12-key-file-server-location
The absolute filesystem path on the Data Migrator server of your service account key file in P12 format. You can either create a Google Cloud Storage service account key or use an existing one.--service-account-json-key-file
The absolute filesystem path on the host running the Data Migrator CLI of your service account key file in JSON format. Use this parameter if you're running the CLI on a different host to your Data Migrator server.--service-account-p12-key-file
The absolute filesystem path on the host running the Data Migrator CLI of your service account key file in P12 format. Use this parameter if you're running the CLI on a different host to your Data Migrator server.
Optional parameters
--service-account-email
The email address linked to your Google Cloud Storage service account.--source
Enter this parameter to use the filesystem resource created as a source.--service-account-email
The email address linked to your Google Cloud Storage service account.--properties-files
Reference a list of existing properties files, each containing Hadoop configuration properties in the format used bycore-site.xml
orhdfs-site.xml
.--properties
Enter properties to use in a comma-separated key/value list.
Example
filesystem add gcs --file-system-id gcsAgent
--bucket-name myGcsBucket
--service-account-p12-key-file-server-location /user/hdfs/targetStorage/myAccountKey.p12
--service-account-email user@mydomain.com
filesystem add hdfs
Add a Hadoop Distributed File System (HDFS) as either a migration source or target using the filesystem add hdfs
command.
Creating a HDFS resource with this command will normally only be used when migrating to a target HDFS filesystem (rather than another storage service like ADLS Gen2 or S3a). Data Migrator will attempt to auto-discover the source HDFS when started from the command line unless Kerberos is enabled on your source environment.
If Kerberos is enabled on your source environment, use the filesystem auto-discover-source hdfs
command to enter Kerberos credentials and auto-discover your source HDFS configuration.
filesystem add hdfs [--file-system-id] string
[--default-fs] string
[--user] string
[--kerberos-principal] string
[--kerberos-keytab] string
[--source]
[--scan-only]
[--success-file] string
[--properties-files] list
[--properties] string
Mandatory parameters
--file-system-id
The ID to give the new filesystem resource.--default-fs
A string that defines how Data Migrator accesses HDFS.
It can be specified in a number of forms:- As a single HDFS URI, such as
hdfs://192.168.1.10:8020
(using an IP address) orhdfs://myhost.localdomain:8020
(using a hostname). - As a HDFS URI that references a nameservice if the NameNodes have high availability, for example,
hdfs://mynameservice
. For more information, see HDFS High Availability.
- As a single HDFS URI, such as
--properties-files
Reference a list of existing properties files that contain Hadoop configuration properties in the format used bycore-site.xml
orhdfs-site.xml
.
Optional parameters
Cross-realm authentication is required in the following scenarios:
- Migration will occur between a source and target HDFS.
- Kerberos is enabled on both clusters.
See the links below for guidance for common Hadoop distributions:
--user
The name of the HDFS user to be used when performing operations against the filesystem. In environments where Kerberos is disabled, this user must be the HDFS super user, such ashdfs
.--kerberos-principal
The Kerberos principal to authenticate with and perform migrations as. This principal should map to the HDFS super user using auth_to_local rules.--kerberos-keytab
The Kerberos keytab containing the principal defined for the--kerberos-principal
parameter. This must be accessible to the local system user running the Data Migrator service (default ishdfs
).--source
Enter this parameter to use the filesystem resource created as a source.--scan-only
Enter this parameter to create a static source filesystem for use in one-time migrations. Requires--source
.--properties
Enter properties to use in a comma-separated key/value list.--success-file
Specify a file name or glob pattern for files that Data Migrator will migrate last from the directory they're contained in. For example,--success-file /mypath/myfile.txt
or--success-file /**_SUCCESS
. You can use these files to confirm the directory they're in has finished migrating.
Properties files are required for NameNode HA
If your Hadoop cluster has NameNode HA enabled, enter the local filesystem path to the properties files that define the configuration for the nameservice ID.
Source HDFS filesystem: These configuration files will likely be in a default location depending on the distribution of the Hadoop cluster.
Target HDFS filesystem: Ensure that the target Hadoop cluster configuration is available on your Data Migrator host's local filesystem.
/etc/hadoop/conf
/etc/targetClusterConfig
Alternatively, define the absolute filesystem paths to these files:
/etc/hadoop/conf/core-site.xml
/etc/hadoop/conf/hdfs-site.xml
/etc/targetClusterConfig/core-site.xml
/etc/targetClusterConfig/hdfs-site.xml
- For the CLI/API, use the
--properties-files
parameter and define the absolute paths to thecore-site.xml
andhdfs-site.xml
files (see the Examples section for CLI usage of this parameter).
Examples
HDFS as source
filesystem add hdfs --file-system-id mysource
--default-fs hdfs://sourcenameservice
--properties-files /etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml
filesystem add hdfs --file-system-id mysource
--default-fs hdfs://sourcenameservice
--properties-files /etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml
--kerberos-keytab /etc/security/keytabs/hdfs.headless.keytab
--kerberos-principal hdfs@SOURCEREALM.COM
HDFS as target
If you enter a HDFS filesystem as a target, the property files for the target cluster must exist on the local filesystem and be accessible to the Data Migrator system user.
filesystem add hdfs --file-system-id mytarget --default-fs hdfs://targetnameservice --properties-files /etc/targetClusterConfig/core-site.xml,/etc/targetClusterConfig/hdfs-site.xml --kerberos-keytab /etc/security/keytabs/hdfs.headless.keytab --kerberos-principal hdfs@SOURCEREALM.COM
filesystem add hdfs --file-system-id mytarget --default-fs hdfs://namenode.targetdomain:8020 --user hdfs
filesystem add local
Add a local filesystem as either a migration target or source using the filesystem add local
command.
filesystem add local [--file-system-id] string
[--fs-root] string
[--source]
[--scan-only]
[--properties-files] list
[--properties] string
Mandatory parameters
--file-system-id
The ID to give the new filesystem resource.
Optional parameters
--fs-root
The directory in the local filesystem to scan for data or send data to, depending on whether the filesystem is defined as a source or a target. Should be supplied using the full directory path from the root.--source
Enter this parameter to use the filesystem resource created as a source.--scan-only
Enter this parameter to create a static source filesystem for use in one-time migrations. Requires--source
.--properties-files
Reference a list of existing properties files.--properties
Enter properties to use in a comma-separated key/value list.
If no fs-root
is specified, the file path will default to the root of your system.
Examples
Local filesystem as source
filesystem add local --file-system-id mytarget --fs-root ./tmp --source
Local filesystem as target
filesystem add local --file-system-id mytarget --fs-root ./Users/username/destinationfolder/
filesystem add s3a
Add an S3-compatible filesystem as a source or target for migration.
For details on which platforms support S3, see Supported sources and targets.
As of Data Migrator 2.1.1 hcfs.ssl.channel.mode
replaces the use of fs.s3a.ssl.channel.mode
and fs.azure.ssl.channel.mode
which are no longer valid.
See SSL implementation for information on the property and values used.
Use the filesystem add s3a
command with the following parameters:
filesystem add s3a [--access-key] string
[--aws-config-file] string
[--aws-profile] string
[--bootstrap.servers] string
[--bucket-name] string
[--credentials-provider] string
[--endpoint] string
[--file-system-id] string
[--properties] string
[--properties-files] list
[--s3type] string
[--scan-only]
[--secret-key] string
[--source]
[--sqs-endpoint] string
[--sqs-queue] string
[--topic] string
For guidance about access, permissions, and security when adding an Amazon S3 bucket as a target filesystem, see Security best practices in IAM.
S3A mandatory parameters
--file-system-id
The ID for the new filesystem resource.--bucket-name
The name of your S3 bucket.--credentials-provider
The Java class name of a credentials provider for authenticating with the Amazon S3 endpoint.
The Provider options available include:org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider
Use this provider to offer credentials as an access key and secret access key with the
--access-key
and--secret-key
Parameters.com.amazonaws.auth.InstanceProfileCredentialsProvider
Use this provider when running Data Migrator on an Elastic Compute Cloud (EC2) instance that has been assigned an IAM role with policies that allow it to access the Amazon S3 bucket.
com.amazonaws.auth.DefaultAWSCredentialsProviderChain
A commonly-used credentials provider chain that looks for credentials in this order:
- Environment Variables -
AWS_ACCESS_KEY_ID
andAWS_SECRET_ACCESS_KEY
, orAWS_ACCESS_KEY
andAWS_SECRET_KEY
. - Java System Properties -
aws.accessKeyId
andaws.secretKey
. - Web Identity Token credentials from the environment or container.
- Credential profiles file at the default location (
~/.aws/credentials
) shared by all AWS SDKs and the AWS CLI. - Credentials delivered through the Amazon EC2 container service if
AWS_CONTAINER_CREDENTIALS_RELATIVE_URI
environment variable is set and security manager has permission to access the variable. - Instance profile credentials delivered through the Amazon EC2 metadata service.
- Environment Variables -
com.wandisco.livemigrator2.fs.ExtendedProfileCredentialsProvider
This provider supports the use of multiple AWS credentials, which are stored in a credentials file.
When adding a source filesystem, use the following properties:
awsProfile - Name for the AWS profile.
awsCredentialsConfigFile - Path to the AWS credentials file. The default path is
~/.aws/credentials
.For example:
filesystem add s3a --file-system-id testProfile1Fs --bucket-name profile1-bucket --credentials-provider com.wandisco.livemigrator2.fs.ExtendedProfileCredentialsProvider --properties awsProfile=<profile-name>,
awsCredentialsConfigFile=</path/to/the/aws/credentials" file>In the CLI, you can also use
--aws-profile
and--aws-config-file
.For example:
filesystem add s3a --file-system-id testProfile1Fs --bucket-name profile1-bucket --credentials-provider com.wandisco.livemigrator2.fs.ExtendedProfileCredentialsProvider --aws-profile <profile-name>
--aws-config-file </path/to/the/aws/credentials/file>Learn more about using AWS profiles: Configuration and credential file settings.
S3A optional parameters
--access-key
When using theorg.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider
credentials provider, enter the access key with this parameter. This is also a required parameter when adding an IBM Cloud Object Storage bucket.--secret-key
When using theorg.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider
credentials provider, enter the secret key using this parameter. This is also a required parameter when adding an IBM Cloud Object Storage bucket.--endpoint
Enter a specific endpoint to access the S3-compatible bucket, such as an AWS PrivateLink endpoint or an IBM COS public regional endoint. If you don't enter a value, the filesystem defaults to AWS.noteUsing
--endpoint
, will supercedefs.s3a.endpoint
, if used as an additional custom property. Don't use the parameters at the same time.--sqs-queue
[Amazon S3 as a source only] Enter an SQS queue name. This field is required if you enter an SQS endpoint.--sqs-endpoint
[Amazon S3 as a source only] Enter an SQS endpoint.--source
Enter this parameter to add the filesystem as a source. See which platforms are supported as a source.--scan-only
Enter this parameter to create a static source filesystem for use in one-time migrations. Requires--source
.--properties-files
Reference a list of existing properties files, each containing Hadoop configuration properties in the format used bycore-site.xml
orhdfs-site.xml
.--properties
Enter properties to use in a comma-separated key/value list.--s3type
Specifies what parameters are required, based on the requirements of your selected s3a-compatible filesystem. Leave it blank for s3-compatible storage or select from the following:- aws
- oracle
- ibmcos
IBM COS as a source only
--bootstrap.servers
The Kafka server address.--topic
Kafka's topic where s3 object change notifications are provided.
S3a default properties
These properties are defined by default when adding an S3a filesystem.
You don't need to define or adjust many of these properties, use caution when making any changes, if you are unsure get in touch with Support for more information.
Enter additional properties for S3 filesystems by adding them as key-value pairs in the UI or as a comma-separated key-value pair list with the --properties
parameter in the CLI. You can overwrite default property values or add new properties.
fs.s3a.impl
(defaultorg.apache.hadoop.fs.s3a.S3AFileSystem
): The implementation class of the S3a Filesystem.fs.AbstractFileSystem.s3a.impl
(defaultorg.apache.hadoop.fs.s3a.S3A
): The implementation class of the S3a AbstractFileSystem.fs.s3a.user.agent.prefix
(defaultAPN/1.0 WANdisco/1.0 LiveDataMigrator/(ldm version)
): Sets a custom value that will be pre-pended to the User-Agent header sent in HTTP requests to the S3 back-end by S3aFileSystem.fs.s3a.impl.disable.cache
(defaulttrue
): Disables the S3 filesystem cache when set to 'true'.hadoop.tmp.dir
(defaulttmp
): The parent directory for other temporary directories.fs.s3a.connection.maximum
(default225
) Defines the maximum number of simultaneous connections to the S3 filesystem.fs.s3a.threads.max
(default150
): Defines the total number of threads to make available in the filesystem for data uploads or any other queued filesystem operation.fs.s3a.max.total.tasks
(default75
): Defines maximum number of tasks allowed for parallel operations.fs.s3a.sqs.init.dir
(default/sqs-init-path
): SQS initialization path.fs.s3a.empty.polls.max.count
(default10
): Maximum number of empty listing responses accepted before considering a directory listing operation as finished.fs.s3a.sqs.messages.max.number
(default10
): Maximum number of messages to pull from an SQS queue in a single request.fs.s3a.sqs.wait.time.sec
(default20
): Duration in seconds to wait for messages in the SQS queue when polling for notifications.fs.s3a.path.events.cache.size
(default0
): Number of entries or paths that can be cached.fs.s3a.path.events.cache.expiration.time.min
(default60
): Time-to-live for entries stored in the events cache.s3a.events.poll.max.retries
(default10
): Maximum number of retries the connector attempts for polling events.fs.s3a.healthcheck
(Defaulttrue
): Allows the S3A filesystem health check to be turned off by changingtrue
tofalse
. This option is useful for setting up Data Migrator while cloud services are offline. However, when disabled, errors in S3A configuration may be missed, resulting in hard-to-diagnose migration stalls.
S3a custom properties
These are some of the additional properties that can be added when creating an S3a filesystem.
fs.s3a.fast.upload.buffer
(defaultdisk
): Defines how the filesystem will buffer the upload.fs.s3a.fast.upload.active.blocks
(default4
): Defines how many blocks a single output stream can have uploading or queued at a given time.fs.s3a.block.size
(default32M
): Defines the maximum size of blocks during file transfer. Use the suffixK
,M
,G
,T
orP
to scale the value in Kilobytes, Megabytes, Gigabytes, Terabytes or Petabytes respectively.fs.s3a.buffer.dir
(defaulttmp
): Defines the directory used by disk buffering.fs.s3a.endpoint.region
(default Current region): Explicitly sets the bucket region.
To configure a Oracle Cloud Storage bucket which isn't in your default region.
Specify a fs.s3a.endpoint.region=<region>
with the --properties
flag when adding the filesystem with the CLI.
Find an additional list of S3a properties in the S3a documentation.
Upload buffering
Migrations using an S3A target destination will buffer all uploads. By default, the buffering will occur on the local disk of system Data Migrator is running on, in the /tmp
directory.
Data Migrator will automatically delete the temporary buffering files once they are no longer needed.
If you want to use a different type of buffering, you can change the property fs.s3a.fast.upload.buffer
. The following values can be supplied:
Buffering Option | Details | Property Value |
---|---|---|
Array Buffer | Buffers the uploaded data in memory instead of on disk, using the Java heap. | array |
Byte Buffer | Buffers the uploaded data in memory instead of on disk, but doesn't use the Java heap. | bytebuffer |
Disk Buffering | The default option. Buffers the upload to disk. | disk |
Both the array
and bytebuffer
options may lead to the consumption of large amounts of memory. Other properties (such as fs.s3a.fast.upload.active.blocks
) may be used to fine-tune the migration to avoid issues.
If you run out of disk space on which to buffer the migration, the migration will stall with a series of errors. To avoid this, ensure the filesystem containing the directory used for buffering (/tmp
by default) has enough remaining space to facilitate the transfer.
S3a Example
filesystem add s3a --file-system-id mytarget
--bucket-name mybucket1
--credentials-provider org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider
--access-key pkExampleAccessKeyiz
--secret-key OP87ChoExampleSecretKeyI904lT7AaDBGJpp0D
IBM Cloud Object Storage Examples
Add source IBM Cloud Object Storage filesystem. Note that this doesn't work if SSL is used on the endpoint address.
filesystem add s3a --source --file-system-id cos_s3_source2
--bucket-name container2
--credentials-provider org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider
--access-key pkExampleAccessKeyiz
--secret-key c3vq6vaNtExampleSecretKeyVuqJMIHuV9IF3n9
--s3type ibmcos
--bootstrap.servers=10.0.0.123:9092
--topic newcos-events--enpoint http://10.0.0.124
Add path mapping.
path mapping add --path-mapping-id testPath
--description description-string
--source-path /
--target targetHdfs2
--target-path /repl_test1
{
"id": "testPath",
"description": "description-string",
"sourceFileSystem": "cos_s3_source2",
"sourcePath": "/",
"targetFileSystem": "targetHdfs2",
"targetPath": "/repl_test1"
}
filesystem auto-discover-source hdfs
Discover your local HDFS filesystem by entering the Kerberos credentials for your source environment.
You can also manually configure the source HDFS filesystem using the filesystem add hdfs
command.
filesystem auto-discover-source hdfs [--kerberos-principal] string
[--kerberos-keytab] string
[--scan-only]
Kerberos parameters
--kerberos-principal
The Kerberos principal to authenticate with and perform migrations as. This principal should map to the HDFS super user using auth_to_local rules.--kerberos-keytab
The Kerberos keytab containing the principal defined for the--kerberos-principal
parameter. This must be accessible to the local system user running the Data Migrator service (default ishdfs
).
Optional
--scan-only
Supply this parameter to create a static source filesystem for use in one-time, non-live migrations.
Example
filesystem auto-discover-source hdfs --kerberos-keytab /etc/security/keytabs/hdfs.headless.keytab --kerberos-principal hdfs@REALM.COM
filesystem clear
Delete all target filesystem references with the filesystem clear
. This leaves any migrated content intact in those targets, but removes all resources that act as references to the target filesystems.
filesystem clear
filesystem delete
Delete a specific filesystem resource by ID. This leaves all migrated content intact at that target, but removes the resource that acts as a reference to that filesystem.
filesystem delete [--file-system-id] string
Mandatory parameters
--file-system-id
The ID of the filesystem resource to delete.
Example
filesystem delete --file-system-id mytarget
filesystem list
List defined filesystem resources.
filesystem list [--detailed]
Mandatory parameters
--detailed
Include all properties for each filesystem in the JSON result.
filesystem show
View details for a filesystem resource.
filesystem show [--file-system-id] string
[--detailed]
Mandatory parameters
--file-system-id
The ID of the filesystem resource to show.
Example
filesystem show --file-system-id mytarget
filesystem types
View information about the filesystem types available for use with Data Migrator.
filesystem types
filesystem update adls2 oauth
Update an existing ADLS Gen2 container migration target with a specified filesystem ID using the filesystem update adls2 oauth
command. You will be prompted to optionally update the service principal and OAuth 2 credentials.
Any optional parameters supplied will update the corresponding details of the existing filesystem. The parameters that can be changed are the same as the ones listed in the filesystem add adls2 oauth
section.
All parameters are optional except --file-system-id
, which specifies the filesystem you want to update.
Example
filesystem update adls2 oauth --file-system-id mytarget --storage-account-name myadls2 --oauth2-client-id b67f67ex-ampl-e2eb-bd6d-client9385id --oauth2-client-secret 2IPO8*secretk-9OPs8n*TexampleHJ= --oauth2-client-endpoint https://login.microsoftonline.com/78u098ex-ampl-e498-8bce-ndpoint5f2e5/oauth2/v2.0/token --container-name lm2target
filesystem update adls2 sharedKey
Update an existing ADLS Gen2 container migration target using the filesystem update adls2 sharedKey
command. You will be prompted to optionally update the secret key.
Any optional parameters supplied will update the corresponding details of the existing filesystem. The parameters that can be changed are the same as the ones listed in the filesystem add adls2 sharedKey
section.
All parameters are optional except --file-system-id
, which specifies the filesystem you want to update.
Example
filesystem update adls2 sharedKey --file-system-id mytarget --storage-account-name myadls2 --container-name lm2target --shared-key Yi8NxHGqoQ79DBGLVn+COK/sRDwbNqAEXAMPLEDaMxRkvXt2ijUtASHAREDj/vaS/NbzR5rtjEKEY31eIopUVA==
filesystem update gcs
Update a Google Cloud Storage migration target using the filesystem update gcs
command.
Any optional parameters supplied will update the corresponding details of the existing filesystem. The parameters that can be changed are the same as the ones listed in the filesystem add gcs
section.
All parameters are optional except --file-system-id
, which specifies the filesystem you want to update.
Example
filesystem update gcs --file-system-id gcsAgent --bucket-name myGcsBucket --service-account-p12-key-file-server-location /user/hdfs/targetStorage/myAccountKey.p12 --service-account-email user@mydomain.com
filesystem update hdfs
Update either a source or target Hadoop Distributed filesystem using the filesystem update hdfs
command.
Any optional parameters supplied will update the corresponding details of the existing filesystem. The parameters that can be changed are the same as the ones listed in the filesystem add hdfs
section.
All parameters are optional except --file-system-id
, which specifies the filesystem you want to update.
Examples
filesystem update hdfs --file-system-id mysource
--default-fs hdfs://sourcenameservice
--properties-files /etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml
filesystem update hdfs --file-system-id mytarget
--default-fs hdfs://sourcenameservice
--properties-files /etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml
--kerberos-keytab /etc/security/keytabs/hdfs.headless.keytab
--kerberos-principal hdfs@SOURCEREALM.COM
filesystem update local
Update a target or source local filesystem using the filesystem update local
command.
Any optional parameters supplied will update the corresponding details of the existing filesystem. The parameters that can be changed are the same as the ones listed in the filesystem add local
section.
All parameters are optional except --file-system-id
, which specifies the filesystem you want to update.
Example
filesystem update local --file-system-id mytarget --fs-root ./tmp
filesystem update s3a
Update an S3 bucket target filesystem using the filesystem update s3a
command. This method also supports IBM Cloud Object Storage buckets.
Any optional parameters supplied will update the corresponding details of the existing filesystem. The parameters that can be changed are the same as the ones listed in the filesystem add s3a
section.
All parameters are optional except --file-system-id
, which specifies the filesystem you want to update.
Example
filesystem update s3a --file-system-id mytarget
--bucket-name mybucket1 --credentials-provider org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider --access-key pkExampleAccessKeyiz --secret-key eSeCreTkeYd8uEDnDHRHuV9IF3n9
Hive agent configuration commands
It's not possible to adjust some TLS parameters for remote metastore agents after creation. Find more information in the following Knowledge base article.
hive agent add azure
Add a local or remote Hive agent to connect to an Azure SQL database using the hive agent add azure
command.
If your Data Migrator host can communicate directly with the Azure SQL database, then a local Hive agent is sufficient. Otherwise, consider using a remote Hive agent.
For a remote Hive agent connection, enter a remote host (Azure VM, HDI cluster node) to communicate with the local Data Migrator server (constrained to a user-defined port).
A small service is deployed on this remote host so that the data can transfer between the Hive agent and the remote metastore.
hive agent add azure [--name] string
[--db-server-name] string
[--database-name] string
[--database-user] string
[--database-password] string
[--auth-method] azure-sqlauthentication-method
[--client-id] string
[--storage-account] string
[--container-name] string
[--insecure] boolean
[--host] string
[--port] integer
[--no-ssl]
[--autodeploy] boolean
[--ssh-user] string
[--ssh-key] file
[--ssh-port] int
[--use-sudo]
[--ignore-host-checking]
[--file-system-id] string
[--keystore-certificate-alias] string
[--keystore-password] string
[--keystore-path] string
[--keystore-trusted-certificate-alias] string
[--keystore-type] string
[--default-fs-override] string
[--certificate-storage-type] string
Mandatory parameters
The Azure Hive agent requires a ADLS Gen2 storage account and container name to generate the correct location for the metadata. The agent doesn't access the container and data isn't written to it.
--name
The ID for the new Hive agent.--db-server-name
The Azure SQL database server name.--database-name
The Azure SQL database name.noteHive Migrator doesn’t support Azure SQL database names containing blank spaces (
-
), semicolons (;
), open curly braces ({
) or close curly braces (}
). Additionaly, see Microsoft's documentation for a list special characters which can't be used.--storage-account
The name of the ADLS Gen2 storage account.--container-name
The name of the container in the ADLS Gen2 storage account.--auth-method
The Azure SQL database connection authentication method (SQL_PASSWORD, AD_MSI, AD_INTEGRATED, AD_PASSWORD, ACCESS_TOKEN).
Additionally, use only one of the following parameters:
--file-system-id
The name of the filesystem that will be associated with this agent (for example:myadls2storage
). This will ensure any path mappings are correctly linked between the filesystem and the agent.--default-fs-override
Enter an override for the default filesystem URI instead of a filesystem name (for example:abfss://mycontainer@mystorageaccount.dfs.core.windows.net
).
Optional parameters
--client-id
The Azure resource's client ID.--insecure
Define an insecure connection (TLS disabled) to the Azure SQL database server (default isfalse
).
Authentication parameters
Select one of the authentication methods listed and include the additional parameters required for the chosen method.
--auth-method
The authentication method to connect to the Azure SQL server.
The following methods can be used:SQL_PASSWORD
- Enter a username and password to access the database.AD_MSI
- Use a system-assigned or user-assigned managed identity.
Required parameters for SQL_PASSWORD
--database-user
The username to access the database.--database-password
The user password to access the database.
Required parameters for AD_MSI
To use this method, complete the following prerequisites:
Data Migrator or the remote Azure Hive agent must be installed on an Azure resource with the managed identity assigned to it. The host must also have Azure AD authentication enabled.
Your Azure SQL server must be enabled for Azure AD authentication.
You have created a contained user in the Azure SQL database that is mapped to the Azure AD resource (where Data Migrator or the remote Azure Hive agent is installed).
The username of the contained user depends on whether you're using a system-assigned or user-assigned identity.
Azure SQL database command for a system-assigned managed identityCREATE USER "<azure_resource_name>" FROM EXTERNAL PROVIDER;
ALTER ROLE db_owner ADD MEMBER "<azure_resource_name>";The
<azure_resource_name>
is the name of the Azure resource where Data Migrator or the remote Azure Hive agent is installed. For example,myAzureVM
).Azure SQL database command for a user-assigned managed identityCREATE USER <managed_identity_name> FROM EXTERNAL PROVIDER;
ALTER ROLE db_owner ADD MEMBER <managed_identity_name>;The
<managed_identity_name>
is the name of the user-assigned managed identity. For example,myManagedIdentity
.
After you complete the prerequisites, see the system-assigned identity or user-assigned identity parameters.
System-assigned identity
No other parameters are required for a system-managed identity.
User-assigned identity
Specify the --client-id
parameter:
--client-id
The client ID of your Azure managed identity.
Parameters for remote Hive agents only
--host
The host where the remote Hive agent will be deployed.--port
The port for the remote Hive agent to use on the remote host. This port is used to communicate with the local Data Migrator server.--no-ssl
TLS encryption and certificate authentication is enabled by default between Data Migrator and the remote agent. Use this parameter to disable it. You can't adjust this parameter after agent creation.
Parameters for TLS/SSL only
--certificate-storage-type
The certificate storage type, can be specified as either FILE or KEYSTORE.--keystore-certificate-alias
The alias of the certificate stored in the keystore.--keystore-password
The password assigned to the target keystore.--keystore-path
The path to the target side keystore file--keystore-trusted-certificate-alias
The alias of the trusted certificate chain stored in the keystore.--keystore-type
The type of keystore specified, JKS or PKCS12
Parameters for automated deployment
--autodeploy
The remote agent is automatically deployed when this flag is used. If using this, the--ssh-key
parameter must also be specified.--ssh-user
The SSH user to use for authentication on the remote host to perform automatic deployment (when using the--autodeploy
parameter).--ssh-key
The absolute path to the SSH private key to use for authentication on the remote host to perform automatic deployment (when using the--autodeploy
parameter).--ssh-port
The SSH port to use for authentication on the remote host to perform automatic deployment (when using the--autodeploy
parameter). Default is port22
.--use-sudo
All commands performed by the SSH user usesudo
on the remote host when performing automatic deployment (using the--autodeploy
parameter).--ignore-host-checking
Ignore strict host key checking when performing the automatic deployment (using the--autodeploy
parameter).
Steps for manual deployment
If you do not wish to use the --autodeploy
function, follow these steps to deploy a remote Hive agent for Azure SQL manually:
Transfer the remote server installer to your remote host (Azure VM, HDI cluster node):
Example of secure transfer from local to remote hostscp /opt/wandisco/hivemigrator/hivemigrator-remote-server-installer.sh myRemoteHost:~
On your remote host, make the installer script executable:
chmod +x hivemigrator-remote-server-installer.sh
On your remote host, run the installer as root (or sudo) user in silent mode:
./hivemigrator-remote-server-installer.sh -- --silent --config <example config string>
Find the
--config
string from the output ofhive agent add azure
command.On your remote host, start the remote server service:
service hivemigrator-remote-server start
On your local host, run the
hive agent add azure
command without using--autodeploy
and its related parameters to configure your remote Hive agent.See the Example for remote Azure SQL deployment - manual example below for further guidance.
Examples
hive agent add azure --name azureAgent --db-server-name mysqlserver.database.windows.net --database-name mydb1 --auth-method SQL_PASSWORD --database-user azureuser --database-password mypassword --storage-account myadls2 --container-name mycontainer --file-system-id myadls2storage
hive agent add azure --name azureRemoteAgent --db-server-name mysqlserver.database.windows.net --database-name mydb1 --auth-method AD_MSI --storage-account myadls2 --container-name mycontainer --file-system-id myadls2storage --autodeploy --ssh-user root --ssh-key /root/.ssh/id_rsa --ssh-port 22 --host myRemoteHost.example.com --port 5052
hive agent add azure --name azureRemoteAgent --db-server-name mysqlserver.database.windows.net --database-name mydb1 --auth-method AD_MSI --client-id b67f67ex-ampl-e2eb-bd6d-client9385id --storage-account myadls2 --container-name mycontainer --file-system-id myadls2storage --host myRemoteHost.example.com --port 5052
For a remote Hive agent connection, enter a remote host (Azure VM instance) to communicate with the local Data Migrator server (constrained to a user-defined port).
A small service is deployed on this remote host so that the data can transfer between the Hive agent and the remote Metastore.
hive agent add filesystem
Add a filesystem Hive agent to migrate your metadata to a specified target filesystem location using the hive agent add filesystem
command.
hive agent add filesystem [--file-system-id] string
[--root-folder] string
[--name] string
--file-system-id
The filesystem ID to be used.--root-folder
The path to use as the root directory for the filesystem agent.--name
The ID to give to the new Hive agent.
Example
hive agent add filesystem --file-system-id myfilesystem --root-folder /var/lib/mysql --name fsAgent
hive agent add glue
Add an AWS Glue Hive agent to connect to an AWS Glue data catalog using the hive agent add glue
command.
If your Data Migrator host can communicate directly with the AWS Glue Data Catalog, then a local Hive agent will be sufficient. Otherwise, consider using a remote Hive agent.
For a remote Hive agent connection, enter a remote host (EC2 instance) that will be used to communicate with the local Data Migrator server (constrained to a user-defined port).
A small service will be deployed on this remote host so that the data can transfer between the Hive agent and the remote Metastore.
hive agent add glue [--name] string
[--access-key] string
[--secret-key] string
[--glue-endpoint] string
[--aws-region] string
[--glue-catalog-id] string
[--credentials-provider] string
[--glue-max-retries] integer
[--glue-max-connections] integer
[--glue-max-socket-timeout] integer
[--glue-connection-timeout] integer
[--file-system-id] string
[--default-fs-override] string
[--host] string
[--port] integer
[--no-ssl]
[--keystore-certificate-alias] string
[--keystore-password] string
[--keystore-path] string
[--keystore-trusted-certificate-alias] string
[--keystore-type] string
[--certificate-storage-type] string
Glue parameters
--name
The ID to give to the new Hive agent.--glue-endpoint
The AWS Glue service endpoint for connections to the data catalog. VPC endpoint types are also supported.--aws-region
The AWS region that your data catalog is located in (default isus-east-1
). If--glue-endpoint
is specified, this parameter will be ignored.
Additionally, use only one of the following parameters:
--file-system-id
The name of the filesystem that will be associated with this agent (for example:mys3bucket
). This will ensure any path mappings are correctly linked between the filesystem and the agent.--default-fs-override
Enter an override for the default filesystem URI instead of a filesystem name (for example:s3a://mybucket/
).
Glue credential parameters
--credentials-provider
The AWS catalog credentials provider factory class.- If you don't enter this parameter, the default is DefaultAWSCredentialsProviderChain.
- If you enter the
--access-key
and--secret-key
parameters, the credentials provider will automatically default to StaticCredentialsProviderFactory.
--access-key
The AWS access key.--secret-key
The AWS secret key.
Glue optional parameters
--glue-catalog-id
The AWS Account ID to access the Data Catalog. This is used if the Data Catalog is owned by a different account to the one provided by the credentials provider and cross-account access has been granted.--glue-max-retries
The maximum number of retries the Glue client will perform after an error.--glue-max-connections
The maximum number of parallel connections the Glue client will allocate.--glue-max-socket-timeout
The maximum time the Glue client will allow for an established connection to timeout.--glue-connection-timeout
The maximum time the Glue client will allow to establish a connection.
Parameters for remote Hive agents only
--host
The host where the remote Hive agent will be deployed.--port
The port for the remote Hive agent to use on the remote host. This port is used to communicate with the local Data Migrator server.--no-ssl
TLS encryption and certificate authentication is enabled by default between Data Migrator and the remote agent. Use this parameter to disable it. You can't adjust this parameter after agent creation.
Parameters for TLS/SSL only
--certificate-storage-type
The certificate storage type, can be specified as either FILE or KEYSTORE.--keystore-certificate-alias
The alias of the certificate stored in the keystore.--keystore-password
The password assigned to the target keystore.--keystore-path
The path to the target side keystore file--keystore-trusted-certificate-alias
The alias of the trusted certificate chain stored in the keystore.--keystore-type
The type of keystore specified, JKS or PKCS12
Steps for remote agent deployment
Follow these steps to deploy a remote Hive agent for AWS Glue:
Transfer the remote server installer to your remote host (Amazon EC2 instance):
Example of secure transfer from local to remote hostscp /opt/wandisco/hivemigrator/hivemigrator-remote-server-installer.sh myRemoteHost:~
On your remote host, run the installer as root (or sudo) user in silent mode:
./hivemigrator-remote-server-installer.sh -- --silent
On your remote host, start the remote server service:
service hivemigrator-remote-server start
On your local host, run the
hive agent add glue
command to configure your remote Hive agent.See the Example for remote AWS Glue agent example below for further guidance.
Examples
hive agent add glue --name glueAgent --access-key ACCESS6HCFPAQIVZTKEY --secret-key SECRET1vTMuqKOIuhET0HAI78UIPfSRjcswTKEY --glue-endpoint glue.eu-west-1.amazonaws.com --aws-region eu-west-1 --file-system-id mys3bucket
hive agent add glue --name glueAgent --access-key ACCESS6HCFPAQIVZTKEY --secret-key SECRET1vTMuqKOIuhET0HAI78UIPfSRjcswTKEY --glue-endpoint glue.eu-west-1.amazonaws.com --aws-region eu-west-1 --file-system-id mys3bucket --host myRemoteHost.example.com --port 5052
hive agent add hive
Add a Hive agent to connect to a local or remote Apache Hive Metastore using the hive agent add hive
command.
When connecting to a remote Apache Hive Metastore, enter a host on the remote cluster that will be used to communicate with the local Data Migrator server (constrained to a user-defined port).
A small service will be deployed on this remote host so that the data can transfer between the Hive agent and the remote Metastore.
hive agent add hive
[--autodeploy]
[--certificate-storage-type] string
[--config-files] string
[--config-path] string
[--default-fs-override] string
[--file-system-id] string
[--force-scanning-mode]
[--host] string
[--ignore-host-checking]
[--jdbc-driver-name] string
[--jdbc-password] string
[--jdbc-url] string
[--jdbc-username] string
[--kerberos-keytab] string
[--kerberos-principal] string
[--keystore-certificate-alias] string
[--keystore-password] string
[--keystore-path] string
[--keystore-trusted-certificate-alias] string
[--keystore-type] string
[--name] string
[--no-ssl]
[--port] integer
[--ssh-key] file
[--ssh-port] int
[--ssh-user] string
[--use-sudo]
[--certificate-storage-type] string
Mandatory parameters
--name
The ID to give to the new Hive agent.
Additionally, use only one of the following parameters:
--file-system-id
The name of the filesystem that will be associated with this agent (for example:myhdfs
). This will ensure any path mappings are correctly linked between the filesystem and the agent.--default-fs-override
Enter an override for the default filesystem URI instead of a filesystem name (for example:hdfs://nameservice01
).
Optional parameters
--kerberos-principal
Not required if Kerberos is disabled. The Kerberos principal to use to access the Hive service (for example:hive/myhost.example.com@REALM.COM
).--kerberos-keytab
Not required if Kerberos is disabled. The path to the Kerberos keytab containing the principal to access the Hive service (for example:/etc/security/keytabs/hive.service.keytab
).--config-path
For a local agent for a target metastore or when Hive config is not located in /etc/hive/conf, supply a path containing the hive-site.xml, core-site.xml, and hdfs-site.xml.--config-files
If the configuration files are not located on the same path, use this parameter to enter all the paths as a comma-delimited list. For example,/path1/core-site.xml,/path2/hive-site.xml,/path3/hdfs-site.xml
.
When configuring a CDP target
--jdbc-url
The JDBC URL for the database.--jdbc-driver-name
Full class name of JDBC driver.--jdbc-username
Full class name of JDBC driver.--jdbc-password
Password for connecting to database.
Don't use the optional parameters, --config-path
and --config-files
in the same add command.
Use --config-path
when configuration files are on the same path, or --config-files
when the configuration files are on separate paths.
Parameters for remote Hive agents only
--host
The host where the remote Hive agent will be deployed.--port
The port for the remote Hive agent to use on the remote host. This port is used to communicate with the local Data Migrator server.--no-ssl
TLS encryption and certificate authentication is enabled by default between Data Migrator and the remote agent. Use this parameter to disable it. You can't adjust this parameter after agent creation.
Parameters for TLS/SSL only
--certificate-storage-type
The certificate storage type, can be specified as either FILE or KEYSTORE.--keystore-certificate-alias
The alias of the certificate stored in the keystore.--keystore-password
The password assigned to the target keystore.--keystore-path
The path to the target side keystore file--keystore-trusted-certificate-alias
The alias of the trusted certificate chain stored in the keystore.--keystore-type
The type of keystore specified, JKS or PKCS12
Parameters for automated deployment
Use the following parameters when deploying a remote agent automatically with the --autodeploy
flag.
--autodeploy
The remote agent will be automatically deployed when this flag is used. If using this, the--ssh-key
parameter must also be specified.--ssh-user
The SSH user to use for authentication on the remote host to perform automatic deployment (when using the--autodeploy
parameter).--ssh-key
The absolute path to the SSH private key to use for authentication on the remote host to perform automatic deployment (when using the--autodeploy
parameter).--ssh-port
The SSH port to use for authentication on the remote host to perform automatic deployment (when using the--autodeploy
parameter). Default is port22
.--use-sudo
All commands performed by the SSH user will usesudo
on the remote host when performing automatic deployment (using the--autodeploy
parameter).--ignore-host-checking
Ignore strict host key checking when performing the automatic deployment (using the--autodeploy
parameter).
Steps for manual remote agent deployment
If you do not wish to use the --autodeploy
function, follow these steps to deploy a remote Hive agent for Apache Hive manually:
Transfer the remote server installer to your remote host:
Example of secure transfer from local to remote hostscp /opt/wandisco/hivemigrator/hivemigrator-remote-server-installer.sh myRemoteHost:~
On your remote host, make the installer script executable:
chmod +x hivemigrator-remote-server-installer.sh
On your remote host, run the installer as root (or sudo) user in silent mode:
./hivemigrator-remote-server-installer.sh -- --silent
On your remote host, start the remote server service:
service hivemigrator-remote-server start
On your local host, run the
hive agent add hive
command without using--autodeploy
and its related parameters to configure your remote Hive agent.See the Example for remote Apache Hive deployment - manual example below for further guidance.
hive agent add hive --name sourceAgent --kerberos-keytab /etc/security/keytabs/hive.service.keytab --kerberos-principal hive/_HOST@LOCALREALM.COM --file-system-id mysourcehdfs
hive agent add hive --name targetautoAgent --autodeploy --ssh-user root --ssh-key /root/.ssh/id_rsa --ssh-port 22 --host myRemoteHost.example.com --port 5052 --kerberos-keytab /etc/security/keytabs/hive.service.keytab --kerberos-principal hive/_HOST@REMOTEREALM.COM --config-path <example directory path> --file-system-id mytargethdfs
Replace <example directory path> with the path to a directory containing the core-site.xml, hdfs-site.xml, and hive-site.xml.
hive agent add hive --name targetmanualAgent --host myRemoteHost.example.com --port 5052 --kerberos-keytab /etc/security/keytabs/hive.service.keytab --kerberos-principal hive/_HOST@REMOTEREALM.COM --config-path <example directory path> --file-system-id mytargethdfs
Replace <example directory path> with the path to a directory containing the core-site.xml, hdfs-site.xml, and hive-site.xml.
If you enter Kerberos and configuration path information for remote agents, ensure that the directories and Kerberos principal are correct for your chosen remote host (not your local host).
When deploying remote agents with JDBC overrides, install the additional JDBC driver (e.g. MYSQL or PostreSQL) within /opt/wandisco/hivemigrator-remote-server/agent/hive/
.
When deploying remote agents with keystore details, your keystore password will need to be manually entered within /etc/wandisco/hivemigrator-remote-server/agent.yaml
.
See the troubleshooting guide for more information.
hive agent add databricks
Databricks agents are currently available as a preview feature.
The source table format must be in one of the following formats to ensure a successful migration to Databricks Delta Lake:
- CSV
- JSON
- Avro
- ORC
- Parquet
- Text
Add a Databricks Hive agent to connect to a Databricks Delta Lake metastore (AWS, Azure, or Google Cloud Platform (GCP)) using the hive agent add databricks
command.
hive agent add databricks [--name] string
[--jdbc-server-hostname] string
[--jdbc-port] int
[--jdbc-http-path] string
[--access-token] string
[--fs-mount-point] string
[--convert-to-delta]
[--delete-after-conversion]
[--file-system-id] string
[--default-fs-override] string
[--host] string
[--port] integer
[--no-ssl]
[--catalog] string
Enable JDBC connections to Databricks
The following steps are required to enable Java Database Connectivity (JDBC) to Databricks Delta Lake:
Download the Databricks JDBC driver.
Unzip the package and upload the
SparkJDBC42.jar
file to the LiveData Migrator host machine.Move the
SparkJDBC42.jar
file to the LiveData Migrator directory below:/opt/wandisco/hivemigrator/agent/databricks
Change ownership of the Jar file to the HiveMigrator system user and group:
Example for hive:hadoopchown hive:hadoop /opt/wandisco/hivemigrator/agent/databricks/SparkJDBC42.jar
Databricks mandatory parameters
--name
The ID to give to the new Hive agent.--jdbc-server-hostname
The server hostname for the Databricks cluster (AWS, Azure or GCP).--jdbc-port
The port used for JDBC connections to the Databricks cluster (AWS, Azure or GCP).--jdbc-http-path
The HTTP path for the Databricks cluster (AWS, Azure or GCP).--access-token
The personal access token to be used for the Databricks cluster (AWS, Azure or GCP).
Additionally, use only one of the following parameters:
If the --convert-to-delta
option is used, the --default-fs-override
parameter must also be provided with the value set to dbfs:
, or a path inside the Databricks filesystem. For example, dbfs:/mount/externalStorage
.
--file-system-id
The name of the filesystem that will be associated with this agent (for example:myadls2
ormys3bucket
). This will ensure any path mappings are correctly linked between the filesystem and the agent.--default-fs-override
Provide an override for the default filesystem URI instead of a filesystem name (for example:dbfs:
).
Databricks optional parameters
--fs-mount-point
Define the ADLS/S3/GCP location in the Databricks filesystem for containing migrations (for example:/mnt/mybucketname
).
This parameter is required if --convert-to-delta
is used. The Databricks agent will copy all associated table data and metadata into this location within the Databricks filesystem during conversion.
--convert-to-delta
All underlying table data and metadata is migrated to the filesystem location defined by the--fs-mount-point
parameter. Use this option to automatically copy the associated data and metadata to Delta Lake on Databricks (AWS, Azure or GCP), and convert tables to Delta Lake format.The following parameter can only be used if
--convert-to-delta
has been specified:--delete-after-conversion
Use this option to delete the underlying table data and metadata from the filesystem location (defined by--fs-mount-point
) once it has been converted to Delta Lake on Databricks.infoOnly use this option if you're performing one-time migrations for the underlying table data. The Databricks agent doesn't support continuous (live) updates of table data when transferring to Delta Lake on Databricks.
If a migration to Databricks runs without the
--convert-to-delta
option, then some migrated data may not be visible from the Databricks side. To avoid this issue, ensure that the value ofdefault-fs-override
is set to "dbfs:
" with the value of--fs-mount-point
.Example:
--default-fs-override dbfs:/mnt/mybucketname
--catalog
Enter the name of your Databricks Unity Catalog.noteYou can't update an agent's Unity Catalog while it's in an active migration.
Example
hive agent add databricks --name databricksAgent --jdbc-server-hostname mydbcluster.cloud.databricks.com --jdbc-port 443 --jdbc-http-path sql/protocolv1/o/8445611123456789/0234-125567-testy978 --access-token daexamplefg123456789t6f0b57dfdtoken4 --file-system-id mys3bucket --default-fs-override dbfs:/mnt/mybucketname --fs-mount-point /mnt/mybucket --convert-to-delta --catalog myUnityCatalog
hive agent add dataproc
Add a Hive agent to connect to a local or remote Google Dataproc Metastore using the hive agent add dataproc
command.
When connecting to a remote Dataproc Metastore, enter a host on the remote cluster that will be used to communicate with the local Data Migrator server (constrained to a user-defined port).
A small service will be deployed on this remote host so that the data can transfer between the Hive agent and the remote Metastore.
hive agent add dataproc [--config-path] string
[--kerberos-principal] string
[--kerberos-keytab] string
[--name] string
[--host] string
[--port] integer
[--no-ssl]
[--autodeploy]
[--ssh-user] string
[--ssh-key] file
[--ssh-port] int
[--use-sudo]
[--ignore-host-checking]
[--keystore-certificate-alias] string
[--keystore-password] string
[--keystore-path] string
[--keystore-trusted-certificate-alias] string
[--keystore-type] string
[--file-system-id] string
[--default-fs-override] string
[--certificate-storage-type] string
Mandatory parameters
--kerberos-principal
Not required if Kerberos is disabled. The Kerberos principal to use to access the Hive service (for example:hive/myhost.example.com@REALM.COM
).--kerberos-keytab
Not required if Kerberos is disabled. The path to the Kerberos keytab containing the principal to access the Hive service (for example:/etc/security/keytabs/hive.service.keytab
).--name
The ID to give to the new Hive agent.
Additionally, use only one of the following parameters:
--file-system-id
The name of the filesystem that will be associated with this agent (for example:myhdfs
). This will ensure any path mappings are correctly linked between the filesystem and the agent.
Optional parameters
--default-fs-override
Enter an override for the default filesystem URI instead of a filesystem name (for example:hdfs://nameservice01
).--config-path
The path to the directory containing the Hive configuration filescore-site.xml
,hive-site.xml
andhdfs-site.xml
. If not specified, Data Migrator will use the default location for the cluster distribution.
Parameters for remote Hive agents only
--host
The host where the remote Hive agent will be deployed.--port
The port for the remote Hive agent to use on the remote host. This port is used to communicate with the local Data Migrator server.--no-ssl
TLS encryption and certificate authentication is enabled by default between Data Migrator and the remote agent. Use this parameter to disable it. You can't adjust this parameter after agent creation.
Parameters for TLS/SSL only
--certificate-storage-type
The certificate storage type, can be specified as either FILE or KEYSTORE.--keystore-certificate-alias
The alias of the certificate stored in the keystore.--keystore-password
The password assigned to the target keystore.--keystore-path
The path to the target side keystore file--keystore-trusted-certificate-alias
The alias of the trusted certificate chain stored in the keystore.--keystore-type
The type of keystore specified, JKS or PKCS12
Parameters for automated deployment
--autodeploy
The remote agent will be automatically deployed when this flag is used. If using this, the--ssh-key
parameter must also be specified.--ssh-user
The SSH user to use for authentication on the remote host to perform automatic deployment (when using the--autodeploy
parameter).--ssh-key
The absolute path to the SSH private key to use for authentication on the remote host to perform automatic deployment (when using the--autodeploy
parameter).--ssh-port
The SSH port to use for authentication on the remote host to perform automatic deployment (when using the--autodeploy
parameter). Default is port22
.--use-sudo
All commands performed by the SSH user will usesudo
on the remote host when performing automatic deployment (using the--autodeploy
parameter).--ignore-host-checking
Ignore strict host key checking when performing the automatic deployment (using the--autodeploy
parameter).
Steps for manual deployment
If you do not wish to use the --autodeploy
function, follow these steps to deploy a remote Hive agent for Apache Hive manually:
Transfer the remote server installer to your remote host:
Example of secure transfer from local to remote hostscp /opt/wandisco/hivemigrator/hivemigrator-remote-server-installer.sh myRemoteHost:~
On your remote host, make the installer script executable:
chmod +x hivemigrator-remote-server-installer.sh
On your remote host, run the installer as root (or sudo) user in silent mode:
./hivemigrator-remote-server-installer.sh -- --silent
On your remote host, start the remote server service:
service hivemigrator-remote-server start
On your local host, run the
hive agent add dataproc
command without using--autodeploy
and its related parameters to configure your remote Hive agent.See the Example for remote Apache Hive deployment - manual example below for further guidance.
Examples
hive agent add dataproc --name sourceAgent --file-system-id mysourcehdfs
hive agent add dataproc --name targetautoAgent --autodeploy --ssh-user root --ssh-key /root/.ssh/id_rsa --ssh-port 22 --host myRemoteHost.example.com --port 5052 --config-path <example directory path> --file-system-id mytargethdfs
Replace <example directory path> with the path to a directory containing the core-site.xml, hdfs-site.xml, and hive-site.xml.
hive agent add dataproc --name targetmanualAgent --host myRemoteHost.example.com --port 5052 --config-path <example directory path> --file-system-id mytargethdfs
Replace <example directory path> with the path to a directory containing the core-site.xml, hdfs-site.xml, and hive-site.xml.
If you enter Kerberos and configuration path information for remote agents, ensure that the directories and Kerberos principal are correct for your chosen remote host (not your local host).
hive agent add snowflake basic
Add an agent using basic authentication.
hive agent add snowflake basic [--account-identifier] string
[--file-system-id] string
[--name ] string
[--password] string
[--stage] string
[--stage-schema] string
[--warehouse] string
[--default-fs-override] string
[--schema] string
[--stage-database] string
[--user] string
[--network-timeout] int
[--query-timeout] int
[--role] string
Mandatory parameters
--account-identifier
is the unique ID for your Snowflake account.--name
is a name that will be used to reference the remote agent.--warehouse
is the Snowflake-based cluster of compute resources.--stage
storage used to temporarily store data that is being moved into Snowflake. A stage can be internal (to Snowflake), or external, using a cloud storage service from Amazon S3, Azure, or Google Cloud.--user
is your Snowflake username.
Additionally, use only one of the following parameters:
--file-system-id
is the ID of the target filesystem.--default-fs-override
is an override for the default filesystem URI instead of a filesystem name.
Optional parameters
--stage-database
is an optional parameter for a Snowflake stage database with the default value "WANDISCO".--stage-schema
- is an optional parameter for a Snowflake stage schema, with the default value "PUBLIC".--schema
- is an optional parameter for a Snowflake schema, with the default value "PUBLIC".--role
- you can enter a custom role for the JDBC connection used by Hive Migrator.
Timeout parameters
--network-timeout
- Number of milliseconds to wait for a response when interacting with the Snowflake service before returning an error.--query-timeout
- Number of seconds to wait for a query to complete before returning an error.
Examples
hive agent add snowflake basic --account-identifier test_adls2 --name snowflakeAgent --stage myAzure --user exampleUser -- password examplePassword --warehouse DemoWH2
hive agent add snowflake privatekey
hive agent add snowflake privatekey [--account-ID] string
[--file-system-id] string
[--private-key-file] string
[--private-key-file-pwd] string
[--schema] string
[--stage-database] string
[--warehouse] string
[--default-fs-override] string
[--name] string
[--stage] string
[--stage-schema] string
[--user] string
Mandatory parameters
--account-identifier
is the unique ID for your Snowflake account.--private-key-file
is the path to your private key file.--private-key-file-pwd
is the password that corresponds with the above private-key-file.--name
is a name that will be used to reference the remote agent.--warehouse
is the Snowflake-based cluster of compute resources.--stage
storage used to temporarily store data that is being moved into Snowflake. A stage can be internal (to Snowflake), or external, using a cloud storage service from Amazon S3, Azure, or Google Cloud.--user
is your Snowflake username.
Additionally, use only one of the following parameters:
--file-system-id
is the ID of the target filesystem.--default-fs-override
is an override for the default filesystem URI instead of a filesystem name.
Optional parameters
--stage-database
is an optional parameter for a Snowflake stage database with the default value "WANDISCO".--stage-schema
- is an optional parameter for a Snowflake stage schema, with the default value "PUBLIC".--schema
- is an optional parameter for a Snowflake schema, with the default value "PUBLIC".
hive agent check
Check the configuration of an existing Hive agent using hive agent check
.
hive agent check [--name] string
Example
hive agent check --name azureAgent
hive agent configure azure
Change the configuration of an existing Azure Hive agent using hive agent configure azure
.
The parameters that can be changed are the same as the ones listed in the hive agent add azure
section.
All parameters are optional except --name
, which is required to enter the existing Hive agent that you wish to configure.
Example
hive agent configure azure --name azureAgent --database-password CorrectPassword
hive agent configure filesystem
Change the configuration of an existing filesystem Hive agent using hive agent configure filesystem
.
The parameters that can be changed are the same as the ones listed in the hive agent add filesystem
section.
All parameters are optional except --name
, which is required to enter the existing Hive agent that you wish to configure.
Example
hive agent configure filesystem --name fsAgent --root-folder /user/dbuser/databases
hive agent configure glue
Change the configuration of an existing AWS Glue Hive agent using hive agent configure glue
.
The parameters that can be changed are the same as the ones listed in the hive agent add glue
section.
All parameters are optional except --name
, which is required to enter the existing Hive agent that you wish to configure.
Example
hive agent configure glue --name glueAgent --aws-region us-east-2
hive agent configure hive
Change the configuration of an existing Apache Hive agent using hive agent configure hive
.
The parameters that can be changed are the same as the ones listed in the hive agent add hive
section.
All parameters are optional except --name
, which is required to enter the existing Hive agent that you wish to configure.
Example
hive agent configure hive --name sourceAgent --kerberos-keytab /opt/keytabs/hive.keytab --kerberos-principal hive/myhostname.example.com@REALM.COM
hive agent configure databricks
Change the configuration of an existing Databricks agent using hive agent configure databricks
.
The parameters that can be changed are the same as the ones listed in the hive agent add databricks
section.
All parameters are optional except --name
, which is required to enter the existing Hive agent that you wish to configure.
You can't update an agent's Unity Catalog while it's in an active migration.
Example
hive agent configure hive --name databricksAgent --access-token myexamplefg123456789t6fnew7dfdtoken4
hive agent configure dataproc
Change the configuration of an existing Dataproc agent using hive agent configure dataproc
.
The parameters that can be changed are the same as the ones listed in the hive agent add dataproc
section.
All parameters are optional except --name
, which is required to enter the existing Hive agent that you wish to configure.
Example
hive agent configure dataproc --name dataprocAgent --port 9099
hive agent configure snowflake
Configure an existing Snowflake remote agent by using the hive agent configure snowflake
command.
hive agent configure snowflake basic [--account-identifier] string
[--file-system-id] string
[--user] string
[--password] string
[--stage] string
[--stage-schema] string
[--warehouse] string
[--default-fs-override] string
[--name] string
[--schema] string
[--stage-database] string
Example Snowflake remote agent configuration
hive agent configure snowflake basic --user snowflakeAgent --password <password-here> --stage internal
hive agent configure snowflake privatekey [--account-identifier] string
[--file-system-id] string
[--private-key-file] string
[--private-key-file-pwd] string
[--schema] string
[--stage-database] string
[--warehouse] string
[--default-fs-override] string
[--name] string
[--stage] string
[--stage-schema] string
Example Snowflake remote agent configuration
hive agent configure snowflake privatekey --private-key-file-pwd <password> --private-key-file /path/to/keyfiles/ --user snowflakeAgent --schema star-schema
hive agent delete
Delete the specified Hive agent with hive agent delete
.
hive agent delete [--name] string
Example
hive agent delete --name azureAgent
hive agent list
List configured Hive agents with hive agent list
.
hive agent list [--detailed]
Example
hive agent list --detailed
hive agent show
Show the configuration of a Hive agent with hive agent show
.
hive agent show [--name] string
Example
hive agent show --name azureAgent
hive agent types
Print a list of supported Hive agent types with hive agent types
.
hive agent types
Example
hive agent types
Exclusion commands
exclusion add date
Create a date-based exclusion that checks the 'modified date' of any directory or file that the Data Migrator encounters during a migration to which the exclusion has been applied. If the path or file being examined by Data Migrator has a 'modified date' earlier than the specified date, it will be excluded from the migration.
Once associated with a migration using migration exclusion add
, files that match the policy will not be migrated.
exclusion add date [--exclusion-id] string
[--description] string
[--before-date] string
Mandatory parameters
--exclusion-id
The ID for the exclusion policy.--description
A user-friendly description for the policy.--before-date
An ISO formatted date and time, which can include an offset for a particular time zone.
Example
exclusion add date --exclusion-id beforeDate --description "Files earlier than 2020-10-01T10:00:00PDT" --before-date 2020-10-01T10:00:00-07:00
exclusion add file-size
Create an exclusion that can be applied to migrations to constrain the files transferred by a policy based on file size. Once associated with a migration using migration exclusion add
, files that match the policy will not be migrated.
exclusion add file-size [--exclusion-id] string
[--description] string
[--value] long
[--unit] string
Mandatory parameters
--exclusion-id
The ID for the exclusion policy.--description
A user-friendly description for the policy.--value
The numerical value for the file size, in a unit defined by the--unit
parameter.--unit
A string to define the unit used. You can useB
for bytes,GB
for gigabytes,KB
for kilobytes,MB
for megabytes,PB
for petabytes,TB
for terabytes,GiB
for gibibytes,KiB
for kibibytes,MiB
for mebibytes,PiB
for pebibytes, orTiB
for tebibytes when creating exclusions with the CLI.
Example
exclusion add file-size --exclusion-id 100mbfiles --description "Files greater than 100 MB" --value 100 --unit MB
exclusion add regex
Create an exclusion using a regular expression to prevent certain files and directories being transferred based on matching file or directory names. Once associated with a migration using migration exclusion add
, files and directories that match the regular expression will not be migrated.
exclusion add regex [--exclusion-id] string
[--description] string
[--regex] string
[--type] string
Mandatory parameters
--exclusion-id
The ID for the exclusion policy.--description
A user-friendly description for the policy.--regex
A regular expression in a syntax of either Java PCRE, Automata or GLOB type.
Optional parameters
--type
Choose the regular expression syntax type. There are three options available:JAVA_PCRE
(default)AUTOMATA
GLOB
Examples
exclusion add regex --description "No paths or files that start with test" --exclusion-id exclusion1 --type GLOB --regex test*
exclusion add regex --description "No paths of files that start with test" --exclusion-id exclusion1 --regex ^test\.*
Using backslash characters within --regex
parameter
If you wish to use a \
character as part of your regex value, you must escape this character with an additional backslash.
exclusion add regex --description "No paths that start with a backslash followed by test" --exclusion-id exclusion2 --regex ^\\test\.*
The response displayed if running through the CLI will not hide the additional backslash. However, the internal representation will be as expected within Data Migrator (it will read as ^\test.*
).
This workaround isn't required for API inputs, as it only affects the Spring Shell implementation used for the CLI.
exclusion add age
Files less than this age at the time of scanning are excluded. Files need to be this age or older to be migrated.
Create an age-based exclusion that checks the 'modified date' of any file that the Data Migrator encounters during a migration to which the exclusion has been applied. At scan time, the age of a file is determined by the difference between the current scan time and the files' modification time. If the file examined has an age less than the age specified, it will be excluded from the migration.
Once associated with a migration using migration exclusion add
, files that match the policy will not be migrated.
exclusion add age [--exclusion-id] string
[--description] string
[--unit] string
[--value] long
Mandatory parameters
--exclusion-id
The ID for the exclusion policy.--description
A user-friendly description for the policy.--unit
The time unit of the value supplied, use DAYS, HOURS or MINUTES.--value
The number of units.
Example
exclusion add age --exclusion-id ExcludeLessThan10d --description "Exclude files changed in the last 10 days" --unit DAYS --value 10
exclusion check regex
Check if a given GLOB, JAVA_PCRE or AUTOMATA regex pattern will match a given path.
Mandatory parameters
--regex
The regex pattern to be checked.--type
Regex pattern type from either GLOB, JAVA_PCRE and AUTOMATA--path
The path being checked.
Example
exclusion check regex --path /data/1 --regex [1-4] --type JAVA_PCRE
exclusion delete
Delete an exclusion policy so that it is no longer available for migrations.
exclusion delete [--exclusion-id] string
Mandatory parameters
--exclusion-id
The ID for the exclusion policy to delete.
Example
exclusion delete --exclusion-id exclusion1
exclusion list
List all exclusion policies defined.
exclusion list
exclusion show
Get details for an individual exclusion policy by ID.
exclusion show [--exclusion-id] string
Mandatory parameters
--exclusion-id
The ID for the exclusion policy to show.
Example
exclusion show --exclusion-id 100mbfiles
exclusion source list
See the restrictions applied automatically on the source file system.
Mandatory parameters
--fs-type
The file system type, choose either adls2, gcs, hdfs, local, or s3a.
Example
exclusion source list --fs-type hdfs
exclusion target list
See the restrictions applied automatically on the target file system.
Mandatory parameters
--fs-type
The file system type, choose either adls2, gcs, hdfs, local, or s3a.
Example
exclusion target list --fs-type adls2
exclusion user-defined list
See a list of all user defined restrictions.
Example
exclusion user-defined list
Migration commands
migration add
migration add [--name or --migration-id] string
[--path] string
[--target] string
[--exclusions] string
[--priority] string
[--action-policy] string
[--auto-start]
[--source] string
[--scan-only]
[--target-match]
[--verbose]
[--detailed]
[--recurring-migration]
[--recurring-period]
[--priority]
Do not write to target filesystem paths when a migration is underway. This could interfere with Data Migrator functionality and lead to undetermined behavior.
Use different filesystem paths when writing to the target filesystem directly (and not through Data Migrator).
Mandatory parameters
--path
Defines the source filesystem directory that is the scope of the migration. All content (other than that excluded) will be migrated to the target.
ADLS Gen2 has a filesystem restriction of 60 segments. Make sure your path has less than 60 segments when defining the path string parameter.
--target
Specifies the name of the target filesystem resource to which migration will occur.
Optional parameters
--name
or--migration-id
Enter a name or ID for the new migration. An ID is auto-generated if you don't enter one.--exclusions
A comma-separated list of exclusions by name.--auto-start
Enter this parameter if you want the migration to start immediately. If you don't enter one, the migration will only take effect once you start to run it.--priority
Enter this parameter with a value ofhigh
,normal
, orlow
to assign a priority to your migration. Higher-priority migrations are processed first. If not specified, migration priority defaults tonormal
.--action-policy
This parameter determines what happens if the migration encounters content in the target path with the same name and size.
There are two options available:com.wandisco.livemigrator2.migration.OverwriteActionPolicy
(default policy)
Every file is replaced, even if the file size is identical on the target storage. This option is incompatible with the--recurring-migration
option. Use theSkipIfSizeMatchActionPolicy
parameter instead.com.wandisco.livemigrator2.migration.SkipIfSizeMatchActionPolicy
If the file size is identical between the source and target, the file is skipped. If it’s a different size, the whole file is replaced.
--source
Specifies the name of the source filesystem.--scan-only
Enter this parameter to create a one-time migration.--target-match
Enable Target Match on this migration to scan source and target and remove extra files from target. If not present Target Match is disabled except for live migrations on ADLS Gen2. See Target Match for more info.--verbose
Enter this parameter to add additional information to the output for the migration.--detailed
Alternative name for--verbose
.--recurring-migration
Add this parameter to enable periodic rescanning of the migration. See Recurring-migration.--recurring-period
Enter a period to schedule the time between migration scan iterations. For example, 12H(hours) or 30M (minutes).
Example
migration add --path /repl1 --target mytarget –-migration-id myNewMigration --exclusions 100mbfiles
migration delete
Delete a stopped migration.
migration delete [--name or --migration-id] string
Mandatory parameters
--name
or--migration-id
The name or ID of the migration to delete.
Example
migration delete --name 4ffa620b6ebb0cd34f2c591220d93830f91ccc7e
Optional parameters
--without-assets
Leave associated assets in place.
Example
migration delete --name 4ffa620b6ebb0cd34f2c591220d93830f91ccc7e --without-assets
migration exclusion add
Associate an exclusion resource with a migration so that the exclusion policy applies to items processed for the migration. Exclusions must be associated with a migration before they take effect.
migration exclusion add [--name or --migration-id] string
[--exclusion-id] string
Mandatory parameters
--name
or--migration-id
The migration name or ID with which to associate the exclusion.--exclusion-id
The ID of the exclusion to associate with the migration.
Example
migration exclusion add --name 4ffa620b6ebb0cd34f2c591220d93830f91ccc7e --exclusion-id myexclusion1
migration exclusion delete
Remove an exclusion from association with a migration so that its policy no longer applies to items processed for the migration.
migration exclusion delete [--name or --migration-id] string
[--exclusion-id] string
Mandatory parameters
--name
or--migration-id
The migration name or ID from which to remove the exclusion.--exclusion-id
The ID of the exclusion to remove from the migration.
Example
migration exclusion delete --name 4ffa620b6ebb0cd34f2c591220d93830f91ccc7e --exclusion-id myexclusion1
migration list
Present the list of all migrations defined.
migration list [--detailed or --verbose]
Optional parameters
--detailed
or--verbose
Returns additional information about each migration.
migration path status
View all actions scheduled on a source filesystem in the specified path.
migration path status [--source-path] string
[--source] string
Mandatory parameters
--source-path
The path on the filesystem to review actions for. Supply a full directory.--source
The filesystem ID of the source system the path is in.
Example
migration path status --source-path /root/mypath/ --source mySource
migration pending-region add
Add a path for rescanning to a migration.
migration pending-region add [--name or --migration-id] string
[--path] string
[--action-policy] string
Mandatory parameters
--name
or--migration-id
The migration name or ID.--path
The path string of the region to add for rescan.
Optional parameters
--action-policy
This parameter determines what happens if the migration encounters content in the target path with the same name and size.
There are two options available:com.wandisco.livemigrator2.migration.OverwriteActionPolicy
(default policy)
Every file is replaced, even if file size is identical on the target storage.com.wandisco.livemigrator2.migration.SkipIfSizeMatchActionPolicy
If the file size is identical between the source and target, the file is skipped. If it’s a different size, the whole file is replaced.
Example
migration pending-region add --name myMigration --path etc/files --action-policy com.wandisco.livemigrator2.migration.SkipIfSizeMatchActionPolicy
migration reset
Reset a stopped migration to the state it was in before it was started. This deletes and replaces it with a new migration that has the same settings as the old one.
migration reset [--name or --migration-id] string
[--action-policy] string
[--reload-mappings]
[--detailed or --verbose]
Mandatory parameters
--name
or--migration-id
The name of the migration you want to reset.--migration-id
The ID of the migration you want to reset.
Optional parameters
--action-policy
Accepts two string values:com.wandisco.livemigrator2.migration.OverwriteActionPolicy
causes the new migration to re-migrate all files from scratch, including those already migrated to the target filesystem, regardless of file size.com.wandisco.livemigrator2.migration.SkipIfSizeMatchActionPolicy
skips migrating files that exist on both the target and source, if the file size is consistent between them. Use tab auto-completion with this parameter to view both options and a short description of each.--reload-mappings
Resets the migration's path mapping configuration, using the newest default path mapping configuration for Data Migrator.--detailed
or--verbose
Returns additional information about the reset migration, similarly tomigration show
.
Example
migration reset --name mymigration
migration resume
Resume a migration that you've stopped from transferring content to its target.
migration resume [--name or --migration-id] string
[--detailed or --verbose]
Mandatory parameters
--name
or--migration-id
The migration name or ID to resume.
Example
migration resume --name 4ffa620b6ebb0cd34f2c591220d93830f91ccc7e
migration run
,migration start
Start a migration that was created without the --auto-start
parameter.
migration run [--name or --migration-id] string
[--detailed or --verbose]
Mandatory parameters
--name
or--migration-id
The migration name or ID to run.
Optional parameters
--detailed
or--verbose
Outputs additional information about the migration.
Example
migration run –-migration-id myNewMigration
migration show
Enter a JSON description of a specific migration.
migration show [--name or --migration-id] string
[--detailed or --verbose]
Mandatory parameters
--name
or--migration-id
The migration name or ID to show.
Optional parameters
--detailed
or--verbose
Outputs additional information about the migration.
Example
migration show --name myNewMigration
migration stop
Stop a migration from transferring content to its target, placing it into the STOPPED
state. Stopped migrations can be resumed.
migration stop [--name or --migration-id] string
Mandatory parameters
--name
or--migration-id
The migration name or ID to stop.
Example
migration stop --name 4ffa620b6ebb0cd34f2c591220d93830f91ccc7e
migration stop all
Stop all migrations from transferring content to their targets, placing them into the STOPPED
state. Stopped migrations can be resumed.
migration stop all
migration update configuration
Update a migration's recurring scan period.
migration update configuration [--name or --migration-id] string
[--recurring-period] string
[--detailed or --verbose] string
Mandatory parameters
--name
or--migration-id
Enter the migration name or ID.--recurring-period
Enter a period to schedule the time between migration scan iterations. For example,12H
(12 hours).--detailed
or--verbose
Include all configuration properties for the source filesystem in the response.
migration verification start
Trigger a new verification for a migration.
migration verification start [--name or --migration-id] string
[--depth] integer
[--date] string
[--paths] string
Mandatory parameters
--name
or--migration-id
Enter the migration name or ID.--depth
Enter a number to specify how deep in the directory you want to run the verification check. This number must be equal to or less than the total number of levels in the directory structure of your migration. The default value is zero. Zero means there's no limit to the verification depth.--date
Enter a date and time as a verification cutoff point inYYYY-MM-DD-THH:MM
format.--paths
Enter a comma-separated list of paths to verify.
Example
migration verification start --migration-id myNewMigration --depth 0 --date 2022-11-15T16:24 --paths /MigrationPath
The verification status will show the number of missing paths and files on the target filesystem and the number of file size mismatches between the source and target. You can view the verification status using migration verification show
for individual verification jobs or migration verification list
for all verification jobs.
migration verification list
List summaries for all or specified verifications.
migration verification list [--name or --migration-id] string
[--states] string
Optional parameters
--name
or--migration-id
Enter the migration name or ID. If not specified, the default is to display summaries for all verifications. You can enter multiple migration names or IDs in a comma-separated list.--states
Enter the migration state(s) (IN_PROGRESS
,QUEUED
,COMPLETED
, orCANCELLED
) for which you want to list summaries.
Examples
migration verification list
migration verification list --migration-id myNewMigration --states IN_PROGRESS,QUEUED
migration verification show
Show the status of a specific migration verification.
migration verification show [--verification-id] string
Mandatory parameters
--verification-id
Show the status of the verification job for this verification ID (only one verification job can be running per migration).
Example
Cirata LiveData Migrator >> migration verification show --verification-id 91c79b1b-c61f-4c39-be61-18072ac3a086-1676979356465
{
"id": "91c79b1b-c61f-4c39-be61-18072ac3a086-1676979356465",
"migrationId": "ver1",
"migrationInternalId": "91c79b1b-c61f-4c39-be61-18072ac3a086",
"status": "COMPLETE",
"createdTimestamp": 1676979356467,
"startedTimestamp": 1676979356518,
"finishedTimestamp": 1676979356598,
"createdAt": "2023-02-21T11:35:56.467Z",
"startedAt": "2023-02-21T11:35:56.518Z",
"finishedAt": "2023-02-21T11:35:56.598Z",
"paths": [
"/DATA/d1"
],
"ignoreAfterTimestamp": 1676978431233,
"originalPaths": [
"/DATA/d1"
],
"verificationDepth": 0,
"filesOnSource": 1,
"directoriesOnSource": 0,
"bytesOnSource": 842,
"filesExcluded": 0,
"filesExcludedExistsOnTarget": 0,
"filesExcludedNotExistsOnTarget": 0,
"dataExcluded": 0,
"bytesExcluded": 0,
"bytesExcludedExistsOnTarget": 0,
"bytesExcludedNotExistsOnTarget": 0,
"directoriesExcluded": 0,
"directoriesExcludedExistsOnTarget": 0,
"directoriesExcludedNotExistsOnTarget": 0,
"filesOnTarget": 1,
"directoriesOnTarget": 0,
"bytesOnTarget": 842,
"filesMissingOnTarget": 0,
"directoriesMissingOnTarget": 0,
"filesMissingOnSource": 0,
"directoriesMissingOnSource": 0,
"fileSizeMismatches": 0,
"totalDiscrepancies": 0
}
migration verification stop
Stop a queued or in-progress migration verification.
migration verification stop [--verification-id] string
Mandatory parameters
--verification-id
Enter the ID of the verification that has been started, for exampledb257c03-697b-48a5-93cc-abc23838d37d-1668593022565
. You can find the verification ID in the output of themigration verification list
command.
Example
migration verification stop --verification-id ab123c03-697b-48a5-93cc-abc23838d37d-1668593022565
migration verification report
Download a full verification report.
migration verification report [--verification-id] string
[--out-dir] string
Mandatory parameters
--verification-id
Enter the ID of the verification for which you want to download a report. You can find the verification ID in the output of themigration verification list
command.--out-dir
Enter your chosen folder for the report download.
Examples
migration verification report --verification-id ab123c03-697b-48a5-93cc-abc23838d37d-1668593022565 --out-dir /user/exampleVerificationDirectory
status
Get a text description of the overall status of migrations. Information is provided on the following:
- Total number of migrations defined.
- Average bandwidth being used over 10s, 60s, and 300s intervals.
- Peak bandwidth observed over 300s interval.
- Average file transfer rate per second over 10s, 60s, and 300s intervals.
- Peak file transfer rate per second over a 300s interval.
- List of migrations, including one-time migrations, with source path and migration id, and with current progress broken down by migration state: completed, live, stopped, running and ready.
status [--diagnostics]
[--migrations]
[--network]
[--transfers]
[--watch]
[--refresh-delay] int
[--full-screen]
Optional parameters
--diagnostics
Returns additional information about your Data Migrator instance and its migrations, useful for troubleshooting.--migrations
Displays information about each running migration.--network
Displays file transfer throughput in Gib/s during the last 10 seconds, 1 minute and 30 minutes.--transfers
Displays overall performance information about data transfers across the last 10 seconds, 1 minute and 30 minute intervals.--watch
Auto-refresh.--refresh-delay
Auto-refresh interval (in seconds).--full-screen
Auto-refresh fullscreen
Examples
Cirata LiveMigrator >> status
Network (10s) (1m) (30m)
Average Throughput: 10.4 Gib/s 9.7 Gib/s 10.1 Gib/s
Average Files/s: 425 412 403
11 Migrations dd:hh:mm dd:hh:mm
Complete: 1 Transferred Excluded Duration
/static1 5a93d5 67.1 GiB 2.3 GiB 00:12:34
Live: 3 Transferred Excluded Duration
/repl1 9088aa 143.2 GiB 17.3 GiB 00:00:34
/repl_psm1 a4a7e6 423.6 TiB 9.6 GiB 02:05:29
/repl5 ab140d 118.9 GiB 1.2 GiB 00:00:34
Running: 5 Transferred Excluded Duration Remaining
/repl123 e3727c 30.3/45.2 GiB 67% 9.8 GiB 00:00:34 00:00:17
/repl2 88e4e7 26.2/32.4 GiB 81% 0.2 GiB 00:01:27 00:00:12
/repl3 372056 4.1/12.5 GiB 33% 1.1 GiB 00:00:25 00:01:05
/repl4 6bc813 10.6/81.7 TiB 8% 12.4 GiB 00:04:21 01:02:43
/replxyz dc33cb 2.5/41.1 GiB 6% 6.5 GiB 01:00:12 07:34:23
Ready: 2
/repl7 070910 543.2 GiB
/repltest d05ca0 7.3 GiB
Cirata LiveMigrator >> status
Cirata LiveMigrator >> status --transfers
Files (10s) (1m) (30m)
Average Migrated/s: 362 158 4781
< 1 KB 14 27 3761
< 1 MB 151 82 0
< 1 GB 27 1 2
< 1 PB 0 0 0
< 1 EB 0 0 0
Peak Migrated/s: 505 161 8712
< 1 KB 125 48 7761
< 1 MB 251 95 4
< 1 GB 29 7 3
< 1 PB 0 0 0
< 1 EB 0 0 0
Average Scanned/s: 550 561 467
Average Rescanned/s: 24 45 56
Average Excluded/s: 7 7 6
Cirata LiveMigrator >> status --diagnostics
Uptime: 0 Days 1 Hours 23 Minutes 24 Seconds
SystemCpuLoad: 0.1433 ProcessCpuLoad: 0.0081
JVM GcCount: 192 GcPauseTime: 36 s (36328 ms)
OS Connections: 1, Tx: 0 B, Rx: 0 B, Retransmit: 0
Transfer Bytes (10/30/300s): 0.00 Gib/s, 0.00 Gib/s, 0.00 Gib/s
Transfer Files (10/30/300s): 0.00/s 0.00/s 0.00/s
Active Transfers/pull.threads: 0/100
Migrations: 0 RUNNING, 4 LIVE, 0 STOPPED
Actions Total: 0, Largest: "testmigration" 0, Peak: "MyMigration" 1
PendingRegions Total: 0 Avg: 0, Largest: "MyMigration" 0
FailedPaths Total: 0, Largest: "MyMigration" 0
File Transfer Retries Total: 0, Largest: "MyMigration" 0
Total Excluded Scan files/dirs/bytes: 26, 0, 8.1 MB
Total Iterated Scan files/dirs/bytes: 20082, 9876, 2.7 GB
EventsBehind Current/Avg/Max: 0/0/0, RPC Time Avg/Max: 4/8
EventsQueued: 0, Total Events Added: 504
Transferred File Size Percentiles:
2 B, 2 B, 2 B, 2 B, 2 B, 2 B, 2 B, 2 B, 2 B, 2 B
Transferred File Transfer Rates Percentiles per Second:
2 B, 2 B, 2 B, 2 B, 2 B, 2 B, 2 B, 2 B, 2 B, 2 B
Active File Size Percentiles:
0 B, 0 B, 0 B, 0 B, 0 B, 0 B, 0 B, 0 B, 0 B, 0 B
Active File Transfer Rates Percentiles per Second:
0 B, 0 B, 0 B, 0 B, 0 B, 0 B, 0 B, 0 B, 0 B, 0 B
Hive migration commands
hive migration add
Create a new Hive migration to initiate metadata migration from your source Metastore.
Create hive rules before initiating a Hive migration to enter which databases and tables are migrated.
hive migration add [--source] string
[--target] string
[--name] string
[--auto-start]
[--once]
[--rule-names] list
[--databricks-catalog] string
[--databricks-convert-to-delta]
[--databricks-delete-after-conversion]
[--databricks-fs-mount-point] string
[--databricks-default-fs-override] string
Mandatory parameters
--source
The name of the Hive agent for the source of migration.--target
The name of the Hive agent for the target of migration.--rule-names
The rule name or list of rule names to use with the migration. Multiple rules need to be comma-separated (for example:rule1,rule2,rule3
).
Metadata rules determine the scope of a migration, you need to add rules before creating your metadata migration.
Optional parameters
--name
The name to identify the migration with.--auto-start
Enter this parameter to start the migration immediately after creation.--once
Enter this parameter to perform a one-time migration, and not continuously scan for new or changing metadata.
Databricks agent override properties
If you want to use any Databricks agent override properties, you need to set --databricks-catalog
, --databricks-convert-to-delta
, --databricks-delete-after-conversion
, and --databricks-fs-mount-point
.
If you use --databricks-convert-to-delta
, you need to provide a value for --databricks-default-fs-override
.
--databricks-catalog
Enter the name of your Databricks Unity Catalog.--databricks-convert-to-delta
All underlying table data and metadata is migrated to the filesystem location defined by the--fs-mount-point
parameter. Use this option to automatically copy the associated data and metadata to Delta Lake on Databricks (AWS, Azure or GCP), and convert tables to Delta Lake format.--databricks-delete-after-conversion
Use this option to delete the underlying table data and metadata from the filesystem location defined by--fs-mount-point
after it's converted to Delta Lake on Databricks. :::important Only use this option if you're performing one-time migrations for the underlying table data. The Databricks agent doesn't support continuous (live) updates of table data when transferring to Delta Lake on Databricks. :::--databricks-fs-mount-point
Define the ADLS/S3/GCP location in the Databricks filesystem for containing migrations (for example:/mnt/mybucketname
).
This parameter is required if --convert-to-delta
is used. The Databricks agent will copy all associated table data and metadata into this location within the Databricks filesystem during conversion.
--databricks-default-fs-override
Enter an override for the default filesystem URI instead of a filesystem name (for example,dbfs:/mnt/adls2
).
Example
hive migration add --source sourceAgent --target remoteAgent --rule-names test_dbs,user_dbs --name hive_migration --auto-start
Auto-completion of the --rule-names
parameter will not work correctly if it is added at the end of the Hive migration parameters. See the troubleshooting guide for workarounds.
hive migration delete
Delete a Hive migration.
A Hive migration must be stopped before it can be deleted. This can be achieved by using the --force-stop
parameter with this command.
hive migration delete [--name] string [--force-stop]
Example
hive migration delete --name hive_migration --force-stop
hive migration list
List all Hive migrations.
hive migration list
hive migration pause
Pause a Hive migration. Use the --names
flag with a comma-separated list of migration names to pause multiple Hive migrations.
hive migration pause --names hmig1,hmig2
hive migration pause all
Pause all Hive migrations.
hive migration pause all
hive migration reset
Reset a stopped Hive migration. This returns the migration to a CREATED
state.
hive migration reset [--names] string
[--force-stop]
A Hive migration must be stopped before it can be reset. This can be achieved by using the --force-stop
parameter with this command.
The reset migration will use the latest agent settings.
For example, if the target agent’s Default Filesystem Override setting was updated after the original migration started, the reset migration will use the latest Default Filesystem Override value.
To reset multiple Hive migrations, use a comma-separated list of migration names with the --names
parameter.
Example
hive migration reset --names hive_migration1
hive migration reset --force-stop --names hive_migration1,hive_migration2
hive migration reset all
See the hive migration reset
command. Reset all Hive migrations.
hive migration reset all
hive migration resume
Resume STOPPED
, PAUSED
or FAILED
Hive migrations. Use the --names
flag with a comma-separated list of migration names to resume multiple Hive migrations.
hive migration resume --names Hmig1
hive migration resume all
Resume all STOPPED
, PAUSED
or FAILED
Hive migrations.
hive migration resume all
hive migration show
Display information about a Hive migration.
hive migration show
hive migration start
Start a Hive migration or a list of Hive migrations (comma-separated).
Enter the --once
parameter to perform a one-time migration, and not continuously scan for new or changing metadata.
hive migration start [--names] list [--once]
Example
hive migration start --names hive_migration1,hive_migration2
hive migration start all
Start all Hive migrations.
Enter the --once
parameter to perform a one-time migration, and not continuously scan for new or changing metadata.
hive migration start all [--once]
Example
hive migration start all --once
hive migration status
Show the status of a Hive migration or a list of Hive migrations (comma-separated).
hive migration status [--names] list
Example
hive migration status --names hive_migration1,hive_migration2
hive migration status all
Show the status of all Hive migrations.
hive migration status all
Example
hive migration status all
hive migration stop
Stop a running hive migration or a list of running hive migrations (comma-separated).
hive migration stop [--names] list
Example
hive migration stop --names hive_migration1,hive_migration2
hive migration stop all
Stop all running Hive migrations.
hive migration stop all
Example
hive migration stop all
Path mapping commands
path mapping add
Create a path mapping that allows you to define a alternative target path for a specific target filesystem. These will be automatically applied to a new migration.
When path mapping isn't used, the source path is created on the target filesystem.
Path mappings can't be applied to existing migrations. Delete and recreate a migration if you want a path mapping to apply.
path mapping add [--path-mapping-id] string
[--source-path] string
[--target] string
[--target-path] string
[--description] string
Mandatory parameters
--source-path
The path on the source filesystem.--target
The target filesystem id (value defined for the--file-system-id
parameter).--target-path
The path for the target filesystem.--description
Description of the path mapping enclosed in quotes ("text"
).
Optional parameters
--path-mapping-id
An ID for this path mapping. An ID will be auto-generated if you don't enter one.
Example
path mapping add --path-mapping-id hdp-hdi --source-path /apps/hive/warehouse --target mytarget --target-path /hive/warehouse --description "HDP to HDI - Hive warehouse directory"
path mapping delete
Delete a path mapping.
Deleting a path mapping will not affect any existing migrations that have the path mapping applied. Delete and recreate a migration if you no longer want a previous path mapping to apply.
path mapping delete [--path-mapping-id] string
Mandatory parameters
--path-mapping-id
The ID of the path mapping.
Example
path mapping delete --path-mapping-id hdp-hdi
path mapping list
List all path mappings.
path mapping list [--target] string
Optional parameters
--target
List path mappings for the specified target filesystem id.
Examples
path mapping list --target hdp-hdi
path mapping list --target hdp-hdi
path mapping show
Show details of a specified path mapping.
path mapping show [--path-mapping-id] string
Optional parameters
--path-mapping-id
The ID of the path mapping.
Example
path mapping show --path-mapping-id hdp-hdi
Built-in commands
clear
clear
echo
Prints whatever text that you write to the console. This can be used to sanity check a command before running it (for example: echo migration add --path /repl1 --target mytarget –-migration-id myNewMigration --exclusions 100mbfiles
).
echo [--message] string
exit
, quit
Entering either exit
or quit
will stop operation of Data Migrator when it is run from the command line. All processing will cease, and you will be returned to your system shell.
If your Data Migrator command line is connected to a Data Migrator system service, this command will end your interactive session with that service, which will remain in operation to continue processing migrations.
If this command is encountered during non-interactive processing of input (such as when you pipe input to an instance as part of another shell script) no further commands contained in that input will be processed.
exit
ALSO KNOWN AS
quit
help
Use the help
command to get details of all commands available from the action prompt.
help [-C] string
For longer commands, you can use backslashes (\
) to indicate continuation, or use quotation marks ("
) to enclose the full command. When using quotation marks, you can press Tab on your keyboard to make Data Migrator automatically suggest the remainder of your typed command.
See the examples below for reference.
Example
help connect
connect - Connect to Data Migrator and Hive Migrator.
connect [--host] string [--ssl] [--lm2port] int [--hvm-port] int [--timeout] integer [--user] string
help hive\ migration\ add
hive migration add - Create new migration.
hive migration add [--source] string [--target] string [--name] string [--auto-start] [--once] [--rule-names] list
help "filesystem add local"
filesystem add local - Add a local filesystem.
filesystem add local [--file-system-id] string [--fs-root] string [--source] [--scan-only] [--properties-files] list [--properties] string
history
Enter history
at the action prompt to list all previously entered commands.
Entering history --file <filename>
will save up to 500 most recently entered commands in text form to the file specified. Use this to record commands that you have executed.
history [--file] file
Optional parameters
--file
The name of the file in which to save the history of commands.
script
Load and execute commands from a text file using the script --file <filename>
command. This file should have one command per line, and each will be executed as though they were entered directly at the action prompt in that sequence.
Use scripts outside of the CLI by referencing the script when running the livedata-migrator
command (see examples).
script [--file] file
Mandatory parameters
--file
The name of the file containing script commands.
hive agent check --name sourceAgent
hive agent check --name azureAgent
Examples
These examples assume that myScript
is inside the working directory.
script --file myScript
livedata-migrator --script=./myScript
Change log level commands
log debug
log debug
log info
log info
log off
log off
log trace
log trace
Connect commands
connect
Use the connect
command to connect to both Data Migrator and Hivemigrator on the same host with a single command.
connect [--host] string
[--hvm-port] integer
[--ldm-port] integer
[--ssl]
[--timeout] integer
[--user] string
hivemigrator
livemigrator
Mandatory parameters
--host
The hostname or IP address for the Data Migrator and host.
Optional parameters
--hvm-port
Specify the Hivemigrator port. If not specified, the default port value of6780
will be used to connect.--ldm-port
Specify the Data Migrator port. If not specified, the default port value of18080
will be used to connect.--ssl
Enter this parameter if you want to establish a TLS connection to Data Migrator. Enable Server TLS on the Data Migrator service before using this parameter.--timeout
Define the connection timeout in milliseconds. Set this parameter to override the default connection timeout of 5 minutes (300000ms).--user
The username to use for authenticating to both services. Used when instances have basic or LDAP authentication enabled. You will be prompted to enter the user password.
Connect to the Data Migrator and Hivemigrator services on the host with this command.
connect --host localhost --hvm-port 6780 --ldm-port 18080 --user admin
connect livemigrator
Connect to the Data Migrator service on your Data Migrator host with this command.
This is a manual method of connecting to the Data Migrator service as the livedata-migrator
command (shown in CLI - Sign in) will attempt to establish this connection automatically.
connect livemigrator [--host] string
[--ssl]
[--port] int
[--timeout] integer
[--user] string
Mandatory parameters
--host
The hostname or IP address for the Data Migrator host.
Optional parameters
--ssl
Enter this parameter if you want to establish a TLS connection to Data Migrator. Enable Server TLS on the Data Migrator service before using this parameter.--port
The Data Migrator port to connect on (default is18080
).--timeout
Define the connection timeout in milliseconds. Set this parameter to override the default connection timeout of 5 minutes (300000ms).--user
The username to use for authenticating to the Data Migrator service. Used only when the Data Migrator instance has basic authentication enabled. You will still be prompted to enter the user password.
Connect to the Data Migrator service on your Data Migrator host with this command.
connect livemigrator --host localhost --port 18080
connect hivemigrator
Connect to the Hive Migrator service on your Data Migrator host with this command.
This is a manual method of connecting to the Hive Migrator service as the livedata-migrator
command (shown in CLI - Log in section) will attempt to establish this connection automatically.
connect hivemigrator [--host] string
[--ssl]
[--port] int
[--timeout] long
[--user] string
Mandatory parameters
--host
The hostname or IP address for the Data Migrator host that contains the Hive Migrator service.
Optional parameters
--ssl
Enter this parameter if you want to establish a TLS connection to Hive Migrator.--port
The Hive Migrator service port to connect on (default is6780
).--timeout
Define the connection timeout in milliseconds. Set this parameter to override the default connection timeout of 5 minutes (300000ms).--user
The username to use for authenticating to the Hive Migrator service. Used only when Hive Migrator has basic authentication enabled. You will still be prompted to enter the user password.
Example
connect hivemigrator --host localhost --port 6780
Email notifications subscription commands
notification email addresses add
Add email addresses to the subscription list for email notifications.
notification email addresses add [--addresses]
Mandatory parameters
--addresses
A comma-separated lists of email addresses to be added.
Example
notification email addresses add --addresses myemail@company.org,personalemail@gmail.com
notification email addresses remove
Remove email addresses from the subscription list for email notifications.
notification email addresses remove [--addresses]
Mandatory parameters
--addresses
A comma-separated lists of email addresses to be removed. Use auto-completion to quickly select from subscribed emails.
Example
notification email addresses remove --addresses myemail@company.org,personalemail@gmail.com
notification email smtp set
Configure the details of an SMTP server for Data Migrator to connect to.
notification email smtp set [--host] string
[--port] integer
[--security] security-enum
[--email] string
[--login] string
[--password] string
[--subject-prefix] string
Mandatory parameters
--host
The host address of the SMTP server.--port
The port to connect to the SMTP server. Many SMTP servers use port 25.--security
The type of security the server uses. Available options:NONE
,SSL
,STARTLS_ENABLED
,STARTTLS_REQUIRED
, orTLS
.--email
The email address for Data Migrator to use with emails sent through the SMTP server. This address will be the sender of all configured email notifications.
Optional parameters
--login
The username to authenticate with the SMTP server.--password
The password to authenticate with the SMTP server sign-in. Required if you sign in.--subject-prefix
Set an email subject prefix to help identify and filter Data Migrator notifications.
Example
notification email smtp set --host my.internal.host --port 587 --security TLS --email livedatamigrator@wandisco.com --login myusername --password mypassword
notification email smtp show
Display the details of the SMTP server Data Migrator is configured to use.
notification email smtp show
notification email subscriptions show
Show a list of currently subscribed emails and notifications.
notification email subscriptions show
notification email types add
Add notification types to the email notification subscription list.
See the output from the command notification email types show
for a list of all currently available notification types.
notification email types add [--types]
Mandatory parameters
--types
A comma-separated list of notification types to subscribe to.
Example
notification email types add MISSING_EVENTS,EVENTS_BEHIND,MIGRATION_AUTO_STOPPED
notification email types remove
Remove notification types from the email notification subscription list.
notification email types remove [--types]
Mandatory parameters
--types
A comma-separated list of notification types to unsubscribe from.
Example
notification email types remove MISSING_EVENTS,EVENTS_BEHIND,MIGRATION_AUTO_STOPPED
notification email types show
Return a list of all available notification types to subscribe to.
notification email types show
Hive Backup Commands
hive backup add
Immediately create a metadata backup file.
hive backup add
hive backup config show
Show the current metadata backup configuration.
hive backup config show
hive backup list
List all existing metadata backup files.
hive backup list
hive backup restore
Restore from a specified metadata backup file.
hive backup restore --name string
hive backup schedule configure
Configure a backup schedule for metadata migrations.
hive backup schedule configure --period-minutes 10 --enable
{
"enabled": true,
"periodMinutes": 10
}
hive backup schedule show
Show the current metadata backup schedule.
hive backup schedule show
{
"enabled": true,
"periodMinutes": 10
}
hive backup show
Show a specified metadata backup file.
hive backup show --name string
Hive configuration commands
hive config certificate generate
hive config certificate generate
hive config certificate upload
hive config certificate upload [--path-mapping-id] string
[--private-key] file
[--certificate] file
[--trusted-certificate] file
Mandatory parameters
--private-key
Client private key used to establish a TLS connection to the remote agent.--certificate
Client certificate used to establish a TLS connection to the remote agent.--trusted-certificate
Trusted certificate used to establish a TLS connection to the remote agent.
Hive rule configuration commands
hive rule add
,hive rule create
Create a Hive migration rule that is used to define which databases and tables are migrated.
Enter these rules when starting a new migration to control which databases and tables are migrated.
hive rule add [--database-pattern] string
[--table-pattern] string
[--name] string
ALSO KNOWN AS
hive rule create
Mandatory parameters
--database-pattern
Enter a Hive DDL pattern that will match the database names you want to migrate.--table-pattern
Enter a Hive DDL pattern that will match the table names you want to migrate.
You can use a single asterisk (*
) if you want to match all databases and/or all tables within the Metastore/database.
Optional parameters
--name
The name for the Hive rule.
Example
hive rule add --name test_databases --database-pattern test* --table-pattern *
hive rule configure
Change the parameters of an existing Hive rule.
The parameters that can be changed are the same as the ones listed in the hive rule add
,hive rule create
section.
All parameters are optional except --name
, which is required to enter the existing Hive rule that you wish to configure.
Example
hive rule configure --name test_databases --database-pattern test_db*
hive rule delete
hive rule delete [--name] string
Example
hive rule delete --name test_databases
hive rule list
hive rule list
hive rule show
hive rule show [--name] string
Example
hive rule show --name test_databases
Hive show commands
hive show conf
hive show conf [--parameter] string
[--agent-name] string
Hive show configuration parameters
--agent-name
The name of the agent.--parameter
The configuration parameter/property that you want to show the value of.
Example
hive show conf --agent-name sourceAgent --parameter hive.metastore.uris
hive show database
hive show database [--database] string
[--agent-name] string
Hive show database parameters
--database
The database name. If not specified, the default will bedefault
.--agent-name
The name of the agent.
Example
hive show database --agent-name sourceAgent --database mydb01
hive show databases
hive show databases [--like] string
[--agent-name] string
Hive show databases parameters
--like
The Hive DDL pattern to use to match the database names (for example:testdb*
will match any database name that begins with "testdb").--agent-name
The name of the agent.
Example
hive show database --agent-name sourceAgent --like testdb*
hive show indexes
hive show indexes [--database] string
[--table] string
[--agent-name] string
Hive show indexes parameters
--database
The database name.--table
The table name.--agent-name
The name of the agent.
Example
hive show indexes --agent-name sourceAgent --database mydb01 --table mytbl01
hive show partitions
hive show partitions [--database] string
[--table] string
[--agent-name] string
Hive show partitions parameters
--database
The database name.--table
The table name.--agent-name
The name of the agent.
Example
hive show partitions --agent-name sourceAgent --database mydb01 --table mytbl01
hive show table
hive show table [--database] string
[--table] string
[--agent-name] string
Hive show table parameters
--database
The database name where the table is located.--table
The table name.--agent-name
The name of the agent.
Example
hive show table --agent-name sourceAgent --database mydb01 --table mytbl01
hive show tables
hive show tables [[--like] string] [[--database] string] [[--agent-name] string]
Hive show tables parameters
--like
The Hive DDL pattern to use to match the table names (for example:testtbl*
will match any table name that begins with "testtbl").--database
Database name. Defaults to default if not set.--agent-name
The name of the agent.
Example
hive show tables --agent-name sourceAgent --database mydb01 --like testtbl*
License manipulation commands
license show
license show [--full]
license upload
license upload [--path] string
Example
license upload --path /user/hdfs/license.key
Notification commands
notification latest
notification latest
notification list
notification list [--count] integer
[--since] string
[--type] string
[--exclude-resolved]
[--level] string
Optional parameters
--count
The number of notifications to return.--since
Return notifications created after this date/time.--type
The type of notification to return e.g. LicenseExceptionNotification.--exclude-resolved
Exclude resolved notifications.--level
The level of notification to return.
notification show
notification show [--notification-id] string
Mandatory parameters
--notification-id
The id of the notification to return.
Source commands
source clear
Clear all information that Data Migrator maintains about the source filesystem by issuing the source clear
command. This will allow you to define an alternative source to one previously defined or detected automatically.
source clear
source delete
Use source delete
to delete information about a specific source by ID. You can obtain the ID for a source filesystem with the output of the source show
command.
source delete [--file-system-id] string
Mandatory parameters
--file-system-id
The ID of the source filesystem resource you want to delete.
Example
source delete --file-system-id auto-discovered-source-hdfs
source show
Get information about the source filesystem configuration.
source show [--detailed]
Optional parameters
---detailed
Include all configuration properties for the source filesystem in the response.