Skip to main content
Version: 3.0 (latest)

Configure Hive Migrator

Properties are defined in the following files:

/etc/wandisco/hivemigrator/application.properties

/etc/wandisco/hivemigrator/log4j2.yaml

/etc/wandisco/hivemigrator/vars.sh

After adding new properties or changing existing values, always restart the Hive Migrator service.

note

Data Migrator 1.19 Hive Migrator configuration changes

Hive Migrator configuration is now stored in /etc/wandisco/hivemigrator/application.properties.

The following configuration files have been removed:

  • /etc/wandisco/hivemigrator/application.yaml
  • /etc/wandisco/hivemigrator/hive-migrator.yaml

This update is automatically applied when upgrading from earlier product versions. There's no need to make any manual changes.

Application properties

General configuration

These configuration properties are used to adjust general items of operation.

NameDetails
micronaut.server.portThe non-TLS port to access Hive Migrator

Default value: 6780
Allowed values: Integer value of available port
micronaut.server.dualProtocolWhether Hive Migrator should allow both non-TLS connections on the non-TLS port and allow TLS connections on the TLS port

Default value: true
Allowed values: true,false
micronaut.server.hostHive Migrator server host

Default value: 127.0.0.1
Allowed values: Hive Migrator IP address
hivemigrator.migrationWorkerThreadsThe number of migration worker threads

Default value: Integer value of number of available cores
Allowed values: An integer value
hivemigrator.migrationBatchSizeThe maximum number of objects to migrate at the same time see Migration batch size

Default value: 1000
Allowed values: An integer value
hivemigrator.connectionRetryTimeoutThe connection retry timeout in minutes

Default value: 20
Allowed values: An integer value
hivemigrator.delayBetweenScanRoundsThe delay between scans in seconds

Default value: 1
Allowed values: An integer value
hivemigrator.hikariConfig.maximumPoolSizeThe maximum Hikari pool size

Default value: Integer value of twice the number of available cores
Allowed values: An integer value
hivemigrator.hikariConfig.connectionTimeoutThe timeout in milliseconds for the Hikari pool

Default value: 30000
Allowed values: An integer value
hivemigrator.delayBetweenErrorNotificationsThe delay in minutes between non fatal error notifications a migration can report to the UI and CLI. Error information remains recorded in Hivemigrator logs.

Default value: 5
Allowed values: An integer value

Migration batch size

Only adjust this setting when directed by support.

Use the following parameter to set the limit of the number of objects that are sent in a single request to a target metastore:

Batch size parameter:
hivemigrator.migrationBatchSize=1000

The default is 1000. To change the property, for example, to increase the batch size for migration of tables with large schemas and allow processing larger batch size quantities:

  1. Open /etc/wandisco/hivemigrator/application.properties in a text editor.
  2. Add the line hivemigrator.migrationBatchSize=<integer>, where <integer> is the maximum number of objects for each request.
  3. Save the change.
  4. Restart the Hive Migrator service. See System service commands - Hive Migrator
info

AWS Glue Data Catalog allows a maximum of 100 objects for each request.
Set hivemigrator.migrationBatchSize to 100 if you have AWS Glue as your target.

tip

The Hive Migrator batch size application property is applied to all metadata migrations. To increase the batch size for individual metadata migrations in Data Migrator 3.0, use the /migration/update or /migration Hive Migrator API endpoints. Only adjust this setting when directed by support. See Access Hive Migrator API.

State

Hive Migrator uses an internally-managed database to store migrations and migration status information.

NameDetails
hivemigrator.storagePathThe Hive Migrator internal database location

Default value: /opt/wandisco/hivemigrator/hivemigrator.db
Allowed values: Any valid file path

Backup configuration

NameDetails
hivemigrator.backups.locationThe default location to store backup files

Default value: /opt/wandisco/hivemigrator/backups
Allowed values: A valid path
hivemigrator.backups.namePrefixThe prefix for backup filenames

Default value: hvmbackup
Allowed values: Any filename string
hivemigrator.backups.listMaxSizeThe maximum number of backups to be returned by the /backups endpoint

Default value: 1000
Allowed values: Positive integer

Security

Secure access to the Hive Migrator REST API through configuration. Select no security or HTTP basic security. To configure secure access, see Configure basic auth.

NameDetails
micronaut.security.enabledEnable or disable basic authentication

Default value: false
Allowed values: true,false
hivemigrator.usernameThe Hive Migrator API basic authentication username

Default value: (none)
Allowed values: Any string that defines a username (no whitespace)
hivemigrator.passwordThe Hive Migrator API basic authentication password

Default value: (none)
Allowed values: bcrypt generated password, encrypted using a bcrypt generator that provides a "2a" prefix at the beginning of the encrypted password
hivemigrator.integration.liveDataMigrator.usernameThe basic authentication username for Data Migrator

Default value: (none)
Allowed values: The Data Migrator basic authentication username
hivemigrator.integration.liveDataMigrator.passwordThe basic authentication password for Data Migrator

Default value: (none)
Allowed values: The Data Migrator basic authentication password encrypted with the encryptor tool

Connect to Data Migrator and Hive Migrator with basic authentication

To use basic authentication for Data Migrator and Hive Migrator, follow the steps in Configure basic auth.

Connect to Hive Migrator with basic authentication

Follow these steps if you used different credentials for Data Migrator and Hive Migrator, or if basic authentication isn't enabled on Data Migrator.

This step isn't required if you use the same credentials for both services.

When basic authentication is enabled, enter the username and password when prompted to connect to Hive Migrator with the CLI:

Example
connect hivemigrator localhost: trying to connect...
Username: admin
Password: ***********
Connected to hivemigrator v1.<VERSION-NUMBER> on http://localhost:6780.

The username and password are required for direct access to the Hive Migrator REST API.

info

If you enable basic authentication on Hive Migrator, ensure you update the UI with the credentials to maintain functionality.

Transport Layer Security

When you deploy a remote agent, Hive Migrator automatically generates certificates and establishes a transport layer security (TLS) connection to the agent. You can configure TLS for the Hive Migrator API, for integration with Data Migrator, and for any remote metastore agents.

TLS for Hive Migrator API

info

Due to a limitation of the Micronaut framework, your key specified for the Hive Migrator API must be the first or only key contained within the keystore. Provide a keystore containing either the first or only key you require for the Hive Migrator API.

To configure TLS for Hive Migrator to encrypt Hive Migrator's API, use the following properties in /etc/wandisco/hivemigrator/application.properties:

NameDetails
micronaut.ssl.enabledWhether Hive Migrator should use TLS on its own API

Default value: true
Allowed values: true,false
micronaut.ssl.buildSelfSignedWhether Hive Migrator should generate self-signed certificates. We recommend setting to false when using a custom truststore and keystore

Default value: true
Allowed values: true, false
micronaut.ssl.portThe port to access the Hive Migrator API on TLS

Default value: 6781
Allowed values: Integer value of available port
micronaut.ssl.key-store.pathThe path to the keystore for Hive Migrator prefixed with "file:", for example: file:/etc/keystore/keystore.jks.

Default value: (none)
Allowed values: Any valid path to a keystore file with prefix: "file:".
micronaut.ssl.key-store.passwordThe keystore password

Default value: (none)
Allowed values: A valid password string
micronaut.ssl.key-store.typeThe keystore file type

Default value: (none)
Allowed values: PKCS12,JKS
micronaut.ssl.key.aliasThe alias of the certificate in the keystore that Hive Migrator will supply to clients

Default value: (none)
Allowed values: A valid alias string

TLS for remote agents

When you deploy a remote agent (for example, Azure SQL or AWS Glue), Hive Migrator establishes a Transport Layer Security (TLS) connection to the agent.

Certificates and keys are automatically generated for this connection for both Hive Migrator and the remote agent. These are placed in the following directories:

Hive Migrator - Client and Root CA certificates
/etc/wandisco/hivemigrator/client-key.pem
/etc/wandisco/hivemigrator/client-cert.pem
/etc/wandisco/hivemigrator/ca-cert.pem
/etc/wandisco/hivemigrator/ca-key.pem
/etc/wandisco/hivemigrator/ca-cert.srl
Remote agent - Server and Root CA certificates
/etc/wandisco/hivemigrator-remote-server/server-key.pem
/etc/wandisco/hivemigrator-remote-server/server-cert.pem
/etc/wandisco/hivemigrator-remote-server/ca-cert.pem

You can apply existing certificates for remote agents, upload existing certificates, or generate new certificates with the Hive Migrator REST API

To update UI with details for TLS, remove the product instance and then add it again with the updated TLS connection details.

Apply existing certificates for remote agents

Use the following steps to apply existing certificates for remote agents:

  1. Generate self-signed certificates, assigning Hive Migrator as the client and the Hive Migrator remote agent as the server.
    Use the following file names:

    ca-cert.pem,
    client-cert.pem,
    client-key.pem,
    server-cert.pem,
    server-key.pem
  2. Copy the following files to the Hive Migrator directory /etc/wandisco/hivemigrator/:

    • ca-cert.pem
    • client-cert.pem
    • client-key.pem
  3. Copy the following files to the Hive Migrator remote server directory /etc/wandisco/hivemigrator-remote-server:

    • ca-cert.pem
    • server-cert.pem
    • server-key.pem
  4. Restart the service for the Hive Migrator remote server by running the command:

    service hivemigrator-remote-server restart

Upload existing certificates

info

Ensure the correct certificates and keys are uploaded for Hive Migrator and all remote agents that are connected.

Existing connections will break if the trust relationship isn't established between Hive Migrator and remote agents.

Upload certificates and keys by using the following Hive Migrator REST API endpoints:

Hive Migrator
POST ​/config​/certificates​/upload
Remote agent
POST ​/agents/{name}/certificates/upload

The remote agent service restarts automatically when new certificates are uploaded this way. The Hive Migrator service doesn't require a restart to start using new certificates.

Generate new certificates with the Hive Migrator REST API

info

Generate new certificates for Hive Migrator and all remote agents that are connected.

Generating certificates for just one of these components breaks existing connections.

Generate new certificates and keys by using the following Hive Migrator REST API endpoints:

Hive Migrator
POST ​/config​/certificates​/generate
Remote agent
POST ​/agents/{name}/certificates/generate

The remote agent service automatically restarts when you generate new certificates this way. You don't need to restart the Hive Migrator service to use the new certificates.

Configure TLS for integration with Data Migrator

NameDetails
hivemigrator.integration.liveDataMigrator.useSslWhether Hive Migrator should use TLS for Data Migrator

Default value: false
Allowed values: true,false
hivemigrator.integration.liveDataMigrator.trust-store.pathWhich truststore should be used to determine whether Hive Migrator will trust the certificate obtained from Data Migrator

Default value: /etc/wandisco/hivemigrator/tls/keystore.p12
Allowed values: A valid path to truststore
hivemigrator.integration.liveDataMigrator.trust-store.passwordThe password for the truststore defined in the above trust-store.path parameter

Default value: (none)
Allowed values: Truststore password string
hivemigrator.integration.liveDataMigrator.trust-store.typeThe file type of the truststore defined in the above trust-store.path parameter

Default value: (none)
Allowed values: PKCS12, JKS
hivemigrator.integration.liveDataMigrator.hostThe Data Migrator host

Default value: localhost
Allowed values: The valid Data Migrator hostname or IP
hivemigrator.integration.liveDataMigrator.portThe Data Migrator port

Default value: 18080
Allowed values: The valid Data Migrator port
hivemigrator.integration.liveDataMigrator.connectionMaxRetriesThe number of retries for Data Migrator connection

Default value: 5
Allowed values: An integer value
hivemigrator.integration.liveDataMigrator.connectionRetryDelayThe delay in milliseconds between Data Migrator retries

Default value: 5000
Allowed values: An integer value

Talkback

Change the following properties to alter how the talkback API outputs data:

NameDetails
talkback.apidata.enabledEnables an hourly process to log Hive Migrator configuration from the output of REST endpoints.
Default value: true
talkback.apidata.backupsCountSpecifies the maximum number of backups that the talkback process gets from the REST endpoint.
Default value: 50

Location mapping

Use locationmapping application properties to specify additional table location properties Hivemigrator maps to the target location during migration. By default, no additional location mapping is applied.

Uncomment the existing properties and supply comma-separated table property name values to adjust the default location mapping.

For example, to map the location value of a SerDe location property of 'path', add the table property as a value for hivemigrator.locationmapping.keys.serde and restart the Hivemigrator service for the change to take effect.

caution

Mapping applies to all tables with the specified property name values regardless of the property being a valid source location or otherwise.

Any source value, with a mapping applied to the property, will be mapped to the target filesystem replacing the first value after two forward slashes("//"). For example, a source value of todo://our_placeholder/data/ equates to a target value of <target_filesystem>/data.

Additionally, Any source value, with a mapping applied to the property, will be mapped to the target filesystem appending the target filesystem to the beginning of the value. For example, a source value containing no forward slash("/") example_value, equates to a target value of <target_filesystem>/example_value.

Only adjust or uncomment these properties when directly instructed by Support.

NameDetails
hivemigrator.locationmapping.keys.serdeMap SerDe table location properties values supplied to this property to the target location during migration.
Default value: none
hivemigrator.locationmapping.keys.tablepropertiesMap table location properties supplied to this property to the target location during migration.
Default value: none

Iceberg agent historical metadata retention

The following properties allow the adjustment of Iceberg agent behaviour.

NameDetails
iceberg.write.metadata.previous-versions-maxLimits the number of previous metadata versions to retain on Iceberg agents.
Default value: 200
caution

Don't adjust this value unless explicitly directed. See the current recommended value for historical metadata retention.

Secrets store properties

HashiCorp Vault

Add the following properties to integrate with HashiCorp Vault. See the HashiCorp Vault configuration section for steps and examples for each authentication type.

info

The following properties cannot be referenced using application property references, if you are integrating with a HashiCorp Vault server ensure these properties and values are used in your bootstrap.properties file.

NameDetails
spring.cloud.vault.enabledDetermines whether the Vault integration is enabled, accepted vaules: true or false
Default value: none
spring.cloud.vault.uriSpecifies the URI (including protocol, host, and port) of the HashiCorp Vault server. For example: http://127.0.0.1:8200 or https://127.0.0.1:8222
Default value: none
spring.cloud.vault.authenticationSpecifies the authentication method for Data Migrator to use when connecting to HashiCorp Vault. Use either TOKEN or APPROLE.
Default value: none
spring.cloud.vault.tokenSpecifies the authentication token that will be used to authenticate with HashiCorp Vault.
Default value: none
spring.config.importSpecifies comma-seperated vault location sources of key-value secrets used for application properties. See the reference format.
Default value: none
spring.cloud.vault.app-role.role-idSpecifies the role ID for AppRole authentication method when connecting to HashiCorp Vault.
Default value: none
spring.cloud.vault.app-role.secret-idSpecifies the secret ID for AppRole authentication method when connecting to HashiCorp Vault. The secret ID, along with the role ID, is used to authenticate with the Vault when APPROLE authentication is used.
Default value: none
note

When disabling HashiCorp Vault integration, set spring.cloud.vault.enabled equal to false, ensure no references are in use including any used with the spring.config.import property. Comment out or remove any reference values from the spring.config.import property and restart the Hive Migrator service for changes to take effect.

Databricks

Databricks concurrent thread configuration

Add the following properties to /etc/wandisco/hivemigrator/application.properties to optimize migration performance by controlling concurrency and thread utilization.

After adding new properties or changing existing values, restart the Hive Migrator service for changes to take effect.

See Databricks concurrent thread configuration for more information.

NameDetails
hivemigrator.databricks.threadcountSpecifies the total number of threads to allocate across all Databricks migrations.
Default value: 10
hivemigrator.databricks.tablethreadcountSpecifies the number of threads to allocate per Databricks migration.
Default value: 10

Hive Migrator logging

You can find configuration for the Hive Migrator log in /etc/wandisco/hivemigrator/log4j2.yaml.

Hive Migrator debug logging

Below shows the steps to enable debug level logging for the application and additional logging for the Java virtual machine (JVM).

Enable application debug logging

Use the following steps to enable debug level logging for Hive Migrator:

  1. Open /etc/wandisco/hivemigrator/log4j2.yaml
  2. Edit the log.level property value from info to debug.
Example
- name: log.level
value: "debug"
  1. Save the changes.
  2. Restart the Hive Migrator service. See System service commands - Hive Migrator
info

After you've completed your investigation, reverse the changes. The debug mode generates massive log files that may exhaust your available storage.

Enable Java connection debug logging

Use the following steps to investigate authentication/authorization problems when using a HDFS source or target:

  1. Open /etc/wandisco/hivemigrator/vars.sh.
  2. Add the following JVM argument:
    HVM_EXTRA_JVM_ARGS="-Dsun.security.krb5.debug=true -Dsun.security.jgss.debug=true -Dsun.security.spnego.debug=true"
  3. Add the following log location parameter:
    LOG_OUT_FILE=/var/log/wandisco/hivemigrator/hivemigrator.out
  4. Save the changes.
  5. Restart the Hive Migrator service. See System service commands - Hive Migrator
info

After you've completed your investigation, reverse the changes. The debug mode generates massive log files that may exhaust your available storage.

Learn more about adding additional JVM arguments in the knowledge base.

Change Hive Migrator log and heap dump location

The default path for Hive Migrator logs is /var/log/wandisco/hivemigrator. You can change the default path by editing the /etc/wandisco/hivemigrator/log4j2.yaml file. Before making changes, ensure the path exists and is writable by Hive Migrator.

  1. Open /etc/wandisco/hivemigrator/log4j2.yaml
  2. Edit the log.dir property value with your new path.
Example
- name: log.dir
value: "/var/log/newlocation/hivemigrator"
  1. Save the changes.
  2. Restart the Hive Migrator service. See System service commands - Hive Migrator
info

Changes to the default log location won't persist after upgrade. After upgrading to a new version, you will have to reapply these changes.

Change Hive Migrator audit log location

The default path for Hive Migrator logs is /var/log/wandisco/audit/hivemigrator. You can change the default path by editing the /etc/wandisco/hivemigrator/log4j2.yaml file. Before making changes, ensure the path exists and is writable by Hive Migrator.

  1. Open /etc/wandisco/hivemigrator/log4j2.yaml
  2. Edit the audit.dir property value with your new path.
Example
- name: audit.dir
value: "/var/log/wandisco/audit/hivemigrator"
  1. Save the changes.
  2. Restart the Hive Migrator service. See System service commands - Hive Migrator
info

Changes to the default log location won't persist after upgrade. After upgrading to a new version, you will have to reapply these changes.

Directory structure

The following directories are used for Hive Migrator:

LocationContent
/var/log/wandisco/hivemigratorLogs
/etc/wandisco/hivemigratorConfiguration files
/opt/wandisco/hivemigratorJava archive files
/var/run/hivemigratorRuntime files

Remote servers

The following directories are used for Hive Migrator remote servers (remote agents):

LocationContent
/var/log/wandisco/hivemigrator-remote-serverLogs
/etc/wandisco/hivemigrator-remote-serverConfiguration files
/opt/wandisco/hivemigrator-remote-serverJava archive files
/var/run/hivemigrator-remote-serverRuntime files