Configure Hive Migrator
Properties are defined in the following files:
/etc/wandisco/hivemigrator/application.properties
/etc/wandisco/hivemigrator/log4j2.yaml
/etc/wandisco/hivemigrator/vars.sh
After adding new properties or changing existing values, always restart the Hive Migrator service.
Data Migrator 1.19 Hive Migrator configuration changes
Hive Migrator configuration is now stored in /etc/wandisco/hivemigrator/application.properties
.
The following configuration files have been removed:
/etc/wandisco/hivemigrator/application.yaml
/etc/wandisco/hivemigrator/hive-migrator.yaml
This update is automatically applied when upgrading from earlier product versions. There's no need to make any manual changes.
Application properties
General configuration
These configuration properties are used to adjust general items of operation.
Name | Details |
---|---|
micronaut.server.port | The non-TLS port to access Hive Migrator Default value: 6780 Allowed values: Integer value of available port |
micronaut.server.dualProtocol | Whether Hive Migrator should allow both non-TLS connections on the non-TLS port and allow TLS connections on the TLS port Default value: true Allowed values: true ,false |
micronaut.server.host | Hive Migrator server host Default value: 127.0.0.1 Allowed values: Hive Migrator IP address |
hivemigrator.migrationWorkerThreads | The number of migration worker threads Default value: Integer value of number of available cores Allowed values: An integer value |
hivemigrator.migrationBatchSize | The maximum number of objects to migrate at the same time see Migration batch size Default value: 1000 Allowed values: An integer value |
hivemigrator.connectionRetryTimeout | The connection retry timeout in minutes Default value: 20 Allowed values: An integer value |
hivemigrator.delayBetweenScanRounds | The delay between scans in seconds Default value: 1 Allowed values: An integer value |
hivemigrator.hikariConfig.maximumPoolSize | The maximum Hikari pool size Default value: Integer value of twice the number of available cores Allowed values: An integer value |
hivemigrator.hikariConfig.connectionTimeout | The timeout in milliseconds for the Hikari pool Default value: 30000 Allowed values: An integer value |
hivemigrator.delayBetweenErrorNotifications | The delay in minutes between non fatal error notifications a migration can report to the UI and CLI. Error information remains recorded in Hivemigrator logs. Default value: 5 Allowed values: An integer value |
Migration batch size
Only adjust this setting when directed by support.
Use the following parameter to set the limit of the number of objects that are sent in a single request to a target metastore:
hivemigrator.migrationBatchSize=1000
The default is 1000. To change the property, for example, to increase the batch size for migration of tables with large schemas and allow processing larger batch size quantities:
- Open
/etc/wandisco/hivemigrator/application.properties
in a text editor. - Add the line
hivemigrator.migrationBatchSize=<integer>
, where<integer>
is the maximum number of objects for each request. - Save the change.
- Restart the Hive Migrator service. See System service commands - Hive Migrator
AWS Glue Data Catalog allows a maximum of 100 objects for each request.
Set hivemigrator.migrationBatchSize
to 100
if you have AWS Glue as your target.
The Hive Migrator batch size application property is applied to all metadata migrations.
To increase the batch size for individual metadata migrations in Data Migrator 3.0, use the /migration/update
or /migration
Hive Migrator API endpoints. Only adjust this setting when directed by support.
See Access Hive Migrator API.
State
Hive Migrator uses an internally-managed database to store migrations and migration status information.
Name | Details |
---|---|
hivemigrator.storagePath | The Hive Migrator internal database location Default value: /opt/wandisco/hivemigrator/hivemigrator.db Allowed values: Any valid file path |
Backup configuration
Name | Details |
---|---|
hivemigrator.backups.location | The default location to store backup files Default value: /opt/wandisco/hivemigrator/backups Allowed values: A valid path |
hivemigrator.backups.namePrefix | The prefix for backup filenames Default value: hvmbackup Allowed values: Any filename string |
hivemigrator.backups.listMaxSize | The maximum number of backups to be returned by the /backups endpointDefault value: 1000 Allowed values: Positive integer |
Security
Secure access to the Hive Migrator REST API through configuration. Select no security or HTTP basic security. To configure secure access, see Configure basic auth.
Name | Details |
---|---|
micronaut.security.enabled | Enable or disable basic authentication Default value: false Allowed values: true ,false |
hivemigrator.username | The Hive Migrator API basic authentication username Default value: (none) Allowed values: Any string that defines a username (no whitespace) |
hivemigrator.password | The Hive Migrator API basic authentication password Default value: (none) Allowed values: bcrypt generated password, encrypted using a bcrypt generator that provides a "2a" prefix at the beginning of the encrypted password |
hivemigrator.integration.liveDataMigrator.username | The basic authentication username for Data Migrator Default value: (none) Allowed values: The Data Migrator basic authentication username |
hivemigrator.integration.liveDataMigrator.password | The basic authentication password for Data Migrator Default value: (none) Allowed values: The Data Migrator basic authentication password encrypted with the encryptor tool |
Connect to Data Migrator and Hive Migrator with basic authentication
To use basic authentication for Data Migrator and Hive Migrator, follow the steps in Configure basic auth.
Connect to Hive Migrator with basic authentication
Follow these steps if you used different credentials for Data Migrator and Hive Migrator, or if basic authentication isn't enabled on Data Migrator.
When basic authentication is enabled, enter the username and password when prompted to connect to Hive Migrator with the CLI:
connect hivemigrator localhost: trying to connect...
Username: admin
Password: ***********
Connected to hivemigrator v1.<VERSION-NUMBER> on http://localhost:6780.
The username and password are required for direct access to the Hive Migrator REST API.
If you enable basic authentication on Hive Migrator, ensure you update the UI with the credentials to maintain functionality.
Transport Layer Security
When you deploy a remote agent, Hive Migrator automatically generates certificates and establishes a transport layer security (TLS) connection to the agent. You can configure TLS for the Hive Migrator API, for integration with Data Migrator, and for any remote metastore agents.
TLS for Hive Migrator API
Due to a limitation of the Micronaut framework, your key specified for the Hive Migrator API must be the first or only key contained within the keystore. Provide a keystore containing either the first or only key you require for the Hive Migrator API.
To configure TLS for Hive Migrator to encrypt Hive Migrator's API, use the following properties in /etc/wandisco/hivemigrator/application.properties
:
Name | Details |
---|---|
micronaut.ssl.enabled | Whether Hive Migrator should use TLS on its own API Default value: true Allowed values: true ,false |
micronaut.ssl.buildSelfSigned | Whether Hive Migrator should generate self-signed certificates. We recommend setting to false when using a custom truststore and keystore Default value: true Allowed values: true , false |
micronaut.ssl.port | The port to access the Hive Migrator API on TLS Default value: 6781 Allowed values: Integer value of available port |
micronaut.ssl.key-store.path | The path to the keystore for Hive Migrator prefixed with "file:", for example: file:/etc/keystore/keystore.jks. Default value: (none) Allowed values: Any valid path to a keystore file with prefix: "file:". |
micronaut.ssl.key-store.password | The keystore password Default value: (none) Allowed values: A valid password string |
micronaut.ssl.key-store.type | The keystore file type Default value: (none) Allowed values: PKCS12 ,JKS |
micronaut.ssl.key.alias | The alias of the certificate in the keystore that Hive Migrator will supply to clients Default value: (none) Allowed values: A valid alias string |
TLS for remote agents
When you deploy a remote agent (for example, Azure SQL or AWS Glue), Hive Migrator establishes a Transport Layer Security (TLS) connection to the agent.
Certificates and keys are automatically generated for this connection for both Hive Migrator and the remote agent. These are placed in the following directories:
/etc/wandisco/hivemigrator/client-key.pem
/etc/wandisco/hivemigrator/client-cert.pem
/etc/wandisco/hivemigrator/ca-cert.pem
/etc/wandisco/hivemigrator/ca-key.pem
/etc/wandisco/hivemigrator/ca-cert.srl
/etc/wandisco/hivemigrator-remote-server/server-key.pem
/etc/wandisco/hivemigrator-remote-server/server-cert.pem
/etc/wandisco/hivemigrator-remote-server/ca-cert.pem
You can apply existing certificates for remote agents, upload existing certificates, or generate new certificates with the Hive Migrator REST API
To update UI with details for TLS, remove the product instance and then add it again with the updated TLS connection details.
Apply existing certificates for remote agents
Use the following steps to apply existing certificates for remote agents:
Generate self-signed certificates, assigning Hive Migrator as the client and the Hive Migrator remote agent as the server.
Use the following file names:ca-cert.pem,
client-cert.pem,
client-key.pem,
server-cert.pem,
server-key.pemCopy the following files to the Hive Migrator directory
/etc/wandisco/hivemigrator/
:ca-cert.pem
client-cert.pem
client-key.pem
Copy the following files to the Hive Migrator remote server directory
/etc/wandisco/hivemigrator-remote-server
:ca-cert.pem
server-cert.pem
server-key.pem
Restart the service for the Hive Migrator remote server by running the command:
service hivemigrator-remote-server restart
Upload existing certificates
Ensure the correct certificates and keys are uploaded for Hive Migrator and all remote agents that are connected.
Existing connections will break if the trust relationship isn't established between Hive Migrator and remote agents.
Upload certificates and keys by using the following Hive Migrator REST API endpoints:
POST /config/certificates/upload
POST /agents/{name}/certificates/upload
The remote agent service restarts automatically when new certificates are uploaded this way. The Hive Migrator service doesn't require a restart to start using new certificates.
Generate new certificates with the Hive Migrator REST API
Generate new certificates for Hive Migrator and all remote agents that are connected.
Generating certificates for just one of these components breaks existing connections.
Generate new certificates and keys by using the following Hive Migrator REST API endpoints:
POST /config/certificates/generate
POST /agents/{name}/certificates/generate
The remote agent service automatically restarts when you generate new certificates this way. You don't need to restart the Hive Migrator service to use the new certificates.
Configure TLS for integration with Data Migrator
Name | Details |
---|---|
hivemigrator.integration.liveDataMigrator.useSsl | Whether Hive Migrator should use TLS for Data Migrator Default value: false Allowed values: true ,false |
hivemigrator.integration.liveDataMigrator.trust-store.path | Which truststore should be used to determine whether Hive Migrator will trust the certificate obtained from Data Migrator Default value: /etc/wandisco/hivemigrator/tls/keystore.p12 Allowed values: A valid path to truststore |
hivemigrator.integration.liveDataMigrator.trust-store.password | The password for the truststore defined in the above trust-store.path parameterDefault value: (none) Allowed values: Truststore password string |
hivemigrator.integration.liveDataMigrator.trust-store.type | The file type of the truststore defined in the above trust-store.path parameterDefault value: (none) Allowed values: PKCS12 , JKS |
hivemigrator.integration.liveDataMigrator.host | The Data Migrator host Default value: localhost Allowed values: The valid Data Migrator hostname or IP |
hivemigrator.integration.liveDataMigrator.port | The Data Migrator port Default value: 18080 Allowed values: The valid Data Migrator port |
hivemigrator.integration.liveDataMigrator.connectionMaxRetries | The number of retries for Data Migrator connection Default value: 5 Allowed values: An integer value |
hivemigrator.integration.liveDataMigrator.connectionRetryDelay | The delay in milliseconds between Data Migrator retries Default value: 5000 Allowed values: An integer value |
Talkback
Change the following properties to alter how the talkback API outputs data:
Name | Details |
---|---|
talkback.apidata.enabled | Enables an hourly process to log Hive Migrator configuration from the output of REST endpoints. Default value: true |
talkback.apidata.backupsCount | Specifies the maximum number of backups that the talkback process gets from the REST endpoint. Default value: 50 |
Location mapping
Use locationmapping
application properties to specify additional table location properties Hivemigrator maps to the target location during migration.
By default, no additional location mapping is applied.
Uncomment the existing properties and supply comma-separated table property name values to adjust the default location mapping.
For example, to map the location value of a SerDe location property of 'path', add the table property as a value for hivemigrator.locationmapping.keys.serde
and restart the Hivemigrator service for the change to take effect.
Mapping applies to all tables with the specified property name values regardless of the property being a valid source location or otherwise.
Any source value, with a mapping applied to the property, will be mapped to the target filesystem replacing the first value after two forward slashes("//").
For example, a source value of todo://our_placeholder/data/
equates to a target value of <target_filesystem>/data
.
Additionally,
Any source value, with a mapping applied to the property, will be mapped to the target filesystem appending the target filesystem to the beginning of the value.
For example, a source value containing no forward slash("/") example_value
, equates to a target value of <target_filesystem>/example_value
.
Only adjust or uncomment these properties when directly instructed by Support.
Name | Details |
---|---|
hivemigrator.locationmapping.keys.serde | Map SerDe table location properties values supplied to this property to the target location during migration. Default value: none |
hivemigrator.locationmapping.keys.tableproperties | Map table location properties supplied to this property to the target location during migration. Default value: none |
Iceberg agent historical metadata retention
The following properties allow the adjustment of Iceberg agent behaviour.
Name | Details |
---|---|
iceberg.write.metadata.previous-versions-max | Limits the number of previous metadata versions to retain on Iceberg agents. Default value: 200 |
Don't adjust this value unless explicitly directed. See the current recommended value for historical metadata retention.
Secrets store properties
HashiCorp Vault
Add the following properties to integrate with HashiCorp Vault. See the HashiCorp Vault configuration section for steps and examples for each authentication type.
The following properties cannot be referenced using application property references,
if you are integrating with a HashiCorp Vault server ensure these properties and values are used in your bootstrap.properties
file.
Name | Details |
---|---|
spring.cloud.vault.enabled | Determines whether the Vault integration is enabled, accepted vaules: true or false Default value: none |
spring.cloud.vault.uri | Specifies the URI (including protocol, host, and port) of the HashiCorp Vault server. For example: http://127.0.0.1:8200 or https://127.0.0.1:8222 Default value: none |
spring.cloud.vault.authentication | Specifies the authentication method for Data Migrator to use when connecting to HashiCorp Vault. Use either TOKEN or APPROLE . Default value: none |
spring.cloud.vault.token | Specifies the authentication token that will be used to authenticate with HashiCorp Vault. Default value: none |
spring.config.import | Specifies comma-seperated vault location sources of key-value secrets used for application properties. See the reference format. Default value: none |
spring.cloud.vault.app-role.role-id | Specifies the role ID for AppRole authentication method when connecting to HashiCorp Vault. Default value: none |
spring.cloud.vault.app-role.secret-id | Specifies the secret ID for AppRole authentication method when connecting to HashiCorp Vault. The secret ID, along with the role ID, is used to authenticate with the Vault when APPROLE authentication is used. Default value: none |
When disabling HashiCorp Vault integration, set spring.cloud.vault.enabled
equal to false
, ensure no references are in use including any used with the spring.config.import
property.
Comment out or remove any reference values from the spring.config.import
property and restart the Hive Migrator service for changes to take effect.
Databricks
Databricks concurrent thread configuration
Add the following properties to /etc/wandisco/hivemigrator/application.properties
to optimize migration performance by controlling concurrency and thread utilization.
After adding new properties or changing existing values, restart the Hive Migrator service for changes to take effect.
See Databricks concurrent thread configuration for more information.
Name | Details |
---|---|
hivemigrator.databricks.threadcount | Specifies the total number of threads to allocate across all Databricks migrations. Default value: 10 |
hivemigrator.databricks.tablethreadcount | Specifies the number of threads to allocate per Databricks migration. Default value: 10 |
Hive Migrator logging
You can find configuration for the Hive Migrator log in /etc/wandisco/hivemigrator/log4j2.yaml
.
Hive Migrator debug logging
Below shows the steps to enable debug level logging for the application and additional logging for the Java virtual machine (JVM).
Enable application debug logging
Use the following steps to enable debug level logging for Hive Migrator:
- Open
/etc/wandisco/hivemigrator/log4j2.yaml
- Edit the
log.level
property value frominfo
todebug
.
- name: log.level
value: "debug"
- Save the changes.
- Restart the Hive Migrator service. See System service commands - Hive Migrator
After you've completed your investigation, reverse the changes. The debug mode generates massive log files that may exhaust your available storage.
Enable Java connection debug logging
Use the following steps to investigate authentication/authorization problems when using a HDFS source or target:
- Open
/etc/wandisco/hivemigrator/vars.sh
. - Add the following JVM argument:
HVM_EXTRA_JVM_ARGS="-Dsun.security.krb5.debug=true -Dsun.security.jgss.debug=true -Dsun.security.spnego.debug=true"
- Add the following log location parameter:
LOG_OUT_FILE=/var/log/wandisco/hivemigrator/hivemigrator.out
- Save the changes.
- Restart the Hive Migrator service. See System service commands - Hive Migrator
After you've completed your investigation, reverse the changes. The debug mode generates massive log files that may exhaust your available storage.
Learn more about adding additional JVM arguments in the knowledge base.
Change Hive Migrator log and heap dump location
The default path for Hive Migrator logs is /var/log/wandisco/hivemigrator
.
You can change the default path by editing the /etc/wandisco/hivemigrator/log4j2.yaml
file.
Before making changes, ensure the path exists and is writable by Hive Migrator.
- Open
/etc/wandisco/hivemigrator/log4j2.yaml
- Edit the
log.dir
property value with your new path.
- name: log.dir
value: "/var/log/newlocation/hivemigrator"
- Save the changes.
- Restart the Hive Migrator service. See System service commands - Hive Migrator
Changes to the default log location won't persist after upgrade. After upgrading to a new version, you will have to reapply these changes.
Change Hive Migrator audit log location
The default path for Hive Migrator logs is /var/log/wandisco/audit/hivemigrator
.
You can change the default path by editing the /etc/wandisco/hivemigrator/log4j2.yaml
file.
Before making changes, ensure the path exists and is writable by Hive Migrator.
- Open
/etc/wandisco/hivemigrator/log4j2.yaml
- Edit the
audit.dir
property value with your new path.
- name: audit.dir
value: "/var/log/wandisco/audit/hivemigrator"
- Save the changes.
- Restart the Hive Migrator service. See System service commands - Hive Migrator
Changes to the default log location won't persist after upgrade. After upgrading to a new version, you will have to reapply these changes.
Directory structure
The following directories are used for Hive Migrator:
Location | Content |
---|---|
/var/log/wandisco/hivemigrator | Logs |
/etc/wandisco/hivemigrator | Configuration files |
/opt/wandisco/hivemigrator | Java archive files |
/var/run/hivemigrator | Runtime files |
Remote servers
The following directories are used for Hive Migrator remote servers (remote agents):
Location | Content |
---|---|
/var/log/wandisco/hivemigrator-remote-server | Logs |
/etc/wandisco/hivemigrator-remote-server | Configuration files |
/opt/wandisco/hivemigrator-remote-server | Java archive files |
/var/run/hivemigrator-remote-server | Runtime files |