Configure Data Migrator
Find details here for the configuration properties of Data Migrator. Properties are defined in the following files:
/etc/wandisco/livedata-migrator/application.properties
/etc/wandisco/livedata-migrator/logback-spring.xml
/etc/wandisco/livedata-migrator/vars.env
/etc/wandisco/ui/vars.env
After adding new properties or changing existing values, always restart the Data Migrator service.
For information about UI logging, see Logging.
Application properties
General configuration
These configuration properties are used to adjust general items of operation.
Name | Details |
---|---|
spring.jackson.serialization.INDENT_OUTPUT | Whether to apply indentation to JSON output from command results Default value: true Allowed values: true , false |
springdoc.swagger-ui.path | The path at which clients can access the Swagger documentation for the Data Migrator REST API Default value: /ldm-api.html Allowed values: Any valid file path |
pull.threads | Threads allocated for concurrent file transfers Default value: 150 Allowed values: An integer value between 1 and 10000 |
engine.threads | The number of migrations that can run concurrently Default value: 1000 Allowed values: An integer value between 1 and 10000 Note: Increasing this value may have undesired impact on performance. Contact Support for more info. |
persisted.store | Reserved for future use Default value: true |
server.port | The TCP port used to listen for clients interacting with the REST API Default value: 18080 Allowed values: An integer value between 1024 and 65535 |
Data Migrator logging
Configure how Data Migrator logs requests made against the REST API.
Name | Details |
---|---|
logging.level.org.zalando.logbook | The logging level to apply to HTTP activity against the REST API of Data Migrator. This must be set to TRACE to record any log information.Default value: TRACE Allowed values: TRACE , NONE |
logbook.format.style | The logging style applied to HTTP activity records Default value: http Allowed values: http , curl |
logbook.write.max-body-size | The maximum number of bytes from an HTTP request or response body to include in a log entry Default value: 1024 Allowed values: Any integer between 1 and 2147483647 |
logbook.exclude | A comma-separated list of patterns that match URIs for which HTTP activity should not be logged Default value: /v3/api-docs/*,/swagger-ui/,/stats/,/diagnostics/,/notifications/*,/openapi.yaml Allowed values: Any valid comma-separated list of patterns |
logbook.obfuscate.parameters | A comma-separated list of HTTP parameters that should not be recorded in log entries, for example: access_token,password Default value: (none) Allowed values: Any valid comma-separated list of HTTP parameter names |
logbook.obfuscate.headers | A comma-separated list of HTTP headers that should not be recorded in log entries, for example: authorization,x-auth-password,x-auth-token,X-Secret Default value: (none) Allowed values: Any valid comma-separated list of HTTP headers |
obfuscate.json.properties | A comma-separated list of JSON request properties by name that should not be recorded in log entries, for example: foo,bar Default value: ${hdfs.fs.type.masked.properties},${adls2.fs.type.masked.properties},${s3a.fs.type.masked.properties},${gcs.fs.type.masked.properties} Allowed values: Any valid comma-separated list of property names |
Set how many days to keep the Data Migrator log file
- Open
/etc/wandisco/livedata-migrator/logback-spring.xml
. - Find the appender block for the
livedata-migrator.log
file which contains<file>${LOGS}/livedata-migrator.log</file>
. - Update the value of property
<maxHistory>90</maxHistory>
to your preferred number of days. - Save the change.
- Restart Data Migrator.
Enable Data Migrator debugging mode
Open
/etc/wandisco/livedata-migrator/logback-spring.xml
.Update the INFO level section:
<!-- LOG everything at INFO level -->
<root level="info">
<appender-ref ref="RollingFile"/>
<!-- <appender-ref ref="Console" /> -->
</root>
</configuration>Change
<root level=info>
to<root level=debug>
.Save the change.
After you've completed your investigation, reverse the changes. The debug mode generates massive log files that may exhaust your available storage.
Change Data Migrator log location
The default path for Data Migrator logs, heap dumps, and garbage collection(GC) logs is /var/log/wandisco/livedata-migrator
.
You can change the default path by editing the /opt/wandisco/livedata-migrator/start.sh
script.
Before making changes, ensure the path exists and is writable by Data Migrator.
- Open
/opt/wandisco/livedata-migrator/start.sh
- Update the value of
-Dlog.dir
,-XX:HeapDumpPath
and-Xloggc
to change the Data Migrator logs, heap dumps, and GC log default paths. - Save the change.
- Restart Data Migrator.
Changes to the default log location won't persist after upgrade. After upgrading to a new version, you will have to reapply these changes.
Thread dump
Add the following properties to the /etc/wandisco/livedata-migrator/application.properties
file to adjust the default thread dump period and number of dumps to retain.
Without the property values specified, the default values apply which results in 24 hours of thread dumps being available.
Name | Details |
---|---|
threaddump.period | Thread dump period in seconds. Default value: 3600 Allowed values: Integer |
threaddump.number.files | The number of thread dumps to retain. Default value: 24 Allowed values: Integer |
To retain 7 days of thread dumps, add both properties and adjust the threaddump.number.files
value to 168
.
Ensure your Data Migrator server has sufficient disk space available for additional thread dumps, contact Support for any questions or concerns before applying these properties.
State
Data Migrator uses an internally-managed database to record state during operation called the Prevayler.
Name | Details |
---|---|
prevayler.databaseLocation | The directory in which Data Migrator will write files to manage state Default value: ${install.dir}/db Allowed values: The full path to a directory in which database files will be managed. It must be writable by the user running Data Migrator (typically hdfs .) |
prevayler.persistent | Whether Data Migrator will persist state to disk in files Default value: true Allowed values: true , false |
prevayler.force | Whether the database performs a sync operation to ensure content is written to persistent storage on each write activity Default value: true Allowed values: true , false |
prevayler.bufferedJournal | Whether buffered journal I/O is used for the database Default value: true Allowed values: true , false |
prevayler.mirrored | Whether actions tracked in-memory by the database are mirrored to disk on every modification. The alternative is for operation to periodically flush to disk and flush on shutdown. Default value: true Allowed values: true , false |
Security
Secure access to the Data Migrator REST API through configuration. Choose between no security or HTTP basic security. To set up, see Configure basic auth.
Name | Details |
---|---|
security.type | The method of securing access to the REST API. Default value: off Allowed values: off , basic |
security.basic.user | Required when security.type=basic . The username that needs to be provided by a REST client to gain access to a secured REST API, for example: admin If basic authentication is enabled or will be enabled on the Hive Migrator REST API, use the same username for Data Migrator and Hive Migrator. Default value: admin Allowed values: Any string that defines a username (no whitespace) |
security.basic.password | Required when security.type=basic . A bcrypt-encrypted representation of the password that needs to be provided using HTTP basic authentication to access the REST API, for example: {bcrypt}$2a$10$mQXFoGAdLryWcZLjSP31Q.5cSgtoCPO3ernnsK4F6/gva8lyn1qgu The {bcrypt} prefix must be included before the encrypted password string as shown in the example above.Default value: {bcrypt}exampleEncryptedValue Allowed values: A valid bcrypt-encrypted string |
If you enable basic authentication on Data Migrator, ensure you update the UI with the credentials to maintain functionality.
Connect to Data Migrator with basic authentication
When basic authentication is enabled, enter the username and password when prompted to connect to Data Migrator with the CLI:
connect livemigrator localhost: trying to connect...
Username: admin
Password: ********
connected
The username and password will also be required when accessing the Data Migrator REST API directly. See more about basic auth
Transport Layer Security
To enable Transport Layer Security (TLS) on the Data Migrator REST API (HTTPS), modify the following server.ssl.*
properties.
To update UI with details for TLS, remove the product instance and then add it again with the updated TLS connection details.
Data Migrator comes preconfigured with an instance running on localhost. To remove the preconfigured instance see the Remove Data Migrators steps.
If HTTPS is enabled on the REST API, plain HTTP requests from the CLI to the REST API will fail.
Bad Request
This combination of host and port requires TLS.
Name | Details |
---|---|
server.ssl.key-store | Path or classpath to the Java KeyStore. Default value: (none) Allowed values: File system path or classpath (example: /path/to/keystore.p12 , classpath:keystore.p12 ) |
server.ssl.key-store-password | The Java KeyStore password. Default value: (none) Allowed values: Any text string |
server.ssl.key-store-type | The Java KeyStore type. Default value: (none) Allowed values: KeyStore types |
server.ssl.key-alias | The alias for the server certificate entry. Default value: (none) Allowed values: Any text string. |
server.ssl.ciphers | The ciphers suite enforce the security by deactivating some old and deprecated SSL ciphers, this list was tested against SSL Labs. Default value: (none but list provided below) TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_DHE_RSA_WITH_AES_128_GCM_SHA256,TLS_DHE_RSA_WITH_AES_256_GCM_SHA384 ,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384,TLS_DHE_RSA_WITH_AES_128_CBC_SHA256,TLS_DHE_RSA_WITH_AES_128_CBC_SHA,TLS_DHE_RSA_WITH_AES_256_CBC_SHA256,TLS_DHE_RSA_WITH_AES_256_CBC_SHA,TLS_RSA_WITH_AES_128_GCM_SHA256,TLS_RSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_AES_128_CBC_SHA256,TLS_RSA_WITH_AES_256_CBC_SHA256,TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_DHE_RSA_WITH_CAMELLIA_256_CBC_SHA,TLS_RSA_WITH_CAMELLIA_256_CBC_SHA,TLS_DHE_RSA_WITH_CAMELLIA_128_CBC_SHA,TLS_RSA_WITH_CAMELLIA_128_CBC_SHA |
The example command below will generate a server certificate and place it inside a new Java KeyStore named keystore.p12
:
keytool -genkey -alias livedata-migrator -storetype PKCS12 -keyalg RSA -keysize 2048 -keystore keystore.p12 -validity 365
See the keytool documentation for further information on the parameters used.
Filesystem defaults
Each filesystem supported by Data Migrator can apply properties defined using the --properties
or --properties-files
parameters to the various filesystem add
commands. You can set default properties that will apply to each type of filesystem at time of creation with these configuration items.
Name | Details |
---|---|
adls2.fs.type.default.properties | A comma-separated list of default properties to apply to ADLS Gen 2 filesystem resources on creation. Default value: fs.scheme,fs.account.name,fs.container.name,fs.auth.type,fs.oauth2.client.id,fs.insecure Allowed values: Any comma-separated list of valid ADLS Gen 2 configuration properties |
hdfs.fs.type.default.properties | A comma-separated list of default properties to apply to ADLS Gen 1 filesystem resources on creation. Default value: fs.defaultFS Allowed values: Any comma-separated list of valid HDFS configuration properties |
s3a.fs.type.default.properties | A comma-separated list of default properties to apply to S3A filesystem resources on creation. Default value: fs.defaultFS Allowed values: Any comma-separated list of valid S3A configuration properties |
gcs.fs.type.default.properties | A comma-separated list of default properties to apply to GCS resources on creation. Default value: bucket.name Allowed values: Any comma-separated list of valid GCS configuration properties |
local.fs.type.default.properties | A comma-separated list of default properties to apply to local filesystem resources on creation. Default value: fs.root Allowed values: Any comma-separated list of valid configuration properties |
Hadoop Distributed File System inotify
Data Migrator will poll the Hadoop cluster for NameNode events using the Hadoop Distributed File System (HDFS) inotify system. These properties can be added and configured to change the default poll periods.
Name | Details |
---|---|
hdfs.inotify.poll.period | The length of time in milliseconds between each event listener poll. Default value: 10 Allowed values: An integer value |
hdfs.inotify.sleep.period | The length of time in milliseconds for delaying the event listener poll after 10 consecutive retry failures. Default value: 10 Allowed values: An integer value |
HDFS marker storage
Data Migrator uses marker files to manage the migration of files on paths. By default, these are stored in the HDFS user's home directory if possible. If this is not possible, they will be stored in the root directory of the migration on the source filesystem. To configure another directory to store marker files in, alter the following property:
Name | Details |
---|---|
hdfs.fs.marker.dir | The directory in which marker files are stored. Default value: (user home directory if not set) Allowed values: The full path to a directory on the source filesystem in which Data Migrator's marker files, used to store migration information, will be stored and managed. It must be writable by the user running Data Migrator (typically hdfs .) |
ADLS Gen2 target metadata handling properties
The following properties control ACL, permission and owner metadata operations for ADLS Gen2 targets. If not specified default values are used. See Access control model in Azure Data Lake Storage Gen2 for more information on Azure access control.
Property | Default | Description |
---|---|---|
adls2.fs.metadata.acl.ignore | false | When set to true, Data Migrator will not attempt to perform any setAcls operation against an ADLS Gen2 target. |
adls2.fs.metadata.perms.ignore | false | When set to true, Data Migrator will not attempt to perform any setPermission operation against an ADLS Gen2 target. This will also affect the health check for the target file system, which will not present an error if the principal under which Data Migrator operates is unable to perform setPermission operations, and will also allow a migration to start that would be prevented from doing so by an inability to perform setPermission operations. |
adls2.fs.metadata.owner.ignore | false | When set to true, Data Migrator will not attempt to perform any setOwner operations against an ADLS Gen2 target. |
Proxy Auto-Config (PAC) file support
Data Migrator allows the use of a PAC file so that traffic can be routed through HTTP proxies (examples of PAC files).
Name | Details |
---|---|
lm.proxy.pac | Path to the PAC file on the local filesystem. Default value: (none) Allowed values: A path that includes the file URI prefix (example: file:///tmp/proxy.pac ). |
PAC files for Data Migrator must contain an explicit clause that will return "DIRECT"
for "localhost"
.
function FindProxyForURL(url, host) {
if (dnsDomainIs(host, "localhost"))
return "DIRECT";
}
Notification properties
Adjust notification properties in the application.properties
file:
/etc/wandisco/livedata-migrator/application.properties
Name | Details |
---|---|
notifications.pending.region.warn.percent | The percentage threshold of events Data Migrator has fallen behind that will trigger a warning. A migration that exceeds this threshold will trigger the Data migration is falling behind system events notification. Default value: 90 |
notifications.pending.region.clear.percent | The percentage threshold of events Data Migrator has fallen behind that will clear the Data migration is falling behind system events notification. A migration that exceeds the notifications.pending.region.warn.percent threshold and then falls below this value will automatically clear its Data migration is falling behind system events notification. Default: 80 |
fs.health.check.initialDelay= fs.health.check.interval= | Filesystem health checks. The frequency at which the system checks for an unhealthy filesystem in milliseconds. Default value: 60000 |
Verifications
The following application properties are optional. They aren't included in the application properties file by default. If you want to use them, add them to the file manually.
Name | Details |
---|---|
verify.runner.threads | This property limits the number of threads used for verifications. Each verification runs on its own thread. Verifications that exceed this number are queued until threads become available. Default value: 10 |
verify.lister.threads | This property controls how many threads are available for making calls during verifications. These calls list the contents of directories on the source and target filesystems and are parallelized in separate threads. A good value for this is twice the number of verify.runner.threads as each runner thread always has two lister threads available. Default value: 20 |
verifications.location | Add this property with a path value to change the default location of verification reports from /opt/wandisco/livedata-migrator/db/verifications to the submitted value. Ensure the Data Migrator user has permissions to write to this location. |
gRPC maximum message size
Name | Details |
---|---|
dataagent.grpc.diagnostics.max.message.size.kb | The maximum message size in kilobytes that Data Migrator can send or receive for diagnostic gRPC messages. Don't set the value to less than the default. Default value: 4096 |
dataagent.grpc.transfer.max.message.size.kb | The maximum message size in kilobytes that Data Migrator can send or receive for gRPC data transfer messages. Don't set the value to less than the default. Default value: 20 |
SSL implementation
The SSL implementation is specified with the hcfs.ssl.channel.mode
property.
Supported values are: default_jsse
, default_jsse_with_gcm
, default
, and openssl
.
default_jsse
uses the Java Secure Socket Extension package (JSSE). However, when running on Java 8, the Galois/Counter Mode (GCM) cipher is removed from the list of enabled ciphers. This is due to performance issues with GCM in Java 8.
default_jsse_with_gcm
uses the JSSE with the default list of cipher suites.
default
attempts to use OpenSSL rather than the JSSE for SSL encryption, if OpenSSL libraries cannot be loaded, it falls back to the default_jsse
behavior.
openssl
attempts to use OpenSSL, but fails if OpenSSL libraries cannot be loaded.
Name | Details |
---|---|
hcfs.ssl.channel.mode | Specifies the SSL implementation used to encrypt connections. Default value: default Allowed values: default_jsse , default_jsse_with_gcm , default , openssl |
As of Data Migrator 2.1.1 hcfs.ssl.channel.mode
replaces the use of fs.s3a.ssl.channel.mode
and fs.azure.ssl.channel.mode
which are no longer valid.
Talkback
Change the following properties to alter how the talkback API outputs data:
Name | Details |
---|---|
talkback.apidata.enabled | Enables an hourly process to log Data Migrator configuration from the output of REST endpoints. Default value: true |
talkback.apidata.diagnosticsCount | Specifies the maximum number of diagnostic records that the talkback process logs from the REST endpoint. Default value: 500 |
talkback.apidata.backupsCount | Specifies the maximum number of backups that the talkback process gets from the REST endpoint. Default value: 50 |
Data transfer manager memory allocation
Add the following property and integer value between 10 and 75 to limit memory usage percent while transferring data.
Name | Details |
---|---|
transfer.manager.heapBasedPercentage | Limits memory usage when transferring data to the percent value supplied. Default value: 50 |
Additional Java arguments
Data Migrator
/etc/wandisco/livedata-migrator/vars.env
Name | Details |
---|---|
JVM_MAX_MEM | Specifies the maximum size of the heap that can be used by the Java virtual machine (JVM). Use k, m, or g (case insensitive) for KiB, MiB, or GiB. |
JVM_MIN_MEM | Specifies the minimum size of the heap that can be used by the Java virtual machine (JVM). Use k, m, or g (case insensitive) for KiB, MiB, or GiB. |
JVM_GC | Specifies the type of garbage collector. |
JVM_G1HeapRegionSize | Specifies the heap region size. Minimum 1 maximum 32 Mb with values that are always a power of 2. |
JVM_GC_LOG_NAME | Specifies the GC log name. |
JVM_GC_LOG_SIZE | Specifies the max GC log size. |
JVM_GC_NUM_LOGS | Specifies the max number of GC logs. |
HTTP_MAX_CONNECTIONS | Specifies the maximum number of keep-alive connections that are maintained. The possible range is zero to 32768. |
LDM_EXTRA_JVM_ARGS | Add extra Java arguments (for example, LDM_EXTRA_JVM_ARGS="-Djava.security.properties=\"/etc/wandisco/jvm/jvm.override.security\""). |
UI
/etc/wandisco/ui/vars.env
Name | Details |
---|---|
JVM_MAX_MEM | Specifies the maximum size of the heap that can be used by the Java virtual machine (JVM). Use k, m, or g (case insensitive) for KiB, MiB, or GiB. |
JVM_MIN_MEM | Specifies the minimum size of the heap that can be used by the Java virtual machine (JVM). Use k, m, or g (case insensitive) for KiB, MiB, or GiB. |
LDUI_EXTRA_JVM_ARGS | Add extra Java arguments (for example, LDUI_EXTRA_JVM_ARGS="-Djava.security.properties=\"/etc/wandisco/jvm/jvm.override.security\""). |
Directory structure
The following directories are used for the Data Migrator core package:
Location | Content |
---|---|
/var/log/wandisco/livedata-migrator | Logs |
/etc/wandisco/livedata-migrator | Configuration files |
/opt/wandisco/livedata-migrator | Java archive files |
/opt/wandisco/livedata-migrator/db | Data Migrator runtime state |
/var/run/livedata-migrator | Runtime files |