Configure data transfer agent properties
See below for details on configuring the following:
- Data transfer agent - application properties
- Security properties
- Data Migrator - application properties
- Secure communication
Data transfer agent - application properties
Property | Description | Default value |
---|---|---|
dataagent.port | The port on which the data transfer agent is running. | 1433 |
dataagent.callback-period-ms | The period in milliseconds for sending the callback from the data transfer agent to Data Migrator with the progress of the file transfer. | 1000 |
dataagent.fs-timeout-sec | The period in seconds after which an unused filesystem is eligible for removal from the filesystem cache on the data transfer agent side. The filesystem cache for reusing filesystem representation for file transfer. | 600 |
dataagent.cache-cleanup-period-sec | The period in seconds for running a cleaning cache job. If the job detects filesystem in cache isn't used during dataagent.fs-timeout-sec or more, the job deletes the filesystem from the cache. | 1800 |
dataagent.grpc-pool-max-threads | The maximum number of threads that can be used for the gRPC connection, or the maximum number of files that can be transferred at the same time by a given data transfer agent. | 150 |
dataagent.grpc-pool-keep-alive-time-sec | The period in seconds after which unused threads can be removed from the gRPC pool. The number of threads in the gRPC pool is managed automatically. We don't recommend changing it to a lower value (lower than 10) | 60 |
dataagent.grpc-callback-thread-count | The number of threads responsible for sending progress callbacks from the data transfer agent to the server. Can be increased if transferring a very large number of small files. | 5 |
dataagent.grpc-max-inbound-message-size-kb | The maximum message size in kilobytes that the data transfer agent can receive. Don't set the value to less than the default. | 4096 |
dataagent.thread-dump-dir | The directory contains thread dump files. | ${log.dir:./logs}/threads |
dataagent.thread-dump-period-sec | The frequency in seconds at which a new thread dump is created. We don't recommend reducing the value to below 60 seconds (one minute). | 3600 |
dataagent.thread-dump-number-files | The maximum number of thread dump files in the thread dump directory. If the limit is exceeded, old files are deleted. | 24 |
Security properties
Security properties are generated when you install data transfer agents. Don't change the property values.
Property | Description |
---|---|
dataagent.grpc.security.client-secret | The secret key of the installed data transfer agent. |
dataagent.grpc.security.keystore | Path to a Java keystore file with secrets. |
dataagent.grpc.security.keystore-password | Password for a Java keystore access. |
Data Migrator - application properties
The following application properties allow Data Migrator to communicate with the data transfer agents.
Property | Description | Default value |
---|---|---|
hdfs.fs.delegationtoken.renew.period.sec | The frequency in seconds that HDFS delegation tokens need to be renewed. This value should be less than dfs.namenode.delegation.token.renew-interval from the configuration of the HDFS source/target, and converted from milliseconds to seconds. Delegation tokens are used to submit file transfer tasks to the data transfer agent for Kerberos-enabled HDFS filesystems. | 3600 |
hdfs.fs.delegationtoken.refresh.factor | The value ranges from 0 to 1. This means the percentage of the remaining lifetime of the token, after which this token should be replaced with the new one. The default value is 0.85 . This means that after passing 15% of the overall lifetime of the delegation token, this token won't be used for new file transfers, and a new token will be issued. This is important for migrating extremely large files. If, for instance, the total lifetime of the token is 7 days, and the transfer of each file takes 5 days, every subsequent file transfer should be started with a new token with enough remaining lifetime. We don't recommend setting this value any lower than 0.15. | 0.85 |
hdfs.fs.delegationtoken.cleanup.period.sec | The period in seconds to run delegation token cache cleaning. If the job detects the delegation token in cache isn't being used by any data transfer agents and the lifetime of the token exceeds hdfs.fs.delegationtoken.refresh.factor , the job deletes the delegation token from the cache. | 60 |
dataagent.healthcheck.timeout.sec | The period in seconds for a data transfer agent to respond before it is marked as unhealthy. | 60 |
dataagent.healthcheck.period.sec | The frequency in seconds that Data Migrator checks the health status of data transfer agents. | 60 |
dataagent.healthcheck.thread.count | The number of threads used to check the health status of data transfer agents. You can increase this value if you're using a lot of data transfer agents or if there are frequent connection issues between Data Migrator and the data transfer agent servers. | 5 |
dataagent.grpc.thread.count.max | The maximum number of threads used for the gRPC connection. Generally, this value should be the same as pull.threads . | 100 |
dataagent.grpc.thread.keepalive.time.sec | The period in seconds, after which unused threads can be removed from the gRPC pool. The number of threads in the gRPC pool is managed automatically. We don't recommend changing it to a lower value (lower than 10). | 60 |
dataagent.loadbalancer.timeout.sec | The maximum period in seconds for searching for the next available data transfer agent to submit a file transfer. If Data Migrator can't find an active data transfer agent after the specified period, an exception is thrown. We don't recommend setting the value to lower than 5. | 60 |
dataagent.transfer.attempts.max | The number of attempts sending a file transfer task to the next data transfer agents when a data transfer fails. This doesn't affect the migration.file.max.retries property. It means that if customer has 10 data transfer agents, the property set migration.file.max.retries is set to 5 , and the property dataagent.transfer.attempts.max is set to 7 , then Data Migrator attempts to transfer the file using 7 data transfer agents every migration.file.max.retries attempt (5*7=35 times). This also depends on any exceptions. | 5 |
dataagent.stats.collect.period.sec | The frequency in seconds that Data Migrator gathers and updates statistical information about data transfer agents (for example, the number of bytes migrated, files migrated, and so on). | 5 |
Secure communication
S3 target filesystems
To enable secure communication between the data transfer agent and an S3 target filesystem type, you need to use delegation tokens. There are three types of token that give agents access to S3 buckets:
Session token
This token expires and you can’t renew it. It's useful for services that run for a short time.Role token
This token is available with a AWS account specific role for a short time.(Recommended) Full delegation token
This token contains the AWS access and secret keys needed to access a bucket. It doesn't expire.
For more information about delegation tokens, see Apache Hadoop - Working with Delegation Tokens.