Configure an Alibaba Cloud Object Storage Service target
You can migrate data to an Alibaba Cloud Object Storage Service (OSS) bucket by configuring one as a target filesystem.
Follow these steps to create an Alibaba Cloud OSS target:
Alibaba Cloud OSS buckets created with the hierarchical namespace option are not supported.
Prerequisites
You need the following:
An Alibaba Cloud OSS bucket.
Authentication details for your bucket. See below for more information.
- UI
- CLI
Configure an Alibaba Cloud OSS target filesystem in the UI
From the Dashboard, select an instance under Instances.
In the Filesystems & Agents menu, select Filesystems.
Select Add target filesystem.
Enter the following details:
- Filesystem Type - The type of filesystem target. Select Alibaba Object Storage.
- Display Name - Enter a name for your target filesystem.
- Access Key - Enter the Alibaba bucket access key. For example,
RANDOMSTRINGACCESSKEY
. - Secret Key - Enter the secret key that corresponds with your Access Key. For example,
RANDOMSTRINGPASSWORD
. - Bucket Name - The reference name of your Alibaba Cloud OSS bucket.
- Endpoint - The Alibaba Cloud OSS endpoint for your bucket.
- S3 Properties - Add optional properties to your target as key-value pairs.
Select Save. You can now use your Alibaba Cloud OSS target in data migrations.
Configure an Alibaba Cloud OSS target filesystem in the CLI
To create an Alibaba Cloud OSS target in the Data Migrator CLI, run the filesystem add s3a
command:
filesystem add s3a [--file-system-id] string
[--bucket-name] string
[--endpoint] string
[--access-key] string
[--secret-key] string
[--sqs-queue] string
[--sqs-endpoint] string
[--credentials-provider] string
[--source]
[--scan-only]
[--properties-files] list
[--properties] string
[--s3type] string
[--bootstrap.servers] string
[--topic] string
For guidance about access, permissions, and security when adding an Alibaba Cloud OSS bucket as a target filesystem, see Access and Control Overview.
Alibaba Cloud OSS mandatory parameters
--file-system-id
The ID for the new filesystem resource. In the UI, this is called Display Name.--bucket-name
The name of your Alibaba Cloud OSS bucket. In the UI, this is called Bucket Name.--access-key
The Alibaba Cloud OSS bucket access key. For example,RANDOMSTRINGACCESSKEY
.--secret-key
The secret key to use with your access key. For example,RANDOMSTRINGPASSWORD
.--endpoint
The endpoint for your Alibaba Cloud OSS bucket. Alibaba provides a list of available endpoints in their public documentation.
Alibaba Cloud OSS optional parameters
--properties
Enter properties to use in a comma-separated key/value list. In the UI, this is called S3A Properties. See the S3A properties section for more information).
Other parameters
These parameters are for S3 sources or other types of S3 targets. Exclude them when you create an Alibaba Object Storage target.
--properties-files
Reference a list of existing properties files, each containing Hadoop configuration properties in the format used bycore-site.xml
orhdfs-site.xml
.--sqs-queue
Enter an SQS queue name.--sqs-endpoint
Enter an SQS endpoint.--source
This parameter creates the filesystem as a source.--scan-only
This parameter creates a static source filesystem for one-time migrations. This parameter needs the--source
parameter.--success-file
This parameter uses a file name or glob pattern for files that Data Migrator will migrate last in their directory. For example,--success-file /mypath/myfile.txt
or--success-file /**_SUCCESS
. You can use these files to confirm the source directory they're in has finished migrating. This parameter only applies to source filesystems.
Example
filesystem add s3a --file-system-id alibaba-target1
--bucket-name container2
--access-key pkExampleAccessKeyiz
--secret-key c3vq6vaNtExampleSecretKeyVuqJMIHuV9IF3n9
--endpoint oss-us-west-1.aliyuncs.com
S3A properties
Enter additional properties for Alibaba Cloud OSS filesystems by adding them as key/value pairs in the UI or as a comma-separated key-value pair list with the --properties
parameter in the CLI. You can overwrite default property values or add new properties.
Default properties
These properties are defined by default when you add an Alibaba Cloud OSS filesystem. Overwrite them by specifying their keys with new values in key-value pairs.
fs.s3a.impl
(defaultorg.apache.hadoop.fs.s3a.S3AFileSystem
): The implementation class of the S3A filesystem.fs.AbstractFileSystem.s3a.impl
(defaultorg.apache.hadoop.fs.s3a.S3A
): The implementation class of the S3A abstract filesystem.fs.s3a.user.agent.prefix
(defaultAPN/1.0 WANdisco/1.0 LiveDataMigrator/1.11.6
): Sets a custom value that will be prepended to the user-agent header sent in HTTP requests to the S3 backend by the S3A filesystem.fs.s3a.impl.disable.cache
(defaulttrue
): Disables the S3 filesystem cache when set to 'true'.hadoop.tmp.dir
(defaulttmp
): The parent directory for other temporary directories.fs.s3a.connection.maximum
(default120
) Defines the maximum number of simultaneous connections to the S3 filesystem.fs.s3a.threads.max
(default150
): Defines the total number of threads made available for data uploads or any other queued filesystem operation.fs.s3a.max.total.tasks
(default60
): Defines the number of operations that can be queued for execution at a time.fs.s3a.healthcheck
(Defaulttrue
): Allows you to switch the health check off by changing the value fromtrue
tofalse
. This option is useful for setting up Data Migrator while cloud services are offline. If you disable the check, it's harder to diagnose any issues with the S3A configuration.
Additional properties
These additional properties are not defined by default. Add more properties by specifying their key-value pairs.
fs.s3a.fast.upload.buffer
(defaultdisk
): Defines how the filesystem buffers the upload.fs.s3a.fast.upload.active.blocks
(default8
): Defines how many blocks a single output stream can have uploading or queued at a given time.fs.s3a.block.size
(default32M
): Defines the maximum size of blocks during file transfer. Use the suffixK
,M
,G
,T
, orP
to scale the value in kilobytes, megabytes, gigabytes, terabytes, or petabytes, respectively.fs.s3a.buffer.dir
(defaulttmp
): Defines the directory used by disk buffering.
Find an additional list of S3A properties in the S3A documentation.
Upload buffering
Migrations using an S3 target destination buffer all uploads. By default, the buffering occurs on the local disk of the same system on which Data Migrator is running. Uploads are buffered to the /tmp
directory.
Data Migrator automatically deletes the temporary buffering files once they are no longer needed.
To use a different type of buffering, change the property fs.s3a.fast.upload.buffer
. Enter one of the following values:
Buffering option | Details | Property value |
---|---|---|
Array buffer | Buffers the uploaded data in memory instead of on the disk, using the Java heap. | array |
Byte buffer | Buffers the uploaded data in memory instead of on the disk, but doesn't use the Java heap. | bytebuffer |
Disk buffering | The default option. This property buffers the upload to the disk. | disk |
Both the array
and bytebuffer
options may consume large amounts of memory. To avoid using a lot of memory and fine-tune your migrations, use properties such as fs.s3a.fast.upload.active.blocks
.
If you run out of disk space on which to buffer the migration, the migration will stall with a series of errors. To avoid this, ensure the filesystem containing the directory used for buffering (/tmp
by default) has enough remaining space to facilitate the transfer.
Next steps
If you haven't already, configure a source filesystem from which to migrate data. Then, you can create a migration to migrate data to your new Alibaba Object Storage target.