Configure an S3 target
You can migrate data to a Simple Storage Service (S3) bucket by configuring one as a target filesystem.
Follow these steps to create an S3 target:
Prerequisites
You need the following:
- An S3 bucket.
- An access key and corresponding secret key for your S3 bucket.
- UI
- CLI
Configure an S3 target filesystem in the UI
From the Dashboard, select an instance under Instances.
In the Filesystems & Agents menu, select Filesystems.
Select Add target filesystem.
Enter the following details:
- Filesystem Type - The type of filesystem target. Select S3.
- Display Name - Enter a name for your target filesystem.
- Bucket Name - Enter the reference name of your S3 bucket.
- Access Key - Enter the access key for your bucket. For example,
RANDOMSTRINGACCESSKEY
. If you have configured a Vault for secrets storage, use a reference to the value stored in your secrets store. - Secret Key - Enter the secret key that corresponds with your access key. For example,
RANDOMSTRINGPASSWORD
. If you have configured a Vault for secrets storage, use a reference to the value stored in your secrets store. - S3 Properties - Add the s3 endpoint of your S3-compatible object storage. For example:Add additional optional properties to your S3 target as key-value pairs.Example custom endpoint
fs.s3a.endpoint = s3-region0.example-objectStore.com
Select Save. You can now use your S3 target in data migrations.
Configure an S3 target filesystem in the CLI
To create an S3 target in the Data Migrator CLI, run the filesystem add s3a
command:
filesystem add s3a [--file-system-id] string
[--bucket-name] string
[--endpoint] string
[--access-key] string
[--secret-key] string
[--sqs-queue] string
[--sqs-endpoint] string
[--credentials-provider] string
[--source]
[--scan-only]
[--properties-files] list
[--properties] string
[--s3type] string
[--bootstrap.servers] string
[--topic] string
S3 mandatory parameters
--file-system-id
The ID for the new filesystem resource. In the UI, this is called Display Name.--bucket-name
The name of your S3 bucket. In the UI, this is called Bucket Name.
S3 optional parameters
--access-key
When using theorg.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider
credentials provider, enter the access key with this parameter. In the UI, this is called Access Key.--secret-key
When using theorg.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider
credentials provider, enter the secret key using this parameter. In the UI, this is called Secret Key.--sqs-queue
Enter an SQS queue name.--sqs-endpoint
Enter an SQS endpoint.--properties-files
Reference a list of existing properties files, each containing Hadoop configuration properties in the format used bycore-site.xml
orhdfs-site.xml
.--properties
Enter properties to use in a comma-separated key/value list. In the UI, this is called S3A Properties. See the S3a properties section for more information).--s3type
Indicates an s3a compatibility filesystem type. You can set the parameter value toaws
or leave it blank.--credentials-provider
The Java class name of a credentials provider for authenticating with the S3 endpoint. In the UI, this is called Credentials Provider. IBM COS target filesystems default to a simple credentials provider.
Other parameters
These parameters are for S3 sources or other types of S3 targets. Exclude them when you create an S3 target.
--source
This parameter creates the filesystem as a source.--scan-only
This parameter creates a static source filesystem for one-time migrations. This parameter needs the--source
parameter.--success-file
This parameter uses a file name or glob pattern for files that Data Migrator will migrate last in their directory. For example,--success-file /mypath/myfile.txt
or--success-file /**_SUCCESS
. You can use these files to confirm the source directory they're in has finished migrating. This parameter only applies to source filesystems.--endpoint
(UI & IBM Cloud Object Storage only): Required when adding a Cloud Object Storage bucket that isn't provided by Amazon Web Services.
Example
filesystem add s3a --file-system-id mytarget
--bucket-name mybucket1
--credentials-provider org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider
--access-key pkExampleAccessKeyiz
--secret-key OP87ChoExampleSecretKeyI904lT7AaDBGJpp0D
S3a properties
For information on properties that are added by default for new S3A filesystems, see the Command reference s3a default properties.
For information on properties that you can customize for new S3A filesystems, see the Command reference s3a custom properties.
Upload buffering
Migrations using an S3 target destination will buffer all uploads. By default, the buffering will occur on the local disk of the system Data Migrator is running on, in the /tmp
directory.
Data Migrator will automatically delete the temporary buffering files once they are no longer needed.
If you want to use a different type of buffering, you can change the property fs.s3a.fast.upload.buffer
. The following values can be supplied:
Buffering Option | Details | Property Value |
---|---|---|
Array Buffer | Buffers the uploaded data in memory instead of on the disk, using the Java heap. | array |
Byte Buffer | Buffers the uploaded data in memory instead of on the disk, but does not use the Java heap. | bytebuffer |
Disk Buffering | The default option. This property buffers the upload to the disk. | disk |
Both the array
and bytebuffer
options may consume large amounts of memory. Other properties (such as fs.s3a.fast.upload.active.blocks
) may be used to fine-tune the migration to avoid issues.
If you run out of disk space on which to buffer the migration, the migration will stall with a series of errors. To avoid this, ensure the filesystem containing the directory used for buffering (/tmp
by default) has enough remaining space to facilitate the transfer.
Next steps
If you haven't already, configure a source filesystem from which to migrate data. Then, you can create a migration to migrate data to your new S3 target.