Configure IBM Cloud Object Storage
IBM Cloud Object Storage is currently available as a preview feature and under development. If you use IBM Cloud Object Storage as a source filesystem with Data Migrator, and have feedback to share, contact us. The feature is automatically enabled. See Preview features.
Configure IBM Cloud Object Storage as a source with the UI
To configure an an IBM Cloud Object Storage bucket as a source filesystem, select IBM Cloud Object Storage in the Filesystem Type dropdown menu when configuring filesystems with the UI.
Enter the following details:
- Filesystem Type - The type of filesystem source. Choose IBM Cloud Object Storage.
- Display Name - A name for your IBM Cloud Object Storage filesystem.
- Access Key - The access key for your authentication credentials, associated with the fixed authentication credentials provider
org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider
.
Although IBM Cloud Object Storage can use other providers (for example, InstanceProfileCredentialsProvider, DefaultAWSCredentialsProviderChain), they're only available in the cloud, not for on-premises. As on-premises is currently the expected type of source, these other providers have not been tested and are not currently selectable.
- Secret Key - Enter the secret key using this parameter, used for the
org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider
credentials provider. - Bucket Name - The name of your Cloud Object Store bucket.
- Topic - The name of the Kafka topic to which the notifications will be sent.
- Endpoint - An endpoint for a Kafka broker, in a host/port format.
- Bootstrap Servers - A comma-separated list of host and port pairs that are addresses for Kafka brokers on a "bootstrap" Kafka cluster that Kafka clients use to bootstrap themselves.
- Port - The TCP port used for connection to the IBM Cloud Object Storage bucket. Default is 9092.
Migrations from IBM Cloud Object Storage use Amazon S3, along with its filesystem classes. The main difference between IBM Cloud Object Storage and Amazon S3 is in the messaging services: SQS queue for Amazon S3, Kafka for IBM Cloud Object Storage.
Configure IBM Cloud Object Storage as a source with the CLI
Creating an IBM Cloud Object Storage source through the CLI uses the same set of command that are used for Amazon S3. The following examples clarify how the commands are used:
Add source IBM Cloud Object Storage filesystem. Note that this does not work if SSL is used on the endpoint address.
filesystem add s3a --source --file-system-id cos_s3_source2
--bucket-name container2
--credentials-provider org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider
--access-key pkExampleAccessKeyiz
--secret-key c3vq6vaNtExampleSecretKeyVuqJMIHuV9IF3n9
--s3type ibmcos
--bootstrap.servers=10.0.0.123:9092
--topic newcos-events--enpoint http://10.0.0.124Add path mapping.
add path mapping
path mapping add --path-mapping-id testPath
--description tt
--source-path /
--target targetHdfs2
--target-path /repl_test1
{
"id": "testPath",
"description": "tt",
"sourceFileSystem": "cos_s3_source2",
"sourcePath": "/",
"targetFileSystem": "targetHdfs2",
"targetPath": "/repl_test1"
}
Adding file to container.
./mc cp ~/Downloads/wq4.pptx cos/container2/
Removing a file from a container.
~/Downloads/minio$ ./mc rm cos/container2/wq4.pptx
List objects in container.
./mc ls cos/container2/
Via S3a API.
aws s3api list-objects --endpoint-url=http://10.0.0.201
--bucket container2config mc.
nano ~/.mc/config.json
add there
"cos": {
"url": "https://s3-cos.wandisco.com",
"accessKey": "pkExampleAccessKeyiz",
"secretKey": "c3vq6vaNtExampleSecretKeyVuqJMIHuV9IF3n9",
"api": "S3v4",
"path": "auto"
}
Configure notifications for migrating the events stream
Migrating data from IBM Cloud Object Storage requires that filesystem events are fed into a Kafka-based notification service. Whenever an object is written, overwritten or deleted using the S3 protocol, a notification is created and stored in a Kafka topic - a message category under which Kafka publishes the notifications stream.
Configure Kafka notifications
Enter the following information into the IBM Cloud Object Storage Manager web interface.
- Select the Administration tab.
- In the Notification Service section, select Configure.
- On the Notification Service Configuration page, select Add Configuration.
- In the General section enter the following:
- Name: A name for the configuration, for example "IBM Cloud Object Storage Notifications"
- Topic: The name of the Kafka topic to which the notifications will be sent.
- Hostnames: List of Kafka node endpoints (host:port) format. Note that larger clusters may support multiple nodes.
- Type: Type of configuration.
[OPTIONAL] In the Authentication section, select Enable authentication and enter your Kafka username and password.
[OPTIONAL] In the Encryption section, select Enable TLS for Apache Kafka network connections.
- If the Kafka cluster is encrypted using a self-signed TLS certificate, paste the root CA key for your Kafka configuration in the Certificate PEM field.
Select Save.
- A message appears confirming that the notification was created successfully and the configuration is listed in the Notification Service Configurations table.
Select the name of the configuration (set in step 4) to assign vaults.
In the Assignments section, select Change.
In the Not Assigned tab, Select vaults and select Assign to Configuration. Filter available vaults by selecting or typing a name into the Vault field.
noteNotification configurations can't be assigned to container vaults, mirrored vaults, vault proxies, or vaults that are migrating data. Once a notification is assigned to configuration, an associated vault can't be used in a mirror, with a vault proxy, or for data migration.
Only new operations that occur after a vault is assigned to the configuration will trigger notifications.
Select update.
noteFor more information, see the Apache Kafka documentation.