Skip to main content
Version: 2.2

Configure an Amazon S3 source

You can migrate data to from Amazon Simple Storage Service (Amazon S3) by configuring one as a source filesystem.

Follow these steps to create an Amazon S3 bucket as a source using either the WANdisco® UI or CLI.

Prerequisites

You need the following:

  • An Amazon S3 bucket. See the Amazon S3 bucket documentation.

  • Authentication details for your bucket. See below for more information.

  • If you're configuring your own SQS queue in AWS for live replication with Data Migrator, the queue must be attached to the S3 bucket:

    Events required for live replication, enable the following event types:

    Object creation: Select All object create events or select individually Put, Post, Copy, Multipart upload completed.

    Object removal: Select All object removal events or select individually Permanently delete, Delete marker created.

note

When migrating data with Amazon S3 as a source, data contained in paths with two or more consecutive forward slashes can't be replicated.

caution

When using Amazon S3 as a source, do not include the SQS initialization path (sqs-init-path/) in any migration, this will cause an issue where Data Migrator will prevent subsequent migrations from progressing to a Live status.

Configure Amazon S3 as a source with the UI

  1. From the Dashboard, select a product under Products.

  2. In the Filesystems & Agents menu, select Filesystems.

  3. Select Add source filesystem

  4. Select Amazon S3 from the Filesystem Type dropdown list.

  5. Enter the following details:

    • Display Name - The name you want to give your source filesystem.

    • Bucket Name - The reference name of the Amazon S3 bucket you are using.

    • Authentication Method - The Java class name of a credentials provider for authenticating with the S3 endpoint.

      The Authentication Method options available include:

      • Access Key and Secret org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider

        Use this provider to enter credentials as an access key and secret access key with the following entries:

        • Access Key - Enter the AWS access key. For example, RANDOMSTRINGACCESSKEY.

        • Secret Key - Enter the secret key that corresponds with your access key. For example, RANDOMSTRINGPASSWORD.

      • AWS Identity and Access Management com.amazonaws.auth.InstanceProfileCredentialsProvider

        Use this provider if you're running Data Migrator on an EC2 instance that has been assigned an IAM role with policies that allow it to access the S3 bucket.

      • AWS Hierarchical Credential Chain com.amazonaws.auth.DefaultAWSCredentialsProviderChain

        A commonly used credentials provider chain that looks for credentials in this order:

        • Environment Variables - AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY, or AWS_ACCESS_KEY and AWS_SECRET_KEY.
        • Java System Properties - aws.accessKeyId and aws.secretKey.
        • Web Identity Token credentials from the environment or container.
        • Credential profiles file at the default location (~/.aws/credentials) shared by all AWS SDKs and the AWS CLI.
        • Credentials delivered through the Amazon EC2 container service if the AWS_CONTAINER_CREDENTIALS_RELATIVE_URI environment variable is set and security manager has permission to access the variable.
        • Instance profile credentials delivered through the Amazon EC2 metadata service.
      • Environment Variables com.amazonaws.auth.EnvironmentVariableCredentialsProvider

        Use this provider to enter an access key and a secret access key as either AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY, or AWS_ACCESS_KEY and AWS_SECRET_KEY.

      • EC2 Instance Metadata Credentials com.amazonaws.auth.InstanceProfileCredentialsProvider

        Use this provider if you need instance profile credentials delivered through the Amazon EC2 metadata service.

      • Profile Credentials Provider com.wandisco.livemigrator2.fs.ExtendedProfileCredentialsProvider

        Use this provider to enter a custom profile configured to access Amazon S3 storage. You can find AWS credential information in a local file named credentials in a folder named .aws in your home directory.

        Enter an AWS Named Profile and a Credentials File Path. For example, ~/.aws/credentials.

        For more information, see Using the AWS Credentials File and Credential Profiles.

      • Custom Provider Class

        Use this if you want to enter your own class for the credentials provider.

    • JCEKS Keystore

      This authentication method uses an access key and a secret key for Amazon S3 contained in a Java Cryptography Extension KeyStore (JCEKS). The keystore must contain key/value pairs for the access key fs.s3a.access.key and the secret key fs.s3a.secret.key.

      info

      You must configure HDFS as a target to be able to select JCEKS Keystore. The HDFS resource must exist on the same Data Migrator instance as the Amazon S3 filesystem you're adding. Due to this dependency, be aware of Backup and Restore limitations before performing a backup with this configuration.

      • JCEKS HDFS - Select the HDFS filesystem where your JCEKS file is located.

      • JCEKS Keystore Path - Enter the path containing the JCEKS keystore. For example, jceks://hdfs@nameservice01:8020/aws/credentials/s3.jceks.

        JCEKS on HDFS with Kerberos - You must add the dfs.namenode.kerberos.principal.pattern configuration property.

        Include the following steps when you add an HDFS source or target with Kerberos:

      1. Under Additional Configuration, select Configuration Property Overrides from the dropdown.

      2. Select + Add Key/Value Pair and add the key dfs.namenode.kerberos.principal.pattern and the value *.

      3. Select Save, then restart Data Migrator.

        note

        If you remove filesystems configured with JCEKS authentication, remove any Amazon S3 filesystems before you remove an HDFS source.

    • S3 Service Endpoint - The endpoint for the source AWS S3 bucket. See --endpoint in the S3A parameters.

    • ⁤Simple Queue Service (SQS) Endpoints (Optional)
      Data Migrator listens to the event queue to continually migrate changes from source file paths to target filesystem(s).
      If you add an S3 source, you have 3 options regarding the queue:

      • Add the source without a queue. Data Migrator creates a queue automatically.
        If you want Data Migrator to create its own queue, ensure your account has the necessary permissions to create and manage SQS queues and attach them to S3 buckets.

      • Add the source and enter a queue but no endpoint. This allows you to use a queue that exists in a public endpoint.
        If you define your own queue, the queue must be attached to the S3 bucket.
        For more information about adding queues to buckets, see the AWS documentation.

      • Add the source and enter a queue and a service endpoint. The endpoint can be a public or a private endpoint.
        For more information about public endpoints, see the Amazon SQS endpoints documentation.

        • Queue - Enter the name of your SQS queue. This field is mandatory if you enter an SQS endpoint.

        • Endpoint - Enter the URL that you want Data Migrator to use. Note if you're using a Virtual Private Network (VPC), you must enter an endpoint.

        note

        You can set an Amazon Simple Notification Service (Amazon SNS) topic as the delivery target of the S3 event.

        Ensure you enable raw message delivery when you subscribe the SQS queue to the SNS topic.

        For more information, see the Amazon SNS documentation.

        Migration events expire after 14 days

        Data Migrator uses SQS messages to track changes to an S3 source filesystem. The maximum retention time for SQS messages is 14 days, which means events are lost after that time and can't be read by a migration.

        If you haven't used Data Migrator or have paused your S3 migrations for 14 days, we recommend you reset your S3 migrations.

        Purge your SQS queue

        Your SQS queue starts to capture events as soon as it's created and live. After queue creation, it may capture irrelevant events up to the time you start your first migration. As Data Migrator will need to consume these events, we recommend you purge your SQS queue just prior to first use.

    • S3A Properties (Optional) - Override properties or enter additional properties by adding key/value pairs.

Filesystem Options

  • Live Migration - After existing data is moved, changes made to the source filesystem are migrated in real time using an SQS queue.

  • One-Time Migration - Existing data is moved to the target, after which the migration is complete and no further changes are migrated.

Next steps

Configure a target filesystem to migrate data to. Then create a migration.