Skip to main content
Version: 2.5.4

Configure an ADLS Gen2 source

Migrate data from an Azure Data Lake Storage Gen2(ADLS Gen2) file system by configuring it as your source file system for Data Migrator.

You can add your ADLS Gen2 file system as either a live or static source.
To add your ADLS Gen2 source as live, the change feed option must be enabled on the storage account. When adding a ADLS Gen2 source as a static source use the One-time Migration option in the UI or the --scan-only flag when adding with the CLI.
Migrations added to a live ADLS Gen 2 source support both Live and One-time Migration types.

Find all migration types and target file systems supported here.

ADLS Gen2 Live Migration

If adding an ADLS Gen2 as a live source, all live migrations will have Target Match enabled when created with this file system. See the following link for more information on Target Match.

Limitations for live ADLS Gen2 source

  • The change feed events can be delayed by approximately 1 minute before being visible to Data Migrator.
  • For a live ADLS Gen2 source, the change feed option must be enabled on the storage account.
  • Events are not strictly ordered in ADLS2, due to this, Data Migrator takes a cautious approach to directory renames. Directory renames sometimes cause rescans of the paths involved in the rename, this can lead to additional work for Data Migrator but is necessary to ensure a consistent source and target.
info
  • Contact Support for further information and any questions about live ADLS Gen2 as a source.
tip

There is a known limitation of the az storage fs file append az command. The current implementation of this command doesn't produce events in the file systems event stream. Use alternative commands or methods if you want to test the live replication functionality with Data Migrator.

Add ADLS Gen2 source with the UI

  1. From the Dashboard, select an instance under Instances.

  2. In the Filesystems & Agents menu, select Filesystems.

  3. Select Add source file system.

  4. Select Azure Data Lake Storage (ADLS) Gen2.

  5. Enter the following details:

    • Display Name - Enter a name for your file system.
    • Data Lake Storage Endpoint - The storage endpoint to connect to. You can override the default value (dfs.core.windows.net) by replacing it with a custom or private endpoint.
    • Authentication Type - The authentication type to use when connecting to your file system. Select either Shared Key or Service Principal (OAuth2).
  6. If you use Shared Key as the Authentication Type. Enter the following details:

    • Account Name - The Microsoft Azure account name that owns the data lake storage.
    • Access Key - The access key associated with the Microsoft Azure account.
    • Container Name - The ADLS Gen2 container you want to migrate data from.
  7. If you use Service Principal (OAuth2) as the Authentication Type. Enter the following details:

    • Account Name - The name of your ADLS Gen2 storage account.
    • Container Name - The ADLS Gen2 container you want to migrate data from.
    • Client ID - The client ID (also known as application ID) for your Azure service principal.
    • Secret - The client secret (also known as application secret) for the Azure service principal.
    • OAuth2 Endpoint - The client endpoint for the Azure service principal. Use the format https://login.microsoftonline.com/<tenant>/oauth2/v2.0/token where <tenant> is the directory ID for the Azure service principal.
  8. Select Use Secure Protocol to use TLS to connect to the Azure Data Lake Storage. Enabled by default.

  9. Under Filesystem Options, select either Live Migration or choose One-time Migration to limit migration types available to one-time and recurring. See migration types to learn more about each type..

  10. Select Save to add the file system.

Add ADLS Gen2 source with the CLI

tip

Use either the filesystem add adls2 oauth or filesystem add adls2 sharedKey command depending on your file system's authentication type.

Specify the --scan-only option to configure the source file system as non-live.

Oauth

Add a live ADLS Gen2 source file system using the filesystem add adls2 oauth CLI command, which requires a service principal and OAuth 2 credentials.

See the official Microsoft documentation to find out more about Oauth and Azure.

Live Oauth example
filesystem add adls2 oauth --file-system-id myLiveSource
--storage-account-name myadls2
--oauth2-client-id b67f67ex-ampl-e2eb-bd6d-client9385id
--oauth2-client-secret 2IPO8__Secret__-9OPs8n*TexampleHJ=
--oauth2-client-endpoint https://login.microsoftonline.com/something/oauth2/v2.0/token
--container-name myContainer
--source

Add a non-live ADLS Gen2 source file system. Use the --scan-only option to configure the file system as non-live. Live changes from the event stream are not replicated.

Non-live Oauth example
filesystem add adls2 oauth --file-system-id mySource
--storage-account-name myadls2
--oauth2-client-id b67f67ex-ampl-e2eb-bd6d-client9385id
--oauth2-client-secret 2I____Secret____n*TexampleHJ=
--oauth2-client-endpoint https://login.microsoftonline.com/something/oauth2/v2.0/token
--container-name myContainer
--scan-only
--source

See the command reference for all options when using the filesystem add adls2 oauth.

Shared key

Add a live ADLS Gen2 source file system using the filesystem add adls2 sharedKey CLI command which requires credentials in the form of an account key.

Live Shared key example
filesystem add adls2 sharedKey  --file-system-id myLiveSource
--storage-account-name myadls2
--container-name myContainer
--shared-key Yi8NxHGqoQ79DBGLVn+COK__EXAMPLE_SHARED__vaS/NbzR5rtjEKEY31eIopUV
--source

Add a non-live ADLS Gen2 source file system using the filesystem add adls2 sharedKey. Use the --scan-only option to configure the file system as non-live. Live changes from the event stream are not replicated.

Non-live Shared key example
filesystem add adls2 sharedKey  --file-system-id mySource
--storage-account-name myadls2
--container-name myContainer
--shared-key Yi8NxHGqoQ79DBGLVn+COK__EXAMPLE_SHARED__vaS/NbzR5rtjEKEY31eIopUV
--scan-only
--source

See the command reference for all options when using the filesystem add adls2 sharedKey.

Manage source filesystem

Remove or show the details of your source ADLS Gen2 file system using the CLI.

Additional information

Azure identity transformer superuser replacement

By default, when migrating from an ADLS Gen2 source, $superuser is replaced with the current Data Migrator user when it appears as the owner or owning group of a file or directory. This behavior is controlled by the fs.azure.identity.transformer.skip.superuser.replacement property, which defaults to false. To adjust this behavior and retain the $superuser ownership, set this property to true when adding your ADLS Gen2 source filesystem.

This property apples to $superuser ownership only.

Example setting property to true
filesystem add adls2 sharedKey --file-system-id adlsSource --storage-account-name StorageAC1 --container-name container1 --shared-key M2oSHAREDKEY2pNL== --source --scan-only --properties fs.azure.identity.transformer.skip.superuser.replacement=true

You can find the fs.azure.identity.transformer.skip.superuser.replacement property defined here in the official Hadoop documentation.

Next steps

Configure a target filesystem to migrate data to. Then create a migration.

tip

When you create a migration with this source, the "Skip if Size Match" option provides more efficient handling of rename operations, preventing redundant retransfers.