Configure an ADLS Gen2 target
You can migrate data to an Azure Data Lake Storage (ADLS) Gen2 filesystem by configuring it as a target filesystem for Data Migrator.
You can authenticate to an ADLS Gen2 filesystem with OAuth 2.0 or a shared key. Select an option below to get started:
You can find more info on maximum ingress rates, capacity, request rates, as well as, scalability and performance targets for Azure Storage here.
The default configuration for ADLS gen2 storage allows a maximum file size of 400GB, see the following knowledge base article for steps and guidance to increase the maximum file size.
- OAuth 2.0
- Shared key
Configure an ADLS Gen2 target with OAuth 2.0
Prerequisites
You need the following:
- A service principal with either the Storage Blob Data Owner role assigned to the ADLS Gen2 storage account, or an access control list with RWX permissions for the parent of a migration path. The required Access Control list is RWX for the parent of a migration path. If you have many migration paths, the ACLs will need to be at the parents of each path, or on one common parent. For more information, see the Microsoft documentation.
- Your OAuth 2.0 credentials. See more information on credentials below.
- UI
- CLI
Configure an ADLS Gen2 target filesystem with OAuth 2.0 in the UI
From the Dashboard, select an instance under Instances.
In the Filesystems & Agents menu, select Filesystems.
Select Add target filesystem.
Enter the following details:
- Filesystem Type - The type of filesystem target. Select Azure Data Lake Storage (ADLS) Gen2.
- Display Name - Enter a name for your target filesystem.
- Data Lake Storage Endpoint - The storage endpoint to connect to. You can override the default value (dfs.core.windows.net) by replacing it with a custom or private endpoint.
- Authentication Type - Select Service Principal (OAuth2).
- Account Name - The name of your ADLS Gen2 storage account.
- Container Name - The name of the container in your storage account that you want to migrate data to.
- Client ID - The client ID (also known as application ID) for your Azure service principal.
- Secret - The client secret (also known as application secret) for the Azure service principal.
- OAuth2 Endpoint - The client endpoint for the Azure service principal. Use the format
https://login.microsoftonline.com/<tenant>/oauth2/v2.0/token
where<tenant>
is the directory ID for the Azure service principal. - Use Secure Protocol - When enabled, Data Migrator will use TLS to connect to the Azure Data Lake Storage. Enabled by default.
Select Save. You can now use your ADLS Gen2 target in data migrations.
Configure an ADLS Gen2 target filesystem with OAuth 2.0 in the CLI
To create an ADLS Gen2 target with OAuth 2.0 in the Data Migrator CLI, run the filesystem add adls2 oauth
command:
filesystem add adls2 oauth [--container-name] string
[--file-system-id] string
[--insecure]
[--oauth2-client-endpoint] string
[--oauth2-client-id] string
[--oauth2-client-secret] string
[--properties] string
[--properties-files] list
[--source]
[--storage-account-name] string
Mandatory parameters
--container-name
The name of the container in the storage account to which content will be migrated.--file-system-id
The ID to give the new filesystem resource.--oauth2-client-endpoint
The client endpoint for the Azure service principal. This often takes the form ofhttps://login.microsoftonline.com/<tenant>/oauth2/v2.0/token
where<tenant>
is the directory ID for the Azure service principal. You can enter a custom URL (such as a proxy endpoint that manually interfaces with Azure Active Directory (Azure AD)).--oauth2-client-id
The client ID (also known as application ID) for your Azure service principal.--oauth2-client-secret
The client secret (also known as application secret) for the Azure service principal.--storage-account-name
The name of the ADLS Gen2 storage account to target.
Optional parameters
--insecure
If you enter this parameter, Data Migrator will not use TLS to encrypt communication with ADLS Gen2. This may improve throughput, but should only be used when you have other means of securing communication.--properties
Enter properties to use in a comma-separated key/value list.--properties-files
Reference a list of existing properties files, each containing Hadoop configuration properties in the format used bycore-site.xml
orhdfs-site.xml
.
Other parameters
Exclude this parameter from the command unless you want to create an ADLS Gen2 source filesystem:
--source
This parameter creates the filesystem as a source.
Example
filesystem add adls2 oauth --file-system-id mytarget
--storage-account-name myadls2
--oauth2-client-id b67f67ex-ampl-e2eb-bd6d-client9385id
--oauth2-client-secret 2IPO8*secretk-9OPs8n*TexampleHJ=
--oauth2-client-endpoint https://login.microsoftonline.com/78u098ex-ampl-e498-8bce-ndpoint5f2e5/oauth2/v2.0/token
--container-name lm2target
Configure an ADLS Gen2 target with a shared key
Prerequisites
You need the following:
- An account key for your ADLS Gen2 storage.
- UI
- CLI
Configure an ADLS Gen2 target filesystem with a shared key in the UI
From the Dashboard, select an instance under Instances.
In the Filesystems & Agents menu, select Filesystems.
Select Add target filesystem.
Enter the following details:
- Filesystem Type - The type of filesystem target. Select Azure Data Lake Storage (HDFS) Gen2.
- Display Name - Enter a name for your target filesystem.
- Data Lake Storage Endpoint - The storage endpoint to connect to. You can override the default value (dfs.core.windows.net) by replacing it with a custom or private endpoint.
- Authentication Type - Select Shared Key.
- Account Name - The name of your ADLS Gen2 storage account.
- Access Key - The shared account key to use when writing data to the storage account during a migration.
- Container Name - The name of the container in your storage account that you want to migrate data to.
- Use Secure Protocol - When enabled, Data Migrator will use TLS to connect to the Azure Data Lake Storage. Enabled by default.
Select Save. You can now use your ADLS Gen2 target in data migrations.
Configure an ADLS Gen2 target filesystem with a shared key in the CLI
To create an ADLS Gen2 target with a shared key in the Data Migrator CLI, run the filesystem add adls2 sharedkey
command:
filesystem add adls2 sharedKey [--file-system-id] string
[--storage-account-name] string
[--container-name] string
[--insecure]
[--shared-key] string
[--properties-files] list
[--properties] string
Mandatory parameters
--file-system-id
The ID to give the new filesystem resource.--storage-account-name
The name of the ADLS Gen2 storage account to target.--shared-key
The shared account key to use as credentials to write to the storage account.--container-name
The name of the container in the storage account to which content will be migrated.
Optional parameters
--insecure
If you enter this parameter, Data Migrator will not use TLS to encrypt communication with ADLS Gen2. This may improve throughput, but should only be used when you have other means of securing communication.--properties-files
Reference a list of existing properties files, each containing Hadoop configuration properties in the format used bycore-site.xml
orhdfs-site.xml
.--properties
Enter properties to use in a comma-separated key/value list.
Other parameters
Exclude this parameter from the command unless you want to create an ADLS Gen2 source filesystem:
--source
This parameter creates the filesystem as a source.
Example
filesystem add adls2 sharedKey --file-system-id mytarget
--storage-account-name myadls2
--container-name lm2target
--shared-key Yi8NxHGqoQ79DBGLVn+COK/sRDwbNqAEXAMPLEDaMxRkvXt2ijUtASHAREDj/vaS/NbzR5rtjEKEY31eIopUVA==
Metadata handling properties
Add the following Data Migrator application properties to /etc/wandisco/livedata-migrator/application.properties
to control ACL, permission and owner metadata operations for ADLS Gen2 targets. If not specified the default values are used.
See Access control model in Azure Data Lake Storage Gen2 for more information on Azure access control.
Set these properties to true for filesystems that don't require transfer of source content ownership, ACL or permissions information, or where the authorization granted to the credentials you've used to access your storage does not allow these operations to be performed.
Property | Default | Description |
---|---|---|
adls2.fs.metadata.acl.ignore | false | When set to true, Data Migrator will not attempt to perform any setAcls operation against an ADLS Gen2 target. |
adls2.fs.metadata.perms.ignore | false | When set to true, Data Migrator will not attempt to perform any setPermission operation against an ADLS Gen2 target. This will also affect the health check for the target file system, which will not present an error if the principal under which Data Migrator operates is unable to perform setPermission operations, and will also allow a migration to start that would be prevented from doing so by an inability to perform setPermission operations. |
adls2.fs.metadata.owner.ignore | false | When set to true, Data Migrator will not attempt to perform any setOwner operations against an ADLS Gen2 target. |
Add ADLS Gen2 metadata handling properties
To add ADLS Gen2 target metadata handling properties to Data Migrator:
Open
/etc/wandisco/livedata-migrator/application.properties
.Add each property and value to a new line.
adls2.fs.metadata.acl.ignore=true
adls2.fs.metadata.perms.ignore=true
adls2.fs.metadata.owner.ignore=trueSave the changes.
Restart the Data Migrator service to apply the change. See System service commands - Data Migrator.
Next steps
If you haven't already, configure a source filesystem from which to migrate data. Then, you can create a migration to migrate data to your new ADLS Gen2 target.