Configure exclusions
Define exclusions to exclude specific directories and files from being migrated. You can use the following exclusion types:
Exclusion type | Description |
---|---|
Regex | Enter regular expression (regex) patterns for file and directory names (of either Java PCRE, Automata, or GLOB type). Filepaths that match the regex are excluded. |
File size | Enter a number and select the file size unit (bytes, GiB, and so on). Files larger than the value are excluded. |
Date | Select a date and time (Coordinated Universal Time (UTC)) after which file changes should be migrated. Files modified on or before the date and time are excluded. |
Default exclusions
Data Migrator automatically applies default exclusions to specific filesystems depending on the platform. For example, Azure Data Lake Storage (ADLS) Gen2 filesystem types have a maximum individual file size limit of 4.55TiB, and any files larger are automatically excluded.
You can remove default exclusions from the migration, but not from the system or the exclusion templates list.
Hadoop Distributed File System
The default exclusions are:
Exclusion | Exclusion type | Description |
---|---|---|
.\\._COPYING_$ | Regex (JAVA_PCRE) | Hadoop Distributed File System (HDFS) copying files |
/**/.hive-staging** | Regex (GLOB) | Hive staging directories |
/**/.spark-staging-** | Regex (GLOB) | Spark staging directories |
/**/_temporary** | Regex (GLOB) | Spark temporary directories |
(/|/.*/)\\.Trash(/.*)? | Regex (Automata) | HDFS trash directories |
(/|/.*/)\\.snapshot(/.*)? | Regex (Automata) | HDFS Snapshot directories |
The Hive or Spark directories are used to stage temporary files during Hive or Spark jobs. These are automatically deleted by Hive or Spark after use, and are excluded by default to avoid the migration of unnecessary data.
The HDFS Snapshot and trash directories are (generally) only relevant to the local cluster and excluded to avoid migration of unnecessary data.
Azure Data Lake Storage (ADLS) Gen2
The default exclusions are:
Exclusion | Exclusion type | Description |
---|---|---|
[.|\\/]$ | Regex (JAVA_PCRE) | File names cannot end with . or ' ' |
.*([^\\/]*\\/){61,}.* | Regex (Automata) | Blob names cannot exceed 61 path segments |
.{1025,} | Regex (JAVA_PCRE) | File name length cannot exceed 1024 |
.*[\\\\].* | Regex (JAVA_PCRE) | Filepath or name cannot include a backslash. |
5 TB | File size | File size cannot exceed 5TB |
These exclusions cover many of the limitations set by ADLS Gen2 directory and file naming rules.
Google Cloud Storage
The default exclusions are:
Exclusion | Exclusion type | Description |
---|---|---|
.*[\r\n].* | Regex (Automata) | File name cannot contain carriage return or line feeds |
.{1025,} | Regex (JAVA_PCRE) | File name length cannot exceed 1024 |
\\.\\.? | Regex (Automata) | File name cannot be named . or .. |
16 TB | File size | File size cannot exceed 16TB |
These exclusions cover the limitations set by Google Cloud object naming guidelines.
Configure exclusions
You can define additional exclusions which you can apply to specific migrations to ignore any matching content.
You can configure exclusions with the UI or the CLI.
The UI defaults to use JAVA_PCRE as the regex pattern type when you add an exclusion template. Use the CLI to add exclusion templates requiring a 'GLOB' or 'AUTOMATA' pattern type.
- UI
- CLI
Configure exclusions with the UI
Define and assign exclusions to new or existing migrations.
Add new exclusions
Adding exclusions to an existing migration changes the future actions performed for that migration, but doesn't affect previously migrated content. To add new exclusions:
In the Dashboard, select a product instance.
Under Migrations, select Exclusion Templates.
Select Add exclusion template to associate the exclusion with the selected filesystem and enter the parameters for the exclusion:
- Exclusion type - Regex, File Size, or Date.
- Name - The name given to the exclusion template. For example,
100gibfilelimit
. - Description - A brief description of what the exclusion is doing. For example, "Files larger than 100GiB are excluded".
- File Size = Value / Unit - The value and unit for the file size limit. For example,
100
GiB
. - Regex = Regex - The regex pattern to be used for the filename exclusion. For example,
^test\.*
. - Date = Select Date - Any files that have been modified before the specified date are excluded during migrations.
Select Add once you've entered all necessary information to create your exclusion template.
Under Migrations, select Data Migrations.
Under the Bulk Action dropdown list, select Add exclusions.
Select the migrations in the list to which you want to apply the exclusion.
noteYou can add multiple exclusions to multiple migrations at once with the UI. For more information, see Bulk add exclusions to migrations with the UI.
You can add and remove exclusions when you are creating migrations. For more information, see Assign exclusions to a new migration.
Select the exclusion you want to apply from the Add Exclusions dropdown list.
Select Submit to apply the exclusion.
When using exclusions with a recurring migration, the count of number of files excluded reported in the migration status show a cumulative total for each recurrence of the migration.
Remove exclusions from the templates list
Removing exclusions from an existing migration changes the future actions performed for that migration, but doesn't affect previously migrated content. To remove exclusions:
Go to the Exclusion Templates page and search for the exclusion you want to remove.
Select the trash can icon.
noteThis doesn't remove the exclusion from an existing migration. For more information, see Remove exclusions from an existing migration.
You can't remove default exclusions from the templates list, but you can remove them from an existing migration.
Configure exclusions with the CLI
Exclusions restrict content migrated from a source filesystem. Adding exclusions to an existing migration changes the future actions performed for that migration, but doesn't affect previously migrated content.
Define exclusions
Define exclusions so you can apply them to migrations.
Command | Action |
---|---|
exclusion add date | Create a new date-based rule |
exclusion add file-size | Create a new file size rule |
exclusion add regex | Create a new regex exclusion rule |
Manage exclusions
Command | Action |
---|---|
exclusion delete | Delete an exclusion rule |
exclusion list | List all exclusion rules |
exclusion show | Get details for a particular exclusion rule |
For more information on how to use these commands, see Exclusion commands.