Configure exclusions
Define exclusions to exclude specific directories and files from being migrated. You can use the following exclusion types:
Exclusion type | Description |
---|---|
Regex | Enter regular expression (regex) patterns for file and directory names (of either Java PCRE, Automata, or GLOB type). Filepaths that match the regex are excluded. |
File size | Enter a number and select the file size unit (bytes, GiB, and so on). Files larger than the value are excluded. |
Date | Select a date and time (Coordinated Universal Time (UTC)) Exclude files modified before a specified date. |
Age | Files less than this age(minutes, hours, days) at the time of scanning are excluded. |
Live migrations and Age exclusion are incompatible. You can't apply an Age exclusion to a live migration.
Age, Date and File Size exclusions apply to files only and not to folders.
Configure exclusions
Create Exclusion Templates to match patterns or conditions, then apply them to migrations.
You can configure exclusions with the UI or the CLI.
The UI defaults to use JAVA_PCRE as the regex pattern type when you add an exclusion template. Use the CLI to add exclusion templates requiring a 'GLOB' or 'AUTOMATA' pattern type.
- UI
- CLI
Configure exclusions with the UI
Define and assign exclusions to new or existing migrations.
Add new exclusion template
Adding exclusions to an existing migration changes the future actions performed for that migration, but doesn't affect previously migrated content. To add new exclusions:
- In the Dashboard, select a product instance.
- Under Migrations, select Exclusion Templates.
- Select Add exclusion template :
- Select Template type, choose either: Regex, File Size, Date, or Age.
- Enter a Name and Description.
- Enter the exclusion condition.
- For a Regex exclusion type.
- Select the Regex Type and Regex pattern. Files or directories that match the pattern will be excluded.
- For a File Size exclusion type.
- Enter a Value. Select a Unit. For example,
100
GiB
. Files larger than the stated value will be excluded.
- Enter a Value. Select a Unit. For example,
- For a Date exclusion type.
- Under Date and Time(UTC), choose a date and time. File paths modified before the date will be excluded.
- For an Age exclusion type.
- Enter a Value. Select a Unit. For example,
24
Hours
. Files modified in the last 24 hours will be excluded.
- Enter a Value. Select a Unit. For example,
- For a Regex exclusion type.
- Select Add once you've entered all necessary information to create your exclusion template.
With your Exclusion Template created, apply it to a migration or migrations.
When a file's modification date is before the date specified in a Date exclusion, Data Migrator will exclude it from migration. If a file is subsequently modified after that date (for example, a move or deletion.), its modification date will be after that exclusion date, and the exclusion won't apply.
Apply an Exclusion Template to a migration
In the Dashboard, select a product instance.
Under Migrations, select Data Migrations.
Under Data Migrations, select the migration you want to apply the exclusion to.
Select Exclusions.
Under Exclusions, choose the Exclusion Template to apply from the Add new exclusion field.
noteYou can add multiple exclusions to multiple migrations at once with the UI. For more information, see Bulk add exclusions to migrations with the UI.
You can add and remove exclusions when creating migrations. For more information, see Assign exclusions to a new migration.
Select Continue to apply the exclusion to this migration and update the list of applied Exclusion Templates.
When using exclusions with a recurring migration, the count of number of files excluded reported in the migration status show a cumulative total for each recurrence of the migration.
Remove exclusions from the templates list
Removing exclusions from an existing migration changes the future actions performed for that migration, but doesn't affect previously migrated content. To remove exclusions:
Go to the Exclusion Templates page and search for the exclusion you want to remove.
Select the trash can icon.
noteThis doesn't remove the exclusion from an existing migration. For more information, see Remove exclusions from an existing migration.
You can't remove default exclusions from the templates list, but you can remove them from an existing migration.
Configure exclusions with the CLI
Exclusions restrict content migrated from a source filesystem. Adding exclusions to an existing migration changes the future actions performed for that migration, but doesn't affect previously migrated content.
Define exclusions
Define exclusions so you can apply them to migrations.
Command | Action |
---|---|
exclusion add date | Create a new date-based rule |
exclusion add file-size | Create a new file size rule |
exclusion add regex | Create a new regex exclusion rule |
exclusion add age | Create a new age exclusion rule |
Manage exclusions
Command | Action |
---|---|
exclusion check regex | Check a regex pattern will match a given path |
exclusion delete | Delete an exclusion rule |
exclusion list | List all exclusion rules |
exclusion show | Get details for a particular exclusion rule |
exclusion source list | See restrictions applied automatically on the source file system |
exclusion target list | See restrictions applied automatically on the target file system |
exclusion user-defined list | list all user defined restrictions |
For more information on how to use these commands, see Exclusion commands.
Default exclusions
Data Migrator automatically applies default exclusions to specific filesystems depending on the platform. For example, Azure Data Lake Storage (ADLS) Gen2 filesystem types have a maximum individual file size limit of 4.55TiB, and any files larger are automatically excluded.
You can remove default exclusions from the migration, but not from the system or the exclusion templates list.
Hadoop Distributed File System
The default exclusions are:
Exclusion | Exclusion type | Description |
---|---|---|
.\\._COPYING_$ | Regex (JAVA_PCRE) | Hadoop Distributed File System (HDFS) copying files |
/**/.hive-staging** | Regex (GLOB) | Hive staging directories |
/**/.spark-staging-** | Regex (GLOB) | Spark staging directories |
/**/_temporary** | Regex (GLOB) | Spark temporary directories |
(/|/.*/)\\.Trash(/.*)? | Regex (Automata) | HDFS trash directories |
(/|/.*/)\\.snapshot(/.*)? | Regex (Automata) | HDFS Snapshot directories |
The Hive or Spark directories are used to stage temporary files during Hive or Spark jobs. These are automatically deleted by Hive or Spark after use, and are excluded by default to avoid the migration of unnecessary data.
The HDFS Snapshot and trash directories are (generally) only relevant to the local cluster and excluded to avoid migration of unnecessary data.
Azure Data Lake Storage (ADLS) Gen2
The default exclusions are:
Exclusion | Exclusion type | Description |
---|---|---|
[.|\\/]$ | Regex (JAVA_PCRE) | File names cannot end with . or ' ' |
.*([^\\/]*\\/){61,}.* | Regex (Automata) | Blob names cannot exceed 61 path segments |
.{1025,} | Regex (JAVA_PCRE) | File name length cannot exceed 1024 |
.*[\\\\].* | Regex (JAVA_PCRE) | Filepath or name cannot include a backslash. |
5 TB | File size | File size cannot exceed 5TB |
These exclusions cover many of the limitations set by ADLS Gen2 directory and file naming rules.
Google Cloud Storage
The default exclusions are:
Exclusion | Exclusion type | Description |
---|---|---|
.*[\r\n].* | Regex (Automata) | File name cannot contain carriage return or line feeds |
.{1025,} | Regex (JAVA_PCRE) | File name length cannot exceed 1024 |
\\.\\.? | Regex (Automata) | File name cannot be named . or .. |
16 TB | File size | File size cannot exceed 16TB |
These exclusions cover the limitations set by Google Cloud object naming guidelines.