Skip to main content
Version: 3.0 (latest)

Configure exclusions

Define exclusions to exclude specific directories and files from being migrated. You can use the following exclusion types:

Exclusion typeDescription
RegexEnter regular expression (regex) patterns for file and directory names (of either Java PCRE, Automata, or GLOB type). Filepaths that match the regex are excluded.
File sizeEnter a number and select the file size unit (bytes, GiB, and so on). Files larger than the value are excluded.
DateSelect a date and time (Coordinated Universal Time (UTC)) Exclude files modified before a specified date.
AgeFiles less than this age(minutes, hours, days) at the time of scanning are excluded.
info

Live migrations and Age exclusion are incompatible. You can't apply an Age exclusion to a live migration.

note

Age, Date and File Size exclusions apply to files only and not to folders.

Configure exclusions

Create Exclusion Templates to match patterns or conditions, then apply them to migrations.

You can configure exclusions with the UI or the CLI.

info

The UI defaults to use JAVA_PCRE as the regex pattern type when you add an exclusion template. Use the CLI to add exclusion templates requiring a 'GLOB' or 'AUTOMATA' pattern type.

Configure exclusions with the UI

Define and assign exclusions to new or existing migrations.

Add new exclusion template

Adding exclusions to an existing migration changes the future actions performed for that migration, but doesn't affect previously migrated content. To add new exclusions:

  1. In the Dashboard, select a product instance.
  2. Select the Migrations menu to expand, select Exclusion Templates.
  3. Select Add exclusion template :
  4. Select Template type, choose either: Regex, File Size, Date, or Age.
  5. Enter a Name and Description.
  6. Enter the exclusion condition.
    • For a Regex exclusion type.
      • Select the Regex Type and Regex pattern. Files or directories that match the pattern will be excluded.
    • For a File Size exclusion type.
      • Enter a Value. Select a Unit. For example, 100 GiB. Files larger than the stated value will be excluded.
    • For a Date exclusion type.
      • Under Date and Time(UTC), choose a date and time. File paths modified before the date will be excluded.
    • For an Age exclusion type.
      • Enter a Value. Select a Unit. For example, 24 Hours. Files modified in the last 24 hours will be excluded.
  7. Select Add once you've entered all necessary information to create your exclusion template.

With your Exclusion Template created, apply it to a migration or migrations.

tip

When a file's modification date is before the date specified in a Date exclusion, Data Migrator will exclude it from migration. If a file is subsequently modified after that date (for example, a move or deletion.), its modification date will be after that exclusion date, and the exclusion won't apply.

Apply an Exclusion Template to a migration

  1. In the Dashboard, select a product instance.

  2. Select the Migrations menu to expand, select Data Migrations.

  3. Under Data Migrations, select the migration you want to apply the exclusion to.

  4. Select Exclusions.

  5. Under Exclusions, choose the Exclusion Template to apply from the Add new exclusion field.

    note

    You can add multiple exclusions to multiple migrations at once with the UI. For more information, see Bulk add exclusions to migrations with the UI.

    You can add and remove exclusions when creating migrations. For more information, see Assign exclusions to a new migration.

  6. Select Continue to apply the exclusion to this migration and update the list of applied Exclusion Templates.

tip

When using exclusions with a recurring migration, the count of number of files excluded reported in the migration status show a cumulative total for each recurrence of the migration.

Remove exclusions from the templates list

Removing exclusions from an existing migration changes the future actions performed for that migration, but doesn't affect previously migrated content. To remove exclusions:

  1. Go to the Exclusion Templates page and search for the exclusion you want to remove.

  2. Select the trash can icon.

    note

    This doesn't remove the exclusion from an existing migration. For more information, see Remove exclusions from an existing migration.

    You can't remove default exclusions from the templates list, but you can remove them from an existing migration.

Default exclusions

Data Migrator automatically applies default exclusions to specific filesystems depending on the platform. For example, Azure Data Lake Storage (ADLS) Gen2 filesystem types have a maximum individual file size limit of 4.55TiB, and any files larger are automatically excluded.

You can remove default exclusions from the migration, but not from the system or the exclusion templates list. To view your current exclusion templates list go to Migrations then select Exclusion Templates.

Global default exclusions

These global exclusions apply to all data migrations.

ExclusionExclusion typeDescription
/**/_tmp.delta_**Regex (GLOB)Hive Acid Temporary Files
/**/.hive-staging**Regex (GLOB)Hive Staging Content
/**/.spark-staging-**Regex (GLOB)Spark Staging Content
/**/_temporary**Regex (GLOB)Spark Temporary Files

Hadoop Distributed File System

The default exclusions are:

ExclusionExclusion typeDescription
.\\._COPYING_$Regex (JAVA_PCRE)Hadoop Distributed File System (HDFS) copying files
/**/.hive-staging**Regex (GLOB)Hive staging directories
/**/.spark-staging-**Regex (GLOB)Spark staging directories
/**/_temporary**Regex (GLOB)Spark temporary directories
(/|/.*/)\\.Trash(/.*)?Regex (Automata)HDFS trash directories
(/|/.*/)\\.snapshot(/.*)?Regex (Automata)HDFS Snapshot directories

The Hive or Spark directories are used to stage temporary files during Hive or Spark jobs. These are automatically deleted by Hive or Spark after use, and are excluded by default to avoid the migration of unnecessary data.

The HDFS Snapshot and trash directories are (generally) only relevant to the local cluster and excluded to avoid migration of unnecessary data.

Azure Data Lake Storage (ADLS) Gen2

The default exclusions are:

ExclusionExclusion typeDescription
[.|\\/]$Regex (JAVA_PCRE)File names cannot end with . or ' '
.*([^\\/]*\\/){61,}.*Regex (Automata)Blob names cannot exceed 61 path segments
.{1025,}Regex (JAVA_PCRE)File name length cannot exceed 1024
.*[\\\\].*Regex (JAVA_PCRE)Filepath or name cannot include a backslash.
5 TBFile sizeFile size cannot exceed 5TB

These exclusions cover many of the limitations set by ADLS Gen2 directory and file naming rules.

Google Cloud Storage

The default exclusions are:

ExclusionExclusion typeDescription
.*[\r\n].*Regex (Automata)File name cannot contain carriage return or line feeds
.{1025,}Regex (JAVA_PCRE)File name length cannot exceed 1024
\\.\\.?Regex (Automata)File name cannot be named . or ..
16 TBFile sizeFile size cannot exceed 16TB

These exclusions cover the limitations set by Google Cloud object naming guidelines.