Skip to main content
Version: 2.2

Configure exclusions

Define exclusions to exclude specific directories and files from being migrated. You can use the following exclusion types:

Exclusion typeDescription
RegexEnter regular expression (regex) patterns for file and directory names (of either Java PCRE, Automata, or GLOB type). Filepaths that match the regex are excluded.
File sizeEnter a number and select the file size unit (bytes, GiB, and so on). Files larger than the value are excluded.
DateSelect a date and time (Coordinated Universal Time (UTC)) after which file changes should be migrated. Files modified on or before the date and time are excluded.

Default exclusions

Data Migrator automatically applies default exclusions to specific filesystems depending on the platform. For example, Azure Data Lake Storage (ADLS) Gen2 filesystem types have a maximum individual file size limit of 4.55TiB, and any files larger are automatically excluded.

You can remove default exclusions from the migration, but not from the system or the exclusion templates list.

Hadoop Distributed File System

The default exclusions are:

ExclusionExclusion typeDescription
.\\._COPYING_$Regex (JAVA_PCRE)Hadoop Distributed File System (HDFS) copying files
/**/.hive-staging**Regex (GLOB)Hive staging directories
/**/.spark-staging-**Regex (GLOB)Spark staging directories
/**/_temporary**Regex (GLOB)Spark temporary directories
(/|/.*/)\\.Trash(/.*)?Regex (Automata)HDFS trash directories
(/|/.*/)\\.snapshot(/.*)?Regex (Automata)HDFS Snapshot directories

The Hive or Spark directories are used to stage temporary files during Hive or Spark jobs. These are automatically deleted by Hive or Spark after use, and are excluded by default to avoid the migration of unnecessary data.

The HDFS Snapshot and trash directories are (generally) only relevant to the local cluster and excluded to avoid migration of unnecessary data.

Azure Data Lake Storage (ADLS) Gen2

The default exclusions are:

ExclusionExclusion typeDescription
[.|\\/]$Regex (JAVA_PCRE)File names cannot end with . or ' '
.*([^\\/]*\\/){61,}.*Regex (Automata)Blob names cannot exceed 61 path segments
.{1025,}Regex (JAVA_PCRE)File name length cannot exceed 1024
.*[\\\\].*Regex (JAVA_PCRE)Filepath or name cannot include a backslash.
5 TBFile sizeFile size cannot exceed 5TB

These exclusions cover many of the limitations set by ADLS Gen2 directory and file naming rules.

Google Cloud Storage

The default exclusions are:

ExclusionExclusion typeDescription
.*[\r\n].*Regex (Automata)File name cannot contain carriage return or line feeds
.{1025,}Regex (JAVA_PCRE)File name length cannot exceed 1024
\\.\\.?Regex (Automata)File name cannot be named . or ..
16 TBFile sizeFile size cannot exceed 16TB

These exclusions cover the limitations set by Google Cloud object naming guidelines.

Configure exclusions

You can define additional exclusions which you can apply to specific migrations to ignore any matching content.

You can configure exclusions with the UI or the CLI.

info

The UI defaults to use JAVA_PCRE as the regex pattern type when you add an exclusion template. Use the CLI to add exclusion templates requiring a 'GLOB' or 'AUTOMATA' pattern type.

Configure exclusions with the UI

Define and assign exclusions to new or existing migrations.

Add new exclusions

Adding exclusions to an existing migration changes the future actions performed for that migration, but doesn't affect previously migrated content. To add new exclusions:

  1. In the Dashboard, select a product instance.

  2. Under Migrations, select Exclusion Templates.

  3. Select Add exclusion template to associate the exclusion with the selected filesystem and enter the parameters for the exclusion:

    • Exclusion type - Regex, File Size, or Date.
    • Name - The name given to the exclusion template. For example, 100gibfilelimit.
    • Description - A brief description of what the exclusion is doing. For example, "Files larger than 100GiB are excluded".
    • File Size = Value / Unit - The value and unit for the file size limit. For example, 100 GiB.
    • Regex = Regex - The regex pattern to be used for the filename exclusion. For example, ^test\.*.
    • Date = Select Date - Any files that have been modified before the specified date are excluded during migrations.
  4. Select Add once you've entered all necessary information to create your exclusion template.

  5. Under Migrations, select Data Migrations.

  6. Under the Bulk Action dropdown list, select Add exclusions.

  7. Select the migrations in the list to which you want to apply the exclusion.

    note

    You can add multiple exclusions to multiple migrations at once with the UI. For more information, see Bulk add exclusions to migrations with the UI.

    You can add and remove exclusions when you are creating migrations. For more information, see Assign exclusions to a new migration.

  8. Select the exclusion you want to apply from the Add Exclusions dropdown list.

  9. Select Submit to apply the exclusion.

tip

When using exclusions with a recurring migration, the count of number of files excluded reported in the migration status show a cumulative total for each recurrence of the migration.

Remove exclusions from the templates list

Removing exclusions from an existing migration changes the future actions performed for that migration, but doesn't affect previously migrated content. To remove exclusions:

  1. Go to the Exclusion Templates page and search for the exclusion you want to remove.

  2. Select the trash can icon.

    note

    This doesn't remove the exclusion from an existing migration. For more information, see Remove exclusions from an existing migration.

    You can't remove default exclusions from the templates list, but you can remove them from an existing migration.