Skip to main content
Version: 3.0 (latest)

Verify migrations

Discover discrepancies (inconsistencies) between source and target migration paths with a Verification Summary Report by performing Verification on Live, Completed or Stopped migrations.

Limitations and considerations

  • As of Data Migrator 3.0, only one verification is run at a time, but now runs in parallel. The introduction and use of parallel requests within verification provide a mechanism for improved performance.
  • If you find a discrepancy in the full report that was a modification to the source or target and was made after the verification start time, it isn't a true discrepancy.
  • Data Migrator doesn't check any files and directories that are excluded from the migration for discrepancies. The summary report shows the number of files and directories excluded on the source.
  • Use the CLI to verify migrations if you don't want to set a verification cutoff point.
  • Use the UI to verify migrations if you want to set a verification cutoff point.
  • Verifications with path mappings are supported, where the root of the migration is mapped. Path mappings within the migration aren't included.
note

If your migration has S3 as a source or target filesystem, the time a directory is reported as having changed is always the same as the time that the directory was queried by the verification.

This is because of how S3 stores metadata about directories.

If you set a verification cutoff point, discrepant S3 directories will be ignored as they are considered to be after the cutoff point.

Prerequisites

Migrations must have one of the following statuses to perform verification scans:

  • Live (real-time event stream notifications for changes to the source that are replicated to the target)
  • Complete (one-time migration without event stream notifications)
  • Stopped (a user has stopped the migration manually)
note

You can’t verify a migration that is in progress. Wait for the migration to finish or stop it and run the verification scan.

Verification application properties

See verification application properties for more information on additional verification configuration.

info

The introduction and use of parallel requests within verification provides a mechanism for improving verification performance and can be used to reduce impact from environment factors such as network latency. verification.default.target.parallel.requests has a higher default value than verification.default.source.parallel.requests to mitigate the expected performance impact of latency in target requests. In use cases where the target filesystem is typically passive, increasing the number of parallel requests to the target should not significantly affect target namenode performance. However, use caution when increasing parallel requests, particularly on HDFS systems with heavy end-user usage.

Verify migrations with the UI

View the migration status

  1. From the Dashboard, select the migration you want to verify.

  2. On the Migration Verification panel, you can see:

    • View details about the migration verification status.
    • Verification Status - Not Started, In Progress, Complete.
    • Total Inconsistencies - Number of discrepancies between the source and target paths.

Verify a migration

Use the following options to create a new verification for a migration:

  1. Select Migration Verification from the sidebar menu.

  2. Path to verify
    Enter the path or paths from your source or target filesystem you want to verify using the format /path/to. The provided path must be a directory.
    If you want to verify certain files on this path, for example, everything for a specific calendar month, you can enter a path like /path/to/oct_2022_*. This option allows you to specify subsections of large migration root directories for verification scanning.

    You can add paths:

    • One by one, up to a maximum of 100.
    • Using wildcards to match multiple paths.
  3. Verification depth
    Enter a number to specify how deep in the directory you want to run the verification check.
    The number must be equal to or less than the total number of levels in the directory structure of your migration.
    Zero means there's no limit to the verification depth.

    Example
    You enter two.
    Data Migrator scans and verifies the top two levels of your migration.

  4. Verification Cutoff Point
    Select a date and time as a verification cutoff point.

    The verification checks files modified on the source filesystem before the date and time you specify and excludes any changes after that cutoff point.

    Example
    You enter the date and time a migration was completed (one-time migration) or changed to live (live migration with continuous event stream notifications). No changes to the source filesystem are recorded as discrepancies in the verification report.

Cancel a verification

You can cancel a verification that is queued or in progress. After you select Start verification, you can simply select Cancel check.
You can also delete a verification report for a completed or canceled verification.

View a verification summary report

After you select Start verification, the Verification Summary Report panel is updated with information from the source and target found during the verification scan. You can view this while the verification is in progress or when it's complete or canceled. The scan may not start immediately as there can be one verification running at a time and new verifications are queued. When the verification starts checking the source and target, you can compare:

  • The number of files and directories found on the source and the target
  • Total amount of data on the source and the target
  • Total number of inconsistencies found, including file size mismatches, missing or extra files and directories
note

You can view up to a maximum of the last 10 verifications for a migration. If you create more than 10 verifications for a migration, the oldest are garbage collected.

tip

If you discover inconsistencies on your target file system and need to remove them with the Target Match migration option, see the Target Match activity monitoring section to learn more about comparing Target Match actions with inconsistencies.

Inconsistencies

IncludeDon't include
Data that exists on one side and not on the other.Changes made after a specified time (see Verification Cutoff Point below).
File size mismatches.Exclusions.
info

Adding exclusions to an active migration may cause inconsistencies.

Files migrated to the target before adding the exclusion are listed as discrepancies because they exist on both the source and target.

  • An excluded file that exists on both the source and target (FOUND_ON_BOTH) is a discrepancy.
  • An excluded file that exists on the source (MISSING_ON_TARGET) but not the target isn't a discrepancy.
note

Full verification reports list all excluded files.

Download a full verification report

After the verification is complete, you can view and expand reports for completed verifications under Last x Verification Reports. Select Download all files to view, share, and analyze the results of the migration verification. This will give you a full report as a tar archive file containing summary.json and the following files:

  • full-verification.jsonl.gz
  • missing_on_source_paths.jsonl.gz
  • repair.jsonl (Only included in reports using the --repair CLI option.)
  • target_inconsistent_paths.jsonl.gz
  • verification-discrepancy.jsonl.gz
  • verification-discrepancy.csv.gz
  • verification.json
tip

Reports are downloaded as .gz files. Use a tool like gunzip or any compatible decompression utility to extract the file before viewing.

You can also download the individual files for up to 10 latest reports by selecting the individual download icon next to the file(s) you want. The maximum number displayed is 10.

The verification-discrepancy report shows all discrepancies (inconsistencies) between these two filesystems while the full-verification report shows discrepancies and all checks. You can use the full-verification report to review reasons why particular files didn't show as discrepancies (inconsistencies).

To download the full verification report, under Last x Verification Reports, expand the main verification report you need, select the download icon beside full-verification.jsonl.gz.

tip

See the following Knowledge Base article to adjust the Verification Report location.

Review the full-verification report

Example full-verification report.

full-verification.json example
{"timestamp":1713266643021,"scanResult":"MISSING_ON_TARGET","reason":"MISSING_ON_TARGET","sourcePath":"/data/5/AddedToSourceAfterMigration","sourceLength":644,"sourceModifiedTime":1713266257688,"targetPath":null,"targetLength":-1,"targetModifiedTime":-1}
{"timestamp":1713266643024,"scanResult":"MISSING_ON_SOURCE","reason":"IGNORED_AFTER_TIMESTAMP","sourcePath":null,"sourceLength":-1,"sourceModifiedTime":-1,"targetPath":"/data/5/AddedToTargetAfterCutoffIgnoreMe","targetLength":16170,"targetModifiedTime":1713266614532}
{"timestamp":1713266643024,"scanResult":"MISSING_ON_SOURCE","reason":"MISSING_ON_SOURCE","sourcePath":null,"sourceLength":-1,"sourceModifiedTime":-1,"targetPath":"/data/5/AddedToTargetAfterMigration","targetLength":644,"targetModifiedTime":1713266233861}
{"timestamp":1713266643024,"scanResult":"MISSING_ON_TARGET","reason":"EXCLUDED","sourcePath":"/data/5/DoNotMigrateFile1","sourceLength":644,"sourceModifiedTime":1713262290632,"targetPath":null,"targetLength":-1,"targetModifiedTime":-1}
{"timestamp":1713266643024,"scanResult":"FOUND_ON_BOTH","reason":"SIZE_MISMATCH","sourcePath":"/data/5/FileSizeMismatch","sourceLength":3220,"sourceModifiedTime":1713266312234,"targetPath":"/data/5/FileSizeMismatch","targetLength":644,"targetModifiedTime":1713266297804}
{"timestamp":1713266643025,"scanResult":"FOUND_ON_BOTH","reason":"OK","sourcePath":"/data/5/sourceFile1","sourceLength":644,"sourceModifiedTime":1713194477365,"targetPath":"/data/5/sourceFile1","targetLength":644,"targetModifiedTime":1713257593900}
{"timestamp":1713266643025,"scanResult":"FOUND_ON_BOTH","reason":"OK","sourcePath":"/data/5/sourceFile2","sourceLength":644,"sourceModifiedTime":1713194477480,"targetPath":"/data/5/sourceFile2","targetLength":644,"targetModifiedTime":1713257593900}

In this example verification, an extra file on source, an extra file on target, a file with a size mismatch resulted in 3 inconsistencies. An additional file at target did not report as an inconsistency due to the cutoff, an extra file at source did not report as an inconsistency due to an exclusion.

scanResultreasonFileExplanationInconsistency
MISSING_ON_TARGETMISSING_ON_TARGET/data/5/AddedToSourceAfterMigrationFile added to source after migrationInconsistency
MISSING_ON_SOURCEIGNORED_AFTER_TIMESTAMP/data/5/AddedToTargetAfterCutoffIgnoreMeFile added to target after the verification cutoffNo inconsistency
MISSING_ON_SOURCEMISSING_ON_SOURCEdata/5/AddedToTargetAfterMigrationFile added to target after migrationInconsistency
MISSING_ON_TARGETEXCLUDED/data/5/DoNotMigrateFile1File missing on target as expected because of exclusion applied to the migration for this filenameNo inconsistency
FOUND_ON_BOTHSIZE_MISMATCH/data/5/FileSizeMismatchFile on both filesystems but size doesn't matchInconsistency
FOUND_ON_BOTHOK/data/5/sourceFile1File exists on source and target. Ok.No inconsistency
FOUND_ON_BOTHOK/data/5/sourceFile2File exists on source and target. Ok.No inconsistency

scanResult and reason types

scanResultDescription
FOUND_ON_BOTHFile exists on both source and target filesystems
MISSING_ON_TARGETFile exists on source but not on target
MISSING_ON_SOURCEFile exists on target but not on source
ReasonDescription
OKNo inconsistency
MISSING_ON_TARGETFile exists on source but not on target
MISSING_ON_SOURCEFile exists on target but not on source
SIZE_MISMATCHFile exists on source and target but sizes differ
IGNORED_AFTER_TIMESTAMPFile missing on either source or target but no inconsistency due to verification cutoff
EXCLUDEDFile missing due to exclusion rule applied to migration

Review the condensed reports

While a full report shows each discovered inconstancy, the missing_on_source_paths.jsonl.gz and target_inconsistent_paths.jsonl.gz files give a clearer insight into discovered inconsistencies with a condensed, filtered view of inconsistencies. If a directory is inconsistent then all the sub-paths are not included. Paths that work is scheduled for are not included, (for example if there is an action scheduled for a path that is missing then it is not included in the condensed report).

tip

Set up email notifications

You can be notified by email about the status of migration verifications and receive the results by email. Go to the Email Notification page to set up these alerts.

For more information, see Configure email notifications.

Verify migrations with the CLI

Use the following commands to manage verifications.

note

View the verification status using migration verification show for individual verification jobs or migration verification list for all verification jobs.

migration verification start

Start a new verification for a migration.

Example: Start a new verification for a migration

Trigger a new verification for a migration
migration verification start --migration-id myNewMigration --depth 0 --timestamp 2022-11-15T16:24 --paths /MigrationPath
note

Any path or paths supplied using the --paths option must be a directory.

Example: Start a verification for a migration and rescan inconsistent paths

Attempt to repair inconsistencies discovered from a previous verification by automatically rescanning inconsistent paths using the --repair and --paths options.

Note: If there are paths missing on source, the repair will only add these to the pending regions if the migration is currently using Target Match.

A repair won't run if the path to rescan is determined to be the root directory of the migration. If you find a very large number of inconsistencies it may be more efficient to reset the migration.

info

Data Migrator will rescan inconsistent paths for this migration. Depending on your migration scan type, action policy, and exclusions, it may not always be possible to resolve all inconsistencies. For example, if your migration is not using Target Match to remove extraneous files/folders from the target, rescanning won't resolve this inconsistency with this migration configuration.

Trigger a new verification for a migration and rescan inconsistent paths
migration verification start --migration-id mig1 --paths /migrationpath/1 --repair

migration verification list

List summaries for all or specified verifications.

Examples

List summaries for all verifications.
migration verification list
List in-progress and queued verification summaries for a specific migration
migration verification list --migration-id myNewMigration --states IN_PROGRESS,QUEUED

migration verification show

Show the status of a specific migration verification.

Example

Example status of a completed verification
migration verification show --verification-id 91c79b1b-c61f-4c39-be61-18072ac3a086-1676979356465

migration verification stop

Stop a queued or in-progress migration verification.

Example

Stop a migration verification
migration verification stop --verification-id ab123c03-697b-48a5-93cc-abc23838d37d-1668593022565

migration verification report

Download a full verification report.

Examples

Download a verification report
migration verification report --verification-id ab123c03-697b-48a5-93cc-abc23838d37d-1668593022565 --out-dir /user/exampleVerificationDirectory

View verification summary

After a verification is complete, you can view a verification summary as a JSON file (summary.json) in your verification folder.

This report contains details including any paths that have discrepancies.

Where multiple summaries are output, they are enclosed in a JSON array:
{summary1}, {summary2}, ....