Verify migrations
Discover discrepancies (inconsistencies) between source and target migration paths with a Verification Summary Report by performing Verification on Live, Completed or Stopped migrations.
Limitations and considerations
- As of Data Migrator 3.0, only one verification is run at a time, but now runs in parallel. The introduction and use of parallel requests within verification provide a mechanism for improved performance.
- If you find a discrepancy in the full report that was a modification to the source or target and was made after the verification start time, it isn't a true discrepancy.
- Data Migrator doesn't check any files and directories that are excluded from the migration for discrepancies. The summary report shows the number of files and directories excluded on the source.
- Use the CLI to verify migrations if you don't want to set a verification cutoff point.
- Use the UI to verify migrations if you want to set a verification cutoff point.
- Verifications with path mappings are supported, where the root of the migration is mapped. Path mappings within the migration aren't included.
If your migration has S3 as a source or target filesystem, the time a directory is reported as having changed is always the same as the time that the directory was queried by the verification.
This is because of how S3 stores metadata about directories.
If you set a verification cutoff point, discrepant S3 directories will be ignored as they are considered to be after the cutoff point.
Prerequisites
Migrations must have one of the following statuses to perform verification scans:
- Live (real-time event stream notifications for changes to the source that are replicated to the target)
- Complete (one-time migration without event stream notifications)
- Stopped (a user has stopped the migration manually)
You can’t verify a migration that is in progress. Wait for the migration to finish or stop it and run the verification scan.
Verification application properties
See verification application properties for more information on additional verification configuration.
The introduction and use of parallel requests within verification provides a mechanism for improving verification performance and can be used to reduce impact from environment factors such as network latency.
verification.default.target.parallel.requests
has a higher default value than verification.default.source.parallel.requests
to mitigate the expected performance impact of latency in target requests.
In use cases where the target filesystem is typically passive, increasing the number of parallel requests to the target should not significantly affect target namenode performance.
However, use caution when increasing parallel requests, particularly on HDFS systems with heavy end-user usage.
Verify migrations with the UI
View the migration status
From the Dashboard, select the migration you want to verify.
On the Migration Verification panel, you can see:
- View details about the migration verification status.
- Verification Status - Not Started, In Progress, Complete.
- Total Inconsistencies - Number of discrepancies between the source and target paths.
Verify a migration
Use the following options to create a new verification for a migration:
Select Migration Verification from the sidebar menu.
Path to verify
Enter the path or paths from your source or target filesystem you want to verify using the format/path/to
. The provided path must be a directory.
If you want to verify certain files on this path, for example, everything for a specific calendar month, you can enter a path like/path/to/oct_2022_*
. This option allows you to specify subsections of large migration root directories for verification scanning.You can add paths:
- One by one, up to a maximum of 100.
- Using wildcards to match multiple paths.
Verification depth
Enter a number to specify how deep in the directory you want to run the verification check.
The number must be equal to or less than the total number of levels in the directory structure of your migration.
Zero means there's no limit to the verification depth.Example
You enter two.
Data Migrator scans and verifies the top two levels of your migration.Verification Cutoff Point
Select a date and time as a verification cutoff point.The verification checks files modified on the source filesystem before the date and time you specify and excludes any changes after that cutoff point.
Example
You enter the date and time a migration was completed (one-time migration) or changed to live (live migration with continuous event stream notifications). No changes to the source filesystem are recorded as discrepancies in the verification report.
Cancel a verification
You can cancel a verification that is queued or in progress. After you select Start verification, you can simply select Cancel check.
You can also delete a verification report for a completed or canceled verification.
View a verification summary report
After you select Start verification, the Verification Summary Report panel is updated with information from the source and target found during the verification scan. You can view this while the verification is in progress or when it's complete or canceled. The scan may not start immediately as there can be one verification running at a time and new verifications are queued. When the verification starts checking the source and target, you can compare:
- The number of files and directories found on the source and the target
- Total amount of data on the source and the target
- Total number of inconsistencies found, including file size mismatches, missing or extra files and directories
You can view up to a maximum of the last 10 verifications for a migration. If you create more than 10 verifications for a migration, the oldest are garbage collected.
If you discover inconsistencies on your target file system and need to remove them with the Target Match migration option, see the Target Match activity monitoring section to learn more about comparing Target Match actions with inconsistencies.
Inconsistencies
Include | Don't include |
---|---|
Data that exists on one side and not on the other. | Changes made after a specified time (see Verification Cutoff Point below). |
File size mismatches. | Exclusions. |
Adding exclusions to an active migration may cause inconsistencies.
Files migrated to the target before adding the exclusion are listed as discrepancies because they exist on both the source and target.
- An excluded file that exists on both the source and target (
FOUND_ON_BOTH
) is a discrepancy. - An excluded file that exists on the source (
MISSING_ON_TARGET
) but not the target isn't a discrepancy.
Full verification reports list all excluded files.
Download a full verification report
After the verification is complete, you can view and expand reports for completed verifications under Last x Verification Reports. Select Download all files to view, share, and analyze the results of the migration verification. This will give you a full report as a tar archive file containing summary.json and the following files:
full-verification.jsonl.gz
missing_on_source_paths.jsonl.gz
repair.jsonl
(Only included in reports using the--repair
CLI option.)target_inconsistent_paths.jsonl.gz
verification-discrepancy.jsonl.gz
verification-discrepancy.csv.gz
verification.json
Reports are downloaded as .gz files. Use a tool like gunzip or any compatible decompression utility to extract the file before viewing.
You can also download the individual files for up to 10 latest reports by selecting the individual download icon next to the file(s) you want. The maximum number displayed is 10.
The verification-discrepancy report shows all discrepancies (inconsistencies) between these two filesystems while the full-verification report shows discrepancies and all checks. You can use the full-verification report to review reasons why particular files didn't show as discrepancies (inconsistencies).
To download the full verification report, under Last x Verification Reports, expand the main verification report you need, select the download icon beside full-verification.jsonl.gz.
See the following Knowledge Base article to adjust the Verification Report location.
Review the full-verification report
Example full-verification report.
{"timestamp":1713266643021,"scanResult":"MISSING_ON_TARGET","reason":"MISSING_ON_TARGET","sourcePath":"/data/5/AddedToSourceAfterMigration","sourceLength":644,"sourceModifiedTime":1713266257688,"targetPath":null,"targetLength":-1,"targetModifiedTime":-1}
{"timestamp":1713266643024,"scanResult":"MISSING_ON_SOURCE","reason":"IGNORED_AFTER_TIMESTAMP","sourcePath":null,"sourceLength":-1,"sourceModifiedTime":-1,"targetPath":"/data/5/AddedToTargetAfterCutoffIgnoreMe","targetLength":16170,"targetModifiedTime":1713266614532}
{"timestamp":1713266643024,"scanResult":"MISSING_ON_SOURCE","reason":"MISSING_ON_SOURCE","sourcePath":null,"sourceLength":-1,"sourceModifiedTime":-1,"targetPath":"/data/5/AddedToTargetAfterMigration","targetLength":644,"targetModifiedTime":1713266233861}
{"timestamp":1713266643024,"scanResult":"MISSING_ON_TARGET","reason":"EXCLUDED","sourcePath":"/data/5/DoNotMigrateFile1","sourceLength":644,"sourceModifiedTime":1713262290632,"targetPath":null,"targetLength":-1,"targetModifiedTime":-1}
{"timestamp":1713266643024,"scanResult":"FOUND_ON_BOTH","reason":"SIZE_MISMATCH","sourcePath":"/data/5/FileSizeMismatch","sourceLength":3220,"sourceModifiedTime":1713266312234,"targetPath":"/data/5/FileSizeMismatch","targetLength":644,"targetModifiedTime":1713266297804}
{"timestamp":1713266643025,"scanResult":"FOUND_ON_BOTH","reason":"OK","sourcePath":"/data/5/sourceFile1","sourceLength":644,"sourceModifiedTime":1713194477365,"targetPath":"/data/5/sourceFile1","targetLength":644,"targetModifiedTime":1713257593900}
{"timestamp":1713266643025,"scanResult":"FOUND_ON_BOTH","reason":"OK","sourcePath":"/data/5/sourceFile2","sourceLength":644,"sourceModifiedTime":1713194477480,"targetPath":"/data/5/sourceFile2","targetLength":644,"targetModifiedTime":1713257593900}
In this example verification, an extra file on source, an extra file on target, a file with a size mismatch resulted in 3 inconsistencies. An additional file at target did not report as an inconsistency due to the cutoff, an extra file at source did not report as an inconsistency due to an exclusion.
scanResult | reason | File | Explanation | Inconsistency |
---|---|---|---|---|
MISSING_ON_TARGET | MISSING_ON_TARGET | /data/5/AddedToSourceAfterMigration | File added to source after migration | Inconsistency |
MISSING_ON_SOURCE | IGNORED_AFTER_TIMESTAMP | /data/5/AddedToTargetAfterCutoffIgnoreMe | File added to target after the verification cutoff | No inconsistency |
MISSING_ON_SOURCE | MISSING_ON_SOURCE | data/5/AddedToTargetAfterMigration | File added to target after migration | Inconsistency |
MISSING_ON_TARGET | EXCLUDED | /data/5/DoNotMigrateFile1 | File missing on target as expected because of exclusion applied to the migration for this filename | No inconsistency |
FOUND_ON_BOTH | SIZE_MISMATCH | /data/5/FileSizeMismatch | File on both filesystems but size doesn't match | Inconsistency |
FOUND_ON_BOTH | OK | /data/5/sourceFile1 | File exists on source and target. Ok. | No inconsistency |
FOUND_ON_BOTH | OK | /data/5/sourceFile2 | File exists on source and target. Ok. | No inconsistency |
scanResult and reason types
scanResult | Description |
---|---|
FOUND_ON_BOTH | File exists on both source and target filesystems |
MISSING_ON_TARGET | File exists on source but not on target |
MISSING_ON_SOURCE | File exists on target but not on source |
Reason | Description |
---|---|
OK | No inconsistency |
MISSING_ON_TARGET | File exists on source but not on target |
MISSING_ON_SOURCE | File exists on target but not on source |
SIZE_MISMATCH | File exists on source and target but sizes differ |
IGNORED_AFTER_TIMESTAMP | File missing on either source or target but no inconsistency due to verification cutoff |
EXCLUDED | File missing due to exclusion rule applied to migration |
Review the condensed reports
While a full report shows each discovered inconstancy, the missing_on_source_paths.jsonl.gz
and target_inconsistent_paths.jsonl.gz
files give a clearer insight into discovered inconsistencies with a condensed, filtered view of inconsistencies.
If a directory is inconsistent then all the sub-paths are not included. Paths that work is scheduled for are not included, (for example if there is an action scheduled for a path that is missing then it is not included in the condensed report).
Set up email notifications
You can be notified by email about the status of migration verifications and receive the results by email. Go to the Email Notification page to set up these alerts.
For more information, see Configure email notifications.
Verify migrations with the CLI
Use the following commands to manage verifications.
View the verification status using migration verification show
for individual verification jobs or migration verification list
for all verification jobs.
migration verification start
Start a new verification for a migration.
Example: Start a new verification for a migration
migration verification start --migration-id myNewMigration --depth 0 --timestamp 2022-11-15T16:24 --paths /MigrationPath
Any path or paths supplied using the --paths
option must be a directory.
Example: Start a verification for a migration and rescan inconsistent paths
Attempt to repair inconsistencies discovered from a previous verification by automatically rescanning inconsistent paths using the --repair
and --paths
options.
Note: If there are paths missing on source, the repair will only add these to the pending regions if the migration is currently using Target Match.
A repair won't run if the path to rescan is determined to be the root directory of the migration. If you find a very large number of inconsistencies it may be more efficient to reset the migration.
Data Migrator will rescan inconsistent paths for this migration. Depending on your migration scan type, action policy, and exclusions, it may not always be possible to resolve all inconsistencies. For example, if your migration is not using Target Match to remove extraneous files/folders from the target, rescanning won't resolve this inconsistency with this migration configuration.
migration verification start --migration-id mig1 --paths /migrationpath/1 --repair
migration verification list
List summaries for all or specified verifications.
Examples
migration verification list
migration verification list --migration-id myNewMigration --states IN_PROGRESS,QUEUED
migration verification show
Show the status of a specific migration verification.
Example
migration verification show --verification-id 91c79b1b-c61f-4c39-be61-18072ac3a086-1676979356465
migration verification stop
Stop a queued or in-progress migration verification.
Example
migration verification stop --verification-id ab123c03-697b-48a5-93cc-abc23838d37d-1668593022565
migration verification report
Download a full verification report.
Examples
migration verification report --verification-id ab123c03-697b-48a5-93cc-abc23838d37d-1668593022565 --out-dir /user/exampleVerificationDirectory
View verification summary
After a verification is complete, you can view a verification summary as a JSON file (summary.json
) in your verification folder.
This report contains details including any paths that have discrepancies.
Where multiple summaries are output, they are enclosed in a JSON array:
{summary1}, {summary2}, ...
.