Release notes
Product Version | LiveData UI | LiveData Migrator | Hive Migrator | CLI |
---|---|---|---|---|
3.2.0 | 14.10.3 | 3.2.2 | 2.6.9 | 2.1.3 |
Release Highlights
Databricks Ingest Performance
In this release, we've significantly enhanced Data Migrator's ability to migrate and convert Hive formatted tables to Delta format on Databricks by introducing cross-partition batching for COPY INTO
commands.
Previously, when converting partitioned Hive tables into Delta format, Data Migrator would process each partition independently. This meant issuing a separate COPY INTO
command for every single partition, which resulted in an excessive number of operations and inefficient use of resources on Databricks.
Now, instead of processing partitions in isolation, Data Migrator batches files across multiple partitions into fewer, larger COPY INTO
operations. By grouping partitions this way, we significantly reduce the number of commands issued and enable Databricks to handle ingestion workloads more efficiently. This leads to massive performance improvements, especially for datasets with a large number of small or unevenly populated partitions. The result is faster overall migration times and better scalability.
More information on Databricks prerequisites can be found here.
Parallel Scan
Scanning files and directories is central to the data migration process. Challenges with this scanning include where there are large filesystems, with high latency, and significant client activity. To meet these challenges we have re-engineered our scanning approach and introduced Parallel Scan.
In this release we have made improvements to Parallel Scan, which can be enabled as a beta feature. Parallel Scan can improve scanning performance, reduce migration times, and enhance the management of your system resources through the introduction of a Global Scan Pool for Target Match migrations. Any improvements will depend on your particular environment, the number of migrations, and the structure of your data in terms of files and directories. We will be targeting additional improvements around tuning capabilities in a future release.
The Parallel Scan beta feature can be enabled for testing and evaluation. More information can be found here. Please consult with support for additional guidance.
Recheck and Repair options for Verification now available from UI
You can now benefit from the verification repair and recheck options being available in the UI. It is important to understand how these features work. A verification repair will add a pending region to be rescanned provided the migration is using Target Match, and the verification itself will likely complete before the result of the repair has been effected. This means that once the effect of the repair has taken place, you can execute a verification recheck to review any inconsistencies that may have been resolved via the repair. Therefore this is a two-stage process of verification repair and then verification recheck. Of course a verification recheck can be performed at any time provided there is at least one inconsistency in the last verification report which was executed.
For more information on verification repair see here.
The verification report generation process does not wait for the action triggered by the "repair" to complete. Most likely we need to recheck after verification repair has both generated the verification report and any changes made by the "repair" have been effective.
Hive Remote Agents - Multi-Filesystem Support
This release introduces multi-filesystem support for Hive remote agents. On S3 based environments, you can now use a single hive remote agent to cater for multiple S3 target filesystems. Once the remote agent is installed and configured on your target, you can identify which filesystem you want a dataset to be associated with per metadata migration via the default fs override parameter.
Other Improvements
UI Performance
The UI H2 database has been upgraded in this release and will require users to take note of the space requirements during the upgrade process - see Storage Requirements for the UI database upgrade in Release 3.2. The headline here is that during the upgrade you will need to ensure that there is at least ten times the size of your current UI H2 database available as free space on your filesystem to allow this H2 database upgrade process to complete successfully. This upgrade will greatly enhance the performance and maintainability of the UI going forward.
UI Improvements
We have consolidated the Metrics and Diagnostics pages for a migration into a single Metrics page for simplicity and ease of use. Within this page we have added a full history for the migration file size distribution and also introduced a separate page called Active Transfers which displays the active file transfers per migration.
Several user experience areas have also been improved, notably when creating metadata rules from the metadata migrations page, when resetting or deleting a migration to warn that the associated verification reports will be deleted, and on bulk reset to provide the same advice that any associated verification reports will be removed.
CLI Enhancements
In this release, we have taken the opportunity to enhance the CLI in several key areas. The following commands transform the information available to users and provide an insight into the progress of your migrations:
migration stats
now in a much improved format, including detailed information on the migration and filesystem scannermigration list
interactive table with options to sort, filter and seemigration stats
for a selected migrationfile transfers
new command introduced to improve the feedback to users on the transfer of files during a migration
In addition to these quantum leaps for the CLI commands, we have also enhanced the information displayed for the migration verification show
command on tab completion so that it functions correctly and it is much easier to distinguish between verification reports.
More information on the CLI commands used to manage migrations can be found here. Further information on all of the CLI commands is available here.
Resolved Issues
Data Migrator Core
LM2-8419 Introduce parallelism into Scanning (Global Scan Pool)
LM2-8593 Can't update LDAP config because license is invalid
LM2-8607 Removal of Legacy verification reports
LM2-8640 REST API - New metrics and logging for Parallel Scan
LM2-8665 Target match stats counted in non-target match migration
LM2-8674 Unable to create GCS FS
LM2-8680 ORC files failing to be enriched for Iceberg Migrations
LM2-8682 Attempting to create s3a source when one already exists can fail
LM2-8695 Parallel Scan file status cache does not get updated
LM2-8720 Handle case when root of Source Filesystem is unavailable
LM2-8733 Invalid caching of rename pending region child paths
LM2-8734 Manage multiple CurrentRegions for parallel scan
LM2-8738 AWS SDK Deprecation WARN message in logs after adding S3 filesystem
LM2-8745 Recurring parallel scan migration stats
LM2-8747 "Internal Server Error", while adding AWS endpoint
LM2-8754 Repair failing due to non Target Match for Parallel scan labelled Migration
LM2-8756 DanglingPathAction not included in the TRANSFERRING set
LM2-8760 Hadoop client library update for aws
LM2-8763 Cancel verification api throws exception
LM2-8767 Two Way scan should not abandon window if source path does not exist
LM2-8776 Exception adding S3 compatible storage due to missing region
LM2-8791 DanglingPathActions should attempt to clean up parent directory structure
LM2-8810 Parallel Scan - Global ThreadPool Allocation
LM2-8820 Log Errors : waiting for leases but the lease queue is empty
LM2-8823 Scanner should count invalid regions towards the Iteration Limit
LM2-8824 Optimize scanning specific method so that it scales
LM2-8826 Parallel scan deletion of currently scanning region parent
LM2-8828 LDM fails to start if S3 target is inaccessible
LM2-8829 Removing unused source FS interrupts connection to unrelated source FS
LM2-8830 CRC Files generated on localFS target
LM2-8832 Validation only works on localFS with opening /
LM2-8833 Files missing from Target incorrectly reported
LM2-8870 Live Migrator "Requested array size exceeds VM limit" error
Hive Migrator
HVM-5244 Logging from the calls for the token-exchange call can become heavy
HVM-5249 Insert timestamp not working to Unity Databricks
HVM-5257 Renaming a partitioned table and inserting data duplicates table contents
HVM-5281 Slow API calls when dealing with remote agent config
HVM-5292 OpenAPI definitions are broken
HVM-5317 Ability to add key/value pairs as additional properties for REST catalog
HVM-5318 Iceberg double slash in the manifestList path of the metadata JSON
HVM-5365 Remote Hive Agents require further SDK implementation/Investigation
HVM-5368 Databricks alter table can force unnecessary drop/create
HVM-5383 Checksum comparison between HDP 2.6.5 and HDP 3.1.5 client
HVM-5384 Enable Kerberos Support in Iceberg Hive Agent
HVM-5385 Reset migration removes targetLocation value
HVM-5388 Databricks : Cluster usage vs threadcount limit
HVM-5392 Path mappings evaluated for agent's filesystem
HVM-5393 Efficient Data Event Handling
HVM-5394 Multithreaded File Scanner on migration restart
HVM-5395 Batch Copy-into stats
HVM-5397 HiveAgent alter table fallback to drop/create does not batch partitions
HVM-5405 Remote agent dropCreateTable does not re-add partitions
HVM-5406 Remote agent does internal restart to update defaultFsOverride value
UI
ONEUI-7349 Update 'Migrator' to 'Instance'
ONEUI-8073 UI can cut off the right hand side of the screen
ONEUI-8248 Upgrade PMD to 7.x.x
ONEUI-8282 Migration file size distribution - Full History
ONEUI-8287 Combine Metrics and diagnostics page into one
ONEUI-8290 Active file transfers per migration
ONEUI-8312 Display HVM Remote Server Versions on the UI
ONEUI-8317 LDMFailedPathsSyncDataProvider dumps migrations to logs
ONEUI-8353 OAuth Server URI should be specifically describing the Token Endpoint
ONEUI-8354 Remove info level logging in the LMV2MigrationTableSummaryResource
ONEUI-8360 Discrepancies with the license usage
ONEUI-8364 Checksum action policy cannot be selected when S3 is a source or target
ONEUI-8369 Enable recheck for verification report
ONEUI-8377 Warning for delete of verification reports on migration reset/delete
ONEUI-8378 Unable to delete FS if instance has any running migrations
ONEUI-8384 Add ability to create metadata rule in metadata migrations page
ONEUI-8386 Enable repair option for verification via UI
ONEUI-8387 H2 database upgrade should force compaction after ingest from dump
ONEUI-8389 H2 upgrade service improvement to manage file lock
ONEUI-8390 Update and simplify the AWS Auth options on AWS S3 Filesystems
ONEUI-8398 Bulk reset action shows migrations which can't be reset
ONEUI-8403 Include migration-ids in bulk reset with verifications alert text
ONEUI-8404 Migration overview can report 'Active for Invalid Date'
ONEUI-8409 Warning level logs displaying as Errors/Failures in the UI notifications
ONEUI-8410 Retries screen keeps collapsing itself
ONEUI-8416 Active File Transfers screen pagination issue
ONEUI-8417 Filters on Bulk Actions screens not filtering results
ONEUI-8428 Table page doesn't update when data removed
ONEUI-8432 Values persist in UI after verifications have been deleted
ONEUI-8433 Unable to bulk reset migrations on upgrade
CLI
LDMC-567 Migration Stats enhancements
LDMC-568 LDM File Transfers enhancements
LDMC-619 Warning for delete of verification reports on migration reset/delete
LDMC-635 Tab auto complete issues with migration verification show command
LDMC-642 More info for migration verification show command on tab completion
LDMC-647 Must supply a value for --fs-root when creating a local filesystem