Skip to main content
Version: 2.6

Upgrade Data Migrator

We recommend you regularly upgrade Data Migrator so you can take advantage of new functionality and other improvements. To upgrade, run through the prerequisites covered below and then run a newer version of the Data Migrator installer. The installer upgrades your Data Migrator instance to the new version.

If your existing deployment uses Lightweight Directory Access Protocol (LDAP/Active Directory) to manage user access, take note of the following known issue.

Before you upgrade

Read through the following section before you begin a product upgrade.

When you upgrade, you'll probably need to make some configuration changes.

Files in the /etc/wandisco directory contain custom configuration changes. The files generally used for configuration are:

Existing configuration

New versions can introduce additional configuration properties and improved default values. Compare your existing configuration and apply any new properties or applicable values supplied with the new configuration files included with your latest version. Check the release notes for changes to these files and make any changes before you restart services.

RPM

For RPM-based installations, your modified configuration is preserved when an RPM upgrade is applied. The latest(new) configuration is saved to the same folder with a .rpmnew extension. Compare your existing configuration and apply any new properties or applicable values supplied with the new configuration files included with your latest version.

Debian based

If you’re on a Debian-based system, your current configuration will be saved with the .dpkg-old extension and no longer used. A new version of configuration file containing any new defaults and property features will be created and used. Compare new config and add your existing custom configuration to your new configuration before restarting services.

In most cases, it is recommended to keep your current configuration, and introduce any new properties as required. For /etc/wandisco/ui/application-prod.properties, it is essential to keep the existing configuration to ensure the UI starts. See Debian automatic handling of configuration files for more information.

Hotfix patch

Newer releases can include previously issued hotfixes. If it's included in your latest version, and not required, fully remove the hotfix patch from your deployment.

If you've deployed a hotfix on your current version, see Hotfix patch removal important information to confirm if it's still required for your latest upgraded version.

Upgrading from 1.15.1 or earlier

If you're currently running LiveData Migrator 1.15.1 or earlier, you must first upgrade to LiveData Migrator 1.16 before upgrading to the latest version. Use the following installation steps. Before you start, read the 1.16 Release Notes.

Upgrading from 1.16 or later

Read through the following upgrade notices before starting your upgrade to the latest version:

caution

Upgrading to Data Migrator 2.5 and later

Location mapping properties

If you're upgrading to 2.5.4 and use tables in the Hive metastore, which have a path Serde property (either created by Spark or custom Serdes) indicating the data location, and require transforming this location to the location of your target platform data within migrations, review the location mapping properties information and contact Support so that these properties can be adjusted accordingly.

Databricks agents

If you are upgrading from any Data Migrator version prior to 2.5 and have Databricks agents. Because of the significant improvements to this agent type in 2.5, all Databricks migrations must be stopped and deleted, and any Databricks agents must be removed before upgrading to Data Migrator 2.5 and later.

info

Upgrading to Data Migrator 2.3 or earlier

After upgrade to any version of Data Migrator prior to 2.4, if Hive Migrator starts before Data Migrator, metadata migrations report as being stopped.

Hive Migrator tries to check the Data Migrator license and fails to connect. The lack of connection leads to stopped metadata migrations as Hive Migrator assumes the license is invalid.

Prevention

Prevent automatic stop of metadata migrations by extending the time and retry intervals for the license check.

To adjust the license check intervals.

  1. Open or edit /etc/wandisco/hivemigrator/application.properties
  2. Uncomment the following properties and adjust their values as shown below.
hivemigrator.integration.liveDataMigrator.connectionMaxRetries=10
hivemigrator.integration.liveDataMigrator.connectionRetryDelay=50000

Resume stopped metadata migrations post upgrade

If the license check has already failed and migrations have already been stopped, resume migrations individually or in bulk.

  • To resume individual stopped metadata migrations, go to the Metadata Migrations panel on the Overview page and filter by Stopped. Select the migration then select Resume.

  • To resume multiple failed metadata migrations, go to Metadata Migrations and under Bulk Actions, select Resume. See Bulk actions.

info

Upgrading to Data Migrator 2.3 if using Auto source cleanup

Auto source cleanup feature has been removed as of Data Migrator 2.3.

Disable Auto source cleanup on all migrations before upgrading to Data Migrator 2.3 if enabled on any migrations.

info

If upgrading from Data Migrator 2.0 to 2.2 and currently using Data Transfer Agents, additional steps are required to start the DTA service. See the Known Issue for steps and more info.

info

Upgrading to Data Migrator 1.21 if using a Databricks agent Data Migrator 1.21 doesn't support Databricks JDBC driver version 2.6.22 or earlier. Upgrade to JDBC driver version 2.6.25 or higher to continue using Databricks agents with Data Migrator.

info

Upgrading to Data Migrator 1.20 or later if using remote agents
If your current deployment uses remote agents, you must complete additional steps before proceeding with the upgrade. See the following knowledge base article - known issue.

Configuration files stay the same after upgrading, but configuration files from the new version are also added into the same folder on an RPM installation. These new configuration files have the extension .rpmsave, and are ignored by Data Migrator by default. You may compare them and copy changes across accordingly, or use the new files.

The upgrade automatically overwrites shell scripts (such as start.sh) with the newer versions.

info

Don't change the encrypted database password for the UI in application-prod.properties. If you change the key, the UI won't start. If you're on a Debian-based system, you're prompted to decide whether to keep the old application-prod.properties file or use the new one from the installer. To ensure the UI starts, choose to keep the existing file.

info

Upgrading to Data Migrator 1.21 - Critical steps

Data Migrator 1.20 changes Hive Migrator user configuration. If upgrading to 1.21 and authenticating with Hive through a Kerberos principal that doesn't map to the hive user, ensure there's a valid proxyuser setting in core-site.xml. Otherwise, metadata migrations will fail. See the related known issue for more information.

info

Upgrading to/through Data Migrator 1.19 - Critical steps for Hive Migrator

This issue applies to any pre-1.19 version, upgrading to any later version.
For example: 1.18 to 1.20 or 1.18 to 1.21.

Large Hive Migrator databases may take up to 30 minutes to optimize. This process is automatic and occurs when you first start Hive Migrator after upgrading Data Migrator 1.19. If the Hive Migrator service is interrupted during this optimization, it may irreversibly corrupt the database.

We strongly recommend that you:

  • Back up the Hive Migrator database before you run a reset (purge) of all the metadata migrations.
    The default location of the database is here:

    /opt/wandisco/hivemigrator/hivemigrator.db.mv.db
  • Reset all metadata migrations. You can do this through the Swagger-based REST API documentation for metadata migrations with the /migration/reset/all command. This command purges the Hive Migrator database and clears the statistics and checksums for all migrations.

    The API call for doing metadata migration resets:

    curl -X 'POST' \
    'http://myldmhost.exampleurl.com:6780/migration/reset/all' \
    -H 'accept: application/json' \
    -H 'Content-Type: application/json' \
    -d '{
    "forceStop": true
    }'

    A successful reset will produce the following output, including a "Success." for each migration:

    [
    {
    "migrationName": "MetaMigration1",
    "status": "OK",
    "errorCode": 0,
    "message": "Success."
    },
    {
    "migrationName": "MetaMigration2",
    "status": "OK",
    "errorCode": 0,
    "message": "Success."
    }
    ]
  • Ensure that the Hive Migrator service is not interrupted when it is first started after the upgrade.

Update Hive Migrator database

Data Migrator includes a script that performs a safe database schema update. This script runs automatically during installations or upgrades using RPM or Debian. No additional actions are required.

Manual database upgrade

danger

Only perform a manual database upgrade if instructed to do so by support

If the automatic database update is interrupted or fails for any reason contact support for assistance. If instructed to do so, you can manually perform the database upgrade using the following script.

Hive Migrator database upgrade script:

Hive Migrator database upgrade script.
/opt/wandisco/hivemigrator/bin/hivemigrator-db-upgrade.sh

Running the upgrade script performs the following:

  • Creates a temporary directory /opt/wandisco/hivemigrator/hvm-db-upgrade-tmp. You can change its location.

  • Copies the H2 database defined in /etc/wandisco/hivemigrator/application.properties to the temporary directory.

    The default entry in application.properties is:
    # H2 database location
    hivemigrator.storagePath=/opt/wandisco/hivemigrator/hivemigrator.db
  • If an old H2 driver is present:

    • Detects agent databases placed in /opt/wandisco/hivemigrator/agent/.
    • Copies the agent database and runs H2 version transition for each agent database copy.
    • Overwrites the existing agent database with the copy if the version transition was successful.
    • Applies any missing schema updates up to version 1.14 to the main database.
    • Runs H2 version transition and deletes the old H2 driver.
  • Applies the new schema to the database copy.

  • Overwrites the existing database with the copy if the schema update was successful.

  • Deletes the temporary directory.

Change the temporary database location

The script creates a temporary directory in the same folder as the existing database. To select a different temporary directory, use this command before running the script:

export CUSTOM_TMP_DIR="<Full-Path-To-Different-Directory>"

Obtain a new installer and upgrade Data Migrator

To upgrade to the latest version of Data Migrator, download and run a new Data Migrator installer in the same way you do to install for the first time.

Upgrading to a newer version won't affect your filesystems or migrations. Any migrations that are in progress simply continue transferring data as normal.

note

You can check the component versions of your current installation by running the command livedata-migrator --version on your Data Migrator host machine.

info

The hivemigrator-azure-hdi.noarch package is no longer included in versions after Data Migrator 1.18 and isn't automatically removed during upgrade. If you have upgraded from 1.18 or lower, remove the package manually using your package manager.

System and custom users for upgrades

If you want to run the installer using a default user, run the following command:

./livedata-migrator.sh
Alternative /tmp directory

The Data Migrator installer extracts its contents to a temporary directory and decompresses them. By default, the temporary directory is a sub-directory of /tmp.

In some situations, extracting and decompressing in the default temporary directory fails. For example, if there is not enough disk space remaining, or if /tmp is mounted as noexec.

To avoid these issues, extract the contents to a different temporary directory by adding the --target option when you run the installer:

Example
./livedata-1.21.0-4-full_rpm_installer.sh --target /opt/wandisco/alternate_tmp_dir

Do not use /opt/wandisco/tmp as the value for --target or the installation will fail.

You can delete your temporary directory and its contents after installation.

The default system user for the Data Migrator and the UI services is hdfs, and the default system user for the Hive Migrator service is hive.

If you want to upgrade the product using a custom user and custom user group, run the following commands:

Thin installer
./livedata-migrator.sh --user <custom user> --group <custom group>
Fat installer
./livedata-migrator.sh -- --user <custom user> --group <custom group>

This sets the custom user and custom user group for all services and their respective directories.

For more information about configuring custom users, go to Configure system users.

If you don’t enter a custom user and group, then the pre-existing user and group are used from the following files:

  • /opt/wandisco/hivemigrator/vars.sh
  • /opt/wandisco/livedata-migrator/vars.env
  • /opt/wandisco/ui/vars.env

If any of these files don’t exist, the default user for that component is used instead.

Upgrade a Hive Migrator remote agent

Use the following steps to upgrade a Hive Migrator remote agent:

  1. Run the hive agent show command and copy the installationCommand value.
  2. Upload the new hivemigrator-remote-server-installer.sh file to the remote host.
    note

    You can find the hivemigrator-remote-server-installer.sh file under /opt/wandisco/hivemigrator.

  3. Make the installer executable:
    chmod +x hivemigrator-remote-server-installer.sh
  4. Run the installation command copied in step 1:
    Example
    ./hivemigrator-remote-server-installer.sh -- --silent --config 25ma-example-string-AbCdEfGhIjKADogCJpemxlbj==
  5. Restart the hivemigrator-remote-server service:
    systemctl restart hivemigrator-remote-server
  6. Check the remote agent is healthy using the hive agent check command.

Install components using RPM/DEB

If you're installing our product components individually using RPM/DEB, you can enter a custom user or group by adding a properties file with the custom user and group.

Example

/opt/wandisco/tmp/ldm.properties:
​​
USERNAME="custom"
GROUPNAME="custom"

/opt/wandisco/tmp/ui.properties:

USERNAME="custom"
GROUPNAME="custom"

/opt/wandisco/tmp/hvm.properties:

HIVE_MIGRATOR_SERVER_USER="custom"
HIVE_MIGRATOR_SERVER_GROUP="custom"

When you install using RPM/DEB, the properties file containing the custom user names and group names are used, and set the user and group of the service and its respective directories.

If you upgrade a single component without using a properties file, then the RPM/DEB checks for the pre-existing user and group in /opt/wandisco/hivemigrator/vars.sh, /opt/wandisco/livedata-migrator/vars.env, and /opt/wandisco/ui/vars.env. If any of these files don't exist, the installer uses the default user for that component.

note

This applies to the hivemigrator-remote-server installer.

If you don't enter a custom user or group to the installer when you upgrade, the existing vars.env/vars.sh for each component of the product is retained, and existing property values are inserted into the new vars.env/vars.sh provided by the component packaging.

We don't currently retain previous custom properties when you upgrade with a custom user or group.

Next steps

Continue migrating data as before. Learn how to get started.