Upgrade Data Migrator
We recommend you regularly upgrade Data Migrator so you can take advantage of new functionality and other improvements. To upgrade, run through the prerequisites covered below and then run a newer version of the Data Migrator installer. The installer upgrades your Data Migrator instance to the new version.
If your existing deployment uses Lightweight Directory Access Protocol (LDAP/Active Directory) to manage user access, take note of the following known issue.
Before you upgrade
Read through the following section before you begin a product upgrade.
When you upgrade, you'll probably need to make some configuration changes.
Files in the /etc/wandisco
directory contain custom configuration changes. The files generally used for configuration are:
/etc/wandisco/livedata-migrator/application.properties
/etc/wandisco/livedata-migrator/vars.env
/etc/wandisco/ui/application-prod.properties
/etc/wandisco/hivemigrator/application.properties
Existing configuration
New versions can introduce additional configuration properties and improved default values. Compare your existing configuration and apply any new properties or applicable values. Check the release notes for changes to these files and make any changes before you restart services.
vars.env
When the RPM is removed or an upgraded RPM is applied,
/etc/wandisco/livedata-migrator/vars.env
is saved to /etc/wandisco/livedata-migrator/vars.env.rpmsave
.
Original values are preserved in /etc/wandisco/livedata-migrator/vars.env
.
application.properties
If modified, /etc/wandisco/livedata-migrator/application.properties
is preserved when an RPM upgrade is applied.
The latest configuration is saved to the same folder with a .rpmnew
extension for RPM-based installations or a .dpkg-dist
extension for Debian-based installations.
If you’re on a Debian-based system, you may be prompted to keep an old config file or use the new one from the installer. This can happen if there have been changes to the shipped configuration file, as well as changes to your local configuration.
In most cases, it is recommended to keep your current configuration, and introduce any new properties as required. For /etc/wandisco/ui/application-prod.properties
, it is essential to keep the existing configuration to ensure the UI starts.
See Debian automatic handling of configuration files for more information.
Hotfix patch
Newer releases can include previously issued hotfixes. If it's included in your latest version, and not required, fully remove the hotfix patch from your deployment.
If you've deployed a hotfix on your current version, see Hotfix patch removal important information to confirm if it's still required for your latest upgraded version.
Currently, when you upgrade, if Hive Migrator starts before LiveData Migrator, metadata migrations show as stopped. Hive Migrator tries to check the Data Migrator license and fails to connect. The lack of connection leads to stopped metadata migrations as Hive Migrator assumes the license is invalid.
Resume stopped metadata migrations
- To resume individual stopped metadata migrations, go to the Metadata Migrations panel on the Overview page and filter by Stopped.
To resume multiple failed metadata migrations, go to Metadata Migrations and under Bulk Actions, select Resume.
See Bulk actions.
Upgrading from 1.15.1 or earlier
If you're currently running LiveData Migrator 1.15.1 or earlier, you must first upgrade to LiveData Migrator 1.16 before upgrading to the latest version. Use the following installation steps. Before you start, read the 1.16 Release Notes.
Upgrading from 1.16 or later
Read through the following upgrade notices before starting your upgrade to the latest version:
If upgrading from Data Migrator 2.0 to 2.2 and currently using Data Transfer Agents, additional steps are required to start the DTA service. See the Known Issue for steps and more info.
Upgrading to Data Migrator 1.21 if using a Databricks agent Data Migrator 1.21 doesn't support Databricks JDBC driver version 2.6.22 or earlier. Upgrade to JDBC driver version 2.6.25 or higher to continue using Databricks agents with Data Migrator.
Upgrading to Data Migrator 1.20 or later if using remote agents
If your current deployment uses remote agents, you must complete additional steps before proceeding with the upgrade. See the following knowledge base article - known issue.
Configuration files stay the same after upgrading, but configuration files from the new version are also added into the same folder on an RPM installation. These new configuration files have the extension .rpmsave
, and are ignored by Data Migrator by default. You may compare them and copy changes across accordingly, or use the new files.
The upgrade automatically overwrites shell scripts (such as start.sh
) with the newer versions.
Don't change the encrypted database password for the UI in application-prod.properties
. If you change the key, WANdisco UI won't start. If you're on a Debian-based system, you're prompted to decide whether to keep the old application-prod.properties
file or use the new one from the installer. To ensure the UI starts, choose to keep the existing file.
Upgrading to Data Migrator 1.21 - Critical steps
Data Migrator 1.20 changes Hive Migrator user configuration. If upgrading to 1.21 and authenticating with Hive through a Kerberos principal that doesn't map to the hive
user, ensure there's a valid proxyuser setting in core-site.xml
. Otherwise, metadata migrations will fail. See the related known issue for more information.
Upgrading to/through Data Migrator 1.19 - Critical steps for Hive Migrator
This issue applies to any pre-1.19 version, upgrading to any later version.
For example: 1.18 to 1.20 or 1.18 to 1.21.
Large Hive Migrator databases may take up to 30 minutes to optimize. This process is automatic and occurs when you first start Hive Migrator after upgrading Data Migrator 1.19. If the Hive Migrator service is interrupted during this optimization, it may irreversibly corrupt the database.
We strongly recommend that you:
Back up the Hive Migrator database before you run a reset (purge) of all the metadata migrations.
The default location of the database is here:/opt/wandisco/hivemigrator/hivemigrator.db.mv.db
Reset all metadata migrations. You can do this through the Swagger-based REST API documentation for metadata migrations with the
/migration/reset/all
command. This command purges the Hive Migrator database and clears the statistics and checksums for all migrations.The API call for doing metadata migration resets:
curl -X 'POST' \
'http://myldmhost.exampleurl.com:6780/migration/reset/all' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"forceStop": true
}'A successful reset will produce the following output, including a
"Success."
for each migration:[
{
"migrationName": "MetaMigration1",
"status": "OK",
"errorCode": 0,
"message": "Success."
},
{
"migrationName": "MetaMigration2",
"status": "OK",
"errorCode": 0,
"message": "Success."
}
]Ensure that the Hive Migrator service is not interrupted when it is first started after the upgrade.
Update Hive Migrator database
Data Migrator includes a script that performs a safe database schema update. This script runs automatically during installations or upgrades using RPM or Debian. No additional actions are required.
Manual database upgrade
Only perform a manual database upgrade if instructed to do so by support
If the automatic database update is interrupted or fails for any reason contact support for assistance. If instructed to do so, you can manually perform the database upgrade using the following script.
Hive Migrator database upgrade script:
/opt/wandisco/hivemigrator/bin/hivemigrator-db-upgrade.sh
Running the upgrade script performs the following:
Creates a temporary directory
/opt/wandisco/hivemigrator/hvm-db-upgrade-tmp
. You can change its location.Copies the H2 database defined in
/etc/wandisco/hivemigrator/application.properties
to the temporary directory.The default entry in application.properties is:# H2 database location
hivemigrator.storagePath=/opt/wandisco/hivemigrator/hivemigrator.dbIf an old H2 driver is present:
- Detects agent databases placed in
/opt/wandisco/hivemigrator/agent/
. - Copies the agent database and runs H2 version transition for each agent database copy.
- Overwrites the existing agent database with the copy if the version transition was successful.
- Applies any missing schema updates up to version 1.14 to the main database.
- Runs H2 version transition and deletes the old H2 driver.
- Detects agent databases placed in
Applies the new schema to the database copy.
Overwrites the existing database with the copy if the schema update was successful.
Deletes the temporary directory.
Change the temporary database location
The script creates a temporary directory in the same folder as the existing database. To select a different temporary directory, use this command before running the script:
export CUSTOM_TMP_DIR="<Full-Path-To-Different-Directory>"
Obtain a new installer and upgrade Data Migrator
To upgrade to the latest version of Data Migrator, download and run a new Data Migrator installer in the same way you do to install for the first time.
Upgrading to a newer version won't affect your filesystems or migrations. Any migrations that are in progress simply continue transferring data as normal.
You can check the component versions of your current installation by running the command livedata-migrator --version
on your Data Migrator host machine.
The hivemigrator-azure-hdi.noarch
package is no longer included in versions after Data Migrator 1.18 and isn't automatically removed during upgrade. If you have upgraded from 1.18 or lower, remove the package manually using your package manager.
System and custom users for upgrades
If you want to run the installer using a default user, run the following command:
./livedata-migrator.sh
The Data Migrator installer extracts its contents to a temporary directory and decompresses them. By default, the temporary directory is a sub-directory of /tmp
.
In some situations, extracting and decompressing in the default temporary directory fails. For example, if there is not enough disk space remaining, or if /tmp
is mounted as noexec
.
To avoid these issues, extract the contents to a different temporary directory by adding the --target
option when you run the installer:
./livedata-1.21.0-4-full_rpm_installer.sh --target /opt/wandisco/alternate_tmp_dir
Do not use /opt/wandisco/tmp
as the value for --target
or the installation will fail.
You can delete your temporary directory and its contents after installation.
The default system user for the Data Migrator and the WANdisco UI services is hdfs
, and the default system user for the Hive Migrator service is hive
.
If you want to upgrade the product using a custom user and custom user group, run the following commands:
./livedata-migrator.sh --user <custom user> --group <custom group>
./livedata-migrator.sh -- --user <custom user> --group <custom group>
This sets the custom user and custom user group for all services and their respective directories.
For more information about configuring custom users, go to Configure system users.
If you don’t enter a custom user and group, then the pre-existing user and group are used from the following files:
/opt/wandisco/hivemigrator/vars.sh
/opt/wandisco/livedata-migrator/vars.env
/opt/wandisco/ui/vars.env
If any of these files don’t exist, the default user for that component is used instead.
Upgrade a Hive Migrator remote agent
Use the following steps to upgrade a Hive Migrator remote agent:
- Run the
hive agent show
command and copy theinstallationCommand
value. - Upload the new
hivemigrator-remote-server-installer.sh
file to the remote host.noteYou can find the
hivemigrator-remote-server-installer.sh
file under/opt/wandisco/hivemigrator
. - Make the installer executable:
chmod +x hivemigrator-remote-server-installer.sh
- Run the installation command copied in step 1:Example
./hivemigrator-remote-server-installer.sh -- --silent --config 25ma-example-string-AbCdEfGhIjKADogCJpemxlbj==
- Restart the
hivemigrator-remote-server
service:systemctl restart hivemigrator-remote-server
- Check the remote agent is healthy using the
hive agent check
command.
Install components using RPM/DEB
If you're installing our product components individually using RPM/DEB, you can enter a custom user or group by adding a properties file with the custom user and group.
Example
/opt/wandisco/tmp/ldm.properties:
USERNAME="custom"
GROUPNAME="custom"
/opt/wandisco/tmp/ui.properties:
USERNAME="custom"
GROUPNAME="custom"
/opt/wandisco/tmp/hvm.properties:
HIVE_MIGRATOR_SERVER_USER="custom"
HIVE_MIGRATOR_SERVER_GROUP="custom"
When you install using RPM/DEB, the properties file containing the custom user names and group names are used, and set the user and group of the service and its respective directories.
If you upgrade a single component without using a properties file, then the RPM/DEB checks for the pre-existing user and group in /opt/wandisco/hivemigrator/vars.sh
, /opt/wandisco/livedata-migrator/vars.env
, and /opt/wandisco/ui/vars.env
. If any of these files don't exist, the installer uses the default user for that component.
This applies to the hivemigrator-remote-server
installer.
If you don't enter a custom user or group to the installer when you upgrade, the existing vars.env
/vars.sh
for each component of the product is retained, and existing property values are inserted into the new vars.env/vars.sh
provided by the component packaging.
We don't currently retain previous custom properties when you upgrade with a custom user or group.
Next steps
Continue migrating data as before. Learn how to get started.