Configure backup and restore
Data Migrator's backup and restore feature creates a snapshot of application settings and configuration files so that you can quickly restore a Data Migrator instance to an earlier state.
A restored Data Migrator instance recreates all migrations on the same paths as before without attempting to reconcile with earlier completed migrations. The following cases also apply:
- Running with "Overwrite": Source data will be transferred again and overwrite the data on target.
- Running with "Skip if size match": Source data won't be transferred again, as long as their size on the source and target match.
You can back up and restore the application state for data and metadata migrations using the REST API, Data Migrator command line interface (CLI), or User Interface (UI).
Limitations
You can't restore backups with source filesystems containing properties that are dependent on another filesystem.
For example, an S3 source with its credentials stored on a HDFS target with a JCEKS Keystore will fail to be added during a restore due to the absence of the HDFS target, and the restore will fail. Before performing a backup, adjust the S3 authentication method to an option other than the JCEKS.
Backup
The following details show you what data is covered in a backup and what commands are available to manage your backups.
Here's what a Data Migrator backup includes
The backup archive files are stored in /opt/wandisco/livedata-migrator/db/backups
by default. Once a backup file has been unzipped, the relevant files are stored in:
- Configs
/opt/wandisco/livedata-migrator/db/backups/configs/etc/wandisco/livedata-migrator
- Objects
/opt/wandisco/livedata-migrator/db/backups/objects
Backed-up function | Object/configuration (see above paths) | Description |
---|---|---|
Application properties | configs/etc/wandisco/livedata-migrator/application.properties | Application configuration. See Configure Data Migrator. |
Bandwidth settings | objects/BandwidthPolicy.json | Settings that limit Data Migrator's use of available network bandwidth. See Manage your bandwidth limit. |
Additional configuration properties | objects/ConfigurationPropertiesWrapper.json | Requeue and max migration configuration. |
Data transfer agents | objects/DataAgents.json | Settings that define data transfer agents. Data Migrator attempts to register agents again with the information provided from the backup. See Register an agent. |
Email registration | objects/EmailRegistrations.json | Email address and type. |
Environmental configuration | configs/etc/wandisco/livedata-migrator/vars.env | Environmental variables stored in vars.env . |
Exclusions | objects/RegexExclusions.json objects/FileSizeExclusions.json objects/DateExclusions.json | Settings for file and directory patterns that you want to exclude from migrations. See Configure exclusions. |
Logging configuration | configs/etc/wandisco/livedata-migrator/logback-spring.xml | Logging variables stored in logback-spring.xml . |
Migrations | objects/Migrations.json | Settings that define data migrations. See Create a migration. |
Path mapping | objects/PathMappings.json | Settings that create alternate paths for specific target filesystems. See Create path mappings. |
Secure keys for filesystem access | Optional configuration files such as /etc/wandisco/livedata-migrator/application.properties | Secret configuration entries are masked using the logging property obfuscate.json.properties . See Masking secret properties. |
Schedule of backups | objects/ScheduleConfig.json | Backup schedule configuration. |
Smtp configuration | objects/SmtpConfigurations.json | SMTP settings |
Source configuration | objects/FileSystemConfigurations.json | Settings that define the source filesystem. See Configure source filesystem. |
Targets | objects/FileSystemConfigurations.json | Settings that define target filesystems. See Configure target filesystems. |
Backed-up configuration files are not restored automatically. The steps for manually restoring these files are listed in Manually restore configuration files.
Here's what a metadata backup includes
A metadata backup file can include the following objects and configurations:
Backed-up function | Object/configuration (path in backup file) | Description |
---|---|---|
Agent configuration | objects/AgentConfigs.json | Configuration for Hive Migrator agents. |
Application properties | configs/etc/wandisco/hivemigrator/application.properties | Application configuration for Hive Migrator. See Configure Hive Migrator. |
Backup schedule | objects/BackupSchedule.json | The schedule configuration for metadata backups. |
Environmental configuration | configs/etc/wandisco/hivemigrator/vars.sh | Application environment variables. |
Instance ID | configs/etc/wandisco/hivemigrator/instanceId | An identifier for the Hive Migrator instance. |
Logging | configs/etc/wandisco/hivemigrator/log4j2.yaml | Logging configuration. |
Migrations | objects/Migrations.json | Hive Migrator migrations. |
Replication rules | objects/ReplicationRules.json | Hive Migrator DB and table replication patterns. |
State information | objects/Conditions.json | Application state configuration. For example, this flags if the source agent was auto-discovered. |
Here's what a Data Migrator backup doesn't include
Currently, the following objects and configurations aren't included in a backup:
Object/configuration | Description |
---|---|
Certificates | Encryption keys and certificates are not included. |
LDAP/Access control settings | LDAP/Access Control settings are backed-up but are not automatically applied after an instance is restored. The feature must be reenabled manually. See Manage user access using LDAP. |
API Access Control LDAP config | The API Access Control LDAP config file /etc/wandisco/livedata-migrator/ldap.properties is not included in the backups. Manually copy the file then move it to the new instance in the same path and ensure it is owned by the same user that Data Migrator service is running |
License file | Product license files are not included. These must be manually restored. |
To manually add files to a backup, see Add extra files to a backup.
Backup configuration (data migrations)
The following data migration backup configuration parameters are stored in /etc/wandisco/livedata-migrator/application.properties
.
Parameter | Description | Default | Recommendation |
---|---|---|---|
backups.listMaxSize | The maximum number of backup entries returned by the REST API GET command. | 1000 | Same as default |
backups.namePrefix | The prefix added to generated backup files. | lm2backup | Same as default |
backups.location | The file path where backup files are stored. | ${install.dir}db/backups (fresh installation) ${install.dir}db/backupDir (upgrade from an earlier version) | Same as default |
backups.filePaths[N] | Provide a path to a file that you want to include in a backup. Change the [N] into an integer. You can add multiple file paths by repeating the entry with incremental numbering. For example, backups.filePaths[0] ,backups.filePaths[1] , backups.filePaths[2] . | Commented out by default | Same as default |
To apply any changes, restart the Data Migrator service.
Backup configuration (metadata migrations)
The following metadata migration backup configuration parameters are stored in /etc/wandisco/hive/application.properties
.
Parameter | Description | Default | Recommendation |
---|---|---|---|
hivemigrator.backup.location | The file path where Hive Migrator backup files are stored. | /opt/wandisco/hivemigrator/backups | Same as default |
hivemigrator.backup.listMaxSize | The maximum number of backup entries returned by the REST API GET command. | 1000 | Same as default |
backups.namePrefix | The prefix added to generated Hive Migrator backup files. | hvmbackup | Same as default |
To apply any changes, restart the Hive Migrator service.
Masking secret properties
Sensitive or secret information stored in backup files is made unreadable using the property obfuscate.json.properties
, located in /etc/wandisco/livedata-migrator/application.properties
. The default value includes the following list of filesystem-based parameters:
${hdfs.fs.type.masked.properties},${adls2.fs.type.masked.properties},
${s3a.fs.type.masked.properties},${gcs.fs.type.masked.properties}
Secret properties for data transfer agents are stored in the backup file:
agent.secret.properties=clientSecret,clientCertKey
Each parameter lists multiple JSON request property values. These values are masked (substituted for random characters) to anyone who views the file:
Filesystem mask parameters | Description | Default |
---|---|---|
${agent.secret.properties} | Properties for data transfer agents to communicate between the Data Migrator server and the agent. | clientSecret,clientCertKey |
hdfs.example.secretKey1 ,hdfs.example.secretKey2 | ${hdfs.fs.type.masked.properties} | |
${adls2.fs.type.masked.properties} | Azure Data Lake Storage Gen2 filesystem properties to be masked. | fs.secret.Key,sharedKey ,fs.oauth2.client.secret ,oauthClientSecret |
${s3a.fs.type.masked.properties} | Amazon S3a filesystem properties to be masked. | fs.s3a.access.key ,fs.s3a.secret.key ,secretKey,accessKey |
${gcs.fs.type.masked.properties} | Google Cloud service properties to be masked. | fs.gs.auth.service.account.private.key.id ,fs.gs.auth.service.account.private.key ,privateKey,privateKeyId ,jsonKeyFile ,p12KeyFile |
Review your configuration files
Don't assume that the default masking will cover all sensitive properties. Review your configuration files, adding additional masked.properties
parameters as required.
- For new installations, backup files are stored in
/opt/wandisco/livedata-migrator/db/backups
. For upgrades, the previous default location,/opt/wandisco/livedata-migrator/db/backupDir
may be used. To change this location, set a different path using thebackups.location
parameter. See Backup Configuration. - Backup files have the following filename pattern:For example:
lm2backup-DateTime-mig(MigrationNumber)
lm2backup-20220711135000.8420-mig7.zip
.
Add extra files to a backup
To add extra files to a backup, use the following steps:
- Open
/etc/wandisco/livedata-migrator/application.properties
in a text editor. - Add a
backups.filePaths[N]
parameter for each file, with the file's path. Each parameter name needs to be unique, so ensure that the bracketed [N] is changed to an integer and incremented for each copy of the parameter. For example:backups.filePaths[0]=/file-to-be-backed-up/file1
backups.filePaths[1]=/file-to-be-backed-up/file2
backups.filePaths[2]=/file-to-be-backed-up/file3 - Save the file.
- Restart Data Migrator. See System service commands - Data Migrator.
Inspect the contents of a data backup file
Use the following commands to check the contents of a backup file:
cd /opt/wandisco/livedata-migrator/db/backups
ls -l
total 64
-r-------- 1 hdfs hdfs 6751 Jul 5 12:19 lm2backup-20220705121918.8780-mig0.zip
-r-------- 1 hdfs hdfs 6751 Jul 5 12:34 lm2backup-20220705123406.3700-mig3.zip
-r-------- 1 hdfs hdfs 6751 Jul 5 12:34 lm2backup-20220705123418.9220-mig3.zip
-r-------- 1 hdfs hdfs 6753 Jul 7 07:17 lm2backup-20220707071729.9360-mig7.zip
-r-------- 1 hdfs hdfs 5334 Jul 11 12:49 lm2backup-20220711124912.2670-mig9.zip
-r-------- 1 hdfs hdfs 5334 Jul 11 13:13 lm2backup-20220711131301.5990-mig9.zip
-r-------- 1 hdfs hdfs 6753 Jul 11 13:48 lm2backup-20220711134845.7880-mig9.zip
-r-------- 1 hdfs hdfs 6753 Jul 11 13:50 lm2backup-20220711135000.8420-mig9.zip
Select a backup file that you want to inspect and run the following unzip command:
unzip lm2backup-20220711135000.8420-mig9.zip
inflating: objects/BandwidthPolicy.json
inflating: objects/EmailRegistrations.json
inflating: objects/PathMappings.json
inflating: objects/FileSystemConfiguration.json
inflating: objects/Migrations.json
inflating: objects/RegexExclusions.json
inflating: objects/ScheduleConfig.json
inflating: objects/DateExclusions.json
inflating: objects/FileSizeExclusions.json
inflating: configs/etc/wandisco/livedata-migrator/application.properties
inflating: configs/etc/wandisco/livedata-migrator/vars.env
inflating: configs/etc/wandisco/livedata-migrator/logback-spring.xml
Files not present in their default locations aren't backed up. Missing files don't trigger a notification or log error.
Schedule backups
Data Migrator supports an automatic scheduled backup but it's not enabled by default.
A backup operation is completed when a backup schedule is first enabled or is later updated. Then backups are created according to the schedule period parameter.
Example:
You enable a backup schedule and set it to 600 (minutes). Data Migrator immediately creates a backup, then creates another backup every 600 minutes. If you change the schedule to 60, Data Migrator immediately creates another backup and then creates backups every 60 minutes.
Inspect the contents of a metadata backup file
Use the following commands to check the contents of a metadata backup file:
cd /opt/wandisco/hivemigrator/backups
ls -l
total 308
-r-------- 1 hive hadoop 3468 Aug 17 15:21 hvmbackup-20220817152118.7840-mig0.zip
-r-------- 1 hive hadoop 3468 Aug 17 15:28 hvmbackup-20220817152818.7820-mig0.zip
-r-------- 1 hive hadoop 3468 Aug 17 15:35 hvmbackup-20220817153518.7800-mig0.zip
-r-------- 1 hive hadoop 3468 Aug 17 15:42 hvmbackup-20220817154218.7790-mig0.zip
-r-------- 1 hive hadoop 3468 Aug 17 15:45 hvmbackup-20220817154527.1330-mig0.zip
-r-------- 1 hive hadoop 3468 Aug 17 15:49 hvmbackup-20220817154918.7800-mig0.zip
-r-------- 1 hive hadoop 3468 Aug 17 15:56 hvmbackup-20220817155618.7790-mig0.zip
Select a backup file that you want to inspect and run the following unzip command:
unzip hvmbackup-20220817155618.7790-mig0.zip
inflating: objects/BackupSchedule.json
inflating: objects/AgentConfigs.json
inflating: objects/Conditions.json
inflating: configs/etc/wandisco/hivemigrator/application.properties
inflating: configs/etc/wandisco/hivemigrator/instanceId
inflating: configs/etc/wandisco/hivemigrator/vars.sh
inflating: configs/etc/wandisco/hivemigrator/log4j2.yaml
Files not present in their default locations aren't backed up. Missing files don't trigger a notification or log error.
Backup commands
The backup and restore feature can be managed through the UI, Rest API, and Data Migrator's CLI. Select where you wish to manage backups:
UI commands
Manage backups and restore from backup using the UI. The available commands are described below:
Configure backup and restore operations for data migration and metadata migrations from the Configuration > Backup and restore UI section.
Create a backup schedule (data migrations)
- Sign in to the UI.
- Under Instances, select the Data Migrator instance name. The default name is LDM - localhost.
- Select Backup and restore from the Configuration links on the side menu.
- Select the Schedule data backups checkbox to create a scheduled backup.
- [Optional] Enter a backup frequency in minutes. The default is 60.
- Select Apply schedule. You'll get a notification that the schedule was applied.
Create a backup schedule (metadata migrations)
- Sign in to the UI.
- Under Instances, select the Data Migrator instance name. The default name is LDM - localhost.
- Select Backup and restore from the Configuration links on the side menu.
- Select the Schedule metadata backups checkbox to create a scheduled metadata backup.
- [Optional] Enter a backup frequency in minutes. The default is 60.
- Select Apply schedule. You'll get a notification that the schedule was applied.
Create immediate backups
Select Back up now. This option ignores any schedule settings and creates immediate data and metadata backups. You can verify that the backup files are created by checking the UI Notifications screen or selecting Restore from backup, which contains a complete list, searchable with a date range.
API commands
Data migrations
You can use the REST API to handle backup and restore operations with scripted automation. To make manual API calls, you can also use the web interface of the swagger-based REST API documentation.
The REST API commands use the following endpoint:
http://<ldm-hostname>:18080/backups/
Create a backup
Use the following command to create a backup file and store it in the backups directory:
curl -X POST "http://127.0.0.1:18080/backups"
{
"createdAt" : 1657547400842,
"size" : 16753,
"migrationsCount" : 7,
"backupName" : "lm2backup-20220711135000.8420-mig7.zip"
}
List backup files
Use the following command to list the backup files that have already been created:
curl -X GET "http://127.0.0.1:18080/backups"
Create a backup schedule
Use the following command to create a backup schedule:
curl -X PUT "http://127.0.0.1:18080/backups/config/schedule/" -H 'Content-Type:application/json' -d '{"enabled": true, "periodMinutes": 480}'
{
"enabled" : true,
"periodMinutes" : 480
}
Review existing schedule configuration
Use the following command to verify that a schedule is enabled:
curl -X GET "http://127.0.0.1:18080/backups/config/schedule/"
{
"enabled" : true,
"periodMinutes" : 480
}
Metadata migrations
You can use the REST API to handle backup and restore operations with scripted automation. To make manual API calls, you can also use the web interface of the swagger-based REST API documentation.
The commands for the Hive Migrator REST API use the following endpoint:
http://<ldm-hostname>:6780/docs
Create a metadata backup
Use the following command to create a metadata backup file and store it in the backups directory:
curl -X POST "http://127.0.0.1:6780/backups"
{
"createdAt": 1660751127133,
"size": 3468,
"migrationsCount": 2,
"backupName": "hvmbackup-20220817154527.1330-mig2.zip"
}
- Backup files are stored in
/opt/wandisco/hivemigrator/backups
. To change this location, set a different path using thehivemigrator.backups.location
parameter. See Backup Configuration. - Backup files have the following filename pattern:For example:
hvmbackup-DateTime-mig(MigrationNumber)
hvmbackup-20220817154527.1330-mig2.zip
.
List metadata backup files
Use the following command to list the metadata backup files that have already been created.
curl -X GET "http://127.0.0.1:6780/backups"
Get backup details
Use the following command to view the details of a specified metadata backup:
curl -X 'GET' \
'http://127.0.0.1:6780/backups/hvmbackup-20220817155618.7790-mig0.zip' \
-H 'accept: application/json'
{
"createdAt": 1660751778779,
"size": 3468,
"migrationsCount": 0,
"backupName": "hvmbackup-20220817155618.7790-mig0.zip"
}
Schedule backups
Data Migrator supports an automatic scheduled metadata backup but it's not enabled by default.
A backup operation is completed when a metadata backup schedule is first enabled or is later updated. Then backups are created according to the schedule period parameter.
Create a backup schedule
Use the following command to create a backup schedule:
curl -X PUT "http://127.0.0.1:6780/backups/schedule" -H 'Content-Type:application/json' -d '{"enabled": true, "periodMinutes": 14}'
{
"enabled": true,
"periodMinutes": 14
}
Review existing schedule configuration
Use the following command to verify that a schedule is enabled:
curl -X GET "http://127.0.0.1:6780/backups/schedule/"
{
"enabled" : true,
"periodMinutes" : 14
}
CLI commands
Available backup and restore commands are listed in the command reference page with the other Data Migrator CLI commands:
Data backup commands
Metadata backup commands
- hive backup add
- hive backup config show
- hive backup list
- hive backup schedule configure
- hive backup schedule show
- hive backup show
Restore from backup
Use the restore function to return Data Migrator and Hive Migrator to an earlier state, as recorded in a stored backup files. The restore command is often run on a reinstalled instance with no existing state. To restore to an existing Data Migrator instance, use the following steps:
Delete the Data Migrator default database
These steps remove current Data Migrator settings such as migrations, path mappings, and exclusions.
Open a terminal on the Data Migrator instance.
Switch to the root user or use
sudo -i
.Navigate to the database directory:
cd /opt/wandisco/livedata-migrator/db/
Delete the instance's default database directory:
rm -r default-db
Restart Data Migrator to initialize the empty database. See Data Migrator service commands.
Delete Hive Migrator default database
These steps remove current Hive Migrator settings.
- Open a terminal on the Data Migrator instance.
- Switch to the root user or use
sudo -i
. - Navigate to the database directory:
cd /opt/wandisco/hivemigrator/
- Delete the instance's default Hive Migrator database file:
rm hivemigrator.db.mv.db
- Delete the Hive Migrator configuration backups:
rm /etc/wandisco/hivemigrator/agents.yaml.bck
rm /etc/wandisco/hivemigrator/hive-migrator.yaml.bck - Restart the Hive Migrator service. See Hive Migrator service commands.
If running in a Hadoop environment without Kerberos authentication, Data Migrator will attempt to auto-discover and add HDFS and Hive sources upon restart. Manually remove any auto-discovered sources before restoring from backup."
Manually restore configuration files
These steps must be completed using the command line:
Unzip the data or metadata backup file to retrieve the backed-up configuration files. See Inspect the contents of a backup file.
The following configuration files are backed up by default:
Data backup6122 07-11-2022 13:50 configs/etc/wandisco/livedata-migrator/application.properties
377 07-11-2022 13:50 configs/etc/wandisco/livedata-migrator/vars.env
11914 07-11-2022 13:50 configs/etc/wandisco/livedata-migrator/logback-spring.xmlMetadata backup1752 08-17-2022 15:56 configs/etc/wandisco/hivemigrator/application.properties
13 08-17-2022 15:56 configs/etc/wandisco/hivemigrator/instanceId
697 08-17-2022 15:56 configs/etc/wandisco/hivemigrator/vars.sh
4239 08-17-2022 15:56 configs/etc/wandisco/hivemigrator/log4j2.yamlRename the current configuration files.
Example rename, apply to all backed-up configuration filescd /etc/wandisco/livedata-migrator/
mv application.properties application.properties.replacedMove the backed-up config files into their correct location.
Example move, apply to all backed-up configuration filesmv /backup/files/location/application.properties /etc/wandisco/livedata-migrator/application.properties
Restart services. See System service commands.
Restore commands
The following details show how backup files are used to restore Data Migrator and Hive Migrator to an earlier state.
UI commands
- Sign in to the UI.
- Under Instances, select the Data Migrator instance name. The default name is LDM - localhost.
- Select Backup and Restore from the Configuration section of the menu.
- Select Restore from backup.
- Select the three dot button for the Data or Metadata backup file from which to restore your instance.
- Select Restore.
- Check the notifications for confirmation that the backup restored successfully.
API commands
Data restore command
Restore Data Migrator from a backup by using the following curl command:
curl -X POST "http://127.0.0.1:18080/backups/restore/<backup-file-name>.zip"
{
"createdAt" : 1657024458922,
"size" : 26751,
"migrationsCount" : 0,
"backupName" : "lm2backup-20220705123418.9220-mig0.zip"
}
The following notification will appear in the UI:
Metadata restore command
Restore Data Migrator from a backup by using the following curl command:
curl -X POST "http://127.0.0.1:6780/backups/restore/<backup-file-name>.zip"
{
"createdAt" : 1673367251093,
"size" : 4819,
"migrationsCount" : 1,
"backupName" : "hvmbackup-20230110161411.0930-mig1.zip"
}
The following notification will appear in the UI:
CLI commands
Available backup and restore commands are listed in the command reference page with the other Data Migrator CLI commands: