Skip to main content
Version: 2.4.3

Databricks target prerequisites

Review the following prerequisites required to migrate to Databricks using a Metastore Agent.

Prerequisites

When adding a Databricks Metastore Agent choose either a Unity Catalog or a Workspace Hive Metastore (Legacy) Metastore Type. Find more information on general prerequisites, Workspace Hive Metastore (Legacy) prerequisites, and Unity Catalog prerequisites below.

Workspace Hive Metastore (Legacy) prerequisites

Unity Catalog prerequisites

  • External location created in Databricks. Learn more from Azure, AWS and GCP.

Data formats

To ensure a successful migration to Databricks, the source tables must be in one of the following formats:

  • CSV
  • JSON
  • AVRO
  • ORC
  • PARQUET
  • Text

Cluster and file system

Ensure you have the following before you start:

Cloud storage mounted

Example: Script to mount ADLS Gen2 or blob storage with Azure Blob File System

configs = {"fs.azure.account.auth.type": "OAuth",
"fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
"fs.azure.account.oauth2.client.id": "<application-id>",
"fs.azure.account.oauth2.client.secret": dbutils.secrets.get(scope="<scope-name>",key="<service-credential-key-name>"),
"fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/<directory-id>/oauth2/token"}

# Optionally, you can add example-directory-name to the source URI of your mount point.
dbutils.fs.mount(
source = "abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/",
mount_point = "/mnt/<mount-name>",
extra_configs = configs)

Replace:

  • <application-id> with the Application (client) ID for the Azure Active Directory application.
  • <scope-name> with the Databricks secret scope name.
  • <service-credential-key-name> with the name of the key containing the client secret.
  • <directory-id> with the Directory (tenant) ID for the Azure Active Directory application.
  • <container-name> with the name of a container in the ADLS Gen2 storage account.
  • <storage-account-name> with the ADLS Gen2 storage account name.
  • <mount-name> with the name of the intended mount point in DBFS.

Install Databricks driver

To install the JDBC driver:

  1. Download the Databricks JDBC driver.
note

Data Migrator only supports JDBC driver version 2.6.25 or higher.

  1. Unzip the package and upload the DatabricksJDBC42.jar file to the Data Migrator host machine.

  2. Move the DatabricksJDBC42.jar file to the Data Migrator directory:

    /opt/wandisco/hivemigrator/agent/databricks
  3. Change ownership of the jar file to the Hive Migrator system user and group:

    chown hive:hadoop /opt/wandisco/hivemigrator/agent/databricks/DatabricksJDBC42.jar

Next steps