Databricks target prerequisites
Review the following prerequisites required to migrate to Databricks using a Metastore Agent.
Prerequisites
When adding a Databricks Metastore Agent choose either a Unity Catalog or a Workspace Hive Metastore (Legacy) Metastore Type. Find more information on general prerequisites, Workspace Hive Metastore (Legacy) prerequisites, and Unity Catalog prerequisites below.
Workspace Hive Metastore (Legacy) prerequisites
Unity Catalog prerequisites
Data formats
To ensure a successful migration to Databricks, the source tables must be in one of the following formats:
- CSV
- JSON
- AVRO
- ORC
- PARQUET
- Text
Cluster and file system
Ensure you have the following before you start:
- A Databricks cluster with at minimum, databricks runtime 15.1.
- A Databricks File System (DBFS)
Cloud storage mounted
Example: Script to mount ADLS Gen2 or blob storage with Azure Blob File System
configs = {"fs.azure.account.auth.type": "OAuth",
"fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
"fs.azure.account.oauth2.client.id": "<application-id>",
"fs.azure.account.oauth2.client.secret": dbutils.secrets.get(scope="<scope-name>",key="<service-credential-key-name>"),
"fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/<directory-id>/oauth2/token"}
# Optionally, you can add example-directory-name to the source URI of your mount point.
dbutils.fs.mount(
source = "abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/",
mount_point = "/mnt/<mount-name>",
extra_configs = configs)
Replace:
- <application-id> with the Application (client) ID for the Azure Active Directory application.
- <scope-name> with the Databricks secret scope name.
- <service-credential-key-name> with the name of the key containing the client secret.
- <directory-id> with the Directory (tenant) ID for the Azure Active Directory application.
- <container-name> with the name of a container in the ADLS Gen2 storage account.
- <storage-account-name> with the ADLS Gen2 storage account name.
- <mount-name> with the name of the intended mount point in DBFS.
Install Databricks driver
To install the JDBC driver:
- Download the Databricks JDBC driver.
Data Migrator only supports JDBC driver version 2.6.25 or higher.
Unzip the package and upload the
DatabricksJDBC42.jar
file to the Data Migrator host machine.Move the
DatabricksJDBC42.jar
file to the Data Migrator directory:/opt/wandisco/hivemigrator/agent/databricks
Change ownership of the jar file to the Hive Migrator system user and group:
chown hive:hadoop /opt/wandisco/hivemigrator/agent/databricks/DatabricksJDBC42.jar
Next steps
- Continue to add a Databricks Metadata Agent.