Version: 2.4.3

Configure Snowflake as a target

note

Snowflake is currently available as a preview feature and under development. If you use Snowflake as a target metastore with Data Migrator, and have feedback to share, contact us. The feature is automatically enabled. See Preview features.

Configure Snowflake as a target metastore using either the UI or the CLI.

We support the following tables for migration:

Apache Orc
Apache Parquet
Partitioned and non-partitioned tables
Non-transactional tables only

info

Disable transactional tables with the following command:

tblproperties('transactional'='false')

We support public and private Snowflake schemas.

Prerequisites

Ensure you have the following before you start:

A Snowflake account identifier, warehouse, and stage. See the Snowflake documentation.
A cloud storage service already configured as your target filesystem. See Configure target filesystems.
If you choose the private key method of authentication, you'll need a private key file for Snowflake key pair authentication. The file must be stored on your Data Migrator cluster host. In this case, the Hive user requires access permissions to the private key file. See Configure system users.
Appropriate user and role privileges to ensure Data Migrator accesses and modifies only those Snowflake target schemas required for migrations. See below for examples.

Additional information.

See the following Snowflake Known issue.

Examples of how to set up the Hive user with limited permissions on Snowflake

This example shows how you can grant permissions to access and modify specific databases and schemas in Snowflake.

Example 1
USE ROLE ACCOUNTADMIN;
//Skip if warehouse already exists
CREATE WAREHOUSE "HVM_WAREHOUSE";
CREATE DATABASE "HVM_DB";
CREATE SCHEMA "HVM_DB"."HVM_SCHEMA";
CREATE ROLE "HVM_INTEGRATION";
GRANT USAGE ON WAREHOUSE "HVM_WAREHOUSE" TO ROLE "HVM_INTEGRATION";
GRANT USAGE ON DATABASE "HVM_DB" TO ROLE "HVM_INTEGRATION";
GRANT USAGE ON SCHEMA "HVM_DB"."HVM_SCHEMA" TO ROLE "HVM_INTEGRATION";
//For internal stage
GRANT READ ON STAGE "HVM_DB"."HVM_SCHEMA".HVM_STAGE TO ROLE "HVM_INTEGRATION";
//For external stage
GRANT USAGE ON STAGE in "HVM_DB"."HVM_SCHEMA" TO ROLE "HVM_INTEGRATION";
GRANT CREATE STAGE ON SCHEMA "HVM_DB"."HVM_SCHEMA" TO ROLE "HVM_INTEGRATION";
GRANT CREATE INTEGRATION ON ACCOUNT TO ROLE "HVM_INTEGRATION";
//There is no FUTURE grants support for databases, thus migration of databases requires ACCOUNTADMIN role
//If limited list of databases going to be migrated with HVM, databases can be created manually by admin or admin can
//grant ACCOUNTADMIN role to the HVM_USER for short period of time to migrate databases only (use particular HVM migration rule)
//Note: Changing the properties of a database, including comments, requires the OWNERSHIP privilege for the database.

//Invoke for all schemas
GRANT SELECT,INSERT,TRUNCATE ON FUTURE TABLES in schema <**DB.SCHEMA***> to role "HVM_INTEGRATION";
GRANT SELECT ON FUTURE VIEWS in schema <**DB.SCHEMA***> to role "HVM_INTEGRATION";

//Create new user with private key
CREATE USER "HVM_USER" RSA_PUBLIC_KEY="********" DEFAULT_ROLE = "HVM_INTEGRATION";
//Grant configured HVM_INTEGRATION role to the new user
GRANT ROLE "HVM_INTEGRATION" TO USER "HVM_USER";

This example shows how you can create a stored procedure and grant the LiveData Migrator user permissions. This method calls the stored procedure rather than using the CREATE DATABASE command.

Example 2
USE ROLE ACCOUNTADMIN ;
CREATE OR REPLACE PROCEDURE demo_db.public.pTestDatabaseCreation(DBNAME string)
RETURNS text
LANGUAGE javascript
COMMENT = 'Create a database without ACCOUNTADMIN privileges'
EXECUTE AS OWNER
as
    $$
    var stmt = snowflake.createStatement({
        sqlText: "create database "+ DBNAME
    })
    var ret = stmt.execute();
    return ret;
    $$
;
GRANT USAGE ON PROCEDURE
demo_db.public.pTestDatabaseCreation(string) TO ROLE SYSADMIN;
USE ROLE SYSADMIN;
CALL demo_db.public.pTestDatabaseCreation('my_test_database');

Snowflake agent timeout recommendations

Occasionally, migrations may fail due to brief network disruption or heavy data processing. Reduce the chance of these failures by setting the following timeout properties in the hive agent add snowflake CLI command:

Property	Description	Recommended
`--networkTimeout`	Number of milliseconds to wait for a response when interacting with the Snowflake service before returning an error. `0` (zero) specifies that no network timeout is set. Default=600000	600000
`--queryTimeout`	Number of seconds to wait for a query to complete before returning an error. `0` (zero) specifies that the driver should wait indefinitely. Default=0	0

The recommended 10 minute (600000ms) networkTimeout ensures that brief disruption to the network won't cause migration failures but migrations will not hang indefinitely.
Leaving queryTimeout at the default 0 value ensures that long query times will never result in a timeout.