Skip to main content
Version: 2.6

Configure Google Dataproc as a target

Configure Google Dataproc as a target metastore using either the UI or the CLI.

Remote agent

A remote agent is a service deployed on a remote host that connects to Data Migrator to handle metadata transfer. A remote agent must be deployed on the Dataproc cluster with a supported OS. Currently, Ubuntu 20.04 and Ubuntu 18.04 are supported.

caution

Migration of transactional tables to a Google Dataproc metastore target is currently unsupported.

Prerequisites

See the knowledge base article Setting up a Dataproc agent.

Deploy a remote Hive agent for Dataproc with the CLI

  1. On your local host, run the hive agent add dataproc command with the following parameters to configure your remote Hive agent.

    • --host The host where the remote Hive agent will be deployed.
    • --port The port for the remote Hive agent to use on the remote host. This port is used to communicate with the local Data Migrator server.
    • --no-ssl (Optional) Transport Layer Security (TLS) encryption and certificate authentication is enabled by default between Data Migrator and the remote agent. Use this parameter to disable it.
  2. Transfer the remote server installer to your remote host:

    Example of secure transfer from local to remote host
    scp /opt/wandisco/hivemigrator/hivemigrator-remote-server-installer.sh myRemoteHost:~
  3. On your remote host, run the installer as root (or sudo) user in silent mode:

    ./hivemigrator-remote-server-installer.sh -- --silent
    note

    The agent port will default to 5052. To set a custom agent port, run the installer with the --agent-port parameter. For example, ./hivemigrator-remote-server-installer.sh -- --silent --agent-port <custom port>.

  4. On your remote host, start the remote server service:

    service hivemigrator-remote-server start
    Example for remote Dataproc deployment - automated
    hive agent add dataproc --name targetautoAgent --autodeploy --ssh-user root --ssh-key /root/.ssh/id_rsa --ssh-port 22 --host myRemoteHost.example.com --port 5052 --config-path <example directory path> --file-system-id mytargethdfs
    Example for remote Dataproc deployment - manual
    hive agent add dataproc --name targetmanualAgent --host myRemoteHost.example.com --port 5052 --config-path <example directory path> --file-system-id mytargethdfs
note

If you enter Kerberos and configuration path information for remote agents, ensure the directories and Kerberos principal are correct for your chosen remote host (not your local host).

Configure a secure TLS/SSL keystore connection to an agent

You can set up a keystore to enable TLS between Hive Migrator and remote agents.

See Configure a secure TLS/SSL keystore connection to a remote agent for more information.

Configure Google Dataproc with the UI

  1. From the Dashboard, select an instance under Instances.

  2. Under the Filesystems & Agents menu, select Metastore Agents.

  3. Select Connect to Metastore.

  4. Select the Filesystem in which the data associated with the metadata is held.
    For Dataproc agents, this is usually a Google Cloud Storage bucket.

  5. Select Google Cloud Dataproc as the Metastore Type.

  6. Download the installer to the Dataproc cluster virtual machine.

  7. Make the installer script executable.

    chmod +x hivemigrator-remote-server-installer.sh
  8. Run the installation command.

    ./hivemigrator-remote-server-installer.sh – --silent
  9. Start the service.

    service hivemigrator-remote-server start
  10. Enter a Display Name.

  11. Enter the hostname or IP address of the cluster edge node.

  12. Enter the port for communication between the Hive Migrator service and the Dataproc server.

  13. Choose whether to use TLS.

  14. [Optional] - Configure a secure TLS/SSL connection to the agent.

    1. Under Secure Connection to a Metastore Agent, select Use Keystore for Certificates.
    2. Enter the following details:
      • Keystore Type - Select JKS or PKCS12 as the keystore type.
      • Keystore Path - Enter the path to the keystore file. For example, /etc/wandisco/hivemigrator/agent/name/keystore.jks.
      • Keystore Password - Enter the password for the keystore.
      • Certificate Alias - Enter the alias of the certificate stored in the keystore.
      • Trusted Certificate Chain Alias - Enter the alias of the trusted certificate chain stored in the keystore.
    3. Select Check connection to test the connection to the metastore with the details you entered.
      If Data Migrator can connect to the remote agent successfully, you can continue configuring the agent.
  15. Optional Settings:

    • Configuration path
    • Kerberos Configuration
      • Use the principal assigned to the Dataproc cluster.
      • Enter a default filesystem override to override the default filesystem URI. We recommend this for complex use cases only.
  16. Select Save.