Skip to main content
Version: 3.0 (latest)

Configure an IBM Spectrum Scale/Storage Scale(GPFS) source

Migrate data from your IBM Storage Scale, (formally Spectrum Scale, GPFS - General Parallel File System) filesystem by adding as a source in Data Migrator.

tip

IBM Spectrum Scale and IBM Storage Scale refer to the same underlying technology (the distributed file system based on GPFS). IBM Storage Scale is the newer name, adopted in recent rebranding to align IBM's storage products under the "Storage" portfolio. You may find reference to either name depending on the age or version of 3rd party documentation.

Prerequisites

  • HDFS CES (Cluster Export Service) is required on IBM Spectrum Scale to allow Data Migrator to interface with the source. See IBM Spectrum Scale Cluster Export Service for more information.
  • For live migration, the Spectrum Scale 'Clustered watch folder' feature is required with an external Kafka sink. See IBM Spectrum Scale Clustered watch folder for more information.
  • For live migration, the following events must be enabled in your 'Clustered watch folder' configuration: IN_ATTRIB, IN_CLOSE_WRITE, IN_CREATE, IN_DELETE, IN_MOVED_FROM, IN_MOVED_TO, and WARNING.
  • For live migration of all independent filesets you intend to migrate with Data Migrator, ensure you have clustered watches configured to produce events for each of these independent filesets.
  • The Data migrator marker directory needs to be included as a watched folder. This is defaulted as the user home directory if not set.
info

When adding as a live source, because Kafka provides an unordered event stream, all migrations automatically default to use Target Match.

note

You can add a maximum of one IBM Spectrum Scale (GPFS) source filesystem.

Configure an IBM Spectrum Scale source filesystem with the UI

  1. From the Dashboard, select an instance under Instances.

  2. In the Filesystems & Agents menu, select Filesystems.

  3. Select Add source filesystem

  4. Select IBM Spectrum Scale (GPFS) from the Filesystem Type dropdown list.

  5. Enter the following details:

    • Display Name - Enter a name for your source filesystem.

    • Default Filesystem - Enter the filesystem address. For example, hdfs://nameservice:8020.

    • Mount Point - Enter the root of the GPFS HDFS mount. For example, /gpfs/fs1/cluster-data/

      tip

      Typically, this will be the value of gpfs.mnt.dir appended with the value of gpfs.data.dir from the source config. See the following example commands used on the Spectrum Scale source to retrieve these values.

      [root@source ~]# mmhdfs config get gpfs-site.xml -k gpfs.mnt.dir
      gpfs.mnt.dir=/gpfs/fs1
      [root@source ~]# mmhdfs config get gpfs-site.xml -k gpfs.data.dir
      gpfs.data.dir=cluster-data
    • Kerberos Configuration

      • Kerberos Principal - Enter a principal that will map to the HDFS superuser using auth_to_local rules, or add the Data Migrator user principal to the superuser group on the Hadoop cluster you're using.
        • For example: Create the Kerberos principal ldmuser@realm.com. Using auth_to_local rules, ensure the principal maps to the user hdfs, or that the user ldmuser is added to the superuser group.
      • Kerberos Keytab Location - Enter the path to the Kerberos keytab file containing the Kerberos Principal. The keytab file must be accessible to the local system user running the Data Migrator service (the default is hdfs) and must be accessible from the edge node where Data Migrator is installed.
        • For example: Copy the ldmuser.keytab file (where ldmuser is your intended user) containing the Kerberos principal into the /etc/security/keytabs/ directory on the edge node running Data Migrator, make its permissions accessible to the HDFS user running Data Migrator, and enter the /etc/security/keytabs/ldmuser.keytab path during Kerberos configuration for the filesystem.
    • Advanced Configuration - Enter paths to configuration files or additional properties with key/value pairs.

      • Configuration Property File Paths (Optional) - Enter the directory or directories containing your HDFS configuration (such as the core-site.xml and hdfs-site.xml) on your Data Migrator host's local filesystem. This is required if you have Kerberos or a High Availability (HA) HDFS.
        note

        Data Migrator reads core-site.xml and hdfs-site.xml once, during filesystem creation, applying any configuration within paths added under Configuration Property File Paths. After creation, these paths are no longer visible in the UI. You can see all filesystem properties using the API.

      • Configuration Property Overrides (Optional) - Enter override properties or additional properties for your filesystem by adding key-value pairs. (Example).
    • Filesystem Options (Select either)

      • Live Migration - Select to include Live as a migration type when creating migrations.
      • One-time Migration - Select to limit migration types available to one-time. See migration types to learn more about each type.
    • Kafka Event Source (Optional, for live migration, see prerequisites).

      • Bootstrap servers - Enter hostname and port of Kafka Bootstrap servers. Use comma-separated pairs for multiple servers. For example, hostname:9092,hostname2:9092.
      • Topic name - Enter the Kafka topic name for event delivery. For example, my-event-topic.
      • Group identifier - Enter the Kafka consumer identifier. For example, my-group-id.
        caution

        The Group identifier must be unique and unused for each Data Migrator instance. Data Migrator cannot share events with other consumers to ensure no other consumer can interfere with Data Migrator retrieving events from the topic. Similarly, if you have multiple Data Migrator instances using the same IBM Spectrum Scale source and the same Kafka topic, each must be supplied with a unique Group identifier.

      • Use TLS (Optional) - Select to use TLS and specify a custom truststore location and password if required for authentication with Kafka.
        • Truststore Location - Enter the full local path of the Truststore file. This must be accessible to the local system user running the Data Migrator service.
        • Truststore Password - Enter the Truststore password.
    • Kafka Kerberos Configuration (Optional)

      • Kafka Kerberos Principal - If using Kerberos with Kafka. Enter the Kafka Kerberos principal used to authenticate with the Kafka.
      • Kafka Kerberos Keytab Location - If using Kerberos with Kafka. Enter the path to the Kerberos keytab containing the Kafka Kerberos Principal supplied. The keytab file must be accessible to the local system user running the Data Migrator service.
  6. Select Save to add your IBM Spectrum Scale (GPFS) filesystem.

Configure an IBM Spectrum Scale source filesystem with the CLI

Create an IBM Spectrum Scale (GPFS) source with the filesystem add gpfs command in the Data Migrator CLI. See the filesystem add gpfs command reference for all options.

Example

Add a live IBM Spectrum Scale (GPFS) source
filesystem add gpfs --default-fs hdfs://SourceCluster:8020 --file-system-id GPFS-Source --gpfs-kerberos-keytab /etc/security/keytabs/hdfs.headless.keytab --gpfs-kerberos-principal hdfs04@REALM.HADOOP --kafka-bootstrap-servers bootstapServer1:9093 --kafka-group-id kafGroup1 --kafka-kerberos-keytab /etc/security/keytabs/kafka.service.keytab --kafka-kerberos-principal kafka/gpfsapr@REALM.HADOOP --kafka-topic FS1-WATCH-EVENT --mount-point /gpfs/fs1/cluster-data --properties-files /etc/wandisco/livedata-migrator/conf/ --use-ssl
note

If your filesystem add command includes the --use-ssl option, you will be asked to enter your Kafka SSL truststore location and password in the command prompt.

Kafka SSL truststore location: /etc/cirata/livedata-migrator/conf/kafka-keystore.p12
Kafka SSL truststore password: ********

Update an IBM Spectrum Scale source filesystem with the CLI

Update an existing IBM Spectrum Scale source filesystem with the CLI using the filesystem update gpfs command.

Example

Update a live IBM Spectrum Scale (GPFS) source
filesystem update gpfs --file-system-id GPFS-Source --default-fs hdfs://SourceCluster:8020 --properties-files /etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml

Advanced configuration Kafka consumer

You can use the Configuration Property Overrides or the --properties option to supply additional configuration as key-value pairs including Kafka consumer configuration. Any Kafka consumer configuration must be prefixed with 'kafka.consumer'.

For example, to provide a custom configuration value of 4096 for the consumer property "fetch.min.bytes" use the --properties option and the kafka.consumer.fetch.min.bytes key. See example usage below.

Apply fetch.min.bytes consumer configuration
filesystem add gpfs --default-fs hdfs://Cluster:8020 --file-system-id GPFS --gpfs-kerberos-keytab /etc/security/keytabs/hdfs.keytab --gpfs-kerberos-principal hdfs04@REALM.HADOOP --kafka-bootstrap-servers Server1:9093 --kafka-group-id Group1 --kafka-kerberos-keytab /etc/security/keytabs/kafka.keytab --kafka-kerberos-principal kafka/gpfsapr@REALM.HADOOP --kafka-topic FS1-WATCH-EVENT --mount-point /gpfs/fs1/cluster-data --properties-files /etc/wandisco/livedata-migrator/conf/ --use-ssl --properties kafka.consumer.fetch.min.bytes=4096

Refer to the official Apache documentation for further information on consumer configuration.

Next steps

Configure a target filesystem to migrate data to. Then create a migration.

tip

When you create a migration with this source, the "Skip if Size Match" option provides more efficient handling of rename operations, preventing redundant retransfers.