Getting Started with Ice Flow
This tutorial walks you through setting up Ice Flow from scratch. By the end you will have a warehouse, two catalogs, a scope, and a running replication.
Prerequisites
- A running Symphony instance with the Ice Flow extension installed
- At least one Iceberg catalog endpoint you can connect to (Hive, JDBC, Glue, REST, Hadoop, or Nessie)
- If replicating, a second catalog or a second warehouse location to replicate into
- Network connectivity from the Symphony host to your catalog endpoints and storage systems
If Ice Flow itself is not yet installed, follow Installation first.
Step 1 — Add a Warehouse
A warehouse defines where Iceberg table data files are stored. You need at least one warehouse before you can create a catalog.
- Navigate to Iceberg > Warehouses
- Click Add New Warehouse
- Enter a Name (e.g.
production-s3) and the Location path (e.g.s3://my-bucket/iceberg/warehouse) - Click Create
The location should point to the root directory where your Iceberg tables store their data files. Common formats:
| Storage | Example location |
|---|---|
| S3 | s3://bucket-name/warehouse |
| HDFS | hdfs://nameservice1/warehouse |
| Azure | abfss://container@account.dfs.core.windows.net/warehouse |
| GCS | gs://bucket-name/warehouse |
| Local | file:///opt/warehouse |
Step 2 — Add a Catalog
A catalog is a connection to an Iceberg metadata store.
- Navigate to Iceberg > Catalogs
- Click Add New Catalog
- Select the Catalog type (e.g. Hive)
- Enter a Name (e.g.
production-hive) - Select the Warehouse you created in Step 1
- Fill in the type-specific properties — the form pre-populates sensible
defaults. For a Hive catalog, set the URI to your Metastore Thrift
endpoint (e.g.
thrift://metastore.example.com:9083) - Click Create
If your catalog requires Kerberos authentication, set Authentication to Kerberos and provide the principal and keytab. See Configure Kerberos for details.
Verify the Connection
Open the catalog you just created and click the Content tab. If the connection is working, you will see the namespaces (databases) in the catalog. Click a namespace to browse its tables.
If no content appears, check the catalog URI, credentials, and network connectivity. See Troubleshooting for common issues.
Step 3 — Define a Scope
Scopes select which tables to include in monitoring or replication.
- Navigate to Iceberg > Scopes
- Click Add New Scope
- Enter a Name (e.g.
all-analytics-tables) - Enter the Namespace exactly as it appears in the catalog (case-sensitive)
- Choose the Match type:
- By Name to match a single table (e.g.
orders) - By Pattern to match multiple tables with a regex (e.g.
.*for all tables in the namespace)
- By Name to match a single table (e.g.
- Click Create
You can browse the catalog content first (Step 2) to see the available namespaces and table names.
Step 4 — Create a Replication
With a source catalog, a target catalog (or a second warehouse), and at least one scope, you can set up replication.
- Navigate to Iceberg > Replications
- Click Create New Replication
- Enter a Name (e.g.
prod-to-dr) - Select the Source catalog and Target catalog
- Choose the Mode:
- Continuous — keeps replicating as changes occur (requires a Hive source)
- One-time — performs a single sync pass
- Choose the Copy type:
- Latest snapshot — faster, copies only the current state
- All snapshots — preserves full history for time-travel queries
- Add one or more Inclusion scopes (the scope you created in Step 3)
- Optionally add Exclusion scopes to skip specific tables
- Click Create
Add Location Mappings (if needed)
If your source and target catalogs use different storage locations, you need a location mapping so Ice Flow knows where to write data files on the target.
- Go to the Replications page and click the Location Mappings tab
- Click Create New Location Mapping
- Select the Source warehouse and Target warehouse
- Set the Source path and Target path
- Click Create
- Associate the mapping with your replication on the replication's Mappings tab
Step 5 — Monitor Progress
After starting a replication, track its progress:
- Open the replication detail page
- Click the Operations tab to see individual table-level sync events
- Click an operation row to view the File Transfers panel, showing each file copied with its size and duration
For continuous replications, the status will show Replicating as it watches for new changes. For one-time replications, it will transition to Complete when finished.
Optional — Set Up a Monitor
Monitors observe catalog changes without replicating them. This is useful for auditing, alerting, or understanding change patterns before setting up replication.
- Navigate to Iceberg > Monitors
- Click Add New Monitor
- Select the Source catalog (must be a Hive catalog)
- Add Inclusion scopes
- Set the Poll period (default: 1000 ms)
- Click Create
View detected events on the monitor's Events tab.
Next Steps
- Core Concepts — understand how the building blocks fit together
- Manage Catalogs — advanced catalog management including property editing
- Catalog Type Reference — complete property reference for each catalog type
- Upgrades — upgrade Ice Flow in place
- Uninstallation — stop and remove Ice Flow