Core Concepts
Ice Flow manages Apache Iceberg catalogs as a service within Symphony. This page explains the key concepts and how they relate to each other.
Catalogs
A catalog is a connection to an Iceberg metadata store. The metadata store tracks which tables exist, their schemas, and where their data files are located. Ice Flow supports six catalog types:
| Type | Metadata store |
|---|---|
| Hadoop | Filesystem-based metadata (HDFS) |
| Hive | Apache Hive Metastore (Thrift) |
| JDBC | Relational database (PostgreSQL, MySQL, etc.) |
| AWS Glue | Amazon Glue Data Catalog |
| REST | Apache Iceberg REST Catalog API |
| Nessie | Project Nessie versioned catalog |
Each catalog is associated with a warehouse and has its own set of connection properties. You can connect multiple catalogs simultaneously, even of different types.
Catalogs that have active monitors or replications cannot be deleted until those associations are removed.
Warehouses
A warehouse is a storage location where Iceberg table data files reside. Every catalog is associated with a warehouse. Common warehouse locations include S3 buckets, HDFS paths, Azure Blob containers, and GCS buckets.
Warehouses serve two purposes:
- Catalog association — tells Ice Flow where a catalog's data lives
- Location mappings — enables path translation during replication between catalogs with different storage backends
Scopes
A scope is a reusable table selector. Scopes target a single namespace and match tables by exact name or regular expression pattern:
- By Name — matches one table exactly (e.g.
orders) - By Pattern — matches tables by regex (e.g.
.*for all tables,order.*for tables starting with "order")
Scopes are defined independently and then attached to monitors or replications as inclusion scopes (tables to process) or exclusion scopes (tables to skip). This separation means you define a pattern once and reuse it across multiple monitors and replications.
Namespace names are case-sensitive and must match exactly as they appear in the catalog.
Monitors
A monitor continuously observes changes to tables in a Hive catalog. It polls the Hive Metastore event stream at a configurable interval and records detected changes (creates, modifications, renames, deletes).
Monitors are passive — they record events but do not copy data. They are useful for:
- Understanding change patterns before setting up replication
- Auditing what has changed and when
- Alerting on unexpected changes
Monitors require a Hive catalog as the source because they rely on the Hive Metastore event stream.
Replications
A replication copies Iceberg tables from a source catalog to a target catalog, including both metadata and data files. The target receives a fully independent, queryable copy.
Modes
| Mode | Behaviour |
|---|---|
| One-time | Performs a single sync pass and stops. Works with all catalog types. |
| Continuous | Watches for changes and replicates them as they occur. Requires a Hive source. |
Copy Strategies
| Strategy | Behaviour |
|---|---|
| Latest snapshot | Copies only the current table state. Faster and uses less storage. |
| All snapshots | Copies the complete snapshot history, preserving time-travel capabilities. |
Operations and File Transfers
Each table synchronised during a replication cycle is recorded as an operation. Each operation may involve copying one or more data files, recorded as file transfers. This two-level tracking lets you see both the high-level progress (which tables were synced) and the low-level detail (which files were copied, their sizes, and transfer durations).
Location Mappings
A location mapping defines how file paths are translated when replicating between catalogs with different storage locations. Each mapping specifies:
- A source warehouse and source path
- A target warehouse and target path
During replication, Ice Flow rewrites file paths so that data files land in the
correct location on the target storage system. For example, a mapping might
translate s3://prod-bucket/warehouse to s3://dr-bucket/warehouse.
Location mappings are global — they are not tied to a specific replication. You create mappings once and associate them with replications as needed.
Mappings are path-scoped: a mapping applies only to tables whose source location falls under its source path. Ice Flow replicates a table that matches no mapping using the catalog's default warehouse, so an unrelated mapping never disrupts other replications.
How They Fit Together
Warehouse ──── Catalog ──── Scope ──── Monitor
│ │ │
│ │ └──── Replication
│ │ │
└── Location Mapping ────────────────┘
- You define warehouses to describe where data lives
- You create catalogs that connect to metadata stores and reference a warehouse
- You define scopes that select tables by namespace and name pattern
- You attach scopes to monitors (for observation) or replications (for copying)
- If source and target use different storage, location mappings translate paths during replication
Consistency Checking
After replication, you can run a consistency check to verify that the target catalog matches the source. The check compares table states and reports whether they are consistent, inconsistent, or if the check failed. This is especially useful after initial one-time replications or when investigating potential data drift.
License Enforcement
Ice Flow integrates with Symphony's license enforcement system. When enforcement is active — either globally or specifically for the Ice Flow extension — all replication and monitoring activities are paused. The UI displays a banner indicating the enforcement state. Activities resume automatically when enforcement is lifted.