Skip to content
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
83 changes: 75 additions & 8 deletions iceberg/ingest-from-iceberg.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,12 @@ description: "Ingest data from external Apache Iceberg tables in RisingWave for

This guide shows how to read data from external Apache Iceberg tables in RisingWave. Use it when your Iceberg tables are managed by external systems and you want to treat them as a streaming source for real-time processing.

Create an Iceberg source with the `CREATE SOURCE` statement. In the WITH clause, provide the catalog and object storage settings for the table.
You can ingest data from Iceberg tables using two approaches:

After the source is created, you can run ad hoc queries against it or maintain materialized views for continuous analytics.
1. **[Continuous ingestion (default)](#continuous-ingestion-with-create-source)**: Create an Iceberg source with the `CREATE SOURCE` statement for continuous, streaming ingestion of append-only data.
2. **[Periodic full reload](#periodic-full-reload-with-create-table)**: Create an Iceberg table with `refresh_mode = 'FULL_RELOAD'` for scheduled full table refreshes. Note that you must use `CREATE TABLE` (not `CREATE SOURCE`), and data will only be loaded after you trigger a refresh—either manually or via the configured schedule.

After the source or table is created, you can run ad hoc queries against it or maintain materialized views for continuous analytics.

## Prerequisites

Expand All @@ -17,7 +20,9 @@ After the source is created, you can run ad hoc queries against it or maintain m
* Network connectivity between RisingWave and your storage system.
* Knowledge of your Iceberg catalog type and configuration.

## Basic connection example
## Continuous ingestion with CREATE SOURCE

### Basic connection example

The following example creates a source for a table in S3 using AWS Glue as the catalog:

Expand All @@ -36,18 +41,16 @@ WITH (
);
```

<Note>
When you read from an external Iceberg table, RisingWave automatically derives column names and data types from the Iceberg table metadata. Use the [DESCRIBE](/sql/commands/sql-describe) statement to view the schema:

```sql
DESCRIBE my_iceberg_source;
```
</Note>

## Parameters
### Parameters

| Parameter | Description | Example |
|-----------|-------------|---------|
|:----------|:------------|:--------|
| `connector` | Required. For Iceberg sources, it must be `'iceberg'` | `'iceberg'` |
| `database.name` | Required. The Iceberg database/namespace name. | `'analytics'` |
| `table.name` | Required. The Iceberg table name. | `'user_events'` |
Expand All @@ -60,7 +63,7 @@ You also need to specify catalog and object storage parameters in the `CREATE SO

For details on how data types are mapped between RisingWave and Iceberg, see the [Data type mapping guide](/iceberg/data-types).

## Source example
### Source example

For a REST catalog:

Expand All @@ -80,6 +83,70 @@ WITH (
);
```

## Periodic full reload with CREATE TABLE

<Note>
Added in v2.7.0. It is currently in **[technical preview](/changelog/product-lifecycle#product-release-lifecycle)** stage.
</Note>

For batch-style workloads where you need to periodically reload an entire Iceberg table, you can create a table with `refresh_mode = 'FULL_RELOAD'`. This mode is useful when:

- The external Iceberg table supports mutable data (updates and deletes).
- You need a point-in-time snapshot of the entire table.
- Periodic full reloads fit your use case better than continuous streaming.

### Create a refreshable table

```sql
CREATE TABLE iceberg_batch_table (
id int primary key,
name varchar
) WITH (
connector = 'iceberg',
catalog.type = 'storage',
warehouse.path = 's3://my-data-lake/warehouse',
database.name = 'public',
table.name = 'my_iceberg_table',
s3.access.key = 'your-access-key',
s3.secret.key = 'your-secret-key',
s3.region = 'us-west-2',
refresh_mode = 'FULL_RELOAD', -- Required for periodic refresh
refresh_interval_sec = '60' -- Reload every 60 seconds
);
```

### Parameters

| Parameter | Description | Required | Example |
|:----------|:------------|:---------|:--------|
| `refresh_mode` | Must be set to `'FULL_RELOAD'` to enable periodic refresh functionality | Yes | `'FULL_RELOAD'` |
| `refresh_interval_sec` | Interval in seconds between automatic refresh operations | No | `'60'` |

RisingWave checks all refreshable tables at a configurable interval (default: 60 seconds, configured by `stream_refresh_scheduler_interval_sec` in the RisingWave configuration file). Setting a `refresh_interval_sec` value lower than this scheduler interval may result in refresh triggers not occurring at the expected frequency.

### Manual refresh

You can manually trigger a refresh at any time using the `REFRESH TABLE` command:

```sql
REFRESH TABLE iceberg_batch_table;
```

### Monitor refresh status

Query the `rw_catalog.rw_refresh_table_state` system catalog to monitor refresh operations:

```sql
SELECT table_id, current_status, last_trigger_time, last_success_time, trigger_interval_secs
FROM rw_catalog.rw_refresh_table_state;
```

The `current_status` field shows the current state of the refresh job:
- `IDLE`: No refresh operation is currently in progress
- `REFRESHING`: A refresh operation is in progress

For more details, see [REFRESH TABLE](/sql/commands/sql-refresh-table).

## What's next?

### Query data
Expand Down
5 changes: 3 additions & 2 deletions iceberg/ov-external.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -116,7 +116,8 @@ With this configuration, RisingWave acts as a real-time transformation layer bet
Use this matrix to understand current support for ingesting from and delivering to external Iceberg tables.

| Loading mode | Append-only | Mutable |
| --- | --- | --- |
| :-- | :-- | :-- |
| One-time loading | Supported | Supported (both equality and position deletes) |
| Periodic loading | Planned for a future release (v2.7) | Planned for a future release (v2.7) |
| [Periodic loading](/iceberg/ingest-from-iceberg#periodic-full-reload-with-create-table) | Supported (Technical preview, v2.7+) | Supported (Technical preview, v2.7+) |
| Continuous loading | Supported | No support (Iceberg CDC is immature) |

2 changes: 1 addition & 1 deletion iceberg/ov-internal.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,6 @@ For complete details on configuration, see the [Iceberg table maintenance](/iceb
## Catalog and compaction summary

| Component | RisingWave Native Options | Alternative Options | Description |
| --- | --- | --- | --- |
| :-- | :-- | :-- | :-- |
| **Catalog service** | Built-in catalog (JDBC) or a self-hosted REST catalog (Lakekeeper) | Glue, Hive, Nessie, or custom REST catalogs | Stores metadata and schema information |
| **Compaction service** | RisingWave's built-in compaction service | External services (Aamazon EMR) or self-hosted Spark | Merges small files and expires old snapshots |
1 change: 1 addition & 0 deletions mint.json
Original file line number Diff line number Diff line change
Expand Up @@ -555,6 +555,7 @@
"sql/commands/sql-grant",
"sql/commands/sql-insert",
"sql/commands/sql-recover",
"sql/commands/sql-refresh-table",
"sql/commands/sql-revoke",
"sql/commands/sql-select",
"sql/commands/sql-set-background-ddl",
Expand Down
69 changes: 69 additions & 0 deletions sql/commands/sql-refresh-table.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
---
title: "REFRESH TABLE"
description: "Use the `REFRESH TABLE` command to manually trigger a refresh operation for tables created with the FULL_RELOAD refresh mode."
---

<Note>
Added in v2.7.0. It is currently in **[technical preview](/changelog/product-lifecycle#product-release-lifecycle)** stage.
</Note>

The `REFRESH TABLE` command manually triggers a full reload of data from the external source for tables configured with the `FULL_RELOAD` refresh mode. This is useful when you need to immediately update the table data without waiting for the next scheduled refresh.

When a table is created with `refresh_mode = 'FULL_RELOAD'`, it can be configured to automatically refresh at a specified interval using the `refresh_interval_sec` parameter. The `REFRESH TABLE` command allows you to trigger an additional refresh on demand.

## Syntax

```sql
REFRESH TABLE table_name;
```

## Parameters

| Parameter | Description |
|:----------|:------------|
| `table_name` | The name of the table to refresh. The table must be created with `refresh_mode = 'FULL_RELOAD'`. |

## Example

Create a table with the `FULL_RELOAD` refresh mode:

```sql
CREATE TABLE iceberg_batch_table (
id int primary key,
name varchar
) WITH (
connector = 'iceberg',
catalog.type = 'storage',
table.name = 'my_iceberg_table',
database.name = 'public',
refresh_mode = 'FULL_RELOAD',
refresh_interval_sec = '60' -- Automatically refresh every 60 seconds
);
```

Manually trigger a refresh:

```sql
REFRESH TABLE iceberg_batch_table;
```

## Monitor refresh status

You can monitor the status of refresh operations using the `rw_catalog.rw_refresh_table_state` system catalog:

```sql
SELECT table_id, current_status, last_trigger_time, last_success_time, trigger_interval_secs
FROM rw_catalog.rw_refresh_table_state;
```

This query returns information about all refreshable tables, including:
- `table_id`: The unique identifier of the table
- `current_status`: The current status of the refresh job (e.g., `IDLE`, `REFRESHING`)
- `last_trigger_time`: The timestamp of the last refresh operation
- `last_success_time`: The timestamp when the refresh last completed successfully
- `trigger_interval_secs`: The configured refresh interval in seconds

## Related topics

- [Ingest data from Iceberg tables](/iceberg/ingest-from-iceberg)
- [RisingWave catalogs](/sql/system-catalogs/rw-catalog)
1 change: 1 addition & 0 deletions sql/system-catalogs/rw-catalog.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,7 @@ SELECT name, initialized_at, created_at FROM rw_sources;
| `rw_materialized_views` | Contains information about materialized views in the database, including their unique IDs, names, schema IDs, owner IDs, definitions, append-only information, access control lists, initialization and creation timestamps, and the cluster version when the materialized view was initialized and created. |
| `rw_meta_snapshot` | Contains information about existing snapshots of the RisingWave meta service. You can use this relation to get IDs of meta snapshots and then restore the meta service from a snapshot. For details, see [Back up and restore meta service](/operate/meta-backup). |
| `rw_rate_limit` | Contains information about rate limit configurations for sources, including source names, node names, fragment types, and assigned rate limit values. |
| `rw_refresh_table_state` | Contains information about the status and configuration of refresh jobs for tables with `refresh_mode = 'FULL_RELOAD'`. Shows the current status, last trigger time, last success time, and configured refresh interval for each refreshable table. |
| `rw_relation_info` | Contains low-level relation information about tables, sources, materialized views, and indexes that are available in the database. |
| `rw_relations` | Contains information about relations in the database, including their unique IDs, names, types, schema IDs, and owners. |
| `rw_resource_groups` | Contains information about resource groups in the database, including their streaming workers, parallelism, databases, and streaming jobs. |
Expand Down
Loading