Skip to content

Conversation

@tabVersion
Copy link
Contributor

@tabVersion tabVersion commented Nov 11, 2025

  • Introduced a new migration for periodic refresh jobs, including the creation of the periodic_refresh_jobs table.
  • Implemented the GlobalRefreshManager to manage ongoing refresh processes and periodic refresh jobs.
  • Added functionality to initialize and trigger periodic refresh jobs based on defined intervals.
  • Updated relevant modules to support the new refresh management system.

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

following #23527

Key aspects of this change include:

  1. Refactored Refresh Mode: The FULL_RECOMPUTE enum and related logic have been renamed and replaced with FULL_RELOAD, better reflecting the operation of reloading data from external sources.
  2. Persistent Refresh Job State: A new database model refresh_job is introduced to store the state (e.g., IDLE, REFRESHING), last trigger time, and configured refresh interval for each refreshable table. This state is now persistent and managed by the meta service.
  3. Scheduled Refreshes: The FULL_RELOAD mode now supports an optional refresh_interval_sec property, allowing users to configure tables to automatically reload their data at specified intervals.
  4. User-facing Observability: A new system catalog, rw_catalog.rw_refresh_table_state, has been added. Users can query this table to view the current status, last trigger time, and configured interval for all refreshable tables.
  5. Monitoring: New Grafana dashboards have been added to track metrics related to refresh job durations, finish rates, and cron job triggers/misses, providing deeper insights into the refresh process.
  6. Integration: The GlobalRefreshManager is tightly integrated with the meta service's barrier management and catalog operations, ensuring consistent and controlled refreshes.

Checklist

  • I have written necessary rustdoc comments.
  • I have added necessary unit tests and integration tests.
  • I have added test labels as necessary.
  • I have added fuzzing tests or opened an issue to track them.
  • My PR contains breaking changes.
  • My PR changes performance-critical code, so I will run (micro) benchmarks and present the results.
  • I have checked the Release Timeline and Currently Supported Versions to determine which release branches I need to cherry-pick this PR into.

Documentation

  • My PR needs documentation updates.
Release note

PLEASE MARK AS EXPERIMENTAL

Now, with FULL_RELOAD, you gain:

  • Scheduled Refreshes: Configure refreshable tables to automatically reload their data at specified intervals.
  • Persistent Job State: RisingWave now keeps track of the refresh status, last trigger time, and interval for each FULL_RELOAD table, even across restarts.
  • Enhanced Observability: A new system catalog rw_catalog.rw_refresh_table_state allows you to monitor the status of your refresh jobs.

This means you can set up data sources (like Iceberg tables) to be periodically reloaded into RisingWave, ensuring your views and queries are always up-to-date with the latest batch data, without manual intervention.

How it Works

When creating a refreshable table, you can now specify a FULL_RELOAD mode with an optional refresh_interval_sec property:

CREATE TABLE iceberg_batch_table (
    id int primary key,
    name varchar
) WITH (
    connector = 'iceberg',
    catalog.type = 'storage',
    table.name = 'my_iceberg_table',
    database.name = 'public',
    refresh_mode = 'FULL_RELOAD', -- MUST set to `FULL_RELOAD`
    refresh_interval_sec = '60' -- reload the table every 60s
);

still can manually refresh the table

REFRESH TABLE iceberg_batch_table;

the status can be queried with sql

SELECT table_id, current_status, last_trigger_time, trigger_interval_secs
FROM rw_catalog.rw_refresh_table_state;

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces infrastructure for periodic refresh table jobs and refactors the progress tracking mechanism from a static global singleton to an instance-based approach managed by a new GlobalRefreshManager.

  • Added a new periodic_refresh_jobs database table to track periodic refresh schedules and status
  • Introduced GlobalRefreshManager to centralize refresh process management and periodic job scheduling
  • Refactored REFRESH_TABLE_PROGRESS_TRACKER from a static LazyLock<Mutex<>> to an instance-based Arc<RwLock<>> owned by GlobalRefreshManager

Reviewed Changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
src/meta/src/stream/refresh_manager.rs Core implementation of GlobalRefreshManager with periodic refresh scheduling, job registration, and refactored progress tracking from static global to instance-based
src/meta/src/barrier/worker.rs Updated to accept and pass GlobalRefreshManagerRef to barrier worker context
src/meta/src/barrier/manager.rs Updated to accept and pass GlobalRefreshManagerRef to barrier manager
src/meta/src/barrier/context/recovery.rs Updated to use instance-based progress tracker instead of static global
src/meta/src/barrier/context/mod.rs Added global_refresh_manager field to worker context struct
src/meta/src/barrier/context/context_impl.rs Updated all progress tracker accesses to use instance-based tracker from global_refresh_manager
src/meta/service/src/stream_service.rs Added global_refresh_manager parameter to stream service and passed to RefreshManager
src/meta/node/src/server.rs Initialized GlobalRefreshManager, started periodic refresh loop, and wired it through the service stack (contains duplicate code blocks that need cleanup)
src/meta/model/src/periodic_refresh_job.rs New entity model for periodic refresh jobs table
src/meta/model/src/lib.rs Added periodic_refresh_job module export
src/meta/model/migration/src/m20251110_224156_periodic_refresh_jobs.rs New migration to create periodic_refresh_jobs table
src/meta/model/migration/src/lib.rs Registered new periodic refresh jobs migration

- Introduced `RefreshProgressTracker` to manage progress across multiple actors during refresh operations, preventing race conditions.
- Updated data structures to track per-actor progress for list and load phases.
- Added new `RefreshProgress` protobuf message for communication.
- Enhanced `BarrierCompleteResult` to include refresh progress data.
- Integrated the tracker into `DatabaseCheckpointControl` and updated related components for compatibility.
- Added migration for new refresh job table and related functionality.

Next steps include integrating the tracker with barrier checkpoint control and updating RPC call sites to handle refresh progress.
@tabVersion tabVersion force-pushed the tab/refactor-tracker-2 branch from 0b6a3a8 to 682af22 Compare November 11, 2025 09:40
@tabVersion tabVersion requested a review from Copilot November 11, 2025 09:41
Copilot finished reviewing on behalf of tabVersion November 11, 2025 09:42
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated 4 comments.

- Updated SLT queries to include retry logic with backoff for improved reliability.
- Removed redundant logging in `context_impl.rs` after table refresh completion.
- Enhanced error handling in `alter_op.rs` for refresh job insertion, logging when a job already exists.
- Added logging for table refresh completion in `refresh_manager.rs`.
- Changed logging level from info to debug in `materialize.rs` for progress tracking.
tab added 6 commits November 12, 2025 16:36
- Removed the `refresh_state` field from the `Table` message in `catalog.proto` and related code.
- Updated `TableCatalog` and other components to eliminate references to the removed `refresh_state`.
- Refactored the `RefreshState` enum and its usage across the codebase to streamline refresh job management.
- Adjusted migration files to reflect the removal of the `source_refresh_mode` migration.
- Enhanced the refresh job status handling in various modules to ensure consistency.
- Introduced `ListRefreshTableStatesRequest` and `ListRefreshTableStatesResponse` messages in `meta.proto` to facilitate querying refresh job states.
- Implemented the `list_refresh_table_states` method in the `FrontendMetaClient` and `StreamManagerService` to handle the new RPC.
- Created `RwRefreshTableState` struct to represent the state of refresh jobs in the system catalog.
- Updated migration files to accommodate changes in refresh job handling.
- Enhanced the `MetaClient` to support the new RPC call for listing refresh table states.
…tures

- Added a new migration for `source_refresh_mode` to enhance refresh job capabilities.
- Updated `RefreshJob` and `RwRefreshTableState` structures to remove deprecated fields and accommodate new logic.
- Modified the `StreamManagerService` to handle timestamp conversions for last trigger times.
- Enhanced the `GlobalRefreshManager` to streamline refresh job management and ensure proper state handling.
- Included `chrono` dependency for improved date and time handling across the codebase.
@tabVersion tabVersion marked this pull request as ready for review November 12, 2025 16:48
@tabVersion tabVersion requested a review from a team as a code owner November 12, 2025 16:48
@tabVersion tabVersion requested review from MrCroxx, chenzl25 and hzxa21 and removed request for a team November 12, 2025 16:48
@chenzl25
Copy link
Contributor

@tabVersion This is a user-facing feature, so please add a release note to describe this feature and provide an example to illustrate how to use it, thanks.

@hzxa21
Copy link
Collaborator

hzxa21 commented Nov 18, 2025

@tabVersion This is a user-facing feature, so please add a release note to describe this feature and provide an example to illustrate how to use it, thanks.

Please also mention that this is an experimental feature for doc team's reference.

message SourceRefreshMode {
message SourceRefreshModeStreaming {}
message SourceRefreshModeFullRecompute {}
message SourceRefreshModeFullRecompute {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

User facing question: after this PR, user specifies refresh_mode = 'FULL_RECOMPUTE' in order to use the refreshable batch source table feature. Personally I feel FULL_RECOMPUTE can mislead user to think that there will be full recomputation on the MV instead of just on the table to calculate the diff.

I am thinking whether we should call it refresh_mode = 'SNAPSHOT_DIFF' instead. WDYT? @tabVersion @chenzl25

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We borrow the same name from here https://docs.databricks.com/aws/en/optimizations/incremental-refresh#determine-the-refresh-type-of-an-update I think it is fine, because that's how to refresh this table.snapshot diff is more like something we generate it to the downstream.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

had an offline discussion, the name finalized to FULL_RELOAD

tab added 3 commits November 18, 2025 15:56
…ation

- Added a new `refresh_manager.py` file to define panels for monitoring refresh job metrics in the RisingWave Dev Dashboard.
- Updated the `MetaMetrics` struct to include metrics for refresh job duration, finish count, cron job triggers, and misses.
- Enhanced the `GlobalRefreshManager` to track and report refresh job metrics, including success and failure statuses.
- Modified the `remove_progress_tracker` method to log metrics upon job completion or failure.
- Updated the dashboard JSON files to reflect the new refresh manager panels.
…ELOAD'

- Changed references in multiple files to reflect the updated refresh mode terminology.
- Adjusted error messages and logic to ensure consistency with the new naming convention.
- Updated related tests and utility functions to align with the changes.
@tabVersion tabVersion added this pull request to the merge queue Nov 24, 2025
Merged via the queue into main with commit f2fb5f9 Nov 24, 2025
38 of 45 checks passed
@tabVersion tabVersion deleted the tab/refactor-tracker-2 branch November 24, 2025 08:59
@github-actions
Copy link
Contributor

✅ Cherry-pick PRs (or issues if encountered conflicts) have been created successfully to all target branches.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants