-
Notifications
You must be signed in to change notification settings - Fork 14
feat: Add Debezium-style Snapshot for Initial Data Capture #37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
All looks good for the MVP. For the next step, we could support multiple-instance processing for the initial snapshot. We can’t apply this to normal slot capturing because of PostgreSQL limitations, but it should be feasible for the initial state. If we run the initial snapshot with multiple instances, we need to design how they’ll be processed and ensure an instance behaves consistently when passive (i.e., a passive instance must not interfere with processing). We should also plan for failure cases — how ownership is recovered, how timeouts are handled, and how we guarantee progress if an instance goes passive or crashes. For the next implementation, I suggest a chunk table with partial ranges at snapshot initialization, for example: Process the snapshot chunk-by-chunk; once all chunks covering the snapshot are finished, we can start consuming slot events. |
* feat: distirbuted snapshot support (wip) * refactor: file paths for snapshot
…hen testing multiple instance
…d increase coordinator wait timeout
# Conflicts: # config/config.go
# Conflicts: # README.md # config/config.go # connector.go
* chore: update benchmark build * feat: add initial benchmark test --------- Co-authored-by: Serhat Karabulut <[email protected]>
emreodabas
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
related #34
Short Description (for PR summary)
Implements a snapshot mechanism to capture existing database rows before starting CDC.
Features include crash recovery, checkpoint-based resume, configurable batch processing,
and full PostgreSQL type support. Includes comprehensive example and monitoring metrics.