Draft
Conversation
🦋 Changeset detectedLatest commit: f1f4c0f The changes in this PR will be included in the next version bump. This PR includes changesets to release 12 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This instruments time spent in various parts of the replication process for MongoDB, and logs with each change stream batch. Example log output:
The goal here is to help diagnose slow replication issues on a high level. When replication is slower than expected, we want to know whether the bottleneck is (1) the source db or network, (2) the storage db, or (3) replication process CPU. The stats here help identify that.
This includes:
duration: Roughly equal total duration between batches.wait_for_change_stream: Time spent waiting for the next batch on the change stream, including the source db waiting for more changes, scanning the oplog, processing the change stream pipeline, and network transfer time.parse_duration: Time spent converting from raw change stream buffers to input for sync config.evaluating_duration: Time spent evaluating sync queries.flush_duration: Time spent writing to the storage database.lock_delay: Time spent waiting for other replication jobs within the same process.retry_delay: Time spent waiting for other replication jobs in different processes.Fields 2-7 above give a rough breakdown of the total duration, but it is not an exhaustive list - there is more CPU overhead not explicitly tracked.
Other source databases are not currently tracking any of this. Postgres storage also doesn't report the storage-specific stats currently.