Skip to content

Conversation

@proton-lisandro-pin
Copy link
Contributor

@proton-lisandro-pin proton-lisandro-pin commented Nov 24, 2025

Initial work to support ranged GTID updates from proxysql_mysqlbinlog.

This PR reworks gtid_interval_t so GTID intervals can be compared/merged, and modifies the logic to add new GTIDs to gtid_set_t so it understand range updates. Notably, it will automatically de-dupe and compact overlapping ranges,
when possible.

A PR to properly parse ranges on I1= and I2= GTID update messages will follow; until then, this PR changes
no functionality within ProxySQL.

See sysown/proxysql_mysqlbinlog#33 for details.

@gemini-code-assist
Copy link

Summary of Changes

Hello @proton-lisandro-pin, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces foundational changes to ProxySQL's GTID management system, specifically to enable support for ranged GTID updates. It redefines how GTID intervals are represented and manipulated internally, moving from basic pairs to a more robust Gtid_Interval class with built-in logic for merging and comparison. The core addGtid function has been revamped to intelligently handle these new interval objects, ensuring that GTID sets can correctly incorporate and consolidate ranges. This work is a prerequisite for an upcoming feature that will parse and apply ranged GTID messages from proxysql_mysqlbinlog.

Highlights

  • GTID Interval Rework: The gtid_interval_t type has been refactored from a simple std::pair to a dedicated Gtid_Interval class, providing methods for comparison, merging, and string representation.
  • Enhanced GTID Set Logic: The gtid_set_t now supports ranged GTID updates, with the addGtid function being significantly updated to handle the insertion and merging of Gtid_Interval objects, including sorting and consolidating overlapping ranges.
  • New Source File: A new source file, proxysql_gtid.cpp, has been added to encapsulate the implementation of the Gtid_Interval class.
  • No Immediate Functional Change: This PR lays the groundwork for future ranged GTID parsing, and as such, introduces no immediate functional changes to ProxySQL's behavior regarding GTID updates, as proper parsing will follow in a subsequent PR.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@proton-lisandro-pin
Copy link
Contributor Author

Question: none of the files in lib\ appears to have unit tests defined 😕

I'll be happy to add tests for proxysql_gtid.cpp if you point me to where you'd prefer these - maybe under test\lib?

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces significant changes to support ranged GTID updates by refactoring gtid_interval_t from a std::pair into a dedicated Gtid_Interval class and reworking the addGtid logic. The changes are well-structured and lay a good foundation for parsing GTID ranges. My review focuses on improving the C++ implementation of the new Gtid_Interval class and the associated functions, with suggestions for enhancing const correctness, readability, and adherence to C++ best practices. These improvements will make the new code more robust and easier to maintain.

@proton-lisandro-pin proton-lisandro-pin force-pushed the ranged_gtid_updates branch 2 times, most recently from dd126a2 to b1a781c Compare November 24, 2025 17:51
@proton-lisandro-pin proton-lisandro-pin force-pushed the ranged_gtid_updates branch 9 times, most recently from c8ac908 to 93dfdf8 Compare November 26, 2025 12:35
@proton-lisandro-pin proton-lisandro-pin force-pushed the ranged_gtid_updates branch 2 times, most recently from fda6306 to 893fc40 Compare December 12, 2025 12:49
Copy link
Collaborator

@wazir-ahmed wazir-ahmed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall. Just a few minor comments to address.

Comment on lines 469 to 482
// merge overlapping GTID ranges, if any
it->second.sort();
auto a = it->second.begin();
while (a != it->second.end()) {
auto b = std::next(a);
if (b == it->second.end()) {
break;
}
if (a->merge(*b)) {
it->second.erase(b);
continue;
}
a++;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performing sort+merge outside this function scope, after inserting all intervals, perhaps in read_all_gtids(), would be more efficient. WDYT?

Copy link
Contributor Author

@proton-lisandro-pin proton-lisandro-pin Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sort+merge is necessary to efficiently handle GTID gaps, and i was planning to have this logic encapsulated in gtid_interval_t once it gets converted to a class 😞

However, i just realized that most GTID ranges added will extend the last interval in the list, as GTID updates are streamed in sequence, so we can optimize. Fixed to avoid sorting+merging when that's the case.

Copy link
Collaborator

@wazir-ahmed wazir-ahmed Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this patch gtid_interval_t has an merge method which does the job of extending an existing GTID. But this can still lead to creation of adjacent intervals, such as 1-5 and 6-10. We need the run the sort+merge to combine intervals like this. But the question is, how frequent we need to this operation?

  1. If GTIDs are mostly streamed in sequence, then calling merge method in addGtid() will take care of that and extent the existing interval.
  2. We only need to run sort+merge to cleanup the formation of adjacent intervals and we don't need to that during insertion of each interval (addGtid).

I'm suggesting, read_all_gtids() would be good place for this, especially considering the batch update feature in mysqlbinlog reader. For each read(), which much contains n number of txid (or in future txid intervals), we can run sort+merge once after inserting all txid.
class.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i was planning to have this logic encapsulated in gtid_interval_t once it gets converted to a class

We should put sort+merge in gtid_set_t once it gets converted into a class.

Copy link
Contributor Author

@proton-lisandro-pin proton-lisandro-pin Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this patch gtid_interval_t has an merge method which does the job of extending an existing GTID. But this can still lead to creation of adjacent intervals, such as 1-5 and 6-10. We need the run the sort+merge to combine intervals like this.

No, merge actually merges intervals 😄 So 6-10 merged with 1-5 will correctly yield 1-10, and compact the range.

I have unit tests showcasing this, but i'm not quite sure where to contribute them. ProxySQL doesn't seem to have a lot of unit coverage.

But the question is, how frequent we need to this operation?

Not often, in practice - in my testing GTIDs streamed to ProxySQL come always mostly ordered. The problem is that, even when rare, the existing implementation will not deal with gaps or out-of-order updates, and quickly baloon memory usage in such scenarios.

The overhead added by this sort+merge approach is minimal and guarantees optimal GTID management, which is worth the added complexity IMHO.

I'm suggesting, read_all_gtids() would be good place for this, especially considering the batch update feature in mysqlbinlog reader. For each read(), which much contains n number of txid (or in future txid intervals), we can run sort+merge once after inserting all txid.
class.
(...)
We should put sort+merge in gtid_set_t once it gets converted into a class.

Still WIP, but proton-lisandro-pin@a2cfdac should explain the intended approach better.

The end result is that GTID management concerns are decoupled from the code ingesting GTID updates (see f.ex. line 334) from the mysqlbinlog client.

Copy link
Collaborator

@wazir-ahmed wazir-ahmed Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, merge actually merges intervals 😄 So 6-10 merged with 1-5 will correctly yield 1-10, and compact the range.

Calling merge for each individual message from mysqlbinlog is not enough. If ProxySQL receives, txid in the following sequence for example, it would lead to formation of adjacent intervals.

I2=1    // 1-1
I2=6    // 1-1, 6-6
I2=7    // 1-1, 6-7
I2=2    // 1-2, 6-7
I2=3    // 1-3, 6-7
I2=4    // 1-4, 6-7
I2=8    // 1-4, 6-8
I2=9    // 1-4, 6-9
I2=10   // 1-4, 6-10
I2=5    // 1-5, 6-10

We need the run the sort+merge logic to combine intervals like this. This is what I was describing before.

Copy link
Collaborator

@wazir-ahmed wazir-ahmed Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the existing implementation will not deal with gaps or out-of-order updates, and quickly baloon memory usage in such scenarios.

  • The for loop at the end of existing addGtid() function is intended to detect the intervals that are adjacent and merge them. It would convert { 1-5, 6-10 } to { 1-10 }.
  • But it is not prefect. For example, if ProxySQL receives I2=6 before I2=1, then that would lead to the formation of intervals { 6-10, 1-5 } which won't be considered as adjacent intervals by the current implementation. So sorting is necessary before iterating through the intervals (your implementation fixes this).

@sonarqubecloud
Copy link

@wazir-ahmed
Copy link
Collaborator

@proton-lisandro-pin please resolve the conflicts.

@wazir-ahmed wazir-ahmed changed the title Add logic to support ranged GTID updates from proxysql_mysqlbinlog. refactor: Extract GTID interval logic into Gtid_Interval class Dec 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants