-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Define ControlService interface #12540
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Result of foundationdb-pr-clang-ide on Linux RHEL 9
|
Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x
|
Result of foundationdb-pr-clang-arm on Linux CentOS 7
|
Result of foundationdb-pr-clang on Linux RHEL 9
|
Result of foundationdb-pr-macos on macOS Ventura 13.x
|
Result of foundationdb-pr on Linux RHEL 9
|
Result of foundationdb-pr-cluster-tests on Linux RHEL 9
|
| // Locality describes the physical or logical location of a worker process. | ||
| // These fields help FoundationDB make intelligent placement decisions for | ||
| // data replication and fault tolerance. | ||
| message Locality { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional locality fields will be ignored in the Locality message, correct? Might be worth to note that in the comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't quite get what you are saying. I'm probably missing something here?
fdbctl/protos/control_service.proto
Outdated
|
|
||
| // Response containing the current cluster coordinators. | ||
| message GetCoordinatorsReply { | ||
| // List of coordinator addresses in the format "ip:port" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth to note that hostname:port is also possible for cluster files with DNS?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. If my understanding is correct the other commands (exclude/include) do not yet support DNS yet? I will check meanwhile, but in case you know it already.
| message GetStatusReply { | ||
| // JSON-formatted status information containing cluster health, performance metrics, | ||
| // configuration, processes, and workload information | ||
| optional string result = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not an expert in photo buf, but is there a better type than string? This would mean the clients have to parse the string again and is there any way to validate that the string is actually json formatted? Is there a reason the result is optional? Even if the cluster is unavailable we should return a json-formatted status with the error message (that's at least the current behaviour with fdbcli and when you query \xff\xff/status/json.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is ProtoJSON format, but comes with some limitations and not sure our schema can be represented in it. I can look into it.
The goal down the future is to split the Status into different fields itself, so that we only get the information we are interested in but it is quite some work which I do intend to do down the line and it was string as a intermediate step.
The reasoning for keeping is optional is more around the conventions around protobuf. (1) it is the default specifier, and is recommended. Marking it explicitly is to keep backward compatibility with proto2. Infact there is no required field in proto3 (2) around why protobuf recommends optional type itself, is to keep things backward and forward compatible.
| // Request to kill (terminate) worker processes. | ||
| // This is a forceful operation that immediately stops processes. | ||
| // Use with caution - prefer exclude for graceful removal. | ||
| message KillRequest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm not mistaken the Kill and Suspend features use the same underlying API call, do you think it makes sense to combine it in one gRPC request?
Use with caution - prefer exclude for graceful removal.
I think that comment might be a bit misleading since the kill and the exclude command are used differently for cluster operations (assuming most people use the fdbmonitor or the operator to manage their processes), e.g. mot of the time the kill command would be used to coordinate a restart of processes, e.g. because of a version upgrade or because some of the knobs have changed. But the kill command wouldn't be used to remove a process (because they will be restarted by the monitor).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
I didn't notice the suspend command. Probably makes sense to call it reboot then? Seems like only difference between suspend and kill is that suspend literally just does a thread_sleep before kill which makes fdbserver be able to anything else.
Do you know how we use suspend command in operartions?
78b1e72 to
6a07470
Compare
vishesh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've partially addressed most of the feedback. There are couple small ones left, but they are all valid ones so I will incorporate those. Main thing left behind is reviews around configure which are all valid, but do want to redo that whole definition as it is quite ugly as I mentioned in the response there as well.
| // Locality describes the physical or logical location of a worker process. | ||
| // These fields help FoundationDB make intelligent placement decisions for | ||
| // data replication and fault tolerance. | ||
| message Locality { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't quite get what you are saying. I'm probably missing something here?
fdbctl/protos/control_service.proto
Outdated
|
|
||
| // Response containing the current cluster coordinators. | ||
| message GetCoordinatorsReply { | ||
| // List of coordinator addresses in the format "ip:port" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. If my understanding is correct the other commands (exclude/include) do not yet support DNS yet? I will check meanwhile, but in case you know it already.
| message GetStatusReply { | ||
| // JSON-formatted status information containing cluster health, performance metrics, | ||
| // configuration, processes, and workload information | ||
| optional string result = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is ProtoJSON format, but comes with some limitations and not sure our schema can be represented in it. I can look into it.
The goal down the future is to split the Status into different fields itself, so that we only get the information we are interested in but it is quite some work which I do intend to do down the line and it was string as a intermediate step.
The reasoning for keeping is optional is more around the conventions around protobuf. (1) it is the default specifier, and is recommended. Marking it explicitly is to keep backward compatibility with proto2. Infact there is no required field in proto3 (2) around why protobuf recommends optional type itself, is to keep things backward and forward compatible.
| // Request to kill (terminate) worker processes. | ||
| // This is a forceful operation that immediately stops processes. | ||
| // Use with caution - prefer exclude for graceful removal. | ||
| message KillRequest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
I didn't notice the suspend command. Probably makes sense to call it reboot then? Seems like only difference between suspend and kill is that suspend literally just does a thread_sleep before kill which makes fdbserver be able to anything else.
Do you know how we use suspend command in operartions?
|
|
||
| // Request to change the cluster's coordinators. | ||
| // Coordinators maintain the cluster's configuration and state. Changing them | ||
| // is a critical operation that should be done carefully. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main concern was that the coordinators much be reachable from the clients (I don't think we do that, but potentially firewall rules etc), or it can lead to unavailability. One thing I wasn't also sure of was the behavior when insersect(BeforeSet, AfterSet) = empty. Last is coordination state. I will try to look up documentation on what exactly can happen an if there are
| // Response to a coordinator change operation. | ||
| message ChangeCoordinatorsReply { | ||
| // True if the coordinators were actually changed, false if they remained the same | ||
| optional bool changed = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can probably add another field "failed_to_chaged" coordinators if not all coordinators were changed. Or perhaps an enum is better. Looking into it.
| // This is one of the most critical operations for a FoundationDB cluster, | ||
| // affecting redundancy, storage engines, process roles, and various other settings. | ||
| message ConfigureRequest { | ||
| // Database creation flags |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me get back to you offline on this. ConfigureRequest is quite ugly and confusing (both here and in fdbcli), and I was thinking about splitting this in general for tss and general configuration. Wasn't quite sure how though. Would like to hear your thoughts on this part on how we can do this better. My experience with command has always been limited to configuring a new database and changing proxies/tlogs/... etc.
Result of foundationdb-pr-clang-ide on Linux RHEL 9
|
Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x
|
Result of foundationdb-pr-clang-arm on Linux CentOS 7
|
Result of foundationdb-pr-macos on macOS Ventura 13.x
|
Result of foundationdb-pr-clang on Linux RHEL 9
|
Result of foundationdb-pr-cluster-tests on Linux RHEL 9
|
Result of foundationdb-pr on Linux RHEL 9
|
6a07470 to
dcad3c1
Compare
Result of foundationdb-pr on Linux RHEL 9
|
Result of foundationdb-pr-clang-arm on Linux CentOS 7
|
Result of foundationdb-pr-clang-ide on Linux RHEL 9
|
Result of foundationdb-pr-cluster-tests on Linux RHEL 9
|
Result of foundationdb-pr-clang on Linux RHEL 9
|
Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x
|
Result of foundationdb-pr-macos on macOS Ventura 13.x
|
dcad3c1 to
a17564f
Compare
a17564f to
0816849
Compare
Result of foundationdb-pr-macos on macOS Ventura 13.x
|
Adds RPCs and messages that provides administration and management operations for FDB cluster.
0816849 to
1ff9d82
Compare
Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x
|
Result of foundationdb-pr-clang-arm on Linux CentOS 7
|
Result of foundationdb-pr on Linux RHEL 9
|
Result of foundationdb-pr-clang-ide on Linux RHEL 9
|
Result of foundationdb-pr-clang on Linux RHEL 9
|
Result of foundationdb-pr-cluster-tests on Linux RHEL 9
|
Result of foundationdb-pr on Linux RHEL 9
|
Result of foundationdb-pr-clang-ide on Linux RHEL 9
|
Result of foundationdb-pr-clang on Linux RHEL 9
|
Result of foundationdb-pr-cluster-tests on Linux RHEL 9
|
Result of foundationdb-pr-clang-arm on Linux CentOS 7
|
Result of foundationdb-pr-clang-ide on Linux RHEL 9
|
Result of foundationdb-pr-clang-arm on Linux CentOS 7
|
Result of foundationdb-pr-clang on Linux RHEL 9
|
Result of foundationdb-pr-macos on macOS Ventura 13.x
|
Result of foundationdb-pr on Linux RHEL 9
|
Result of foundationdb-pr-cluster-tests on Linux RHEL 9
|
Adds RPCs and messages that provides administration and management operations for FDB cluster.
Code-Reviewer Section
The general pull request guidelines can be found here.
Please check each of the following things and check all boxes before accepting a PR.
For Release-Branches
If this PR is made against a release-branch, please also check the following:
release-branchormainif this is the youngest branch)