Skip to content

Conversation

@vishesh
Copy link
Contributor

@vishesh vishesh commented Nov 6, 2025

Adds RPCs and messages that provides administration and management operations for FDB cluster.

Code-Reviewer Section

The general pull request guidelines can be found here.

Please check each of the following things and check all boxes before accepting a PR.

  • The PR has a description, explaining both the problem and the solution.
  • The description mentions which forms of testing were done and the testing seems reasonable.
  • Every function/class/actor that was touched is reasonably well documented.

For Release-Branches

If this PR is made against a release-branch, please also check the following:

  • This change/bugfix is a cherry-pick from the next younger branch (younger release-branch or main if this is the youngest branch)
  • There is a good reason why this PR needs to go into a release branch and this reason is documented (either in the description above or in a linked GitHub issue)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang-ide on Linux RHEL 9

  • Commit ID: 78b1e72
  • Duration 0:25:54
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x

  • Commit ID: 78b1e72
  • Duration 0:35:13
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang-arm on Linux CentOS 7

  • Commit ID: 78b1e72
  • Duration 0:46:21
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang on Linux RHEL 9

  • Commit ID: 78b1e72
  • Duration 0:49:36
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-macos on macOS Ventura 13.x

  • Commit ID: 78b1e72
  • Duration 0:51:57
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr on Linux RHEL 9

  • Commit ID: 78b1e72
  • Duration 0:56:14
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-cluster-tests on Linux RHEL 9

  • Commit ID: 78b1e72
  • Duration 1:08:03
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
  • Cluster Test Logs zip file of the test logs (available for 30 days)

// Locality describes the physical or logical location of a worker process.
// These fields help FoundationDB make intelligent placement decisions for
// data replication and fault tolerance.
message Locality {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional locality fields will be ignored in the Locality message, correct? Might be worth to note that in the comments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't quite get what you are saying. I'm probably missing something here?


// Response containing the current cluster coordinators.
message GetCoordinatorsReply {
// List of coordinator addresses in the format "ip:port"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth to note that hostname:port is also possible for cluster files with DNS?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. If my understanding is correct the other commands (exclude/include) do not yet support DNS yet? I will check meanwhile, but in case you know it already.

message GetStatusReply {
// JSON-formatted status information containing cluster health, performance metrics,
// configuration, processes, and workload information
optional string result = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not an expert in photo buf, but is there a better type than string? This would mean the clients have to parse the string again and is there any way to validate that the string is actually json formatted? Is there a reason the result is optional? Even if the cluster is unavailable we should return a json-formatted status with the error message (that's at least the current behaviour with fdbcli and when you query \xff\xff/status/json.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is ProtoJSON format, but comes with some limitations and not sure our schema can be represented in it. I can look into it.

The goal down the future is to split the Status into different fields itself, so that we only get the information we are interested in but it is quite some work which I do intend to do down the line and it was string as a intermediate step.

The reasoning for keeping is optional is more around the conventions around protobuf. (1) it is the default specifier, and is recommended. Marking it explicitly is to keep backward compatibility with proto2. Infact there is no required field in proto3 (2) around why protobuf recommends optional type itself, is to keep things backward and forward compatible.

// Request to kill (terminate) worker processes.
// This is a forceful operation that immediately stops processes.
// Use with caution - prefer exclude for graceful removal.
message KillRequest {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm not mistaken the Kill and Suspend features use the same underlying API call, do you think it makes sense to combine it in one gRPC request?

Use with caution - prefer exclude for graceful removal.

I think that comment might be a bit misleading since the kill and the exclude command are used differently for cluster operations (assuming most people use the fdbmonitor or the operator to manage their processes), e.g. mot of the time the kill command would be used to coordinate a restart of processes, e.g. because of a version upgrade or because some of the knobs have changed. But the kill command wouldn't be used to remove a process (because they will be restarted by the monitor).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

I didn't notice the suspend command. Probably makes sense to call it reboot then? Seems like only difference between suspend and kill is that suspend literally just does a thread_sleep before kill which makes fdbserver be able to anything else.

Do you know how we use suspend command in operartions?

Copy link
Contributor Author

@vishesh vishesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've partially addressed most of the feedback. There are couple small ones left, but they are all valid ones so I will incorporate those. Main thing left behind is reviews around configure which are all valid, but do want to redo that whole definition as it is quite ugly as I mentioned in the response there as well.

// Locality describes the physical or logical location of a worker process.
// These fields help FoundationDB make intelligent placement decisions for
// data replication and fault tolerance.
message Locality {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't quite get what you are saying. I'm probably missing something here?


// Response containing the current cluster coordinators.
message GetCoordinatorsReply {
// List of coordinator addresses in the format "ip:port"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. If my understanding is correct the other commands (exclude/include) do not yet support DNS yet? I will check meanwhile, but in case you know it already.

message GetStatusReply {
// JSON-formatted status information containing cluster health, performance metrics,
// configuration, processes, and workload information
optional string result = 1;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is ProtoJSON format, but comes with some limitations and not sure our schema can be represented in it. I can look into it.

The goal down the future is to split the Status into different fields itself, so that we only get the information we are interested in but it is quite some work which I do intend to do down the line and it was string as a intermediate step.

The reasoning for keeping is optional is more around the conventions around protobuf. (1) it is the default specifier, and is recommended. Marking it explicitly is to keep backward compatibility with proto2. Infact there is no required field in proto3 (2) around why protobuf recommends optional type itself, is to keep things backward and forward compatible.

// Request to kill (terminate) worker processes.
// This is a forceful operation that immediately stops processes.
// Use with caution - prefer exclude for graceful removal.
message KillRequest {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

I didn't notice the suspend command. Probably makes sense to call it reboot then? Seems like only difference between suspend and kill is that suspend literally just does a thread_sleep before kill which makes fdbserver be able to anything else.

Do you know how we use suspend command in operartions?


// Request to change the cluster's coordinators.
// Coordinators maintain the cluster's configuration and state. Changing them
// is a critical operation that should be done carefully.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main concern was that the coordinators much be reachable from the clients (I don't think we do that, but potentially firewall rules etc), or it can lead to unavailability. One thing I wasn't also sure of was the behavior when insersect(BeforeSet, AfterSet) = empty. Last is coordination state. I will try to look up documentation on what exactly can happen an if there are

// Response to a coordinator change operation.
message ChangeCoordinatorsReply {
// True if the coordinators were actually changed, false if they remained the same
optional bool changed = 1;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can probably add another field "failed_to_chaged" coordinators if not all coordinators were changed. Or perhaps an enum is better. Looking into it.

// This is one of the most critical operations for a FoundationDB cluster,
// affecting redundancy, storage engines, process roles, and various other settings.
message ConfigureRequest {
// Database creation flags
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me get back to you offline on this. ConfigureRequest is quite ugly and confusing (both here and in fdbcli), and I was thinking about splitting this in general for tss and general configuration. Wasn't quite sure how though. Would like to hear your thoughts on this part on how we can do this better. My experience with command has always been limited to configuring a new database and changing proxies/tlogs/... etc.

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang-ide on Linux RHEL 9

  • Commit ID: 6a07470
  • Duration 0:24:38
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x

  • Commit ID: 6a07470
  • Duration 0:36:42
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang-arm on Linux CentOS 7

  • Commit ID: 6a07470
  • Duration 0:46:34
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-macos on macOS Ventura 13.x

  • Commit ID: 6a07470
  • Duration 0:52:12
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang on Linux RHEL 9

  • Commit ID: 6a07470
  • Duration 1:04:25
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-cluster-tests on Linux RHEL 9

  • Commit ID: 6a07470
  • Duration 1:09:00
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
  • Cluster Test Logs zip file of the test logs (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr on Linux RHEL 9

  • Commit ID: 6a07470
  • Duration 1:10:27
  • Result: ❌ FAILED
  • Error: Error while executing command: ctest -j ${NPROC} --no-compress-output -T test --output-on-failure. Reason: exit status 8
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr on Linux RHEL 9

  • Commit ID: dcad3c1
  • Duration 0:07:26
  • Result: ❌ FAILED
  • Error: Error while executing command: ninja -v -C build_output -j ${NPROC} all packages strip_targets. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang-arm on Linux CentOS 7

  • Commit ID: dcad3c1
  • Duration 0:07:24
  • Result: ❌ FAILED
  • Error: Error while executing command: ninja -v -C build_output -j ${NPROC} all packages strip_targets. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang-ide on Linux RHEL 9

  • Commit ID: dcad3c1
  • Duration 0:07:29
  • Result: ❌ FAILED
  • Error: Error while executing command: ninja -v -C build_output -j ${NPROC} all. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-cluster-tests on Linux RHEL 9

  • Commit ID: dcad3c1
  • Duration 0:07:31
  • Result: ❌ FAILED
  • Error: Error while executing command: ninja -v -C build_output -j ${NPROC} all packages strip_targets. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
  • Cluster Test Logs zip file of the test logs (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang on Linux RHEL 9

  • Commit ID: dcad3c1
  • Duration 0:07:44
  • Result: ❌ FAILED
  • Error: Error while executing command: ninja -v -C build_output -j ${NPROC} all packages strip_targets. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x

  • Commit ID: dcad3c1
  • Duration 0:36:54
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-macos on macOS Ventura 13.x

  • Commit ID: dcad3c1
  • Duration 0:52:43
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-macos on macOS Ventura 13.x

  • Commit ID: a17564f
  • Duration 0:05:29
  • Result: ❌ FAILED
  • Error: git checkout failed with exit status 128: fatal: unable to read tree (a17564f0572852ef92711da4a30b9b92825bec5c) for primary source and source version a17564f0572852ef92711da4a30b9b92825bec5c
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

Adds RPCs and messages that provides administration and management
operations for FDB cluster.
@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x

  • Commit ID: a17564f
  • Duration 0:06:26
  • Result: ❌ FAILED
  • Error: git checkout failed with exit status 128: fatal: unable to read tree (a17564f0572852ef92711da4a30b9b92825bec5c) for primary source and source version a17564f0572852ef92711da4a30b9b92825bec5c
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang-arm on Linux CentOS 7

  • Commit ID: a17564f
  • Duration 0:07:17
  • Result: ❌ FAILED
  • Error: Error while executing command: ninja -v -C build_output -j ${NPROC} all packages strip_targets. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr on Linux RHEL 9

  • Commit ID: a17564f
  • Duration 0:07:20
  • Result: ❌ FAILED
  • Error: Error while executing command: ninja -v -C build_output -j ${NPROC} all packages strip_targets. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang-ide on Linux RHEL 9

  • Commit ID: a17564f
  • Duration 0:07:22
  • Result: ❌ FAILED
  • Error: Error while executing command: ninja -v -C build_output -j ${NPROC} all. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang on Linux RHEL 9

  • Commit ID: a17564f
  • Duration 0:07:27
  • Result: ❌ FAILED
  • Error: Error while executing command: ninja -v -C build_output -j ${NPROC} all packages strip_targets. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-cluster-tests on Linux RHEL 9

  • Commit ID: a17564f
  • Duration 0:07:46
  • Result: ❌ FAILED
  • Error: Error while executing command: ninja -v -C build_output -j ${NPROC} all packages strip_targets. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
  • Cluster Test Logs zip file of the test logs (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr on Linux RHEL 9

  • Commit ID: 0816849
  • Duration 0:04:03
  • Result: ❌ FAILED
  • Error: git checkout failed with exit status 128: fatal: unable to read tree (08168490e7baeaf6aec096571214df19d989bb89) for primary source and source version 08168490e7baeaf6aec096571214df19d989bb89
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang-ide on Linux RHEL 9

  • Commit ID: 0816849
  • Duration 0:04:03
  • Result: ❌ FAILED
  • Error: git checkout failed with exit status 128: fatal: unable to read tree (08168490e7baeaf6aec096571214df19d989bb89) for primary source and source version 08168490e7baeaf6aec096571214df19d989bb89
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang on Linux RHEL 9

  • Commit ID: 0816849
  • Duration 0:04:03
  • Result: ❌ FAILED
  • Error: git checkout failed with exit status 128: fatal: unable to read tree (08168490e7baeaf6aec096571214df19d989bb89) for primary source and source version 08168490e7baeaf6aec096571214df19d989bb89
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-cluster-tests on Linux RHEL 9

  • Commit ID: 0816849
  • Duration 0:04:03
  • Result: ❌ FAILED
  • Error: git checkout failed with exit status 128: fatal: unable to read tree (08168490e7baeaf6aec096571214df19d989bb89) for primary source and source version 08168490e7baeaf6aec096571214df19d989bb89
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
  • Cluster Test Logs zip file of the test logs (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang-arm on Linux CentOS 7

  • Commit ID: 0816849
  • Duration 0:04:27
  • Result: ❌ FAILED
  • Error: git checkout failed with exit status 128: fatal: unable to read tree (08168490e7baeaf6aec096571214df19d989bb89) for primary source and source version 08168490e7baeaf6aec096571214df19d989bb89
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang-ide on Linux RHEL 9

  • Commit ID: 1ff9d82
  • Duration 0:24:51
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang-arm on Linux CentOS 7

  • Commit ID: 1ff9d82
  • Duration 0:44:59
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang on Linux RHEL 9

  • Commit ID: 1ff9d82
  • Duration 0:49:13
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-macos on macOS Ventura 13.x

  • Commit ID: 1ff9d82
  • Duration 0:54:03
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr on Linux RHEL 9

  • Commit ID: 1ff9d82
  • Duration 0:58:31
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-cluster-tests on Linux RHEL 9

  • Commit ID: 1ff9d82
  • Duration 1:10:25
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
  • Cluster Test Logs zip file of the test logs (available for 30 days)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants