Skip to content

Conversation

@clumens
Copy link
Contributor

@clumens clumens commented Dec 10, 2025

Ugh, this one wanted to get out of control. There's a whole lot of code in the fence daemon. This is basically the same as the previous PRs, but I did not move code out of fenced_commands.c into fenced_messages.c for consistency with other daemons. There's just so much code in that file that it didn't seem worth it to introduce so much churn for no good reason. I also didn't update the header file blocks in the files I didn't touch for similar reasons.

There's a lot that could still be done just in fenced_commands.c:

  • Reduce scope of loop iteration variables.
  • Split all the device table code out into its own file with its own API.
  • Split all the topology table code out into its own file with its own API.
  • Split all the metadata cache code out into its own file with its own API.
  • Create additional types for callback functions and put these and enums into a header file.
  • Do the loop conditional inverting and unindenting trick.
  • Make sure all these functions and variables are actually used.
  • Etc.

This could probably keep one of us busy for a week or two, after which time maybe reorganizing the remaining code into fenced_messages.c would be easier and make more sense.

@clumens clumens requested a review from nrwahl2 December 10, 2025 20:02
@nrwahl2
Copy link
Contributor

nrwahl2 commented Dec 10, 2025

This could probably keep one of us busy for a week or two, after which time maybe reorganizing the remaining code into fenced_messages.c would be easier and make more sense.

It's absolutely on the list of things I'd like to do. I did a bunch of refactors in #3863 and #3866, but those are just the tip of the iceberg. That's when I saw what a mess it is.

Lately I've been focusing on booth refactors while waiting for review of my libcrmcommon refactors. Booth isn't widely used, but I shudder to think of trying to fix anything there someday, given the current state of the code...

Copy link
Contributor

@nrwahl2 nrwahl2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review, through "Don't call pcmk__assert in fenced handlers."

Copy link
Contributor

@nrwahl2 nrwahl2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done reviewing

This brings it in line with how other daemons are organized.  Note that
unlike other daemons I've reorganized recently, fenced uses both
corosync and IPC.  This will require further reorganization and
refactoring in later commits.  For the moment, not everything IPC
related is in this file.
* Add doxygen comments to explain what each function does.

* Remove unnecessary whitespace in function headers.

* Standardize log messages with what's happening in the same functions
  in attrd.

* Standardize variable names (c becomes client, qbc becomes c, request
  becomes msg).

* Make sure return values and comments match what libqb documents.
* Add fenced_ipc_init to register callback functions and add them to the
  mainloop.  This means the callback struct can now be static.

* Add fenced_ipc_cleanup to clean up.  This includes client dropping and
  cleanup tasks that were previously not happening.  I'm not sure if
  this was a bug or if it doesn't really matter, but it's still good to do
  this in all daemons.
This isn't all the corosync code - just the initial setup/teardown
stuff.  There are no code changes aside from renaming the cluster
variable to fenced_cluster and defining it in fenced_corosync.c.
This brings them in line with what's happening in attrd.  Note that
stonith_peer_callback remains unchanged, but there's other things that
need to be done to that function so it will be handled separately.
* Add doxygen comments to each.

* Improve whitespace.

* Rename groupName parameter to group_name.

* Rearrange code in fenced_cpg_dispatch to be more in line with what's
  happening in attrd.
* Add doxygen comments to each.

* Improve whitespace.

* Rename groupName parameter to group_name.
This brings it closer to what is happening in attrd.  For the moment
it's still basically just a wrapper around stonith_command, but that
will change in future patches.
And make it look a lot more like the version in attrd.  It looks likely
that there's always going to be daemon-specific code to these
handle_request functions, but so far it looks like it will be pretty
minimal.
The comment indicates the reply might be freed before we log the op, but
I don't see anywhere that happens.
If the handler only deals with one type of message (IPC or CPG), it
should first check for the error case and handle that.  If it deals with
both types of messages, it should first check for whichever one requires
less code to process.  This allows as much of the body of the function
as possible to be indented as little as possible.
A couple of these handlers are only meant for IPC clients - if we get a
CPG client, that's an unknown request.  One and only one of ipc_client
or peer must be non-NULL at a time, and in general we don't assert in
other daemons like this.
* Add some additional checks in line with what is happening in other
  daemon dispatch functions.

* Move a couple comments above the code they document for line length
  reasons.

* Remove the redundant trace logging call.  This is already done by
  call_api_dispatch.
To make this look as similar as possible to the dispatch functions in
other daemons, I want to get rid of the stonith_command function.  This
will result in a little bit of code duplication for now, but I'm
planning on eventually reducing that duplication across all daemons.
Instead, inline the portions of stonith_command that would be used and
then get rid of that function entirely.
This does lose some of the existing error handling - if there's an OOM
condition, this code will no longer abort and it will instead be up to
whatever next allocates memory to fail.  However, this is almost exactly
what's happening in attrd and OOM should be extremely rare.

It's expected that eventually all daemons will use identical code for
this purpose, at which point we can handle these rare error cases
better.
And move the definition of attrd_cluster into attrd_corosync.c where it
belongs.
Some (but not all) daemons can handle both IPC and CPG message types, so
the unknown request message should be able to print both.  I'm doing
this even on daemons that only handle one type in the interest of making
these functions as similar as possible.

Other daemons have a different style of unknown request function.
Rather than sort that out right now (I think we'll need to figure out
the ACK vs NACK thing first), I'm just removing the message type detail
from the result.
@nrwahl2 nrwahl2 merged commit d09928f into ClusterLabs:main Dec 12, 2025
1 check passed
@clumens clumens deleted the fenced-ipc-cleanup branch December 13, 2025 00:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants