Skip to content

Conversation

@HideoYamauchi
Copy link
Contributor

Hi All,

This PR is a modified version of the next PR.

In Pacemaker 3.0, the node name is not set when calculating the state transition for fenced, so STONITH device registration is not performed.

As a result, unfencing will fail even if you submit a cib.xml that worked with Pacemaker 2.1 that used fence_scsi.

For compatibility with Pacemaker 2.1, the process will be modified so that the node name is set on the local node when calculating the state transition for fenced.

We strongly request that this fix be included in the RHEL 10.0 release version of Pacemaker.

Best Regards,
Hideo Yamauchi.

…nced failed when the node name was unknown (for compatibility with 2.1 series).
@HideoYamauchi
Copy link
Contributor Author

Hi All,

Additional information:

This issue occurs when a cib.xml file with no content in the node section is imported.

However, importing a cib.xml file generated with pcs -f that has no content in the node section has already been addressed by fixes including the following PR.

Since this fix PR, Pacemaker's cibadmin --replace --xml-file xxx.xml --scope=configuration command allows replacement of a cib.xml file with no content in the node section.

For this reason, fenced needs to correctly register a STONITH device even when importing a cib.xml file with no content in the node section.

Best Regards,
Hideo Yamauchi.

@clumens
Copy link
Contributor

clumens commented Mar 25, 2025

@wenningerk @nrwahl2 I'd like to hear your opinions on this, thanks.

@wenningerk
Copy link
Contributor

@clumens Still working on an opinion ;-) Currently it feels a bit hacky.

@nrwahl2
Copy link
Contributor

nrwahl2 commented Mar 26, 2025

I'm planning to take a closer look at this today (after my post-meeting nap).

@nrwahl2
Copy link
Contributor

nrwahl2 commented Mar 27, 2025

@HideoYamauchi I apologize for not having sufficiently analyzed this yet.

Question: Have you figured out exactly what changed between 2.1.9 and 3.0.0, that caused this to stop working? Are you able to point to specific commits?

I see that you pointed to #2046. That's from 2020, so I don't think you're saying that commit caused this problem.

@nrwahl2
Copy link
Contributor

nrwahl2 commented Mar 27, 2025

Question: Have you figured out exactly what changed between 2.1.9 and 3.0.0, that caused this to stop working? Are you able to point to specific commits?

From a quick look, I wouldn't be surprised if it's bf7ffcd. That was introduced in 3.0.0 and touches (actually moves) the same code.

@nrwahl2
Copy link
Contributor

nrwahl2 commented Mar 27, 2025

From a quick look, I wouldn't be surprised if it's bf7ffcd. That was introduced in 3.0.0 and touches (actually moves) the same code.

Yes, in fact it looks like all your fix does is partially revert bf7ffcd. With your proposed patch, we create the fake node in the original (buggy) location again (when the fencer is making the request). And then we create the fake node again in the new ("fixed") location from bf7ffcd.

I'm not sure yet what to do about this. We probably don't want to do the fake node creation twice, but we need to figure out the correct ordering of events.

@HideoYamauchi
Copy link
Contributor Author

HideoYamauchi commented Mar 27, 2025

Hi @nrwahl2,

Thank you for your comment.

This is due to the fix you pointed out.

With this fix, the placement node of stonith resources is no longer calculated when calculating placement from fenced's cib.xml.

It's true that pe__create_fake_local_node() is executed twice.
However, by the second call of pe__create_fake_local_node(), the node has already been created in scheduler->priv->local_node_name, so it shouldn't be created a second time.

Best Regards,
Hideo Yamauchi.

@nrwahl2
Copy link
Contributor

nrwahl2 commented Mar 28, 2025

It's true that pe__create_fake_local_node() is executed twice. However, by the second call of pe__create_fake_local_node(), the node has already been created in scheduler->priv->local_node_name, so it shouldn't be created a second time.

Ah yes, you're right. So it seems like this will fix the issue and will be pretty non-invasive.

I still want to think about whether we can do it without adding a scheduler flag and without making two calls (even though it's safe to do so).

I appreciate you finding the bug, the cause, and the strategy for fixing it.

@HideoYamauchi
Copy link
Contributor Author

HideoYamauchi commented Mar 28, 2025

@nrwahl2

If you can provide the same functionality for placing a fenced STONITH device, I'll leave it up to you to figure out how to fix it.
There may be a better way to fix it.

Best Regards,
Hideo Yamauchi.

@nrwahl2
Copy link
Contributor

nrwahl2 commented Mar 28, 2025

thinking about avoiding the timing issue altogether, by inserting a fake node at the beginning of cluster_status() and overwriting it later if a real node is found

nrwahl2 added a commit to nrwahl2/pacemaker that referenced this pull request Mar 28, 2025
This fixes a regression introduced by bf7ffcd. As of that commit, the
fake local node is created after all resources have been unpacked. So it
doesn't get added to resources' allowed_nodes tables.

This prevents registration of fencing devices when the fencer receives a
CIB diff that doesn't contain the local node. For example, the user may
have replaced the CIB with a boilerplate configuration that has an empty
nodes section.

See the following pull requests from Hideo Yamauchi and their
discussions:
ClusterLabs#3849
ClusterLabs#3852

Thanks to Hideo for the report and finding the cause.

Signed-off-by: Reid Wahl <[email protected]>
@nrwahl2
Copy link
Contributor

nrwahl2 commented Mar 28, 2025

On second thought this fix might break something for remote nodes... the fake local node would get created before we've found and unpacked all remote nodes.

Then again, I don't think the fencer gets proxied to remote nodes, so it might be okay. Anyway I suggested another approach in #3855. I hope it works :)

nrwahl2 added a commit to nrwahl2/pacemaker that referenced this pull request Mar 28, 2025
This fixes a regression introduced by bf7ffcd. As of that commit, the
fake local node is created after all resources have been unpacked. So it
doesn't get added to resources' allowed_nodes tables.

This prevents registration of fencing devices when the fencer receives a
CIB diff that doesn't contain the local node. For example, the user may
have replaced the CIB with a boilerplate configuration that has an empty
nodes section.

See the following pull requests from Hideo Yamauchi and their
discussions:
ClusterLabs#3849
ClusterLabs#3852

Thanks to Hideo for the report and finding the cause.

Signed-off-by: Reid Wahl <[email protected]>
nrwahl2 added a commit to nrwahl2/pacemaker that referenced this pull request Mar 30, 2025
This fixes a regression introduced by bf7ffcd. As of that commit, the
fake local node is created after all resources have been unpacked. So it
doesn't get added to resources' allowed_nodes tables.

This prevents registration of fencing devices when the fencer receives a
CIB diff that removes the local node. For example, the user may have
replaced the CIB with a boilerplate configuration that has an empty
nodes section. Registering a fencing device requires that the local node
be in the resource's allowed nodes table.

One option would be to add the fake local node to all resources' allowed
nodes tables immediately after creating it. However, it shouldn't
necessarily be an allowed node for all resources. For example, if
symmetric-cluster=false, then a node should not be placed in a
resource's allowed nodes table by default.

It's difficult to ensure correctness of the allowed nodes tables when a
fake local node is involved. It may involve repeated code or a fragile
and confusing dependence on the order of unpacking.

Since the fake local node is a hack in the first place, it seems better
to avoid using it where possible. Currently the only code that even sets
the local_node_name member of scheduler->priv is in:
* the fencer
* crm_mon when showing the "times" section

This commit works as follows. If the fencer receives a CIB diff
notification that contains the nodes section, it triggers a full refresh
of fencing device registrations. In our example above, where a user has
replaced the CIB with a configuration that has an empty nodes section,
this means all fencing device registrations will be removed.

However, the controller also has a CIB diff notification callback:
do_cib_updated(). The controller's callback repopulates the nodes
section with up-to-date information from the cluster layer (or its node
cache) if it finds that an untrusted client like cibadmin has sent a
modified the nodes section. Then it updates the CIB accordingly.

The fencer will receive this updated CIB and refresh fencing device
registrations again, re-registering the fencing devices that were just
removed.

Note that in the common case, we're not doing all this wasted work. The
"remove and then re-register" sequence should happen only when a user
has modified the CIB in a sloppy way (for example, by deleting nodes
from the CIB's nodes section that have not been removed from the
cluster).

In short, it seems better to rely on the controller's maintenance of the
CIB's node list, than to rely on a "fake local node" hack in the
scheduler.

See the following pull requests from Hideo Yamauchi and their
discussions:
ClusterLabs#3849
ClusterLabs#3852

Thanks to Hideo for the report and finding the cause.

Signed-off-by: Reid Wahl <[email protected]>
nrwahl2 added a commit to nrwahl2/pacemaker that referenced this pull request Mar 31, 2025
This fixes a regression introduced by bf7ffcd. As of that commit, the
fake local node is created after all resources have been unpacked. So it
doesn't get added to resources' allowed_nodes tables.

This prevents registration of fencing devices when the fencer receives a
CIB diff that removes the local node. For example, the user may have
replaced the CIB with a boilerplate configuration that has an empty
nodes section. Registering a fencing device requires that the local node
be in the resource's allowed nodes table.

One option would be to add the fake local node to all resources' allowed
nodes tables immediately after creating it. However, it shouldn't
necessarily be an allowed node for all resources. For example, if
symmetric-cluster=false, then a node should not be placed in a
resource's allowed nodes table by default.

It's difficult to ensure correctness of the allowed nodes tables when a
fake local node is involved. It may involve repeated code or a fragile
and confusing dependence on the order of unpacking.

Since the fake local node is a hack in the first place, it seems better
to avoid using it where possible. Currently the only code that even sets
the local_node_name member of scheduler->priv is in:
* the fencer
* crm_mon when showing the "times" section

This commit works as follows. If the fencer receives a CIB diff
notification that contains the nodes section, it triggers a full refresh
of fencing device registrations. In our example above, where a user has
replaced the CIB with a configuration that has an empty nodes section,
this means all fencing device registrations will be removed.

However, the controller also has a CIB diff notification callback:
do_cib_updated(). The controller's callback repopulates the nodes
section with up-to-date information from the cluster layer (or its node
cache) if it finds that an untrusted client like cibadmin has sent a
modified the nodes section. Then it updates the CIB accordingly.

The fencer will receive this updated CIB and refresh fencing device
registrations again, re-registering the fencing devices that were just
removed.

Note that in the common case, we're not doing all this wasted work. The
"remove and then re-register" sequence should happen only when a user
has modified the CIB in a sloppy way (for example, by deleting nodes
from the CIB's nodes section that have not been removed from the
cluster).

In short, it seems better to rely on the controller's maintenance of the
CIB's node list, than to rely on a "fake local node" hack in the
scheduler.

See the following pull requests from Hideo Yamauchi and their
discussions:
ClusterLabs#3849
ClusterLabs#3852

Thanks to Hideo for the report and finding the cause.

Signed-off-by: Reid Wahl <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants