Skip to content

COO-1687: feat: migrate to EndpointSlice service discovery#1028

Merged
openshift-merge-bot[bot] merged 2 commits intorhobs:mainfrom
jan--f:endpointslices-migration
Mar 12, 2026
Merged

COO-1687: feat: migrate to EndpointSlice service discovery#1028
openshift-merge-bot[bot] merged 2 commits intorhobs:mainfrom
jan--f:endpointslices-migration

Conversation

@jan--f
Copy link
Collaborator

@jan--f jan--f commented Mar 9, 2026

Prometheus Operator defaults to watching the deprecated Endpoints API for service discovery. Switch the operator's own ServiceMonitors to use EndpointSlice explicitly, which eliminates the deprecation log noise from the operator's internal components.

Changes:

  • Set serviceDiscoveryRole: EndpointSlice on the ServiceMonitors we own (observability-operator, health-analyzer, thanos-querier) so that prometheus-operator uses the EndpointSlice role for these jobs.
  • Add discovery.k8s.io/endpointslices to all Prometheus RBAC roles and ClusterRoles (alongside the existing endpoints permission) so that Prometheus can serve both kinds of ServiceMonitors simultaneously.
  • Add discovery.k8s.io/endpointslices to the korrel8r ClusterRole so the correlation tool can read both endpoint representations.
  • Add the corresponding kubebuilder markers and update the generated cluster role YAML and CSV.

The Prometheus CR's global serviceDiscoveryRole is intentionally left unset (defaulting to Endpoints) so that user-created ServiceMonitors continue to work without modification. Users can opt individual ServiceMonitors into EndpointSlice by setting serviceDiscoveryRole: EndpointSlice on them.

@openshift-ci
Copy link

openshift-ci bot commented Mar 9, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jan--f

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved label Mar 9, 2026
@jan--f jan--f changed the title feat: migrate to EndpointSlice service discovery COO-1687: feat: migrate to EndpointSlice service discovery Mar 9, 2026
@openshift-ci-robot
Copy link
Collaborator

openshift-ci-robot commented Mar 9, 2026

@jan--f: This pull request references COO-1687 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Prometheus Operator defaults to watching the deprecated Endpoints API for service discovery. Switch the operator's own ServiceMonitors to use EndpointSlice explicitly, which eliminates the deprecation log noise from the operator's internal components.

Changes:

  • Set serviceDiscoveryRole: EndpointSlice on the ServiceMonitors we own (observability-operator, health-analyzer, thanos-querier) so that prometheus-operator uses the EndpointSlice role for these jobs.
  • Add discovery.k8s.io/endpointslices to all Prometheus RBAC roles and ClusterRoles (alongside the existing endpoints permission) so that Prometheus can serve both kinds of ServiceMonitors simultaneously.
  • Add discovery.k8s.io/endpointslices to the korrel8r ClusterRole so the correlation tool can read both endpoint representations.
  • Add the corresponding kubebuilder markers and update the generated cluster role YAML and CSV.

The Prometheus CR's global serviceDiscoveryRole is intentionally left unset (defaulting to Endpoints) so that user-created ServiceMonitors continue to work without modification. Users can opt individual ServiceMonitors into EndpointSlice by setting serviceDiscoveryRole: EndpointSlice on them.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@jan--f
Copy link
Collaborator Author

jan--f commented Mar 9, 2026

/retest

1 similar comment
@jan--f
Copy link
Collaborator Author

jan--f commented Mar 12, 2026

/retest

@jan--f jan--f force-pushed the endpointslices-migration branch from de800c3 to bc8b9f7 Compare March 12, 2026 10:43
Jan Fajerski and others added 2 commits March 12, 2026 10:50
Prometheus Operator defaults to watching the deprecated Endpoints API for
service discovery. Switch the operator's own ServiceMonitors to use
EndpointSlice explicitly, which eliminates the deprecation log noise from
the operator's internal components.

Changes:
- Set serviceDiscoveryRole: EndpointSlice on the ServiceMonitors we own
  (observability-operator, health-analyzer, thanos-querier) so that
  prometheus-operator uses the EndpointSlice role for these jobs.
- Add discovery.k8s.io/endpointslices to all Prometheus RBAC roles and
  ClusterRoles (alongside the existing endpoints permission) so that
  Prometheus can serve both kinds of ServiceMonitors simultaneously.
- Add discovery.k8s.io/endpointslices to the korrel8r ClusterRole so
  the correlation tool can read both endpoint representations.
- Add the corresponding kubebuilder markers and update the generated
  cluster role YAML and CSV.

The Prometheus CR's global serviceDiscoveryRole is intentionally left
unset (defaulting to Endpoints) so that user-created ServiceMonitors
continue to work without modification. Users can opt individual
ServiceMonitors into EndpointSlice by setting serviceDiscoveryRole:
EndpointSlice on them.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Jan Fajerski <jan@fajerski.name>
…nitors

The operator's self-monitoring ServiceMonitor and the health-analyzer
ServiceMonitor are monitoring.coreos.com objects processed by the
platform prometheus-operator on OpenShift, which we don't control.
Setting serviceDiscoveryRole: EndpointSlice on them requires the
platform Prometheus to have endpointslices access and the platform
prometheus-operator to correctly generate TLS-aware scrape configs
for the endpointslice role — neither of which is guaranteed across
OCP versions.

The thanos-querier ServiceMonitor (monitoring.rhobs) is handled by
the obo-prometheus-operator we manage, so it retains the EndpointSlice
setting safely.

Fixes TestOperatorMetrics/metrics_ingested_in_Prometheus on OCP clusters.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
@jan--f jan--f force-pushed the endpointslices-migration branch from bc8b9f7 to 3adb1fb Compare March 12, 2026 10:54
@simonpasquier
Copy link
Contributor

/lgtm

@PeterYurkovich
Copy link
Member

/cherry-pick release-1.4

@openshift-cherrypick-robot

@PeterYurkovich: once the present PR merges, I will cherry-pick it on top of release-1.4 in a new PR and assign it to you.

Details

In response to this:

/cherry-pick release-1.4

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-merge-bot openshift-merge-bot bot merged commit cbd6ba3 into rhobs:main Mar 12, 2026
11 checks passed
@openshift-cherrypick-robot

@PeterYurkovich: #1028 failed to apply on top of branch "release-1.4":

Applying: feat: migrate to EndpointSlice service discovery
Using index info to reconstruct a base tree...
M	bundle/manifests/observability-operator.clusterserviceversion.yaml
Falling back to patching base and 3-way merge...
Auto-merging bundle/manifests/observability-operator.clusterserviceversion.yaml
CONFLICT (content): Merge conflict in bundle/manifests/observability-operator.clusterserviceversion.yaml
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
hint: When you have resolved this problem, run "git am --continue".
hint: If you prefer to skip this patch, run "git am --skip" instead.
hint: To restore the original branch and stop patching, run "git am --abort".
hint: Disable this message with "git config set advice.mergeConflict false"
Patch failed at 0001 feat: migrate to EndpointSlice service discovery

Details

In response to this:

/cherry-pick release-1.4

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

PeterYurkovich pushed a commit to PeterYurkovich/observability-operator that referenced this pull request Mar 12, 2026
* feat: migrate to EndpointSlice service discovery

Prometheus Operator defaults to watching the deprecated Endpoints API for
service discovery. Switch the operator's own ServiceMonitors to use
EndpointSlice explicitly, which eliminates the deprecation log noise from
the operator's internal components.

Changes:
- Set serviceDiscoveryRole: EndpointSlice on the ServiceMonitors we own
  (observability-operator, health-analyzer, thanos-querier) so that
  prometheus-operator uses the EndpointSlice role for these jobs.
- Add discovery.k8s.io/endpointslices to all Prometheus RBAC roles and
  ClusterRoles (alongside the existing endpoints permission) so that
  Prometheus can serve both kinds of ServiceMonitors simultaneously.
- Add discovery.k8s.io/endpointslices to the korrel8r ClusterRole so
  the correlation tool can read both endpoint representations.
- Add the corresponding kubebuilder markers and update the generated
  cluster role YAML and CSV.

The Prometheus CR's global serviceDiscoveryRole is intentionally left
unset (defaulting to Endpoints) so that user-created ServiceMonitors
continue to work without modification. Users can opt individual
ServiceMonitors into EndpointSlice by setting serviceDiscoveryRole:
EndpointSlice on them.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Jan Fajerski <jan@fajerski.name>

* fix: revert serviceDiscoveryRole from monitoring.coreos.com ServiceMonitors

The operator's self-monitoring ServiceMonitor and the health-analyzer
ServiceMonitor are monitoring.coreos.com objects processed by the
platform prometheus-operator on OpenShift, which we don't control.
Setting serviceDiscoveryRole: EndpointSlice on them requires the
platform Prometheus to have endpointslices access and the platform
prometheus-operator to correctly generate TLS-aware scrape configs
for the endpointslice role — neither of which is guaranteed across
OCP versions.

The thanos-querier ServiceMonitor (monitoring.rhobs) is handled by
the obo-prometheus-operator we manage, so it retains the EndpointSlice
setting safely.

Fixes TestOperatorMetrics/metrics_ingested_in_Prometheus on OCP clusters.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

---------

Signed-off-by: Jan Fajerski <jan@fajerski.name>
Co-authored-by: Jan Fajerski <jan@fajerski.name>
Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants