Skip to content

Add cloud-connector as a native foremanctl feature#569

Open
jeremylenz wants to merge 15 commits into
theforeman:masterfrom
jeremylenz:cloud-connector-feature
Open

Add cloud-connector as a native foremanctl feature#569
jeremylenz wants to merge 15 commits into
theforeman:masterfrom
jeremylenz:cloud-connector-feature

Conversation

@jeremylenz

@jeremylenz jeremylenz commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Re-implements the upstream satellite_operations.cloud_connector role natively in foremanctl (SAT-45966 / SAT-44641)
  • New cloud-connector feature gated behind --add-feature cloud-connector, with rh-cloud dependency
  • Installs rhc and yggdrasil-worker-forwarder, configures the worker, starts rhcd, sets rhc_instance_id via the Foreman API, and announces to Sources
  • Works with both foremanctl deploy and forge deploy-dev (dev uses separate admin credentials)
  • Early pre-checks validate package availability and mutual exclusion with the iop feature
  • Adds Foreman CA to system trust store so the worker binary can verify Foreman's certificate
  • Optional --cloud-connector-http-proxy flag for environments without direct internet access

Companion PR: theforeman/foreman_rh_cloud#1214 (adds the announce_to_sources API endpoint)

Test plan

  • foremanctl deploy --add-feature cloud-connector completes successfully
  • foremanctl features --list-enabled includes cloud-connector
  • systemctl status rhcd shows active/running
  • /etc/rhc/workers/foreman_rh_cloud.toml has correct content
  • hammer settings info --name rhc_instance_id shows the consumer cert CN
  • Second foremanctl deploy is idempotent
  • forge deploy-dev --add-feature cloud-connector completes successfully
  • Enabling both cloud-connector and iop fails early with a clear error
  • Deploying without yggdrasil-worker-forwarder repo enabled fails early with a clear error

🤖 Generated with Claude Code

- name: Verify cloud-connector is not used with iop
ansible.builtin.assert:
that:
- "'iop' not in enabled_features"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to enforce this at the parameter level that these two features cannot co-exist.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exactly do you mean by parameter? I attemped to solve this with a new PR which I am now realizing I hadn't raised yet, stand by..

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is #570 what you had in mind?

path: /etc/pki/consumer/cert.pem
register: __cloud_connector_consumer_cert

- name: Verify consumer certificate exists

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this sufficient? What if the system is registered to somewhere else for example? What if the system has a consumer cert but it's stale or cannot reach console.redhat.com ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of those things will cause it to break later, either during deploy or after. But this is the same thing the original role does. (I actually added this check so that it will fail earlier than it would otherwise.)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would a subscription-manager status tell us more?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

subscription-manager status gives us this:

# subscription-manager status
+-------------------------------------------+
   System Status Details
+-------------------------------------------+
Overall Status: Registered
Content Access Mode is set to Simple Content Access. This host has access to content, regardless of subscription status.

subscription-manager identity gives us the actual consumer uuid:

# subscription-manager identity
system identity: cf46f013-6455-4ceb-913e-b1af651f098b
name: ip-10-0-198-13.rhos-01.prod.psi.rdu2.redhat.com
org name: 11949999
org ID: 11949999

But that command does make an API call.

Comment thread src/roles/check_cloud_connector/tasks/main.yaml Outdated
Comment thread src/roles/cloud_connector/defaults/main.yaml Outdated
Comment thread src/roles/cloud_connector/handlers/main.yaml
Comment thread src/roles/cloud_connector/tasks/main.yaml
Comment thread src/roles/cloud_connector/tasks/main.yaml Outdated
Comment thread src/roles/cloud_connector/tasks/main.yaml Outdated
group: root
mode: '0755'

- name: Add Foreman CA to system trust store

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this how we do it today? This is an anti-pattern we have been trying to avoid.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

foreman-installer does the same thing via puppet-certs:

We need it here because yggdrasil-worker-forwarder is a Go binary that uses the OS trust store — no way to pass it a CA path. That said, we could move this into the certificates role so it's done once globally instead of per-feature. Would that be cleaner?

@jeremylenz jeremylenz Jun 12, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was apprehensive about this too. I figured we can change it later..
Claude reply above

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are both 404s because they were moved quite a bit ago. I think we need to consider udpating yggdrasil-worker-forwarder vs. starting this trend of relying on the system store. Let's phone a friend for another opinion. @evgeni ☎️

@jeremylenz jeremylenz Jun 15, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to update it anyway to use DBUS (so it can run on RHEL 10), so we can probably just tack that change on there.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/theforeman/yggdrasil-worker-forwarder doesn't seem to use Github Issues, so I will create an internal Jira for that.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the description of https://redhat.atlassian.net/browse/SAT-27307. However, ideally I would like this PR to be merged and use the "incorrect" approach temporarily, to move things forward,.

Comment thread src/roles/cloud_connector/tasks/main.yaml
Comment thread src/roles/cloud_connector/tasks/main.yaml Outdated
protocol = "grpc"
env = [
"FORWARDER_USER={{ cloud_connector_user }}",
"FORWARDER_PASSWORD={{ cloud_connector_password }}",

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the part we still need to work out how to properly handle, that is, the authentication.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I see it our options are
1 - Keep things as they are here, feeding admin/changeme to rhc - Not acceptable because the admin password ends up in the worker config.toml.
2 - Create a service user with limited permissions - Doable in the existing PRs here and in foreman_rh_cloud.
3 - Create a service user + personal access token and use that, same as the previous architecture - FAM doesn't have a module for personal access tokens, so this would be a fair bit of work

I think Option 2 seems promising and am pushing that update now.

@jeremylenz

Copy link
Copy Markdown
Contributor Author

Pushed updates addressing review feedback (1bfe640):

  • Handler: switched from ansible.builtin.service to ansible.builtin.systemd_service since we use daemon_reload
  • Workers directory: removed the task — confirmed rhc package on EL9 creates /etc/rhc/workers/
  • Settings API: replaced raw uri calls with theforeman.foreman.setting module for rhc_instance_id and allow_auto_inventory_upload
  • Lint fix: added noqa: no-static-secrets on the defaults password (same pattern as foreman_development)

@jeremylenz jeremylenz force-pushed the cloud-connector-feature branch from 908e8b1 to 95a9cc9 Compare June 15, 2026 18:13
jeremylenz and others added 11 commits June 23, 2026 16:20
Re-implements the upstream satellite_operations.cloud_connector role
natively in foremanctl so users can enable it via:
  foremanctl deploy --add-feature cloud-connector

The new role installs rhc and yggdrasil-worker-forwarder, templates
the worker config, starts the rhcd service, and sets rhc_instance_id
via the Foreman API. Optional HTTP proxy support is included.

Works with both foremanctl deploy and forge deploy-dev (with
appropriate credential overrides for the dev environment).

Enforces mutual exclusion with the iop feature at runtime.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move iop mutual exclusion and package availability checks into a
new check_cloud_connector role that runs in the checks phase, before
any services are deployed. This avoids a long deploy-dev run failing
late when it reaches the cloud_connector role.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use ca_path with the Foreman CA certificate instead of validate_certs,
matching the pattern used by other roles (foreman, check_foreman_api).
The self-signed CA cert is always available in the deploy context.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
After setting the rhc_instance_id, POST to the new
/api/v2/rh_cloud/announce_to_sources endpoint to register the
Satellite in Sources on console.redhat.com. This replaces the
Ruby-side CloudConnectorAnnounceTask that previously triggered
on REX job completion.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The yggdrasil-worker-forwarder binary uses the OS trust store and
doesn't accept a CA path argument. Add the Foreman CA certificate
to the system trust store so the worker can verify Foreman's
self-signed certificate when forwarding cloud requests.

Also fix Content-Type header on the announce_to_sources POST.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Set allow_auto_inventory_upload to true via the Foreman API,
matching the previous cloud connector setup behavior.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Verify /etc/pki/consumer/cert.pem exists early in the checks
phase, since the cloud_connector role needs it to derive the
rhc_instance_id from the certificate CN.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove cross-role variable references from defaults (use standalone
  fallback values; base.yaml provides the real overrides)
- Rename task "Configure rhc-cloud-connector-worker" for consistency
- Rename "Announce Satellite to Sources" to "Announce to Sources"
- Fix var-naming lint: use role-prefixed variable names instead of
  double-underscore prefix

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Use ansible.builtin.systemd_service instead of service for handler
- Remove redundant workers directory task (rhc package creates it)
- Use theforeman.foreman.setting module instead of raw uri for settings
- Add noqa for static secret in role defaults

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Instead of storing admin credentials in the worker config, create a
dedicated cloud_connector_user with a limited role that only grants
dispatch_cloud_requests permission. The service user password is
generated and persisted like other foremanctl secrets.

Admin credentials are still used for the FAM calls to create the
user/role and manage settings, but they no longer end up on disk
in the worker config file.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Clarifies that these are the admin credentials used for Foreman API
calls during setup, distinct from cloud_connector_service_user/password
which are the limited-permission credentials baked into the worker config.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jeremylenz jeremylenz force-pushed the cloud-connector-feature branch from 2092035 to 84335ce Compare June 23, 2026 20:21
@jeremylenz

Copy link
Copy Markdown
Contributor Author

rebased & fixed conflicts

@qcjames53 qcjames53 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not enough in the loop to give this a full review but had a few comments that might be helpful. Consider this 80% good points 20% ass-talking. Here's what I was able to test directly:

  • The iop conflict and missing consumer cert checks both work on the positive and negative cases (used manual subscription manager commands for the latter).
  • The yggdrasil-worker-forwarder check works in the negative (was unable to figure out package install on centos9).
  • The cloud connector role works.

Beyond these tests, I have questions about the whole approach of cloud connector even after reading the planning docs and such.

  1. How are we splitting RHEL and Centos 9 deploys? I think the differentiation needs to be clearer since (best I can tell) yggdrasil-worker-forwarder is RHEL only. If the packaged container images are centos9 and the rpms needed for cloud connector are RHEL-only, that seems like a big problem to me!
  2. deploy-dev needs to work with cloud connector. At minimum a procedure but ideally out of the box. I think we really need a procedure for setting up a centos9 deploy-dev environment to build/test cloud connector. If that's not possible, then we need a good procedure for setting up a RHEL environment as deploy-dev.
  3. If it's possible to pass in subscription manager credentials safely, I would greatly prefer if we could deploy an environment with cloud connector immediately instead of requiring manual registration for the quadlet VM, THEN adding the feature. I understand this won't be an issue for customers but this would be HUGE for testing purposes.

I have to gracefully bow out of reviewing this for a few reasons: capacity + artemis stuff + this baby could come any day! Sorry to not complete testing but I put in an honest effort.

The system must be registered with subscription-manager.

- name: Check that yggdrasil-worker-forwarder package is available
ansible.builtin.command: dnf info yggdrasil-worker-forwarder

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yggdrasil-worker-forwarder - Is this available in centos9? Where is it shipped? I know it won't be a problem for shipping pre-built images but for dev environments and such we should really document a pathway for getting this installed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread src/roles/cloud_connector/tasks/main.yaml Outdated
- name: Read client ID from CN of consumer certificate
ansible.builtin.command: openssl x509 -in /etc/pki/consumer/cert.pem -subject -noout
register: cloud_connector_cert_output
changed_when: false

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above but opposite. Idempotency?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is correct — changed_when: false means "this command never changes anything," which is right for a read-only openssl x509 call. Standard Ansible pattern for info-gathering tasks.

Comment thread src/roles/checks/tasks/main.yml Outdated
@jeremylenz

jeremylenz commented Jun 24, 2026

Copy link
Copy Markdown
Contributor Author

I'll respond with thoughts off the top of my head :)

  1. How are we splitting RHEL and Centos 9 deploys? I think the differentiation needs to be clearer since (best I can tell) yggdrasil-worker-forwarder is RHEL only. If the packaged container images are centos9 and the rpms needed for cloud connector are RHEL-only, that seems like a big problem to me!

In order for cloud connector to work you need available on your base system:

  1. ansible-runner command - Downstream, from the Satellite repo, e.g. satellite-6.19-for-rhel-9-x86_64-rpms. Upstream, from the Foreman Plugins repo, e.g. https://yum.theforeman.org/plugins/3.19/el9/x86_64/
  2. yggdrasil-worker-forwarder - Downstream, from the Satellite repo. Upstream, from the Foreman Plugins repo, e.g. https://yum.theforeman.org/plugins/3.19/el9/x86_64/
  3. rhc - from AppStream (CentOS or RHEL)

In the previous architecture, these weren't even listed as prerequisites, even in upstream docs. Downstream, you'd already have all of these since the Satellite system must be registered and have those repos enabled. But upstream, anything can happen. It seems this is a documentation gap. cc @Lennonka

  1. deploy-dev needs to work with cloud connector. At minimum a procedure but ideally out of the box. I think we really need a procedure for setting up a centos9 deploy-dev environment to build/test cloud connector. If that's not possible, then we need a good procedure for setting up a RHEL environment as deploy-dev.

I was testing with deploy-dev on RHEL (with --target-host localhost). I think it works fine, as long as you have the repos available that I mentioned above.

- Make update-ca-trust a handler triggered by CA file changes instead
  of an always-changed task, fixing idempotency
- Move check_cloud_connector out of the checks loop and call it
  separately with a when clause, matching the check_database_index
  pattern

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jeremylenz

Copy link
Copy Markdown
Contributor Author

@qcjames53 Addressed 2 comments and replied to the third

Comment thread src/roles/cloud_connector/defaults/main.yaml Outdated
The default was accidentally set to 'cloud_connector_admin_user'
(a leftover from the rename) instead of the intended login name
'cloud_connector_user'.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@lfu

lfu commented Jun 26, 2026

Copy link
Copy Markdown

Tested cloud connector feature in development mode. All core functionality works correctly. Found 3 necessary configuration changes for development environment.


🔍 Testing Methodology

Environment: Development mode using forge deploy-dev with Rails source code
Test Scope:

  • Cloud connector deployment
  • Service user authentication
  • API endpoint functionality
  • Worker configuration

🔗 Related/Dependent PRs

Foreman RH Cloud Plugin

Foremanctl Branch Preservation

Note: These PRs should be merged before or alongside this cloud connector feature PR.


✅ What Works

1. Cloud Connector Deployment

  • ✓ Cloud connector role created successfully
  • ✓ Required packages present (rhc, yggdrasil-worker-forwarder) - installed as prerequisite
  • ✓ Worker config file created: /etc/rhc/workers/foreman_rh_cloud.toml
  • ✓ rhcd service running and enabled

Note: The cloud connector role does not install packages - it assumes rhc and yggdrasil-worker-forwarder are already present (installed manually or by prerequisite roles).

2. Service User Authentication

Created service user cloud_connector_user with proper credentials:

User ID: 5
Role: Cloud Connector
Auth Source: Internal
Last Login: Successfully authenticated via API

3. API Endpoint Verification

Manual test of /api/v2/rh_cloud/cloud_request endpoint:

Request:

curl -u cloud_connector_user:$PASSWORD \
  -X POST \
  http://localhost:3000/api/v2/rh_cloud/cloud_request \
  -H 'Content-Type: application/json' \
  -d '{"directive":"foreman_rh_cloud", ...}'

Result:

✓ Authenticated user cloud_connector_user against INTERNAL authentication source
✓ Authorized user cloud_connector_user(cloud_connector_user)
✓ Request processed (playbook URL validation occurred as expected)

4. Configuration Validation

  • rhc_instance_id setting configured: 5fc35187-ee43-48a8-b61a-01c0153d0562
  • ✓ Worker URL correct: http://localhost:3000/api/v2/rh_cloud/cloud_request
  • ✓ Credentials properly stored in worker config

🔧 Required Code Changes

Found 3 necessary changes for development environment support:

Change 1: Development URL Configuration

File: development/playbooks/deploy-dev/deploy-dev.yaml

Issue: Development uses Rails dev server on port 3000, not containerized Foreman

Fix:

vars:
  cloud_connector_url: "http://localhost:3000"
  cloud_connector_admin_user: "{{ foreman_development_admin_user }}"
  cloud_connector_admin_password: "{{ foreman_development_admin_password }}"

Why: Points worker to the correct development Rails server endpoint


Change 2: Disable Certificate Validation for HTTP

File: src/roles/cloud_connector/tasks/main.yaml

Issue: Development uses HTTP (no SSL), certificate validation fails

Fix:

- name: Create cloud connector service user
  theforeman.foreman.user:
    # ... existing parameters ...
    validate_certs: false  # <-- Add this

Why: HTTP connections don't have valid certificates in development


Change 3: Make CA Certificate Path Optional

File: src/roles/cloud_connector/tasks/main.yaml

Issue: No CA certificate exists for HTTP connections, undefined variable error

Fix:

- name: Create cloud connector service user
  theforeman.foreman.user:
    # ... existing parameters ...
    ca_path: "{{ foreman_ca_certificate | default(omit) }}"  # <-- Change this

Why: Prevents Ansible error when foreman_ca_certificate is undefined (HTTP mode)


⚠️ Known Limitations

Automated pytest Suite Skipped

Issue: All 6 cloud connector tests were skipped:

tests/cloud_connector_test.py::test_rhc_package_installed SKIPPED
tests/cloud_connector_test.py::test_yggdrasil_worker_forwarder_package_installed SKIPPED
tests/cloud_connector_test.py::test_workers_directory_exists SKIPPED
tests/cloud_connector_test.py::test_worker_config_exists SKIPPED
tests/cloud_connector_test.py::test_worker_script_exists SKIPPED
tests/cloud_connector_test.py::test_rhcd_service_running SKIPPED

Root Cause:

  • pytest hardcoded to check ./foremanctl features (production state)
  • Development uses ./forge deploy-dev (separate state)
  • Feature flags from --add-feature are temporary, not persisted

Impact:

  • ❌ Automated test suite cannot verify development deployments
  • ✅ Manual verification confirms all test criteria pass

Testing completed with: Claude Code

@jeremylenz

Copy link
Copy Markdown
Contributor Author

Re: Change 1 - cloud_connector_url defaults to foreman_url which is https://{{ fqdn }},
and httpd proxies to the Rails server. The HTTPS URL through httpd works fine in dev — we tested it. Using http://localhost:3000 directly would bypass httpd and also break the ca_path on
all the FAM module calls. The worker TOML would also have an HTTP URL, which means the runtime cloud requests would go over unencrypted HTTP.

Change 2 is not necessary unless I were to do Change 1, which I think is not needed for the reason above.

Updating now with Change 3.

Use default(omit) for ca_path so the role works in environments
where foreman_ca_certificate is not defined (e.g. HTTP-only dev
setups). Also skip the CA trust store copy when undefined.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jeremylenz

Copy link
Copy Markdown
Contributor Author

Btw, Our role does install the packages — line 2-7 of tasks/main.yaml:

  - name: Install rhc and yggdrasil-worker-forwarder
    ansible.builtin.package:
      name:
        - rhc
        - yggdrasil-worker-forwarder

foreman-protector is not used in foremanctl deployments. The
disable_plugin parameter was carried over from the upstream role
where it was needed for installer-based deployments.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants