openshift-hyperfleet · openshift-merge-bot · Mar 30, 2026 · Mar 27, 2026 · Mar 27, 2026 · Mar 27, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -0,0 +1,116 @@
+# Changelog
+
+All notable changes to this project will be documented in this file.
+
+The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
+and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+
+## [Unreleased]
+
+### Added
+
+### Changed
+
+### Deprecated
+
+### Removed
+
+### Fixed
+
+### Security
+
+## [0.1.1](https://github.com/openshift-hyperfleet/hyperfleet-sentinel/compare/v0.1.0...v0.1.1) - 2026-03-10
+
+### Added
+- Standard metrics labels to Sentinel Prometheus metrics for consistent monitoring across HyperFleet components
+- ServiceMonitor resource for Prometheus Operator environments
+- PodDisruptionBudget to protect Sentinel availability during voluntary disruptions
+- Helm chart linting and template validation to CI via Makefile targets
+- Support for nested field paths in `message_data` configuration for richer event content
+- Functional health and readiness probes beyond basic liveness checks
+
+### Changed
+- Updated `hyperfleet-broker` to v1.1.0 and integrated `MetricsRecorder` for broker-level observability
+- Standardized Helm value structure for consistency across HyperFleet charts
+- Moved Sentinel Helm chart to `charts/` directory following repository conventions
+- GCP-specific monitoring resources are now disabled by default
+- Standardized Dockerfile and Makefile for unified image build process
+- Standardized version injection to avoid collisions with `go-toolset` environment variables
+
+### Fixed
+- RabbitMQ connection URL now included in broker ConfigMap for proper broker discovery
+- CA certificates copied from builder stage to `ubi9-micro` runtime, resolving TLS verification failures
+- Clarified Helm deployment instructions for GKE environments using Quay images
+
+## [0.1.0](https://github.com/openshift-hyperfleet/hyperfleet-sentinel/compare/v0.0.0...v0.1.0) - 2026-02-19
+
+### Added
+- Initial release of HyperFleet Sentinel Service
+- Kubernetes resource polling for clusters and nodepools
+- CloudEvents publishing with broker abstraction (GCP Pub/Sub, RabbitMQ, Stub)
+- Horizontal sharding via resource selector labels
+- Configurable polling intervals and max age intervals (not ready vs ready resources)
+- CEL-based message data templating for custom CloudEvents payloads
+- Prometheus metrics for observability
+- Grafana dashboard for monitoring
+- PodMonitoring support for GKE with Google Cloud Managed Prometheus
+- Helm chart for deployment
+- Integration tests with testcontainers (RabbitMQ and GCP Pub/Sub)
+- OpenAPI client generation from hyperfleet-api specification
+- Configuration validation at startup
+- HyperFleet API client with retry logic
+- Comprehensive test coverage and linting
+
+---
+
+<!-- Changelog Guidelines:
+
+Follow these guidelines when updating the changelog:
+
+1. **What to include:**
+   - All notable changes that affect users
+   - New features, bug fixes, security fixes
+   - Breaking changes (mark with "BREAKING CHANGE" in description)
+   - Deprecations and removals
+
+2. **What NOT to include:**
+   - Internal refactoring that doesn't affect users, i.e. editorial/layout fixes only; component boundary or interface changes MUST be logged as they impact E2E testing
+   - Development tooling changes
+   - Documentation typo fixes
+   - Code formatting changes
+
+3. **How to categorize changes:**
+   - **Added** for new features
+   - **Changed** for changes in existing functionality
+   - **Deprecated** for soon-to-be removed features
+   - **Removed** for now removed features
+   - **Fixed** for any bug fixes
+   - **Security** for vulnerability fixes
+
+4. **Version format:**
+   - Use semantic versioning (MAJOR.MINOR.PATCH)
+   - Include release date in YYYY-MM-DD format
+   - Link to release tags if available
+
+5. **Entry format:**
+   ```markdown
+   ### Added
+   - Brief description of the change
+   - Another change with [link to issue/PR](URL) if relevant
+   ```
+
+6. **Example entries:**
+   ```markdown
+   ### Added
+   - New API endpoint for cluster status monitoring
+   - Support for custom authentication providers
+
+   ### Changed
+   - BREAKING CHANGE: Updated API response format for cluster resources
+   - Improved error handling for network timeouts
+
+   ### Fixed
+   - Fixed memory leak in status polling service
+   - Resolved authentication timeout issues
+   ```
+-->
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -0,0 +1,177 @@
+# CLAUDE.md
+
+## Project Identity
+
+HyperFleet Sentinel is a **Kubernetes resource watcher** that polls the HyperFleet API for cluster/nodepool updates, makes orchestration decisions based on max age intervals, and publishes CloudEvents to message brokers. It is stateless, horizontally scalable via label-based sharding, and delegates all state persistence to the API.
+
+- **Language**: Go 1.25+
+- **Messaging**: Broker abstraction supporting RabbitMQ, GCP Pub/Sub, and Stub implementations
+- **API Client**: Generated from [hyperfleet-api](https://github.com/openshift-hyperfleet/hyperfleet-api) OpenAPI spec
+- **Deployment**: Helm chart with PodMonitoring (GKE) and ServiceMonitor (Prometheus Operator)
+
+## Critical First Steps
+
+**Generated OpenAPI client is NOT committed to git.** Before any build, test, or development task:
+
+```bash
+make generate    # Downloads spec from hyperfleet-api and generates Go client
+```
+
+Setup sequence for a fresh clone:
+1. `make generate` — generate OpenAPI client in `pkg/api/openapi/`
+2. `make download` — fetch Go dependencies
+3. `make build` — build `bin/sentinel` binary
+4. `make test` — verify unit tests pass
+
+## Verification Commands
+
+| Command | What it does |
+|---|---|
+| `make verify` | go vet + format check (fast) |
+| `make lint` | golangci-lint (comprehensive) |
+| `make test` | unit tests only (no external deps) |
+| `make test-integration` | integration tests with testcontainers (RabbitMQ, Pub/Sub) |
+| `make test-helm` | Helm chart lint and validation |
+| `make test-all` | lint + unit + integration + helm tests |
+
+Use `make verify && make test` for fast local feedback. Use `make test-all` before pushing.
+
+## Code Conventions
+
+### Commit Messages
+Format: `HYPERFLEET-### - type: description`
+
+Example:
+```
+HYPERFLEET-427 - feat: add standard metrics labels
+
+Adds resource_type and resource_selector labels to all
+Prometheus metrics for consistent querying.
+
+Co-Authored-By: Claude <noreply@anthropic.com>
+```
+
+### Import Ordering
+1. Standard library
+2. External packages (`github.com/google/cel-go`, `github.com/prometheus/client_golang`)
+3. HyperFleet packages (`github.com/openshift-hyperfleet/hyperfleet-broker`, etc.)
+4. Internal packages (`github.com/openshift-hyperfleet/hyperfleet-sentinel/internal/...`)
+
+### Configuration
+- Config lives in `internal/config/config.go` — struct tags for YAML, validation via `Validate()`
+- All durations use `time.Duration` with YAML `duration` format (e.g., `5s`, `30m`)
+- Environment variables override YAML only for broker credentials (via hyperfleet-broker library)
+- Config validation fails fast at startup — never run with invalid config
+
+### Error Handling
+- Errors propagate with context: `fmt.Errorf("failed to poll API: %w", err)`
+- Log errors at the boundary (main service loop), not deep in call stack
+- Use structured logging: `logger.Error("msg", "key", value, "error", err)`
+
+### Metrics
+- All metrics defined in `pkg/metrics/metrics.go` — use Prometheus client conventions
+- Standard labels on all metrics: `resource_type`, `resource_selector`
+- Counter: `_total` suffix (e.g., `hyperfleet_sentinel_events_published_total`)
+- Gauge: no suffix (e.g., `hyperfleet_sentinel_pending_resources`)
+- Histogram: `_seconds` suffix (e.g., `hyperfleet_sentinel_poll_duration_seconds`)
+
+### Testing
+- Unit tests: mock external dependencies (API client, broker), fast, deterministic
+- Integration tests: testcontainers for real RabbitMQ/Pub/Sub, slower, covers end-to-end flows
+- Test file naming: `*_test.go` alongside implementation
+- Integration tests: `test/integration/*_test.go` with build tag `//go:build integration`
+
+### CloudEvents Structure
+Events use CEL expressions from `message_data` config to build payloads:
+```yaml
+message_data:
+  id: resource.id            # CEL expressions, not static values
+  kind: resource.kind
+  href: resource.href
+  generation: resource.generation
+```
+
+CEL context includes:
+- `resource` — the cluster/nodepool object from API
+- `reason` — decision string ("not_ready", "ready_stale", "ready_fresh")
+
+## Project Boundaries
+
+**DO NOT**:
+- Modify generated code in `pkg/api/openapi/` — regenerate via `make generate` instead
+- Add dependencies without checking licenses (`go-licenses` reports in CI)
+- Commit broker credentials or GCP service account keys
+- Add business logic to Sentinel — orchestration decisions only, execution belongs in adapters
+- Store state in Sentinel — it is stateless, API is the source of truth
+- Poll faster than API can handle — respect backpressure and rate limits
+
+**DO**:
+- Use `make generate` after any hyperfleet-api spec changes
+- Add tests for new features (unit + integration if broker/API interaction)
+- Update Prometheus metrics when adding observable behaviors
+- Update CHANGELOG.md for user-visible changes
+- Follow the ObjectReference pattern for CloudEvents payloads (id, kind, href)
+- Use broker abstraction (`hyperfleet-broker`) — never import RabbitMQ/Pub/Sub clients directly
+
+## Architecture Context
+
+Sentinel is one component in the HyperFleet control plane:
+- **API** persists cluster/nodepool state (source of truth)
+- **Sentinel** watches API, decides when resources need reconciliation, publishes events
+- **Adapters** consume events, execute provisioning/deprovisioning, report status back to API
+- **Broker** (RabbitMQ or Pub/Sub) decouples Sentinel from adapters
+
+Sentinel's job: **decide when**, not **execute how**. Max age intervals define "when":
+- `max_age_not_ready`: poll frequently for unstable resources
+- `max_age_ready`: poll infrequently for stable resources
+
+## Local Development
+
+```bash
+# 1. Start HyperFleet API (see hyperfleet-api repo) and RabbitMQ
+docker run -d -p 5672:5672 rabbitmq:3-management
+
+# 2. Configure (see configs/dev-example.yaml and broker.yaml for templates)
+# 3. Run Sentinel
+./bin/sentinel serve --config config.yaml
+
+# Watch events at http://localhost:15672 (guest/guest)
+```
+
+For detailed local/GKE deployment, see [docs/running-sentinel.md](docs/running-sentinel.md).
+
+## Helm Chart
+
+Chart lives in `charts/` with values for:
+- Multiple Sentinel instances with different `resource_selector` (sharding)
+- Monitoring: PodMonitoring (GKE/GMP) or ServiceMonitor (Prometheus Operator)
+- Broker config via ConfigMap (type, topic) + Secret (credentials)
+
+Example: deploy 2 Sentinels watching different shards:
+```bash
+helm install sentinel-shard-1 ./charts \
+  --set config.resourceSelector[0].label=shard \
+  --set config.resourceSelector[0].value=1 \
+  --set broker.topic=hyperfleet-prod-clusters
+
+helm install sentinel-shard-2 ./charts \
+  --set config.resourceSelector[0].label=shard \
+  --set config.resourceSelector[0].value=2 \
+  --set broker.topic=hyperfleet-prod-clusters
+```
+
+Both read from the same API and publish to the same topic, but watch different label-filtered subsets.
+
+## Validation Checklist
+
+Before submitting a PR:
+1. `make generate` — ensure OpenAPI client is current
+2. `make fmt` — format code
+3. `make verify` — vet and format check
+4. `make lint` — pass golangci-lint
+5. `make test` — pass unit tests
+6. `make test-integration` — pass integration tests (if broker/API changes)
+7. `make test-helm` — validate Helm chart
+8. Update CHANGELOG.md for user-visible changes
+9. Add metrics if new observable behavior
+10. Commit message follows `HYPERFLEET-### - type: description` format