012 - Hot reload feature proposal by Uzziee · Pull Request #83 · kroxylicious/design

Uzziee · 2025-11-18T06:49:59Z

This proposal is to add hot reload functionality, which will enable the app to reload any changes to virtual cluster config without the need to restart the app

Signed-off-by: Urjit Patel <105218041+Uzziee@users.noreply.github.com>

SamBarker

Design PR#83 Feedback - Configuration Reload Design

Date: 2026-01-28
Reviewer: Sam Barker
Design PR: #83

Executive Summary

Thank you for putting together this design proposal! Configuration reload is a critical operational feature that many users have been asking for, and your design work provides a solid foundation for moving this forward.

The current proposal focuses on file watch as the primary mechanism. This feedback suggests an alternative HTTP-first approach with 2-phase validation and discusses enhancements that will make either approach production-ready. The feedback builds on analysis of the POC implementation (PR#3176).

Your POC demonstrates the core reload mechanism works well - the questions here are primarily about the trigger mechanism and operator integration patterns. The groundwork you've laid out makes these decisions much clearer.

Proposed Change to Design: HTTP Endpoints as Primary Interface

Current Design Proposal

The design PR currently proposes file watch as the primary mechanism for configuration reload (Part 1), with potential HTTP endpoints as future work.

Recommended Alternative: HTTP-First Approach

I recommend inverting this: make HTTP endpoints the primary interface, with file watching as an optional convenience layer.

Rationale for HTTP-first:

✅ Universal: Works on bare metal, Kubernetes, and any deployment model
✅ Operator-friendly: Natural integration point for Kubernetes operator (operator detects ConfigMap changes → POST /admin/config/reload)
✅ Testable: Easy to test programmatically (integration tests can POST directly)
✅ Observable: Clear success/failure responses (200 OK vs 400 Bad Request with error details)
✅ Composable: File watching can be implemented as a layer that calls the HTTP endpoint internally
✅ Kubernetes-native: Aligns with how operators interact with workloads (API calls, not filesystem)

File watching challenges:

❌ Read-only filesystem (Kubernetes security best practice blocks file writes)
❌ ConfigMap mounting complexity (..data symlinks, atomic updates)
❌ No feedback mechanism (how does operator know reload succeeded/failed?)
❌ Race conditions (file watch triggers before ConfigMap fully mounted)

Proposed architecture:

Core: HTTP Management Endpoints

Proxy exposes on localhost:9190 (management port):
    ↓
POST /admin/config/validate (validate without applying)
POST /admin/config/reload (apply changes)
GET /admin/config/status (current config version, last operation status)
GET /admin/health (proxy health for liveness/readiness, already exists)
    ↓
Core reload mechanism (shared by all trigger mechanisms)

2-Phase Workflow:

Validate: Build models, initialize filters, check internal consistency (no port binding)
Reload: If validation passes, apply changes (bind ports, register gateways)

Security:

Default bind: localhost:9190 (local access only)
For Kubernetes: Bind to 0.0.0.0:9190 (pod IP accessible to operator)
Authentication: Optional (TLS client certificates, bearer tokens)
Recommendations:
- Bare metal: Keep localhost binding, use local access controls
- Kubernetes: Use NetworkPolicy to restrict operator→proxy traffic
- Production: Consider mTLS for operator↔proxy communication

Trigger Mechanisms (How to Call HTTP Endpoints)

Option 1: Direct HTTP (Kubernetes Operator)

Operator detects ConfigMap change
    ↓
POST /admin/config/validate to management Service
    ↓
POST /admin/config/reload to all pod IPs

✅ Native Kubernetes integration
✅ Immediate feedback via HTTP responses
✅ No filesystem coupling

Option 2: File Watcher (Bare Metal)

Sidecar process watches config file
    ↓
On file change → POST to localhost:9190/admin/config/validate
    ↓
If valid → POST to localhost:9190/admin/config/reload

Sidecar options:

Shell script: Simple inotifywait wrapper

inotifywait -e modify /etc/kroxylicious/config.yaml | while read; do
  curl -X POST http://localhost:9190/admin/config/validate --data-binary @/etc/kroxylicious/config.yaml
  if [ $? -eq 0 ]; then
    curl -X POST http://localhost:9190/admin/config/reload --data-binary @/etc/kroxylicious/config.yaml
  fi
done

Go binary: More robust error handling, retry logic
In-process Java: WatchService (if proxy can write to filesystem for persistence)

✅ Familiar workflow for bare metal users
✅ Decoupled from proxy (sidecar can be restarted independently)
✅ Uses same HTTP endpoints as Kubernetes

This means:

HTTP endpoints are the primitive (required)
File watching is optional convenience (can be added later)
Both deployment models use same tested, validated endpoints
Validation catches config errors before any cluster goes down

Note: This is a significant change from the current design proposal, which focuses on file watch without a validation phase. If the community prefers file watch as the primary mechanism, we should address the challenges listed above (read-only filesystem, feedback mechanisms, etc.) in the design.

Cluster Modification Semantics

The design's remove→add pattern is architecturally necessary:

The proxy's channel state machine has a fundamental constraint: each frontend channel (client→proxy) has a 1:1 relationship with a backend channel (proxy→broker). There's no mechanism to redirect an existing backend connection without closing the frontend connection.

This means:

Any cluster modification requires draining connections (1-30 seconds downtime per cluster)
"Atomic swap" approaches don't eliminate downtime—they would require hot-swapping filters in the Netty pipeline, which introduces filter state management complexity
The remove→add pattern is the correct architectural choice, not a limitation to be overcome

Implication for design: Document that cluster modifications incur brief downtime (1-30s) and this is by design, not a quality issue.

Rollback Strategy (Needs Discussion)

Current POC behavior: Rollback ALL clusters on ANY failure (all-or-nothing semantics)

This is a critical design decision that requires community consensus. The choice affects operational complexity, user experience, and downtime characteristics. See "Questions for Design Discussion" below for detailed analysis of trade-offs.

Key question: When cluster-a succeeds but cluster-b fails, should we:

Option A: Rollback cluster-a (all-or-nothing) → simpler operations, more downtime
Option B: Keep cluster-a on new config (partial success) → less downtime, more complexity

Recommendation for design: Dedicate a section to this decision, present both options fairly, and explicitly request community feedback before proceeding.

Core Design: HTTP Endpoints with 2-Phase Commit

Validation Endpoint (Core Component)

API:

POST /admin/config/validate
Content-Type: application/yaml

{new configuration YAML}

Response (200 OK):
{
  "valid": true,
  "configVersion": "a3f5b2c19e4d"  // SHA-256 hash of config
}

Response (400 Bad Request):
{
  "valid": false,
  "errors": [
    "Filter 'record-encryption' initialization failed: KMS URL required",
    "Port conflict: 9293 used by cluster-a and cluster-b"
  ]
}

What it validates:

✅ YAML syntax and structure
✅ Filter types exist (registered via SPI)
✅ FilterFactory.initialize() succeeds (filter config valid)
✅ Port ranges internally consistent (no duplicate ports in config)

What it doesn't validate (runtime concerns):

❌ Ports actually available on the OS (might be in use)
❌ External dependencies reachable (KMS might be down during reload)
❌ Upstream Kafka cluster healthy

Why this split is acceptable:

Validation is about catching configuration errors (syntax, invalid filter config). Runtime failures (port conflicts, KMS down at reload time) are handled by rollback. We can't guarantee "config valid at 10:00am" means "will succeed at 10:02am" for external dependencies.

Implementation note: Validation should build models and initialize filters without binding ports or registering gateways. This makes validation:

Fast (no network operations)
Deterministic (same result on all pods)
Resource-light (no double-memory usage)

Reload Endpoint (Core Component)

API:

POST /admin/config/reload
Content-Type: application/yaml

{new configuration YAML}

Response (200 OK):
{
  "success": true,
  "configVersion": "a3f5b2c19e4d",
  "clustersModified": ["cluster-a", "cluster-b"]
}

Response (500 Internal Server Error):
{
  "success": false,
  "error": "Failed to modify cluster-b: filter initialization failed",
  "configVersion": "abc123"  // Rolled back to previous version
}

What it does:

Applies configuration changes (remove→add clusters as needed)
If any operation fails → rollback all changes
Returns success/failure with current config version

Configuration Options

Management endpoint binding:

# proxy-config.yaml
admin:
  host: "localhost"  # Default: localhost only (bare metal)
  # host: "0.0.0.0"  # Kubernetes: bind to pod IP
  port: 9190
  tls:  # Optional: mTLS for operator communication
    keyStore: /path/to/keystore.jks
    trustStore: /path/to/truststore.jks

Benefits of this architecture:

Catches 90% of errors before any cluster goes down (validation phase)
Clear error messages before disruption
Same HTTP endpoints for Kubernetes and bare metal
File watching is optional, can be added as sidecar later
Security: localhost by default, configurable for Kubernetes

Kubernetes Integration Patterns

Management Service

Problem: Operator creates Services for Kafka traffic (ports 9292+) but not for the management port (9190).

Proposed: Create dedicated management Service for operator↔proxy communication:

apiVersion: v1
kind: Service
metadata:
  name: my-proxy-management
spec:
  type: ClusterIP  # Internal only
  selector:
    app.kubernetes.io/instance: minimal
    app.kubernetes.io/component: proxy
  ports:
  - name: management
    port: 9190
    targetPort: 9190

Benefits:

✅ Automatic pod readiness handling (Service only routes to ready pods, returns 503 if none ready)
✅ Stable DNS endpoint (my-proxy-management.ns.svc.cluster.local)
✅ Survives pod restarts/rescheduling
✅ Follows Kubernetes best practices (Services for stable endpoints)

Usage:

Validation: POST http://my-proxy-management:9190/admin/config/validate (one pod via Service)
Reload: Iterate over pods, POST directly to pod IPs (all pods must succeed)

Recommendation: Add management Service pattern to Kubernetes deployment section of design.

Read-Only Filesystem Support

Problem: Kubernetes deployments use securityContext.readOnlyRootFilesystem: true as security best practice. Current design persists config to disk after successful reload, which fails with read-only filesystem.

Proposed: Make config file persistence optional:

Deployment models:

Bare metal: Config file on disk, persist on successful reload
Kubernetes: Config in ConfigMap (operator-managed), no disk persistence

Recommendation: Document read-only filesystem support as a requirement for Kubernetes deployments.

Checksum-Based Change Detection

Problem: Operator needs to detect "config actually changed" vs "CRD reconciliation loop with no real change."

Proposed: Store SHA-256 hash of config YAML in KafkaProxy annotation:

apiVersion: kroxylicious.io/v1alpha1
kind: KafkaProxy
metadata:
  name: minimal
  annotations:
    kroxylicious.io/config-checksum: "a3f5b2c19e4d"  # SHA-256 hash
spec:
  # ... config ...

Operator logic:

String newChecksum = sha256(generateYaml(kafkaProxy));
String oldChecksum = kafkaProxy.getMetadata().getAnnotations().get("kroxylicious.io/config-checksum");

if (newChecksum.equals(oldChecksum)) {
    LOGGER.debug("Config unchanged, skipping reload");
    return;  // No-op, avoid unnecessary reload
}

// Config changed, trigger 2-phase reload
ValidationResult validation = validateViaManagementService(yaml);
if (validation.valid()) {
    reloadAllPods(yaml);
    kafkaProxy.getMetadata().getAnnotations().put("kroxylicious.io/config-checksum", newChecksum);
}

Benefits:

✅ Automatic no-op detection (reconciliation loop doesn't trigger unnecessary reloads)
✅ Rollback detection (reverting config doesn't reload if already at that state)
✅ O(1) comparison vs deep config diff

Recommendation: Add checksum-based change detection to operator integration section.

Additional Design Components

Configurable Drain Timeout

Problem: Hard-coded 30-second drain timeout is too short for Kafka consumers with long poll timeouts (default 5 minutes).

Proposed:

# proxy-config.yaml
admin:
  drainTimeoutSeconds: 300  # 5 minutes for graceful connection drain

Trade-off: Longer timeouts mean longer reload times, but fewer disrupted clients.

Recommendation: Add configurable drain timeout to design.

Observability and Status Reporting

Configuration Status Endpoint:

Separate configuration status from health checks (health is for liveness/readiness):

GET /admin/config/status
{
  "currentConfigVersion": "sha256:a3f5b2c19e4d...",
  "appliedAt": "2026-01-28T10:15:30Z",
  "lastReloadAttempt": {
    "timestamp": "2026-01-28T10:15:30Z",
    "status": "SUCCESS",
    "requestedVersion": "sha256:a3f5b2c19e4d...",
    "durationMs": 1234,
    "clustersModified": ["cluster-a"]
  },
  "lastValidationAttempt": {
    "timestamp": "2026-01-28T10:15:25Z",
    "status": "SUCCESS",
    "requestedVersion": "sha256:a3f5b2c19e4d..."
  }
}

// After reload failure with rollback failure:
{
  "currentConfigVersion": "sha256:abc123...",  // Previous version still running
  "appliedAt": "2026-01-28T09:00:00Z",
  "lastReloadAttempt": {
    "timestamp": "2026-01-28T10:20:00Z",
    "status": "ROLLBACK_PARTIAL_FAILURE",
    "requestedVersion": "sha256:newversion...",
    "rollbackState": {
      "successful": ["cluster-a"],
      "failed": {
        "cluster-b": "Failed to re-register gateway: port 9293 in use"
      }
    }
  }
}

Health endpoint stays focused on proxy health:

GET /admin/health
{
  "status": "UP",
  "checks": {
    "netty": "UP",
    "virtualClusters": "UP"
  }
}

Benefit: Clean separation - operators query /admin/config/status for reload state, /admin/health for liveness/readiness.

Recommendation: Add dedicated config status endpoint to design.

Metrics:

kroxylicious_config_reload_total{result="success|failure"} counter
kroxylicious_config_reload_duration_seconds histogram
kroxylicious_config_version_info{version="a3f5b2c19e4d"} gauge

Use cases:

Alerting on reload failures
Tracking reload duration trends
Capacity planning (reload frequency)

Recommendation: Add metrics to observability section.

Error Handling and Recovery

Rollback Failure Handling:

Current design: Log "CRITICAL: system may be in inconsistent state"

Proposed: Track rollback state and expose via health endpoint (see above).

Recovery path:

Query /admin/health to see which clusters failed rollback
Manual intervention:
- Verify cluster state (is port bound? filter initialized?)
- Either retry reload or manually fix state
Operator automation (future):
- Detect rollback failure from health endpoint
- Attempt recovery (remove failed cluster, re-add from old config)

Recommendation: Document rollback failure recovery procedures.

Concurrent Reload Prevention:

Only one reload at a time (enforced via lock)
Concurrent requests fail fast with 409 Conflict

POST /admin/config/reload
{new config}

Response (409 Conflict):
{
  "error": "Reload already in progress",
  "inProgressSince": "2026-01-28T10:15:30Z"
}

Recommendation: Document concurrency model in API specification.

Design Document Structure

Suggest organizing the design document as follows. Note: This structure assumes the HTTP-first approach described above. If the community prefers the file watch approach, the structure would need to adjust accordingly (swap "HTTP Endpoints" with "File Watch" as primary, etc.).

1. Goals and Non-Goals

Goals:

Zero-restart configuration updates
Universal deployment model (bare metal, Kubernetes)
Operator-friendly integration
Clear error handling and rollback

Non-Goals:

Zero-downtime modification (brief downtime per cluster is acceptable)
Hot-swapping filters in active connections
Partial success / continue-on-failure

2. Architecture

2.1 Core: HTTP Management Endpoints

Required endpoints:

POST /admin/config/validate - Validate config without applying
POST /admin/config/reload - Apply validated config
GET /admin/config/status - Current config version, last operation status
GET /admin/health - Proxy health (liveness/readiness)

Security:

Default bind: localhost:9190 (bare metal)
Kubernetes bind: 0.0.0.0:9190 (pod IP)
Optional TLS/mTLS for authentication
NetworkPolicy to restrict access in Kubernetes

2.2 Trigger Mechanisms (Optional)

Direct HTTP (Kubernetes):

Operator calls endpoints directly
No file watching needed

File Watcher Sidecar (Bare Metal):

Separate process watches config file
Calls HTTP endpoints on change
Options: shell script, Go binary, Java WatchService
Decoupled from proxy process

2.3 Reload Mechanism

Remove→add pattern (architecturally necessary)
Sequential processing (simplicity > parallelism)
All-or-nothing rollback (operational simplicity - needs discussion)

2.4 Validation Strategy

Build models + initialize filters without port binding
Deterministic (same result on all pods)
Catches config errors, not runtime failures

3. Deployment Patterns

3.1 Bare Metal

HTTP endpoints on localhost:9190
Config file on disk (optional)
Persist config to disk on success (if writable filesystem)

3.2 Kubernetes with Operator

HTTP endpoints on 0.0.0.0:9190
Config in ConfigMap (operator-managed)
Management Service for validation (exposes port 9190)
Checksum-based change detection (avoid no-op reloads)
2-phase commit (validate via Service → reload all pods)
Read-only filesystem support (no disk persistence)
- Sidecar file watcher (optional) → calls HTTP endpoints

4. Failure Modes and Recovery

Filter initialization failure → rollback
Port binding failure → rollback
Rollback failure → tracked state, manual recovery
Concurrent reload → fail fast with 409

5. Observability

Logging throughout reload process
Metrics for reload operations

6. Future Enhancements

Granular endpoints (/reload/cluster/{name})
Canary rollout strategies
Blue-green at pod level (operator)

Questions for Design Discussion

Should FilterFactory.initialize() be documented as validation-safe?
- Must be idempotent (can be called multiple times)?
- Should avoid side effects (don't connect to external services)?
- Or allow filter authors to decide (validation calls real KMS if they want)?

Rollback Strategy: All-or-Nothing vs Partial Success (Critical Design Decision)

This requires community consensus before proceeding.

Scenario: Config change affects cluster-a, cluster-b, cluster-c

cluster-a: modify succeeds ✅ (downtime: 2s)
cluster-b: modify fails ❌ (downtime: 30s)
cluster-c: modify succeeds ✅ (downtime: 2s)

Option A: All-or-Nothing (Current POC)

Result: Rollback cluster-a and cluster-c
Final state: All clusters on OLD config
Total downtime: cluster-a (4s), cluster-b (30s), cluster-c (4s)

Pros:

✅ Single source of truth (config file intent OR previous state, never mixed)
✅ Predictable retry path (fix issue → retry → all move together)
✅ No configuration drift (never "cluster-a on v2, cluster-b on v1")
✅ Simple status model (one config version for entire proxy)
✅ Follows declarative configuration philosophy (Kubernetes/GitOps)

Cons:

❌ Unnecessary downtime for successful clusters during rollback
❌ Wastes successful work (cluster-a, cluster-c succeeded but rolled back)

Option B: Partial Success / Continue-on-Failure

Result: Keep cluster-a and cluster-c on new config
Final state: cluster-a (NEW), cluster-b (OLD), cluster-c (NEW)
Total downtime: cluster-a (2s), cluster-b (30s), cluster-c (2s)

Pros:

✅ Less total downtime (no rollback for successful clusters)
✅ Preserves successful work

Cons:

❌ Configuration drift (reality doesn't match declared intent)
❌ Complex status model (per-cluster versions: {a: "v2", b: "v1", c: "v2"})
❌ Unclear retry path (should cluster-a reload again? How does operator know?)
❌ Reconciliation complexity (which clusters already on target version?)
❌ Requires granular reload endpoints (/reload/cluster/{name})
❌ Confusing user experience ("Reload failed" but some clusters succeeded?)

Operational Comparison:

Aspect	All-or-Nothing	Partial Success
Source of truth	Config OR previous state (clear)	Mixed state (confusing)
Retry after fixing cluster-b	Simple (reload all)	Complex (skip a,c or reload?)
Status API	One version	Per-cluster versions
Downtime on failure	Higher (rollback)	Lower (no rollback)
Operator logic	Simple	Complex reconciliation
User understanding	Clear	Confusing

User Experience Example:

All-or-Nothing:

$ kubectl apply -f new-config.yaml
Error: Config reload failed on cluster-b (filter init error)
Status: All clusters on version abc123 (previous config)
Action: Fix cluster-b config, retry apply

Partial Success:

$ kubectl apply -f new-config.yaml
Error: Config reload failed on cluster-b (filter init error)
Status: cluster-a (def456), cluster-b (abc123), cluster-c (def456)
Question: Should I retry? Will cluster-a reload again?

Questions for the community:

Which operational model do users prefer?
Is configuration drift acceptable as a trade-off for less downtime?
Should this be configurable, or should we pick one approach?

If configurable:

admin:
  rollbackStrategy: ALL  # Default? Or FAILED_ONLY?

Do we need granular reload endpoints regardless of rollback strategy?

Calude's recommendation: Start with all-or-nothing (simpler, matches declarative config philosophy), gather operational feedback, add partial success later if users request it. But this needs community buy-in, not just maintainer decision.

Should we define granular reload endpoints now or defer?
- POST /admin/config/reload (full config, current)
- POST /admin/config/reload/cluster/{name} (single cluster, future?)
What should config version format be?
- SHA-256 hash (deterministic, no clock dependency)
- Timestamp-based (easier for humans to understand)
- Operator-provided (e.g., ConfigMap resourceVersion)

Summary

The configuration reload design addresses a critical operational need. This feedback proposes HTTP endpoints with 2-phase commit (validate → reload) as the primary interface (alternative to the current file watch proposal) for the following reasons:

Why HTTP-first with validation:

Better Kubernetes integration (operator-friendly, read-only filesystem compatible)
Clear observability (HTTP responses vs file watch with no feedback)
Testability (programmatic testing vs file system manipulation)
Validation catches config errors before any cluster goes down
File watching can still be supported as a convenience layer that calls HTTP internally

Core components proposed:

POST /admin/config/validate - Validates config without applying (deterministic, fast)
POST /admin/config/reload - Applies validated config (with rollback on failure)
Management Service - Kubernetes Service exposing port 9190 for operator access
Checksum-based change detection - Avoid unnecessary reloads on no-op reconciliation
Read-only filesystem support - Make disk persistence optional for Kubernetes

Key takeaway: The architectural constraints (channel state machine, draining requirement) mean the design correctly accepts brief downtime per cluster modification. This is not a limitation—it's the right trade-off for operational simplicity and safety.

Recommended next steps:

Discuss HTTP vs file watch as primary mechanism - This is a fundamental design choice that needs community input
Discuss rollback strategy - All-or-nothing vs partial success requires consensus
Add validation endpoint and 2-phase commit to design
Add Kubernetes integration patterns (management Service, checksum-based change detection)
Document failure modes and recovery procedures
Refine POC implementation (PR#3176) based on finalized design

Excellent work on the POC—it provides a solid foundation for whichever trigger mechanism the community prefers!

Signed-off-by: Urjit Patel <105218041+Uzziee@users.noreply.github.com>

Uzziee · 2026-02-02T07:51:00Z

Hi @gunnarmorling @SamBarker @tombentley regarding the security risk for HTTP over TCP, we can either

Expose the endpoint only on localhost (would immediately give the required security, as one will have to exec into the pods and then run the HTTP command)
Replace HTTP over TCP with HTTP over UNIX.

I believe option #1 would satisfy our needs without introducing any additional code complexities involved with UNIX socket based approach. WDYT ?

SamBarker · 2026-02-02T08:04:27Z

Pulling some conversation from slack for posterity:

My initial reaction would be that config should only happen in exactly one way, which is files. A new config would also be provided as a file. An HTTP endpoint should be there for triggering the validation and eventually application of changed config, but which itself would be read from a file.

Inline updates to files are suboptimal in case of validation failures, because then you have again that mismatch of the state of the config file (already changed) and what actually is applied by the proxy (not changed).

How about the following:

Config is provided in a file (as it is today)

When changing config, create a new file with those changes, e.g. by copying and mutating that copy as needed (on K8s, this could be a new config map)

Have HTTP endpoints for a) validating a config file (by specifying its location in the file system) and be) applying a config file (by specifying its location in the file system)

That way, config is always in files (so it can be in source control, etc.), you can always easily examine the current state of config. The operator could support this by alternating through two config maps, one with currently applied config, and one with changed config, staged for application.

by @gunnarmorling

Yeah, I hear the point about two methods of providing config.

I'd mentally turned the issues with file watches & the need for read only file systems into a requirement for HTTP upload, which is not really true.

How does:

HTTP Post to /config/validate/file for validation - served by any active process Valid files are made available on the filesystem

HTTP PUT / PATCH to /config/file/ the config path which the proxy then re-loads from (could be the same path for k8s) this step triggers the actual incremental reload

HTTP GET on /config/[status|version|info] to serve details about whats currently running in the proxy?

by @SamBarker

SamBarker · 2026-02-02T08:08:14Z

Hi @gunnarmorling @SamBarker @tombentley regarding the security risk for HTTP over TCP, we can either

Expose the endpoint only on localhost (would immediately give the required security, as one will have to exec into the pods and then run the HTTP command)

Replace HTTP over TCP with HTTP over UNIX.

I believe option #1 would satisfy our needs without introducing any additional code complexities involved with UNIX socket based approach. WDYT ?

We can already support 1. using the bindAddress config property.

Hopefully thats good enough to get started with.

Uzziee · 2026-02-05T09:17:47Z

Hi @SamBarker given that we are still in discussion on how to trigger the reload, can we get an alignment on the actual graceful restart of the virtual clusters ? (Part 2 of the design)
If we get an approval on Part 2, I can submit a PR for part 2 (graceful restart of clusters), while we finalize Part 1 (triggering hot reload)

SamBarker · 2026-02-11T22:03:23Z

Sorry @Uzziee I've been to slow getting back to this.

In principle yes. I think we need a little bit out thought to work out the interface between the trigger and the actual reloading code.

One other thought: How does the runtime know that a plug-ins configuration has changed? Take the ACL authz plugin that has a rules file, the content of that file might have changed even though the path to it is the same. The runtime doesn't and shouldn't understand plugin configuration so we should consider some way of asking plugins to detect config changes as well.

SamBarker · 2026-02-13T05:25:08Z

Thanks for sticking with this @Uzziee, the proposal has come a long way and there's a lot of good thinking here. I've been noodling on a few areas and wanted to share where my head is at. Happy to discuss any of this further.

Decoupling trigger from apply

I think it would help to draw a clearer line between the trigger mechanism (the HTTP endpoint) and the bit that actually applies a new configuration to the running proxy. Something like a ProxyControl interface with an applyConfiguration(Configuration) method — the HTTP endpoint deals with parsing and validation, then hands the Configuration off to ProxyControl to do the actual work. That way if we later want to trigger from a file watcher or an operator callback, there's an obvious place to plug in. It would also make the apply logic easier to test in isolation.

Plugin resource tracking

This is the bit I've been chewing on most. The runtime can spot when a filter's YAML config blob changes (via equals()), but it has no way of knowing when external files that a plugin reads during initialize() change — things like password files, TLS keystores, ACL rules. Those reads tend to happen deep in nested plugin stacks (e.g. RecordEncryption → KmsService → CredentialProvider → FilePassword) so the runtime has no visibility.

One approach that seems promising: add a readResource(URI) method to FilterFactoryContext. Instead of plugins doing direct file I/O, they'd read through this method. The runtime would read the content, hash it, track the dependency, and return the content — all in one go. On a subsequent reload check, the runtime can re-read and re-hash the tracked URIs to see if anything changed.

Some of the thinking behind this:

Thread-local context access: We could provide a static FilterFactoryContext.current() method (similar to how Vert.x handles context) so that code deep in the call stack — like FilePassword — can access the context without us having to thread it through every intermediate SPI. The single-thread guarantee on initialize() makes this safe. I'm happy to put together a PR for this part myself.
URI-based with pluggable resolvers: I'm inclined toward taking a URI rather than a Path. Files are the common case today, but there are plausible near-term use cases for reading resources over HTTP (e.g. fetching schemas or credentials from a remote endpoint). If the API takes a URI, we can ship a file:// resolver by default and add other scheme resolvers (e.g. https://) later via ServiceLoader — without changing the plugin-facing API. The resolver itself would be a simple interface (scheme() + read(URI)) so adding a new scheme doesn't require changes to the runtime, just a new implementation on the classpath.
Returns content, not typed objects: I think readResource should return InputStream/String rather than trying to deserialize into typed objects. The runtime's concern is tracking dependencies — the plugin knows what the content means.
Throws outside initialize(): I'm inclined to have FilterFactoryContext.current() throw IllegalStateException if called outside of factory initialization, rather than returning null or a stub. Reading resources outside initialize() would create untracked dependencies, and I'd rather that be a loud failure than a silent one.
Consistent change detection: Because the runtime reads the content and computes the hash in the same operation that provides the bytes to the plugin, the hash always matches what the plugin actually received. There's no gap between checking for changes and reading the new content.

None of this is set in stone — I'm keen to hear what others think, especially around the FilterFactoryContext.current() approach.

Minimising disruption

For now I think restarting a modified cluster by tearing it down and rebuilding it (remove + add) is the right starting point — dropping connections is unavoidable. It is worth calling this out explicitly in the proposal so it's clear this is a known trade-off rather than something we've overlooked. More surgical reloads (swapping just the filter chain without dropping connections or routing changes) could be interesting to explore later but I wouldn't want to block on that.

Similarly, I think we should call out the all-or-nothing rollback semantics as a deliberate choice. Even with a clean internal separation, failures during apply (port conflicts, TLS errors at bind time) can still happen, so we'll need a rollback strategy regardless.

One small thing — Should the 30-second drain timeout be configurable? Long-running consumer rebalances or slow produces with acks=all can legitimately exceed that.

Design exploration notes (for context, not part of the main proposal)

These are ideas we explored and set aside during discussion. Recording them here so we don't retread the same ground.

ResourceDependency (plugin-declared change detection): An alternative to readResource where plugins declare dependencies with an opaque version token (Object currentVersion()) and the runtime compares tokens between checks. More general (works for any resource type) but the runtime's version check and the plugin's re-read during initialize() are independent operations that can see different versions of the resource. Also relies on plugin authors remembering to declare dependencies. We leaned toward readResource for the common case since it's harder to accidentally miss a dependency.

Returning null from FilterFactoryContext.current(): We considered returning null when called outside initialize(), with a fallback to direct I/O. The worry was that silently succeeding means untracked dependencies. Also considered a no-op stub but that has the same problem. Throwing seemed like the clearest contract.

Typed readResource return (Jackson deserialization): We considered readResource(URI, Class<T>) to deserialize into typed objects, but the runtime would then need to know serialization formats (JSON? YAML? properties?). The current resources (passwords, keystores) are simple enough that raw bytes/string feels like the right level.

Plan/apply split on the public interface: We considered exposing plan() and apply() separately on ProxyControl to enable dry-run validation. Decided this is an internal concern — the trigger just needs applyConfiguration(). A validate/dry-run endpoint could be added later without changing the interface.

Kafka topic as config source: We discussed storing configuration in a dedicated Kafka topic. This feels more like a trigger mechanism than a resource dependency. Too early to design for — the URI-based readResource doesn't need to accommodate it.

ConfigurationReconciler naming: We considered this to describe the "compare desired vs current and converge" pattern, but there are actual Kubernetes reconcilers in the source tree and overloading the term seemed likely to cause confusion.

Uzziee · 2026-02-13T05:45:39Z

Hey @SamBarker

I think it would help to draw a clearer line between the trigger mechanism (the HTTP endpoint) and the bit that actually applies a new configuration to the running proxy. Something like a ProxyControl interface with an applyConfiguration(Configuration) method — the HTTP endpoint deals with parsing and validation, then hands the Configuration off to ProxyControl to do the actual work. That way if we later want to trigger from a file watcher or an operator callback, there's an obvious place to plug in. It would also make the apply logic easier to test in isolation.

Heheh, this is something already in works by me. I feel we are still not convinced on "how to trigger" part of this problem, so I have already started working on a design which decouples the trigger-mechanism. I'll update the new proposal here in a few days :)

Some high level thoughts about it

ReloadResult result = proxy.reload(Configuration newConfig, ReloadOptions reloadOptions)


ReloadOptions
├── OnFailure
│   └── appState: ROLLBACK | TERMINATE | CONTINUE
└── OnSuccess
    └── persistConfigToDisk: true | false

We can have a interface which the triggers should implement which internally invokes this proxy.reload() method.
eg:- HttpReloadTrigger, FileWatcherReloadTrigger, MyOwnSuperAwesomeCustomReloadTrigger

Behavior Matrix

OnFailure: `ROLLBACK` vs `TERMINATE` vs `CONTINUE`

Aspect	ROLLBACK (default)	TERMINATE	CONTINUE
Cluster operations fail	Undo all successful operations in reverse order (remove added, restore modified, re-add removed)	No undo. Partial changes persist until proxy shuts down.	No undo. Partial changes persist. Proxy keeps running.
FilterChainFactory	Old factory remains active; new factory is closed. New connections use old filters.	New factory is committed. Moot — proxy is shutting down.	New factory is committed despite failure. New connections use new filters.
Proxy state	Running, consistent — fully operational with old config.	Shut down — process exits (or caller handles).	Running, inconsistent — some clusters old, some new.
Future result	Completes exceptionally with `ReloadException`.	Completes exceptionally (after shutdown initiated).	Completes exceptionally with `ReloadException`.
Recovery	Automatic — proxy is in known-good state. Retry anytime.	External — process supervisor (K8s, systemd) restarts.	Manual — operator inspects, fixes, calls `reload()` again.

OnSuccess: `persistConfigToDisk = true` vs `false`

Aspect	persistConfigToDisk = true (default)	persistConfigToDisk = false
Config file	Overwritten with new config (old config backed up as `.bak`)	Unchanged — old config remains on disk
Proxy restart	Proxy starts with new config after restart	Proxy starts with old config after restart (reload was ephemeral)
Use case	Production — config file should always reflect running state	K8s (config comes from CRD), tests, temporary experiments

Combined Examples

Scenario	OnFailure	OnSuccess	Effect
Production HTTP reload	`rollback()`	`withPersist()`	Safest: rollback + save to disk
K8s Operator reconciler	`terminate()`	`withoutPersist()`	Pod restarts on failure; K8s owns the config
Integration test	`continueRunning()`	`withoutPersist()`	Test can inspect partial state; nothing written to disk
Debug session	`continueRunning()`	`withPersist()`	Keep running for investigation; save what was attempted
CI pipeline	`rollback()`	`withoutPersist()`	Safe rollback; config comes from CI, don't overwrite

Related to the Plugin resource tracking, I believe checking on the hash would be the most optimal way to go about.
I'll be honest, I did not quite understand your proposal on this as I don't have much context around how plugins are configured. I'll try to take a look at that part once I done with the above mentioned proposal 🥲

One small thing — Should the 30-second drain timeout be configurable

I was already planning to make it configurable, since this is just a POC PR, a lot of hardening work might still be pending. I'll be anyways creating separate PR when its time to submit

Uzziee · 2026-02-17T04:06:25Z

Hi @SamBarker , as part of this proposal, what I am proposing is to just have the proxy.reload() method added to begin with.
In later enhancements, we could have trigger interfaces which uses this internal method to reload, like HTTPTrigger, FileWatcherTrigger. This will help the PR move forward without being stuck on the discussion around "how to trigger" part

ReloadResult result = proxy.reload(Configuration newConfig, ReloadOptions reloadOptions)


ReloadOptions
├── OnFailure
│   └── appState: ROLLBACK | TERMINATE
└── OnSuccess
    └── persistConfigToDisk: true | false

We can have later have interfaces which the triggers should implement which internally invokes this proxy.reload() method. (We can have this as part of future enhancements)
eg:- HttpReloadTrigger, FileWatcherReloadTrigger, MyOwnSuperAwesomeCustomReloadTrigger

Behavior Matrix

OnFailure: `ROLLBACK` vs `TERMINATE`

Aspect	ROLLBACK (default)	TERMINATE
Cluster operations fail	Undo all successful operations in reverse order (remove added, restore modified, re-add removed)	No undo. Partial changes persist until proxy shuts down
FilterChainFactory	Old factory remains active; new factory is closed. New connections use old filters	New factory is committed (proxy is shutting down)
Proxy state	Running, consistent — fully operational with old config	Shut down — process exits (or caller handles)
Future result	Completes exceptionally with `ReloadException`	Completes exceptionally (after shutdown initiated)
Recovery	Automatic — proxy is in known-good state; retry anytime	External — process supervisor (K8s, systemd) restarts

OnSuccess: `persistConfigToDisk = true` vs `false`

Aspect	persistConfigToDisk = true (default)	persistConfigToDisk = false
Config file	Overwritten with new config (old config backed up as `.bak`)	Unchanged — old config remains on disk
Proxy restart	Proxy starts with new config after restart	Proxy starts with old config after restart (reload was ephemeral)
Use case	Production — config file should always reflect running state	K8s (config comes from CRD), tests, temporary experiments

Combined Examples

Scenario	OnFailure	OnSuccess	Effect
Production HTTP reload	`rollback()`	`withPersist()`	Safest: rollback + save to disk
K8s Operator reconciler	`terminate()`	`withoutPersist()`	Pod restarts on failure; K8s owns the config
CI pipeline	`rollback()`	`withoutPersist()`	Safe rollback; config comes from CI, don't overwrite

What do you think ?

SamBarker · 2026-02-18T03:02:16Z

Thanks @Uzziee, this is heading in the right direction.

Core API shape

Agreed on deferring the trigger mechanism and focusing on the core operation first.

A thought on naming: "reload" presupposes re-reading from somewhere. Something like applyConfiguration(Configuration) better describes what it does — "make the running proxy match this configuration." Worth getting right early since it'll show up everywhere.

I'd push back on ReloadOptions as a per-call parameter though. Things like rollback-vs-terminate and disk persistence will vary between deployments, but they shouldn't vary between invocations within the same deployment. A multi-tenant ingress-style deployment might want to limp on with partial success; a sidecar model has different constraints again. These are decisions the operator makes at deployment time, not decisions the trigger makes per invocation — so they belong in the proxy's static configuration rather than the API. That keeps applyConfiguration(Configuration) simple and gives us space to figure out the right options as we understand deployment models better.

One more open question: the current proposal works with state-of-the-world snapshots — pass a complete Configuration and the proxy diffs it against what's running. That's a good starting point, but worth thinking about whether we eventually want something more granular (delta-based operations, or more targeted snapshots). No need to solve now, but the API shape should leave room for it. Thoughts?

Drain timeout

Agreed it should be configurable — details during implementation.

Plugin resource tracking

I think this is orthogonal to the core mechanism. I'd suggest splitting it into a separate proposal so it doesn't block this one and reviewers can engage with each concern independently.

The problem in brief: the runtime can detect when a filter's YAML config changes (via equals()), but has no visibility into external resources plugins read during initialize() — password files, TLS keystores, ACL rules. Those reads happen deep in plugin call stacks (e.g. RecordEncryption → KmsService → CredentialProvider → FilePassword), so the runtime can't detect when they change. Without addressing this, a reload would miss those changes entirely.

We've been exploring an approach where plugins read external resources through the runtime rather than doing direct file I/O. The runtime tracks what was read and hashes the content, so it can detect changes on subsequent checks. This makes dependency tracking automatic rather than opt-in. There are open design questions around the API shape, how deeply nested code accesses the context, and whether to support non-file resources — happy to go into detail if useful.

Proposal structure

The PR discussion has covered a lot of ground and it's getting hard for someone coming in cold to follow. I'd suggest updating the proposal document itself to reflect where we've landed:

Reframe around applyConfiguration() as the core API, with trigger mechanisms as future work
Call out remove+add with brief per-cluster downtime as a deliberate design choice
Call out all-or-nothing rollback as the initial default (consistent with startup, where any cluster failure fails the whole proxy) while acknowledging other deployment models may need different behaviour
Mention plugin resource tracking as a known gap, with a pointer to a separate proposal

That way reviewers can engage with the document rather than reconstructing the position from comments.

Rewrite the hot reload proposal to focus on architectural decisions rather than implementation detail. The PR discussion has established consensus on several key points that the document didn't reflect: - Reframe around applyConfiguration(Configuration) as the core API, decoupled from trigger mechanisms (HTTP, file watcher, operator) - Remove all Java class implementations and handler chains — these belong in the code PR where they're reviewable in context - Call out remove+add with brief per-cluster downtime as deliberate - Call out all-or-nothing rollback as the initial default, consistent with startup behaviour - Move ReloadOptions to deployment-level static configuration rather than per-call parameters - Identify plugin resource tracking as a known gap with pointer to separate proposal - Flag open questions (config granularity, failure behaviour options, drain timeout configurability) - Defer trigger mechanism design as explicit future work Assisted-by: Claude claude-opus-4-6 <noreply@anthropic.com> Signed-off-by: Sam Barker <sam@quadrocket.co.uk>

- Fix summary to read as proposed behaviour, not existing - Use "administrators" instead of "operators" for humans to avoid confusion with the Kubernetes operator process - Fix filter config examples (KMS endpoint, key selection pattern) - Clarify failure behaviour is consistent across trigger mechanisms - Note thundering herd as a known trade-off of remove+add - Fix "original proposal" to "earlier iterations" Assisted-by: Claude claude-opus-4-6 <noreply@anthropic.com> Signed-off-by: Sam Barker <sam@quadrocket.co.uk>

hot reload feature proposal

f8a3e53

Signed-off-by: Urjit Patel <105218041+Uzziee@users.noreply.github.com>

Uzziee requested a review from a team as a code owner November 18, 2025 06:50

Uzziee mentioned this pull request Nov 18, 2025

Proposal to add dynamic reload feature in kroxylicious kroxylicious/kroxylicious#2900

Open

SamBarker reviewed Jan 28, 2026

View reviewed changes

replace file watcher with HTTP based approach

2a9747e

Signed-off-by: Urjit Patel <105218041+Uzziee@users.noreply.github.com>

Uzziee mentioned this pull request Feb 12, 2026

[POC] Add dynamic reload feature using HTTP based trigger mechanism kroxylicious/kroxylicious#3176

Open

SamBarker mentioned this pull request Feb 17, 2026

feat: Virtual cluster lifecycle state model #89

Open

4 tasks

SamBarker and others added 3 commits February 18, 2026 16:36

Merge pull request #2 from SamBarker/hot-reload-proposal-restructure

c239f68

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Comments

012 - Hot reload feature proposal#83

012 - Hot reload feature proposal#83
Uzziee wants to merge 5 commits intokroxylicious:mainfrom
Uzziee:hot-reload-proposal

Uzziee commented Nov 18, 2025

Uh oh!

SamBarker left a comment

Uh oh!

Uzziee commented Feb 2, 2026

Uh oh!

SamBarker commented Feb 2, 2026 •

edited by tombentley

Loading

Uh oh!

SamBarker commented Feb 2, 2026 •

edited

Loading

Uh oh!

Uzziee commented Feb 5, 2026

Uh oh!

SamBarker commented Feb 11, 2026

Uh oh!

SamBarker commented Feb 13, 2026

Uh oh!

Uzziee commented Feb 13, 2026

Uh oh!

Uzziee commented Feb 17, 2026

Uh oh!

SamBarker commented Feb 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Comments

Conversation

Uzziee commented Nov 18, 2025

Uh oh!

SamBarker left a comment

Choose a reason for hiding this comment

Design PR#83 Feedback - Configuration Reload Design

Executive Summary

Proposed Change to Design: HTTP Endpoints as Primary Interface

Current Design Proposal

Recommended Alternative: HTTP-First Approach

Core: HTTP Management Endpoints

Trigger Mechanisms (How to Call HTTP Endpoints)

Cluster Modification Semantics

Rollback Strategy (Needs Discussion)

Core Design: HTTP Endpoints with 2-Phase Commit

Validation Endpoint (Core Component)

Reload Endpoint (Core Component)

Configuration Options

Kubernetes Integration Patterns

Management Service

Read-Only Filesystem Support

Checksum-Based Change Detection

Additional Design Components

Configurable Drain Timeout

Observability and Status Reporting

Error Handling and Recovery

Design Document Structure

1. Goals and Non-Goals

2. Architecture

2.1 Core: HTTP Management Endpoints

2.2 Trigger Mechanisms (Optional)

2.3 Reload Mechanism

2.4 Validation Strategy

3. Deployment Patterns

3.1 Bare Metal

3.2 Kubernetes with Operator

4. Failure Modes and Recovery

5. Observability

6. Future Enhancements

Questions for Design Discussion

Summary

Uh oh!

Uzziee commented Feb 2, 2026

Uh oh!

SamBarker commented Feb 2, 2026 • edited by tombentley Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SamBarker commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uzziee commented Feb 5, 2026

Uh oh!

SamBarker commented Feb 11, 2026

Uh oh!

SamBarker commented Feb 13, 2026

Decoupling trigger from apply

Plugin resource tracking

Minimising disruption

Uh oh!

Uzziee commented Feb 13, 2026

Behavior Matrix

OnFailure: ROLLBACK vs TERMINATE vs CONTINUE

OnSuccess: persistConfigToDisk = true vs false

Combined Examples

Uh oh!

Uzziee commented Feb 17, 2026

Behavior Matrix

OnFailure: ROLLBACK vs TERMINATE

OnSuccess: persistConfigToDisk = true vs false

Combined Examples

Uh oh!

SamBarker commented Feb 18, 2026

Core API shape

Drain timeout

Plugin resource tracking

Proposal structure

Uh oh!

Reviewers

SamBarker commented Feb 2, 2026 •

edited by tombentley

Loading

SamBarker commented Feb 2, 2026 •

edited

Loading

OnFailure: `ROLLBACK` vs `TERMINATE` vs `CONTINUE`

OnSuccess: `persistConfigToDisk = true` vs `false`

OnFailure: `ROLLBACK` vs `TERMINATE`

OnSuccess: `persistConfigToDisk = true` vs `false`