helm: tighten /ref volume split between orchestrator, workers, and API

## Problem

The current Helm chart treats the orchestrator and all provider workers as a single `defaults` block, so every worker pod gets the same RW mount on `/ref`. That is broader than the application actually requires, and it forces every worker to share a writable PVC with the orchestrator.

## Actual access requirements

After tracing `climate_ref.config.PathConfig` and `climate_ref_celery.worker_tasks`:

| Path                 | API   | Provider workers       | Orchestrator           | Migrate Job |
| -------------------- | ----- | ---------------------- | ---------------------- | ----------- |
| `/ref` (config TOML) | RO    | RO                     | RO                     | RO          |
| `/ref/software`      | RO    | RO                     | **RW** (`ref providers setup` writes conda envs) | — |
| `/ref/scratch`       | —     | RW (per-pod `emptyDir` is fine) | RW            | — |
| `/ref/results`       | RO    | —                      | RW (`handle_result` task copies scratch -> results) | — |
| `/ref/log`           | —     | —                      | RW (currently unused)  | —           |
| `/tmp` (HOME)        | RW    | RW                     | RW                     | RW          |

Provider workers run `celery start-worker --provider X` and consume only their provider queue. The orchestrator runs `celery start-worker` (no `--provider` flag) and is the only deployment that consumes the default `celery` queue, where `handle_result` performs the scratch -> results copy. So provider workers never touch `/ref/results` and never write `/ref/software`.

## Why this matters

- Provider workers should not have RW access to the conda env tree (`/ref/software`); a buggy or compromised diagnostic could clobber other providers' environments.
- Workers should not need shared RW access to `/ref/scratch`; a per-pod `emptyDir` is enough because the orchestrator copies the artefacts out via `handle_result` before the pod is recycled.
- Today these constraints are not expressed in the chart, so users defaulting to `defaults.volumes` end up with a single shared RW PVC across every pod.

## Proposed direction

1. Surface the orchestrator as its own top-level chart block (or a sentinel under `providers`) with its own `volumes` / `volumeMounts` defaults rather than relying on the implicit `providers.orchestrator` entry.
2. Set chart defaults so:
   - Orchestrator: RW `/ref` (or RW `/ref/software`, `/ref/results`, `/ref/log` plus RO config).
   - Provider workers: RO `/ref` + per-pod `emptyDir` for `/ref/scratch`.
   - API: RO `/ref` (already the recommended pattern).
3. Update `helm/ci/gh-actions-values.yaml` and `helm/local-test-values.yaml` to match the new split, and document the contract in `helm/README.md` under "Required Volumes".

## Workaround until then

The looser layout (single shared RW PVC at `/ref` for all worker pods) still works and is what `helm/README.md` currently documents. This issue tracks tightening the model rather than a regression.

## References

- `climate_ref/config.py` `PathConfig` — defines the `/ref/{software,scratch,results,log}` layout
- `climate_ref_celery/worker_tasks.py:handle_result` — orchestrator-only task that copies scratch -> results
- `helm/templates/providers/deployment.yaml` — currently includes `orchestrator` under the same range as provider workers

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

helm: tighten /ref volume split between orchestrator, workers, and API #8

Problem

Actual access requirements

Why this matters

Proposed direction

Workaround until then

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Path	API	Provider workers	Orchestrator	Migrate Job
`/ref` (config TOML)	RO	RO	RO	RO
`/ref/software`	RO	RO	RW (`ref providers setup` writes conda envs)	—
`/ref/scratch`	—	RW (per-pod `emptyDir` is fine)	RW	—
`/ref/results`	RO	—	RW (`handle_result` task copies scratch -> results)	—
`/ref/log`	—	—	RW (currently unused)	—
`/tmp` (HOME)	RW	RW	RW	RW

helm: tighten /ref volume split between orchestrator, workers, and API #8

Description

Problem

Actual access requirements

Why this matters

Proposed direction

Workaround until then

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions