diff --git a/docs/adr/ADR-027-cross-environment-domain-generalization.md b/docs/adr/ADR-027-cross-environment-domain-generalization.md index 03b249803b..8f44163d85 100644 --- a/docs/adr/ADR-027-cross-environment-domain-generalization.md +++ b/docs/adr/ADR-027-cross-environment-domain-generalization.md @@ -60,8 +60,59 @@ Five concurrent lines of research have converged on the domain generalization pr ## 2. Decision +### 2.0 — 2026-Q2 Re-scope: MERIDIAN-MAE foundation pre-training (primary path) + +> **Status of this subsection:** Active. Supersedes the *training strategy* of §2.1–§2.6 (the dual-path / domain-adversarial / geometry-conditioned *architecture* is retained — it becomes the **fine-tune-stage head** on top of a pre-trained encoder, not a from-scratch network). +> **Driver:** `docs/research/sota/2026-Q2-agentic-ai-and-edge-for-ruview.md` (§B1) and the 2025→2026 evidence below. + +**What changed.** The 2026 WiFi-sensing literature converged on a single result: **masked-autoencoder (MAE) pre-training on large, heterogeneous CSI pools beats supervised baselines on cross-domain tasks, and the bottleneck is data breadth, not model capacity.** + +- *Scale What Counts, Mask What Matters* (arXiv:2511.18792): pre-trains/evaluates across **14 datasets, >1.3 M CSI samples, 4 device types, 2.4/5/6 GHz**; **log-linear** cross-domain gains with pre-training data (+2.2 % to +15.7 % over supervised), **marginal** gains from bigger models. +- **CIG-MAE** (arXiv:2512.04723): dual-stream MAE reconstructing **both amplitude and phase**, with information-guided masking — phase reconstruction is now SOTA-competitive (historically the hard part). +- **AM-FM** (2026; arXiv:2602.11200, already cited in §1.2): ~9.2 M samples, ~20 device types — the data-breadth thesis at scale. +- *A Tutorial-cum-Survey on SSL for Wi-Fi Sensing* (arXiv:2506.12052) and ACM TOSN (10.1145/3715130): MAE is the consistently strongest SSL choice for CSI. + +**Revised decision.** The primary MERIDIAN program is now a **three-stage** pipeline: + +1. **Pre-train** a CIG-MAE-style **dual-stream (amplitude + phase) masked autoencoder** on every CSI source RuView can reach — own recordings (`data/recordings/`, overnight captures), MM-Fi + Wi-Pose (ADR-015), public CSI corpora, and the multi-band virtual-subcarrier streams from `ruvsense/multiband.rs`. Thesis: *data breadth > pose-net capacity*. +2. **Fine-tune** the existing MERIDIAN heads — the 17-keypoint / DensePose-UV regression heads, the AETHER contrastive embedding (ADR-024), and the domain-adversarial / geometry-conditioned layers of §2.1–§2.6 — on top of the **frozen-then-unfrozen** pre-trained encoder. The §2.x machinery is now *regularisation on a good representation* rather than the load-bearing structure. +3. **Adapt** per room with **source-free unsupervised domain adaptation** (MU-SHOT-Fi, arXiv:2605.01369; Wi-SFDAGR) wired behind `ruvsense/coherence_gate.rs::Recalibrate` — a bounded MicroLoRA-delta + EWC++ pass on the head, triggered by the coherence z-score, logged via the witness chain. (Tracked separately; see the companion ADR referenced in the survey's Part C #2.) + +**Why this is better than from-scratch (§2.1 as the primary path).** A model trained from scratch on one or two single-environment datasets *cannot* see enough multipath/hardware diversity to learn an environment-agnostic representation — that's the layout-overfitting / multipath-memorisation failure in §1.1. A pre-trained encoder front-loads that diversity, so the SISO-multistatic ESP32 input (§B3) has to carry far less, and the per-room work shrinks to adaptation (stage 3), not retraining. + +**Token convention (implementation).** A CSI window `[T, tx, rx, sub]` → a sequence of `N = T·tx·rx` tokens, each a `sub`-dim *channel snapshot* — the same `[B, T·tx·rx, sub]` layout `model.rs::ModalityTranslator` already consumes. Amplitude and phase share the token grid, so one mask drives both streams. + +**Implementation status & plan.** + +- ✅ **Iteration 1**: `wifi-densepose-train::csi_mae` — `MaeConfig` (+`validate`), `MaskStrategy`, `TokenLayout`, deterministic `mask_csi_window` / `reassemble_tokens` (pure Rust, dependency-free PRNG, unit tests, builds & tests under `cargo test --no-default-features`); the re-scoped ADR (this section). +- ✅ **Iteration 2a**: information-guided masking — `MaskStrategy::InfoGuided` now masks high-information tokens (token "information" = variance of amplitude + variance of phase), weighted-without-replacement via Efraimidis–Spirakis, deterministic in seed; replaces the iter-1 Random fallback. +3 tests. +- ✅ **Iteration 2b** (CI-verified): `csi_mae::model` behind `tch-backend` — `CsiMae` (dual-stream amp+phase per-token embed → fuse → residual-MLP encoder over visible tokens → flatten-to-latent bottleneck → learned per-position query + broadcast latent → residual-MLP decoder → `dec_amp_head`/`dec_ph_head` → `index_select` the masked positions); `CsiMae::reconstruction_loss` (MSE amp + `phase_w`·MSE phase); `MaeBatch::from_windows` (partition from window 0, reused across the batch — `n_tokens` is fixed); `pretrain_step`; `src/bin/pretrain_mae.rs` (synthetic-data driver, `required-features = ["tch-backend"]`); a gated "loss halves when overfitting one batch" smoke test. v0 limits noted in the module docs: fixed `n_tokens`, batch-shared masking, MSE on unwrapped phase. The dev box that wrote this had no LibTorch, so the tch path is verified by CI (`tch-backend` feature), not locally. +- ◻ **Iteration 3+**: pool & ingest heterogeneous CSI (own recordings + MM-Fi + Wi-Pose + multi-band virtual sub-carriers); real pre-train run (GPU — `scripts/gcloud-train.sh` / the cognitum project); per-sample masking + self-attention transformer blocks (lift the v0 limits); fine-tune the §2.x heads on top of the pre-trained encoder; cross-domain eval (§4.6 protocol); ship the encoder as an RVF segment (§4.7). +- ⏸ **Out of scope here**: the per-room SFDA adaptation (stage 3) — its own ADR. + +#### Iteration 3 plan — heterogeneous-CSI ingest, GPU pre-train, fine-tune handoff + +The remaining prototype work (the parts that can't run on the dev box): + +1. **Heterogeneous-CSI ingest.** A `csi_mae`-adjacent loader that pools every reachable CSI source into a uniform `[T, tx, rx, sub]` window stream, normalising sub-carrier count to 56 (via `wifi-densepose-train::subcarrier::interpolate_subcarriers`) and amplitude scale per-frame: + - own captures: `data/recordings/*.csi.jsonl`, overnight recordings; + - `MmFiDataset` (ADR-015, NeurIPS-2023 MM-Fi, 114 sub-carriers → interpolate); + - Wi-Pose (ADR-015); + - multi-band virtual sub-carriers from `ruvsense/multiband.rs` (3 channels × 56 → 168) — treated as extra tokens, not extra streams; + - public CSI corpora as available. + Implemented as a `CsiDataset` impl (e.g. `PooledCsiDataset`) that round-robins / weights sources; `pretrain-mae` gains a `--datasets ` flag selecting it instead of `SyntheticCsiDataset`. *Thesis (arXiv:2511.18792): breadth of this pool — devices, bands, rooms — is what buys cross-domain generalisation; the model stays small.* +2. **GPU pre-train run.** `scripts/pretrain-mae-gcloud.sh` (added this iteration — a thin mirror of `scripts/gcloud-train.sh`): provisions a GCloud VM in `cognitum-20260110`, builds `wifi-densepose-train` with `--features tch-backend,cuda`, runs `pretrain-mae`, downloads the `.ot` variable store, tears the VM down. Currently drives `SyntheticCsiDataset` (the smoke path); the `--data-dir`/`--datasets` plumbing for the real corpus is the one TODO in that script. *Not run as part of this prototype.* +3. **Lift the v0 model limits.** Per-sample masking (gather/scatter so each window in a batch can have its own mask), self-attention transformer blocks in the encoder/decoder (replacing the residual MLPs and the flatten-to-latent bottleneck — this also removes the fixed-`n_tokens` constraint), a circular phase-reconstruction loss. +4. **Fine-tune handoff.** Load the pre-trained `CsiMae` encoder weights into the `model::WiFiDensePoseModel` front-end (the `ModalityTranslator` slot), freeze for a warm-up, then unfreeze; train the 17-keypoint / DensePose-UV heads, the AETHER contrastive embedding (ADR-024), and the §2.1–§2.6 domain-adversarial / geometry-conditioned layers *as regularisers on top of the pre-trained representation*. A `train` sub-command flag (`--init-encoder `) wires this. +5. **Cross-domain eval.** Run §4.6's protocol (leave-one-room-out / leave-one-device-out) on the fine-tuned model vs. the from-scratch baseline; the win condition is the +2.2 %…+15.7 % cross-domain band that 2511.18792 reports for MAE pre-training. +6. **Ship the encoder** as an RVF segment (§4.7) so deployments load a pre-trained backbone and only carry the small task head + per-room adapter (stage 3 / the SFDA ADR). + +The remainder of this ADR (§2.1 onward) describes the **fine-tune-stage architecture** — read it as "the head and regularisers that sit on top of the §2.0 pre-trained encoder", not as a from-scratch design. + ### 2.1 Architecture: Environment-Disentangled Dual-Path Transformer +> *(Now the fine-tune-stage head — see §2.0.)* + MERIDIAN adds a domain generalization layer between the CSI encoder and the pose/embedding heads. The core insight is explicit factorization: decompose the latent representation into a **pose-relevant** component (invariant across environments) and an **environment** component (captures room geometry, hardware, layout): ``` @@ -546,3 +597,12 @@ ADR-011 Proof-of-Reality ──→ ⏳ Independent (Python v1 issue, high pr 8. Ramesh, S. et al. (2025). "LatentCSI: High-resolution efficient image generation from WiFi CSI using a pretrained latent diffusion model." arXiv:2506.10605. https://arxiv.org/abs/2506.10605 9. Ganin, Y. et al. (2016). "Domain-Adversarial Training of Neural Networks." JMLR 17(59):1-35. https://jmlr.org/papers/v17/15-239.html 10. Perez, E. et al. (2018). "FiLM: Visual Reasoning with a General Conditioning Layer." AAAI 2018. arXiv:1709.07871. https://arxiv.org/abs/1709.07871 + +**2026-Q2 re-scope (§2.0) — masked-autoencoder foundation pre-training:** + +11. "Scale What Counts, Mask What Matters: Evaluating Foundation Models for Zero-Shot Cross-Domain Wi-Fi Sensing." arXiv:2511.18792. https://arxiv.org/html/2511.18792 — 14 datasets / >1.3 M CSI samples; data-breadth > model-capacity. +12. "CIG-MAE: Cross-Modal Information-Guided Masked Autoencoder for Self-Supervised WiFi Sensing." arXiv:2512.04723. https://arxiv.org/html/2512.04723v1 — dual-stream amplitude+phase MAE, information-guided masking. +13. "MU-SHOT-Fi: Self-Supervised Multi-User Wi-Fi Sensing with Source-free Unsupervised Domain Adaptation." arXiv:2605.01369. https://arxiv.org/html/2605.01369 — per-room SFDA (MERIDIAN stage 3). +14. "A Tutorial-cum-Survey on Self-Supervised Learning for Wi-Fi Sensing: Trends, Challenges, and Outlook." arXiv:2506.12052. https://arxiv.org/html/2506.12052 +15. "Evaluating Self-Supervised Learning for WiFi CSI-Based Human Activity Recognition." ACM Trans. Sensor Networks. https://dl.acm.org/doi/10.1145/3715130 +16. RuView 2026-Q2 SOTA survey — `docs/research/sota/2026-Q2-agentic-ai-and-edge-for-ruview.md` (§B1, Part C #1). diff --git a/docs/research/sota/2026-Q2-agentic-ai-and-edge-for-ruview.md b/docs/research/sota/2026-Q2-agentic-ai-and-edge-for-ruview.md new file mode 100644 index 0000000000..80ba33ffe4 --- /dev/null +++ b/docs/research/sota/2026-Q2-agentic-ai-and-edge-for-ruview.md @@ -0,0 +1,241 @@ +# Agentic-AI Breakthroughs & Related SOTA — Applicability to RuView (2026-Q2) + +> **Status:** Research note — non-binding survey. Nothing here is an accepted decision. +> **Date:** 2026-05-11 · **Author:** research pass (Claude Code) · **Scope owner:** ruv +> **Companion docs:** `docs/research/sota/2026-Q2-rf-sensing-and-edge-rust.md`, +> `docs/research/sota-surveys/wifi-sensing-ruvector-sota-2026.md`, +> `docs/research/sota-surveys/ruview-multistatic-fidelity-sota-2026.md` +> **Maps onto ADRs:** 015, 016, 017, 024, 027, 028, 029–032, 039, 040, 069, 081, 084–086, 095, 096 + +--- + +## 0. TL;DR — the eight findings that matter for RuView + +| # | Breakthrough / SOTA result | Why it matters here | Lands against | Horizon | +|---|---|---|---|---| +| 1 | **WiFi-sensing foundation models scale with *data*, not capacity** — a 14-dataset / 1.3 M-CSI-sample MAE study (arXiv 2511.18792) shows log-linear cross-domain gains from pre-training breadth; larger models barely help. AM-FM (2026) pushes this to ~9.2 M samples / 20 device types. | This is the answer to **MERIDIAN (ADR-027)**: the path to cross-room generalisation is a CSI MAE pre-trained on heterogeneous capture, then a tiny task head — *not* a bigger pose net. RuView already has the data-collection plumbing (`scripts/collect-training-data.py`, overnight recordings, MM-Fi/Wi-Pose under ADR-015). | ADR-015, ADR-016, ADR-027 | Medium | +| 2 | **MAE on amplitude *and* phase** — CIG-MAE (arXiv 2512.04723) reconstructs both with a symmetric dual-stream encoder + information-guided masking; phase reconstruction is now SOTA-competitive. | RuView throws away most phase information after `phase_align.rs` / `coherence.rs`. A dual-stream amplitude+phase MAE pre-text task fits the existing `wifi-densepose-train` graph and the AETHER contrastive head (ADR-024). | ADR-016, ADR-024 | Medium | +| 3 | **Source-free unsupervised domain adaptation works for Wi-Fi** — MU-SHOT-Fi (arXiv 2605.01369), Wi-SFDAGR — adapt a deployed model to a new environment with *no source data and no labels*, just the target stream. | This is exactly the **recalibration gate** decision in `ruvsense/coherence_gate.rs` (`Recalibrate`). Today that gate only flags drift; SFDA gives it something to *do* — adapt the head online, on-device, without phoning home. Pairs naturally with SONA / MicroLoRA + EWC++ already in the stack. | ADR-027, ADR-081, ADR-095/096 | Medium | +| 4 | **The "agent harness on the MCU" pattern** — ESP-Claw (Espressif) and the broader hybrid edge/cloud consensus: heavy reasoning stays in the cloud, but the *loop* (sense → decide → act, plus skills/memory/routing) runs on the microcontroller. ESP32-P4 (dual RISC-V 400 MHz, 768 KB SRAM, 32 MB PSRAM); ESP32-S3 vector ISA accelerates NN kernels. | RuView's ESP32 firmware is already a tiny agent: `adaptive_controller.c` (ADR-081) is the decide loop, the WASM tier (ADR-040) is the skill sandbox, `temporal_task.c` (ADR-095/096) is the on-device model, `edge_processing.c` is sensor fusion. Worth re-framing the firmware explicitly as a *constrained autonomous agent* and stealing patterns (skill registry, memory budget, watchdog-as-supervisor). ESP32-P4 is a credible next hardware tier (vs. the S3's ~200 KB free heap). | ADR-040, ADR-081, ADR-095/096, ADR-028 | Short→Medium | +| 5 | **Agent memory has matured into a discipline** — hierarchical episodic/semantic/procedural memory, the LOCOMO benchmark, the ICLR-2026 *MemAgents* workshop, "Memory in the Age of AI Agents" survey, mem0's "State of Agent Memory 2026". | RuView/ruflo already runs a ReasoningBank-style loop (HNSW-indexed trajectories, verdict→distil→consolidate). The new framing adds: (a) *procedural* memory as a first-class type — the **fix-marker witness guard** (`scripts/fix-markers.json`, merged 2026-05-11) is a primitive instance of this; (b) typed-memory eval (LOCOMO-style) for the dev-loop memory. | ADR-016 (memory side), repo CI | Short | +| 6 | **Long-horizon agents + continual learning are converging into "the system, not the model, is the unit of progress"** — METR's task-duration-doubling (~1 h tasks early-2025 → multi-hour by late-2026); Q2-2026 long-horizon agents from the frontier labs; continual-learning + world-models maturing together. | Two reads for RuView: (a) the *dev* workflow — multi-PR, multi-day swarm work is now tractable; the witness/fix-marker guard is the kind of "system memory" that makes long-horizon dev safe (don't silently revert). (b) the *product* — a RuView deployment that *learns its room over weeks* (longitudinal biomechanics drift in `ruvsense/longitudinal.rs`, the persistent field model ADR-030) is the same "world model that adapts" story, just for RF. | ADR-030, ADR-027, repo workflow | Medium | +| 7 | **Streaming / drift-tolerant vector quantisation** — CoDEQ (arXiv 2512.18335, in ruvector): frozen kd-tree + live leaf centroids via Welford → O(1) updates, no k-means retrain, 7.5× faster build than PQ, ~7 % standalone recall@10 (coarse pre-filter only). RaBitQ / Extended-RaBitQ (arXiv 2405.12497 + follow-ups) remains the high-recall 1-bit workhorse. | RuView's vector tier is RaBitQ + HNSW (ADR-084/085, ADR-016). CoDEQ is **not** a replacement (recall too low) and its "no k-means" advantage is over PQ, which RuView doesn't use. It's a potential *new tier* only if (a) on-ESP32 adaptive quant becomes a goal, or (b) HNSW build cost shows up in a profile as the mesh scales. Deferred — see Part C. | ADR-016, ADR-084/085 | Defer | +| 8 | **Agentic verification / "show your work" is now an explicit research theme** (eval harnesses, trajectory judging, reproducibility gates). | RuView is ahead of the curve here: the ADR-028 deterministic-pipeline proof (`archive/v1/data/proof/verify.py` → SHA-256), the release witness bundle, and the new fix-marker regression guard *are* the "verifiable agent output" pattern applied to a codebase. Worth writing this up as a reference pattern (it's reusable beyond RuView). | ADR-028, repo CI | Done / extend | + +**One-line recommendation:** the highest-leverage move is **#1 + #2 + #3 together** — a heterogeneous-CSI masked-autoencoder pre-train (amplitude+phase) plus a source-free online-adaptation hook on the recalibration gate. That's the credible path to closing MERIDIAN, and every piece of plumbing it needs already exists in the repo. Everything else is incremental or already in flight. + +--- + +## 1. Method & scope + +This note surveys two adjacent literatures as of 2026-Q2 and filters hard for RuView relevance: + +- **Agentic AI** — long-horizon agents, agent memory architectures, self-improving / continual-learning agents, multi-agent coordination, on-device ("edge") agents, retrieval & memory compression, agentic evaluation/verification. +- **RF / WiFi sensing** — CSI foundation models & self-supervised pre-training, domain adaptation, the DensePose-from-WiFi lineage, multistatic / distributed sensing, mmWave + WiFi fusion, adversarial robustness & privacy, through-wall / stand-off radar. + +Selection bias is deliberate: a result is in only if there's a concrete hook into an existing RuView crate, ADR, or workflow. Pure-LLM-application work (browser agents, code agents, RAG-over-docs) is out except where it has changed how *this project's* dev loop or firmware should be structured. References (Part E) carry arXiv IDs / DOIs where available. + +--- + +## PART A — Agentic-AI breakthroughs (the relevant slice) + +### A1. Long-horizon agents and "the system is the unit of progress" + +The headline 2026 shift isn't a model — it's that agents can now hold a goal across hours and many steps. METR's measurement (AI task duration doubling roughly every seven months) crossed from ~1-hour tasks in early 2025 toward multi-hour workstreams by late 2026, and the frontier labs are explicitly targeting "long-horizon agents" for H1-2026. Commentary across the field (adaline labs' "Beyond Transformers", the *Architecture of Agency* guide, MLM's "7 trends") converges on: when continual learning + long-horizon planning + world models mature *simultaneously*, the unit of engineering stops being "a model" and becomes "a system" — harness + memory + tools + verification. + +**RuView implications.** Two, on two different timescales: + +1. *Dev workflow* (already realised this session). Multi-PR, multi-day swarm work is tractable, and the failure mode is *forgetting / silently reverting* a hard-won fix. The fix-marker witness guard (`scripts/check_fix_markers.py` + `scripts/fix-markers.json`, the `Fix-Marker Regression Guard` workflow, merged in PR #526) is exactly the "system memory that survives the agent" primitive that makes long-horizon dev safe. It's a procedural-memory artifact (see A5) and a verification artifact (see A7) at once. **Action:** keep growing the manifest; treat it as the project's "don't regress" ledger. +2. *Product* (medium horizon). A RuView install that *learns a room over weeks* — longitudinal biomechanics drift (`ruvsense/longitudinal.rs`, Welford stats), the persistent field-model eigenstructure (ADR-030, `ruvsense/field_model.rs`), cross-room fingerprinting (`ruvsense/cross_room.rs`) — is the same "world model that adapts" pattern, in the RF domain. The agentic literature's framing (episodic→semantic→procedural consolidation) is a useful lens for organising what the deployed node should remember about its environment vs. discard. + +### A2. Agent memory has become a discipline + +2026 is the year "agent memory" stopped being an implementation detail. Signals: the **LOCOMO** benchmark for long-term conversational memory (the first apples-to-apples comparison of memory architectures); the **ICLR-2026 MemAgents workshop**; the *"Memory in the Age of AI Agents: A Survey"* paper list; mem0's "State of AI Agent Memory 2026". The settled taxonomy: **episodic** (what happened), **semantic** (distilled facts), **procedural** (how to do things / skills) — with consolidation passes promoting episodic → semantic → procedural, hierarchical stores, and (newer) multi-agent *shared* memory. + +**RuView/ruflo already runs a ReasoningBank-style loop** — HNSW-indexed trajectory store, verdict-judge → distil → consolidate, experience replay (this is in the `.claude-flow/` coordination layer and the `reasoningbank-*` skills). What the 2026 framing adds: + +- **Procedural memory as a first-class artifact.** The fix-marker manifest is procedural memory for the *codebase*: "the way we do CSI capture is `WIFI_PS_NONE` before promiscuous (#521); the way we handle sibling UDP magics is `ruview_sibling_packet_name` (#517); …". It's checked-in, diffable, CI-enforced. That's exactly what the literature recommends and almost nobody does. **Action:** generalise the pattern — a `docs/research/`-adjacent "decisions ledger" that links ADRs ↔ code markers ↔ tests is a natural extension. +- **Typed-memory evaluation.** A LOCOMO-style harness for the *dev-loop* memory (does recall surface the right ADR/pattern for a given task?) would be cheap and would catch memory rot. Low priority but easy. +- **Sensor-side memory.** The deployed ESP32 node has its own (tiny) episodic/semantic split: `edge_processing.c` ring buffer (episodic), the rolling vitals/presence stats (semantic), the NVS config + adaptive-controller policy (procedural). Worth being explicit about the budget (the S3 has ~200 KB free heap) and what gets consolidated vs. dropped. + +### A3. Self-improving / continual-learning agents + +The "self-improving" thread: agents that analyse past outcomes and evolve their strategies (this is the verdict→distil loop again, plus online weight updates). The hard problem is doing it without catastrophic forgetting. RuView is *already living in this space*: **SONA** (self-optimising neural architecture) + **MicroLoRA** online adaptation + **EWC++** for forgetting-resistance is named in the project's own positioning, and ADR-024 (AETHER contrastive CSI embedding) + ADR-095/096 (on-ESP32 temporal head with sparse GQA) are the substrate. + +**What's new and worth pulling in:** + +- **Source-free domain adaptation as the *update rule*** (see B2) — MU-SHOT-Fi / Wi-SFDAGR show you can adapt to a new environment with target stream only. Wire that as the action behind `coherence_gate.rs::Recalibrate`: when the coherence z-score gate says "this room has drifted", run an SFDA pass on the temporal head (MicroLoRA delta, EWC++ regulariser, bounded steps) instead of just degrading to `PredictOnly`. +- **Bounded, auditable online learning.** The fix-marker / witness culture should extend to learned weights: every on-device adaptation event should be logged (the firmware already has the witness-chain primitives from the Cognitum/`brain` work and ADR-028) so "the model in this room is now divergent from the shipped checkpoint" is *observable*, not silent — the same lesson as #505 (mislabelled firmware), one layer up. + +### A4. Multi-agent coordination at the edge — RuView's mesh *is* a swarm + +A 4–6 node ESP32 RuView deployment is, structurally, a multi-agent system: each node senses, runs an edge-tier pipeline, makes local decisions (channel hop, role, send-rate via `adaptive_controller.c`), and they fuse via TDM + multistatic attention. The agentic-systems literature (the arXiv 2601.12560 architectures/taxonomies paper; Byzantine-consensus and market-based task-allocation work) maps cleanly: + +- **Role assignment / task allocation** — which node is the coordinator, which compute the per-cluster π hop (ADR-083), which go `PredictOnly` when degraded — is a market/auction problem. RuView does this ad hoc today; the literature has cleaner protocols. +- **Byzantine robustness** — ADR-032 (multistatic mesh security hardening) and `ruvsense/adversarial.rs` (physically-impossible-signal detection, multi-link consistency) are RuView's answer to "a node lies / is spoofed". The 2026 framing: treat it as Byzantine-fault-tolerant sensor fusion with an explicit fault model, not just heuristics. +- **Shared memory across the swarm** — the multi-agent-shared-memory thread suggests the nodes should converge on a shared *field model* (ADR-030) the way agents converge on shared semantic memory: CRDT-style, eventually consistent, with the coordinator as a soft leader (raft-ish). Some of this is sketched in `radio_ops.rs` (mesh header, node status, anomaly alerts). + +**Action:** a short ADR re-framing the firmware mesh as a BFT sensor swarm with explicit role-auction + shared-field-model-as-CRDT would unify ADR-029/030/032/081/083 and give the implementation a literature to lean on. Low urgency, high coherence value. + +### A5. The "agent harness on the MCU" pattern (most directly actionable) + +Espressif's **ESP-Claw** crystallised something RuView half-built already: keep the LLM in the cloud, but run the *whole agent loop* — skills, tools, memory, routing, the sense→decide→act cycle — on the microcontroller. The hardware backs it: **ESP32-P4** (dual RISC-V @ 400 MHz, ~768 KB internal SRAM, 32 MB PSRAM, HW H.264) is Espressif's AI-focused part; the **ESP32-S3** vector ISA already accelerates NN kernels (which is why ADR-095/096's on-device temporal head is viable at all). The broader 2026 consensus is *hybrid*: MCU runs lightweight models + rule-based agents for real-time decisions, offload heavy reasoning only when needed. + +**RuView's firmware is already a constrained agent — name it as one:** + +| Agent component (ESP-Claw-ish) | RuView firmware equivalent | +|---|---| +| Decide loop | `adaptive_controller.c` (ADR-081) — fast/med/slow ticks, channel/role/send-rate policy | +| Skill sandbox | WASM tier (ADR-040) — uploadable `.wasm` modules, `on_timer()`, the upload/list/start/stop endpoints | +| On-device model | `temporal_task.c` (ADR-095/096) — sparse-GQA temporal head, emits `0xC5110007` classifications | +| Sensor fusion / perception | `edge_processing.c` — vitals, presence, fall detection, feature vectors; `csi_collector.c` — capture + the `WIFI_PS_NONE` fix | +| Memory | NVS config (procedural), ring buffer (episodic), rolling stats (semantic), witness chain (audit) | +| Supervisor / watchdog | `task_wdt` + the `EDGE_BATCH_LIMIT` yield discipline (#266/#321) — "kill the agent if it starves the idle task" | +| Telemetry / trajectory | UDP packet stream (`0xC5110001`–`0xC5110007`), boot log, OTA status | + +**Actions worth doing:** +1. **An ADR that explicitly models the firmware as a tiered autonomous agent** (Tier-0 rules → Tier-1 WASM skills → Tier-2 on-device temporal head → Tier-3 cloud), with a stated memory budget and a "supervisor" contract. This mostly *documents and unifies* ADR-040/081/086/095/096 — but the framing buys clarity and a literature. +2. **Track ESP32-P4 as a real hardware tier.** The S3's ~200 KB free heap is the binding constraint on the temporal head and the WASM tier; the P4's 32 MB PSRAM changes that calculus. Worth a feasibility note (and it's a natural place for a bigger MAE-derived head). +3. **Steal ESP-Claw's skill-registry ergonomics** for the WASM tier (ADR-040) — versioned skills, declared capabilities, a deny-by-default policy (this echoes the `brain_sdk_allow`/`deny` pattern already in the `logi-brain` MCP surface). + +### A6. Retrieval & memory compression + +The fast-moving sub-area: 1-bit / few-bit vector quantisation with theoretical error bounds. **RaBitQ** (arXiv 2405.12497) and **Extended-RaBitQ** are the high-recall workhorses — and RuView already runs them (ADR-084 RaBitQ similarity sensor, ADR-085 pipeline expansion, plus the ruvector RaBitQ binding). **CoDEQ** (arXiv 2512.18335, in ruvector) is the new streaming/drift-tolerant entrant: frozen kd-tree + Welford-updated leaf centroids, O(1) updates, 7.5× faster build than PQ, but ~7 % standalone recall@10 — a coarse pre-filter, never a sole index. **Graph RAG / hybrid (sparse+dense) retrieval** with MMR diversity reranking is the other settled pattern (and `ruflo-rag-memory` already implements it for the dev loop). + +**RuView read (see Part C for the verdict):** RaBitQ + HNSW (current) is the right stack for high-recall CSI/pose matching. CoDEQ is *deferred* — its "no k-means retrain" edge is over PQ (which RuView doesn't use), and there's no measured bottleneck it relieves. It becomes interesting only on the ESP32 (where a tiny, streaming, retrain-free quantiser might be the *only* thing that fits) — i.e. it's a possible ingredient of the ADR-095/096 line, not of the server-side index. + +### A7. Agentic verification / "show your work" + +A genuine 2026 research theme: eval harnesses, trajectory judging, reproducibility gates, "the agent must produce a verifiable artifact." RuView is, unusually, *ahead* here: + +- **ADR-028 deterministic-pipeline proof** — `archive/v1/data/proof/verify.py` feeds a seeded reference signal through the production pipeline and SHA-256-hashes the output; `expected_features.sha256` pins it; `verify-pipeline.yml` re-runs it on every PR (twice, for determinism). This is a "tamper-evident agent output" by construction. +- **Release witness bundle** — `scripts/generate-witness-bundle.sh` → a recipient-verifiable `VERIFY.sh` packet (witness log + proof + test results + firmware hashes + crate versions). +- **Fix-marker regression guard** (new, merged PR #526) — `scripts/fix-markers.json` + `scripts/check_fix_markers.py` + the `Fix-Marker Regression Guard` workflow: every shipped fix asserts its own continued presence; reverting one fails CI; intentional removal forces a manifest diff (= the audit trail). +- **`firmware-ci.yml` version-guard** (new) — a release tag can't ship a binary whose `version.txt` doesn't match the tag (the #505 lesson, automated). + +**Action:** write this up as a reusable pattern (it generalises well beyond RuView — it's basically "ReasoningBank verdicts, but for a repo"). A `docs/` note or a public gist; possibly an ADR ("verification artifacts as a project contract"). The `ruflo-core:witness` plugin is the cross-project version of the same idea — worth a cross-reference. + +--- + +## PART B — RF / WiFi-sensing SOTA (related research) + +### B1. WiFi-sensing foundation models & self-supervised CSI — *the big one* + +The dominant 2025→2026 result: **masked autoencoding (MAE) pre-training on large, heterogeneous CSI pools beats supervised baselines on cross-domain tasks, and the bottleneck is data breadth, not model size.** + +- *"Scale What Counts, Mask What Matters: Evaluating Foundation Models for Zero-Shot Cross-Domain Wi-Fi Sensing"* (arXiv 2511.18792) — pre-trains/evaluates across **14 datasets, >1.3 M CSI samples, 4 device types, 2.4/5/6 GHz**; finds **log-linear** cross-domain gains with pre-training data (+2.2 % to +15.7 % over supervised), **marginal** gains from bigger models. Tasks: activity, gesture, user-ID. +- **AM-FM** (2026) — billed as the first true WiFi foundation model: **~9.2 M samples, ~20 device types**. +- *"A Tutorial-cum-Survey on Self-Supervised Learning for Wi-Fi Sensing"* (arXiv 2506.12052) and the ACM TOSN evaluation (DOI 10.1145/3715130) — MAE is the consistently strong SSL choice for CSI. +- **CIG-MAE** (arXiv 2512.04723) — dual-stream MAE reconstructing **both amplitude and phase**, with information-guided masking (mask the high-info regions). Phase reconstruction is now competitive — historically the hard part. +- **CIR–CSI consistency** (arXiv 2502.11965), **WiFo-CF** (arXiv 2508.04068) — channel/CSI-feedback foundation models from the comms side; relevant as architecture priors and for the multi-link / MIMO framing. + +**RuView mapping.** This *is* the MERIDIAN program (ADR-027) — and RuView already has the pieces: +- Data plumbing: `scripts/collect-training-data.py`, `scripts/collect-ground-truth.py`, overnight CSI recordings, MM-Fi + Wi-Pose ingestion (ADR-015), the `data/recordings/` corpus. +- Training graph: `wifi-densepose-train` (ADR-016, ruvector-integrated) — a place to bolt an MAE pre-text head on. +- Embedding: AETHER contrastive head (ADR-024) — natural fine-tune target after MAE pre-train. +- Compression: `CompressedCsiBuffer` (`dataset.rs`, ruvector-temporal-tensor) — already streams CSI history; a sensible substrate for masked-token pre-training. + +**Concrete plan (this is the recommendation):** ADR-027 should become "**heterogeneous-CSI MAE pre-train (amplitude+phase, CIG-MAE-style) → small task head**", with the explicit thesis *data breadth > pose-net capacity*. Phase 1: pool every CSI source RuView can reach (own recordings, MM-Fi, Wi-Pose, public CSI datasets, multi-band virtual subcarriers from `multiband.rs`) and run an MAE pre-train. Phase 2: fine-tune the 17-keypoint head + AETHER embedding on top. Phase 3: ship the encoder; the per-room work becomes adaptation (B2), not retraining. + +### B2. Source-free / unsupervised domain adaptation + +- **MU-SHOT-Fi** (arXiv 2605.01369) — self-supervised *multi-user* Wi-Fi sensing with **source-free** unsupervised domain adaptation: adapt with target stream only, no source data, no labels. +- **Wi-SFDAGR** — WiFi cross-domain gesture recognition via source-free domain adaptation (IEEE). +- *Self-supervised WiFi-based identity recognition in multi-user smart environments* (PMC12115556) — relevant for AETHER re-ID across rooms. + +**RuView mapping.** This is the *missing action* behind `ruvsense/coherence_gate.rs::Recalibrate` and the recalibration recommendations in `coherence.rs` (`RECOMMEND_RECAL` quality flag, already on the wire in `rv_feature_state.h`). Today the gate detects environment drift and degrades; SFDA lets it *fix* itself: a bounded MicroLoRA-delta adaptation pass on the temporal head, EWC++-regularised, triggered by the coherence z-score, logged via the witness chain. Multi-user SFDA (MU-SHOT-Fi) is directly relevant because RuView's whole point is multi-person (the `DynamicPersonMatcher`, the COCO-17 multi-track output). + +### B3. The DensePose-from-WiFi lineage — where RuView sits + +Origin: **CMU's "DensePose From WiFi"** (Geng et al., arXiv 2301.00250, building on the 2022 RI thesis CMU-RI-TR-22-59) — UV-coordinate dense pose from CSI using 3×3 MIMO commercial NICs (Intel 5300 / Atheros). The honest gap (well-documented in RuView's own issues #506/#509): that work relies on rich multi-antenna spatial resolution; ESP32 is 1×1 SISO. RuView's bet is to recover spatial diversity *across nodes* (4–6 ESP32, TDM, multistatic attention-weighted fusion, 168 virtual subcarriers via 3 channels × 56) rather than within one rich NIC — plus a Rust pipeline at sub-50 ms (~800× over the original Python) and SONA on-device adaptation. The foundation-model results (B1) are what make the ESP32 path plausible: a strong pre-trained CSI encoder lowers how much the noisy SISO multistatic input has to carry. + +### B4. Multistatic / distributed RF sensing & multi-band fusion + +Active area, and RuView's `ruvsense/` is a fairly complete implementation of it: multi-band frame fusion + cross-channel coherence (`multiband.rs`), iterative LO phase-offset estimation (`phase_align.rs`), attention-weighted multistatic fusion with geometric diversity (`multistatic.rs`), RF tomography with an ISTA L1 solver (`tomography.rs`), the persistent room-eigenstructure field model (`field_model.rs`, ADR-030), cross-viewpoint attention with geometric bias and Cramér-Rao / Fisher-information bounds (`ruvector/src/viewpoint/`). The SOTA reading is mostly: the geometry-aware-attention + information-bound framing RuView already uses is the right one; the gaps are (a) a cleaner statistical fault model (→ A4, ADR-032) and (b) tying the field model to the foundation-model encoder (a pre-trained encoder + a per-room eigenstructure prior is a strong combo). + +### B5. mmWave + WiFi fusion, vital signs + +ADR-063 (mmWave sensor fusion, ESP32-C6 + Seeed MR60BHA2 over UART, 60 GHz FMCW) and ADR-021 (ESP32 CSI-grade vital-sign extraction, the `wifi-densepose-vitals` 4-stage pipeline) put RuView in the multimodal-vitals SOTA. The literature trend: WiFi gives coarse presence/macro-motion + room-scale coverage; 60 GHz FMCW gives precise HR/BR but narrow FOV; fusion (Kalman / attention) beats either. RuView's `mmwave_fusion_bridge.py` + the `0xC5110004` fused-vitals packet are the implementation. Watch: the cardiac/respiration-from-CSI-alone work keeps improving (RuView already does breathing/HR from CSI in `breathing.rs` / `bvp.rs`) — a good MAE pre-train (B1) should help here too since respiration is a periodic-structure problem. + +### B6. Adversarial robustness & privacy + +`ruvsense/adversarial.rs` (physically-impossible-signal detection, multi-link consistency) + ADR-032 (multistatic mesh security) are RuView's stake. The 2026 framing: (a) **spoofing/jamming** as Byzantine faults in a sensor swarm (→ A4) with a stated adversary model; (b) **privacy** — WiFi sensing is "camera-free" but still biometric (gait, breathing, re-ID embeddings are PII); the project already gestures at this (privacy logs in the ADR-084 RaBitQ-sensor framing), and the broader move is toward on-device-only processing + differential-privacy on any exported embedding. The `aidefence` / PII-detection surface in the ruflo toolchain is the dev-side analogue. + +### B7. Through-wall / NLOS, stand-off radar tier + +ADR-091 (stand-off radar tier research) and the single-sided through-wall thread (issue #424) sit here. SOTA: through-wall pose/activity is real but resolution-limited; multistatic helps (more look-angles see around the wall differently); the foundation-model encoders (B1) are starting to include NLOS data in their pre-training pools, which is the cleanest path to robustness. Mostly a "watch" item for RuView — the multi-node multistatic architecture is already the right substrate. + +--- + +## PART C — Synthesis: what's actionable for RuView + +Prioritised. "Effort" is rough; "horizon" is short (<1 mo) / medium (1–3 mo) / long (>3 mo) / defer. + +| Rank | Action | Impact | Effort | Horizon | New ADR? | Notes | +|---|---|---|---|---|---|---| +| **1** | **MERIDIAN ⇒ heterogeneous-CSI MAE pre-train (amplitude+phase, CIG-MAE-style) → small task head.** Pool all reachable CSI (own recordings + MM-Fi + Wi-Pose + public + multi-band virtual subcarriers), MAE pre-train in `wifi-densepose-train`, fine-tune the 17-kpt + AETHER heads on top. | Very high — this is the cross-room story | High | Long | Re-scope ADR-027; possibly fold ADR-016 | All plumbing exists. Thesis: *data breadth > pose-net capacity* (2511.18792). | +| **2** | **Source-free online adaptation as the action behind `coherence_gate.rs::Recalibrate`.** Bounded MicroLoRA-delta + EWC++ pass on the temporal head, triggered by the coherence z-score, logged via the witness chain. | High — turns a "detect" into a "fix" | Medium | Medium | New ADR (sibling to ADR-027/081/095) | MU-SHOT-Fi / Wi-SFDAGR. Multi-user variant matters (RuView is multi-person). | +| **3** | **ADR: "firmware as a tiered autonomous agent."** Document & unify ADR-040/081/086/095/096 under the ESP-Claw-style agent model (Tier-0 rules → Tier-1 WASM skills → Tier-2 on-device temporal head → Tier-3 cloud), with a memory budget and a supervisor contract. | Medium — clarity + a literature to lean on | Low | Short | Yes (mostly documentation) | Cheap; high coherence value. | +| **4** | **Track ESP32-P4 as a hardware tier.** Feasibility note: 32 MB PSRAM vs. the S3's ~200 KB free heap changes what the temporal head / WASM tier / a bigger MAE-derived head can be. | Medium — unblocks the on-device-model ceiling | Low | Short | Note → ADR if pursued | Espressif's AI-focused part; S3 vector ISA stays the floor. | +| **5** | **Write up the verification-artifact pattern** (ADR-028 proof + witness bundle + fix-marker guard + version-guard) as a reusable reference; cross-link `ruflo-core:witness`. Keep growing `fix-markers.json`. | Medium — reuse beyond RuView; protects long-horizon dev | Low | Short | Optional ADR | RuView is ahead of the field here; make it legible. | +| **6** | **ADR: firmware mesh as a BFT sensor swarm** — explicit role-auction + shared-field-model-as-CRDT + a stated adversary model. Unifies ADR-029/030/032/081/083. | Medium — coherence; sets up ADR-032 properly | Medium | Medium | Yes | Lean on the agentic-systems / Byzantine-sensor-fusion literature. | +| **7** | **Phase-aware features end-to-end.** RuView discards most phase after `phase_align.rs`/`coherence.rs`; CIG-MAE shows phase reconstruction is now SOTA-competitive. Carry an amplitude+phase representation into the embedding. | Medium | Medium | Medium | Folds into #1 | Likely just falls out of #1 if the MAE is dual-stream. | +| **8** | **CoDEQ — DEFER, with a stub.** Add a one-paragraph "alternatives considered" to ADR-017: streaming/drift-tolerant coarse quant; no current bottleneck; revisit only if (a) on-ESP32 adaptive quant becomes a goal or (b) HNSW build/rebuild shows up in a profile as the mesh scales. | Low (now) | Trivial | Defer | Stub in ADR-017 | RaBitQ+HNSW is the right stack today; CoDEQ's "no k-means" edge is over PQ, which RuView doesn't use. | + +**If you do one thing:** #1. If you do two: #1 + #2 (they're complementary — pre-train for breadth, adapt for the room). #3/#4/#5 are cheap and worth slipstreaming. #6 is housekeeping that pays off when ADR-032 gets serious. #7 is probably free given #1. #8 is "don't lose the idea." + +--- + +## PART D — Watch list / open questions + +- **AM-FM and the next WiFi foundation models** — when one ships with permissive weights + a CSI tokeniser RuView can adopt, #1 gets dramatically cheaper. Watch arXiv / Hugging Face. +- **Phase-faithful CSI capture on ESP32** — how much usable phase does the S3 actually deliver under the MGMT-only promiscuous regime (#396)? Worth a measurement; gates how much of CIG-MAE's amplitude+phase advantage RuView can realise on cheap hardware. +- **On-device MAE-derived heads** — is a distilled/quantised MAE encoder small enough for the S3 (or does it need the P4)? Determines whether the foundation model lives only server-side or also on the node. +- **LOCOMO-style eval for the dev-loop memory** — does ReasoningBank recall actually surface the right ADR/pattern? Cheap to measure; would catch memory rot. +- **Byzantine fault model for the mesh** — pin down the adversary (spoofed node? jammed link? compromised firmware? replay?) before ADR-032 implementation, not after. +- **Differential privacy on exported embeddings** — AETHER re-ID embeddings are biometric; if any leave the box (multi-room hand-off, cloud tier), what's the DP budget? +- **CoDEQ revisit trigger** — only if a profile shows HNSW build/rebuild as a bottleneck, or if on-ESP32 adaptive quant becomes a stated goal. +- **ESP-Claw / on-MCU agent frameworks** — track Espressif's releases; the skill-registry / capability-policy ergonomics are directly stealable for the WASM tier (ADR-040). + +--- + +## PART E — References + +WiFi / RF sensing: +- DensePose From WiFi — Geng et al., arXiv [2301.00250](https://arxiv.org/abs/2301.00250); CMU RI thesis CMU-RI-TR-22-59. +- Scale What Counts, Mask What Matters: Evaluating Foundation Models for Zero-Shot Cross-Domain Wi-Fi Sensing — arXiv [2511.18792](https://arxiv.org/html/2511.18792). +- CIG-MAE: Cross-Modal Information-Guided Masked Autoencoder for Self-Supervised WiFi Sensing — arXiv [2512.04723](https://arxiv.org/html/2512.04723v1). +- MU-SHOT-Fi: Self-Supervised Multi-User Wi-Fi Sensing with Source-free Unsupervised Domain Adaptation — arXiv [2605.01369](https://arxiv.org/html/2605.01369). +- A Tutorial-cum-Survey on Self-Supervised Learning for Wi-Fi Sensing — arXiv [2506.12052](https://arxiv.org/html/2506.12052). +- Evaluating Self-Supervised Learning for WiFi CSI-Based Human Activity Recognition — ACM TOSN, [10.1145/3715130](https://dl.acm.org/doi/10.1145/3715130). +- A MIMO Wireless Channel Foundation Model via CIR–CSI Consistency — arXiv [2502.11965](https://arxiv.org/html/2502.11965). +- WiFo-CF: Wireless Foundation Model for CSI Feedback — arXiv [2508.04068](https://arxiv.org/pdf/2508.04068). +- Wi-SFDAGR: WiFi-Based Cross-Domain Gesture Recognition via Source-Free Domain Adaptation — IEEE (DOI per IEEE Xplore listing). +- Self-Supervised WiFi-Based Identity Recognition in Multi-User Smart Environments — PMC [PMC12115556](https://pmc.ncbi.nlm.nih.gov/articles/PMC12115556/). +- (project context) RuView issues #68 (MERIDIAN/ADR-027), #506, #509, #424. + +Agentic AI / memory / edge: +- Agentic Artificial Intelligence (AI): Architectures, Taxonomies, and Evaluation of LLM Agents — arXiv [2601.12560](https://arxiv.org/html/2601.12560v1). +- MemAgents: Memory for LLM-Based Agentic Systems — ICLR-2026 workshop proposal, OpenReview [U51WxL382H](https://openreview.net/pdf?id=U51WxL382H). +- Memory in the Age of AI Agents: A Survey — paper list: github.com/Shichun-Liu/Agent-Memory-Paper-List. +- 2026 agent papers collection — github.com/VoltAgent/awesome-ai-agent-papers. +- State of AI Agent Memory 2026 — mem0.ai/blog/state-of-ai-agent-memory-2026. +- LOCOMO benchmark (long-term conversational memory) — see the mem0 and agent-memory survey references. +- METR — measuring AI ability to complete long tasks (task-duration doubling). +- ESP-Claw / on-MCU AI agents — Espressif; xda-developers coverage; ESP32-P4 / ESP32-S3 vector ISA datasheets. +- (project) ReasoningBank — the trajectory verdict→distil→consolidate loop RuView/ruflo implements; `reasoningbank-*` skills. + +Retrieval / quantisation: +- RaBitQ: Quantizing High-Dimensional Vectors with a Theoretical Error Bound — arXiv [2405.12497](https://arxiv.org/abs/2405.12497); Extended-RaBitQ follow-ups. +- CoDEQ: streaming vector quantisation (frozen kd-tree + Welford leaf centroids) — arXiv 2512.18335; in `ruvector`. Gist: https://gist.github.com/ruvnet/d10fe656bd0fa68b4eb873ad299c6d4e. + +RuView internal (for the mapping): +- ADRs: 014, 015, 016, 017, 021, 022, 024, 027, 028, 029, 030, 031, 032, 039, 040, 045, 060, 061, 062, 063, 069, 080–086, 089–096 — see `docs/adr/`. +- Crates: `wifi-densepose-{core,signal,nn,train,mat,hardware,ruvector,api,db,config,wasm,cli,sensing-server,wifiscan,vitals}`, `nvsim` — see project `CLAUDE.md`. +- Modules: `ruvsense/*` (signal crate), `viewpoint/*` (ruvector crate); firmware `main/*.c`. +- Verification artifacts: `archive/v1/data/proof/verify.py`, `scripts/generate-witness-bundle.sh`, `scripts/check_fix_markers.py` + `scripts/fix-markers.json`, `.github/workflows/{verify-pipeline,firmware-ci,fix-regression-guard}.yml`. +- Related surveys in this repo: `docs/research/sota/2026-Q2-rf-sensing-and-edge-rust.md`, `docs/research/sota-surveys/{wifi-sensing-ruvector-sota-2026,ruview-multistatic-fidelity-sota-2026,sota-wifi-sensing-2025}.md`, `docs/research/rf-topological-sensing/*`. + +--- + +*Generated by Claude Code (research pass), 2026-05-11. Treat as input to ADR discussions, not as decisions.* diff --git a/scripts/pretrain-mae-gcloud.sh b/scripts/pretrain-mae-gcloud.sh new file mode 100644 index 0000000000..b9b44df1ce --- /dev/null +++ b/scripts/pretrain-mae-gcloud.sh @@ -0,0 +1,162 @@ +#!/bin/bash +# ============================================================================== +# GCloud GPU driver for the MERIDIAN CSI masked-autoencoder pre-train (ADR-027 §2.0) +# ============================================================================== +# +# Creates a GCloud VM with a GPU, builds wifi-densepose-train with the +# `tch-backend` (+ `cuda`) feature, runs the `pretrain-mae` binary, downloads +# the pre-trained variable store (`.ot`), and tears the VM down. +# +# STATUS: prototype wiring stub (ADR-027 §2.0, iteration 3). The `pretrain-mae` +# binary currently drives the *deterministic SyntheticCsiDataset* — that's the +# end-to-end smoke path. The real heterogeneous-CSI pre-train (MM-Fi + Wi-Pose + +# data/recordings/ + multi-band virtual sub-carriers) needs the ingest pipeline +# tracked in ADR-027 §2.0 "Iteration 3 plan"; the TODO markers below show where +# it plugs in. This script is intentionally a thin, reviewable shell of the real +# gcloud-train.sh (which it mirrors) — it has NOT been run. +# +# Usage: +# bash scripts/pretrain-mae-gcloud.sh [OPTIONS] +# +# Options: +# --gpu l4|a100|h100 GPU type (default: l4) +# --zone ZONE GCloud zone (default: us-central1-a) +# --hours N Max VM lifetime in hours (default: 3) +# --epochs N Pre-train epochs (default: 20) +# --samples N Synthetic samples (until the real ingest lands) (default: 4096) +# --batch N Mini-batch size (default: 64) +# --mask-ratio R Token mask ratio (default: 0.75) +# --lr R Adam learning rate (default: 1e-3) +# --out FILE Local path for the downloaded .ot (default: data/models/mae-pretrained.ot) +# --data-dir DIR (future) heterogeneous CSI corpus to upload — see TODO below +# --dry-run Build + run a tiny pre-train locally with synthetic data; no VM +# --keep-vm Do not delete the VM after the run +# --instance NAME Custom VM instance name +# +# Prerequisites (same as gcloud-train.sh): +# - gcloud CLI authenticated: gcloud auth login +# - Project set: gcloud config set project cognitum-20260110 +# - GPU quota in the chosen zone +# +# Cost (same envelope as gcloud-train.sh): +# L4 ~$0.80/hr (prototyping) · A100 40GB ~$3.60/hr (full pre-train) · H100 80GB ~$11/hr +# ============================================================================== + +set -euo pipefail + +# ── Defaults ────────────────────────────────────────────────────────────────── +PROJECT="cognitum-20260110" +GPU_TYPE="l4" +ZONE="us-central1-a" +HOURS=3 +EPOCHS=20 +SAMPLES=4096 +BATCH=64 +MASK_RATIO=0.75 +LR="1e-3" +OUT="data/models/mae-pretrained.ot" +DATA_DIR="" +DRY_RUN=0 +KEEP_VM=0 +INSTANCE="meridian-mae-$(date +%s)" + +# ── Arg parse ───────────────────────────────────────────────────────────────── +while [[ $# -gt 0 ]]; do + case "$1" in + --gpu) GPU_TYPE="$2"; shift 2;; + --zone) ZONE="$2"; shift 2;; + --hours) HOURS="$2"; shift 2;; + --epochs) EPOCHS="$2"; shift 2;; + --samples) SAMPLES="$2"; shift 2;; + --batch) BATCH="$2"; shift 2;; + --mask-ratio) MASK_RATIO="$2"; shift 2;; + --lr) LR="$2"; shift 2;; + --out) OUT="$2"; shift 2;; + --data-dir) DATA_DIR="$2"; shift 2;; + --dry-run) DRY_RUN=1; shift;; + --keep-vm) KEEP_VM=1; shift;; + --instance) INSTANCE="$2"; shift 2;; + -h|--help) sed -n '2,46p' "$0"; exit 0;; + *) echo "unknown option: $1" >&2; exit 2;; + esac +done + +case "$GPU_TYPE" in + l4) ACCEL="type=nvidia-l4,count=1"; MACHINE="g2-standard-8";; + a100) ACCEL="type=nvidia-tesla-a100,count=1"; MACHINE="a2-highgpu-1g";; + h100) ACCEL="type=nvidia-h100-80gb,count=1"; MACHINE="a3-highgpu-1g";; + *) echo "unknown --gpu: $GPU_TYPE (l4|a100|h100)" >&2; exit 2;; +esac + +PRETRAIN_ARGS="--epochs $EPOCHS --samples $SAMPLES --batch $BATCH --mask-ratio $MASK_RATIO --lr $LR --save mae-pretrained.ot" + +# ── Dry run: build + tiny pre-train locally (synthetic data), no VM ─────────── +if [[ "$DRY_RUN" -eq 1 ]]; then + echo "[dry-run] cargo run -p wifi-densepose-train --features tch-backend --bin pretrain-mae -- --epochs 2 --samples 64 --batch 8" + echo "[dry-run] (requires LibTorch — set LIBTORCH or use a tch download-libtorch feature build)" + cd "$(dirname "$0")/../v2" + cargo run -p wifi-densepose-train --features tch-backend --bin pretrain-mae -- --epochs 2 --samples 64 --batch 8 + exit 0 +fi + +# ── Provision VM ────────────────────────────────────────────────────────────── +echo "==> Project: $PROJECT Zone: $ZONE GPU: $GPU_TYPE Machine: $MACHINE Instance: $INSTANCE" +gcloud config set project "$PROJECT" >/dev/null +gcloud compute instances create "$INSTANCE" \ + --zone="$ZONE" --machine-type="$MACHINE" \ + --accelerator="$ACCEL" --maintenance-policy=TERMINATE \ + --image-family=pytorch-latest-gpu --image-project=deeplearning-platform-release \ + --boot-disk-size=128GB --metadata="install-nvidia-driver=True" \ + --max-run-duration="${HOURS}h" --instance-termination-action=DELETE + +cleanup() { + if [[ "$KEEP_VM" -eq 0 ]]; then + echo "==> Deleting VM $INSTANCE" + gcloud compute instances delete "$INSTANCE" --zone="$ZONE" --quiet || true + else + echo "==> --keep-vm set; VM $INSTANCE left running (remember to delete it)." + fi +} +trap cleanup EXIT + +run_remote() { gcloud compute ssh "$INSTANCE" --zone="$ZONE" --command="$1"; } + +echo "==> Waiting for SSH..." +for _ in $(seq 1 30); do run_remote "true" 2>/dev/null && break; sleep 10; done + +echo "==> Provisioning toolchain on the VM" +run_remote 'set -e + curl --proto "=https" --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y + source "$HOME/.cargo/env" + # The pytorch-latest-gpu image ships libtorch; point tch at it. + TORCH_DIR="$(python -c "import torch,os;print(os.path.dirname(torch.__file__))")" + echo "export LIBTORCH=$TORCH_DIR" >> "$HOME/.bashrc" + echo "export LD_LIBRARY_PATH=$TORCH_DIR/lib:\$LD_LIBRARY_PATH" >> "$HOME/.bashrc" + sudo apt-get update -qq && sudo apt-get install -y -qq git build-essential pkg-config' + +echo "==> Uploading repo" +# rsync the repo (excluding build artifacts) — same approach as gcloud-train.sh. +gcloud compute scp --recurse --zone="$ZONE" \ + ../v2 ../scripts ../docs "$INSTANCE":~/ruview/ >/dev/null + +# TODO (ADR-027 §2.0, iter 3 ingest): when --data-dir is given, upload the +# heterogeneous CSI corpus and point pretrain-mae at it instead of the synthetic +# dataset (needs a `--data-dir`/`--datasets` flag on the bin first — see the plan). +if [[ -n "$DATA_DIR" ]]; then + echo "==> Uploading CSI corpus from $DATA_DIR" + gcloud compute scp --recurse --zone="$ZONE" "$DATA_DIR" "$INSTANCE":~/ruview/csi-corpus/ >/dev/null + PRETRAIN_ARGS="$PRETRAIN_ARGS # TODO: --data-dir ~/ruview/csi-corpus" +fi + +echo "==> Building + running pre-train on the VM" +run_remote "set -e; source \$HOME/.cargo/env; source \$HOME/.bashrc + cd ~/ruview/v2 + cargo build --release -p wifi-densepose-train --features tch-backend,cuda + cargo run --release -p wifi-densepose-train --features tch-backend,cuda --bin pretrain-mae -- $PRETRAIN_ARGS" + +echo "==> Downloading pre-trained variable store → $OUT" +mkdir -p "$(dirname "$OUT")" +gcloud compute scp --zone="$ZONE" "$INSTANCE":~/ruview/v2/mae-pretrained.ot "$OUT" + +echo "==> Done. Pre-trained encoder: $OUT" +echo " Next: fine-tune the ADR-027 §2.x heads on top of it (see §2.0 'Iteration 3 plan')." diff --git a/v2/crates/wifi-densepose-train/Cargo.toml b/v2/crates/wifi-densepose-train/Cargo.toml index ac0fa37d86..323f33fa1a 100644 --- a/v2/crates/wifi-densepose-train/Cargo.toml +++ b/v2/crates/wifi-densepose-train/Cargo.toml @@ -20,6 +20,11 @@ name = "verify-training" path = "src/bin/verify_training.rs" required-features = ["tch-backend"] +[[bin]] +name = "pretrain-mae" +path = "src/bin/pretrain_mae.rs" +required-features = ["tch-backend"] + [features] default = [] tch-backend = ["tch"] diff --git a/v2/crates/wifi-densepose-train/README.md b/v2/crates/wifi-densepose-train/README.md index 4610f7b072..d8f620c0d8 100644 --- a/v2/crates/wifi-densepose-train/README.md +++ b/v2/crates/wifi-densepose-train/README.md @@ -82,6 +82,24 @@ wifi-densepose-train/src/ trainer.rs -- (tch) Training loop orchestrator [feature-gated] ``` +## MERIDIAN-MAE — masked-autoencoder pre-training (ADR-027 §2.0) + +The `csi_mae` module implements a CIG-MAE-style **dual-stream (amplitude + phase)** masked +autoencoder for cross-domain CSI pre-training. The thesis (2026-Q2 SOTA survey, arXiv:2511.18792): +cross-room generalisation is a *data-breadth* problem — pre-train one CSI encoder on heterogeneous +capture, attach a small task head — not a bigger-pose-net problem. + +* Pure-Rust (always built): `MaeConfig`, `MaskStrategy` (`Random` / `InfoGuided` — the latter + variance-weights token selection so high-information tokens are masked), `TokenLayout`, + `mask_csi_window`, `reassemble_tokens`. Dependency-free deterministic masking. +* `csi_mae::model` (feature `tch-backend`): `CsiMae` (encoder over visible tokens → latent → + decoder reconstructs masked amplitude+phase), `reconstruction_loss`, `MaeBatch`, `pretrain_step`. +* Driver: `cargo run -p wifi-densepose-train --features tch-backend --bin pretrain-mae -- --epochs 5` + (synthetic data). GPU run: `bash scripts/pretrain-mae-gcloud.sh` (prototype wiring stub). + +See `docs/adr/ADR-027-cross-environment-domain-generalization.md` §2.0 for the full plan +(heterogeneous-CSI ingest, GPU pre-train, fine-tune handoff, cross-domain eval). + ## Related Crates | Crate | Role | diff --git a/v2/crates/wifi-densepose-train/src/bin/pretrain_mae.rs b/v2/crates/wifi-densepose-train/src/bin/pretrain_mae.rs new file mode 100644 index 0000000000..8e3b370c20 --- /dev/null +++ b/v2/crates/wifi-densepose-train/src/bin/pretrain_mae.rs @@ -0,0 +1,108 @@ +//! `pretrain-mae` — drive the MERIDIAN CSI masked-autoencoder pre-train on a +//! deterministic `SyntheticCsiDataset` (ADR-027 §2.0, prototype iteration 2). +//! +//! This is the *prototype* driver — it exercises the full pre-train loop +//! (mask → encode visible → reconstruct masked amplitude+phase → optimiser +//! step) end-to-end on synthetic CSI. Real cross-domain pre-training (iter 3+) +//! ingests heterogeneous capture — MM-Fi / Wi-Pose / `data/recordings/` / +//! multi-band virtual sub-carriers — and runs on GPU (`scripts/gcloud-train.sh` +//! / the cognitum project). +//! +//! ```text +//! cargo run -p wifi-densepose-train --features tch-backend --bin pretrain-mae -- --epochs 5 +//! ``` +//! +//! Only compiled with `--features tch-backend` (see Cargo.toml `required-features`). + +use clap::Parser; +use tch::nn::OptimizerConfig; +use tch::{nn, Device}; + +use wifi_densepose_train::csi_mae::model::{pretrain_step, CsiMae, MaeBatch}; +use wifi_densepose_train::csi_mae::{MaeConfig, MaskStrategy, TokenLayout}; +use wifi_densepose_train::dataset::{CsiDataset, SyntheticConfig, SyntheticCsiDataset}; + +/// MERIDIAN CSI masked-autoencoder pre-train (prototype, synthetic data). +#[derive(Parser, Debug)] +#[command(name = "pretrain-mae", version, about)] +struct Cli { + /// Number of epochs over the synthetic dataset. + #[arg(long, default_value_t = 5)] + epochs: usize, + /// Mini-batch size (windows per optimiser step). + #[arg(long, default_value_t = 8)] + batch: usize, + /// Number of synthetic samples to generate. + #[arg(long, default_value_t = 256)] + samples: usize, + /// Adam learning rate. + #[arg(long, default_value_t = 1e-3)] + lr: f64, + /// Fraction of tokens masked per window. + #[arg(long, default_value_t = 0.75)] + mask_ratio: f64, + /// Optional path to save the pre-trained variable store (`.ot`). + #[arg(long)] + save: Option, +} + +fn main() -> anyhow::Result<()> { + let cli = Cli::parse(); + let _ = tracing_subscriber::fmt::try_init(); + + let ds = SyntheticCsiDataset::new(cli.samples, SyntheticConfig::default()); + if ds.len() < cli.batch { + anyhow::bail!("need at least --batch ({}) samples, have {}", cli.batch, ds.len()); + } + let s0 = ds.get(0)?; + let layout = TokenLayout::from_window(s0.amplitude.view()); + let n_tokens = layout.n_tokens as i64; + + let mut cfg = MaeConfig::default(); + cfg.token_dim = layout.token_dim; + cfg.mask_ratio = cli.mask_ratio; + cfg.validate().map_err(anyhow::Error::msg)?; + + let device = Device::cuda_if_available(); + let vs = nn::VarStore::new(device); + let model = CsiMae::new(&vs.root(), &cfg, n_tokens); + let mut opt = nn::Adam::default().build(&vs, cli.lr)?; + + println!( + "pretrain-mae: device={device:?} n_tokens={n_tokens} token_dim={} V={} M={} samples={} batch={} epochs={} lr={} mask_ratio={}", + cfg.token_dim, model.n_visible, model.n_masked, cli.samples, cli.batch, cli.epochs, cli.lr, cli.mask_ratio + ); + + let mut step: u64 = 0; + for epoch in 0..cli.epochs { + let mut epoch_loss = 0.0_f64; + let mut nb = 0_usize; + let mut i = 0_usize; + while i + cli.batch <= ds.len() { + let mut windows = Vec::with_capacity(cli.batch); + for j in i..i + cli.batch { + let s = ds.get(j)?; + windows.push((s.amplitude, s.phase)); + } + let seed = step.wrapping_mul(0x9E37_79B9_7F4A_7C15) ^ 0xC511_0027; + let batch = MaeBatch::from_windows(&windows, &cfg, seed, MaskStrategy::InfoGuided, device) + .map_err(anyhow::Error::msg)?; + let loss = pretrain_step(&model, &mut opt, &batch); + if !loss.is_finite() { + anyhow::bail!("non-finite loss at epoch {epoch} step {step}"); + } + epoch_loss += loss; + nb += 1; + step += 1; + i += cli.batch; + } + let avg = if nb > 0 { epoch_loss / nb as f64 } else { f64::NAN }; + println!("epoch {epoch}: avg reconstruction loss = {avg:.6} ({nb} batches)"); + } + + if let Some(path) = cli.save { + vs.save(&path)?; + println!("saved pre-trained variable store → {path}"); + } + Ok(()) +} diff --git a/v2/crates/wifi-densepose-train/src/csi_mae.rs b/v2/crates/wifi-densepose-train/src/csi_mae.rs new file mode 100644 index 0000000000..554fb0a605 --- /dev/null +++ b/v2/crates/wifi-densepose-train/src/csi_mae.rs @@ -0,0 +1,1047 @@ +//! Masked-autoencoder pre-training for cross-domain CSI — **MERIDIAN-MAE** (ADR-027 §2.0). +//! +//! Implements a [CIG-MAE]-style **dual-stream** (amplitude + phase) masked +//! autoencoder over CSI "channel-snapshot" tokens. The pre-train objective is: +//! hide a large fraction of the tokens, encode only the visible ones, and +//! reconstruct the hidden amplitude *and* phase. The thesis (from the 2026-Q2 +//! SOTA survey, `docs/research/sota/2026-Q2-agentic-ai-and-edge-for-ruview.md`): +//! cross-room generalisation is a **data-breadth** problem — pre-train one CSI +//! encoder on heterogeneous capture, then attach a small task head — not a +//! bigger-pose-net problem. +//! +//! # Token convention +//! +//! A CSI window `amplitude: [T, tx, rx, sub]` is flattened to a sequence of +//! `N = T·tx·rx` tokens, each a `sub`-dimensional vector (one *channel +//! snapshot*). This matches the `[B, T·tx·rx, sub]` layout the supervised model +//! already consumes (see `model.rs::ModalityTranslator`). Amplitude and phase +//! share the same `[N, sub]` token grid, so a single mask applies to both +//! streams — exactly the dual-stream setup CIG-MAE uses. +//! +//! # What's in this module +//! +//! * **Pure Rust** (always compiled, covered by `cargo test --no-default-features`): +//! [`MaeConfig`] (+ `validate`), [`MaskStrategy`], [`TokenLayout`], the +//! deterministic masking ([`mask_csi_window`]) and re-assembly +//! ([`reassemble_tokens`]). A tiny inline PRNG keeps masking reproducible with +//! no extra dependency. +//! * **`#[cfg(feature = "tch-backend")]`** — the `model` submodule: the +//! encoder/decoder networks, the reconstruction loss, and the pre-train step. +//! That code is *not* exercised by the default workspace test job; treat +//! compile-checking it as requiring a LibTorch toolchain. +//! +//! # Status +//! +//! Prototype. **iter 1**: masking pipeline + config + tests + ADR §2.0. +//! **iter 2a**: information-guided masking ([`MaskStrategy::InfoGuided`]). +//! **iter 2b**: the [`model`] submodule — `CsiMae` (MLP-based v0 dual-stream +//! encoder/decoder, batch-shared masking), `reconstruction_loss`, `MaeBatch`, +//! `pretrain_step`, plus the `pretrain-mae` binary (`bin/pretrain_mae.rs`, +//! `--features tch-backend`). **iter 3+** (see ADR-027 §2.0 "Iteration 3 plan" +//! and `scripts/pretrain-mae-gcloud.sh`): heterogeneous-CSI ingest, the real +//! GPU pre-train run, per-sample masking + self-attention transformer blocks +//! (lifting the v0 limits), and the fine-tune handoff into the §2.x heads. +//! +//! [CIG-MAE]: https://arxiv.org/html/2512.04723v1 + +use ndarray::{Array2, ArrayView4}; +use serde::{Deserialize, Serialize}; + +use crate::error::ConfigError; + +// --------------------------------------------------------------------------- +// PRNG — tiny, dependency-free, deterministic. (SplitMix64.) +// --------------------------------------------------------------------------- + +/// Minimal deterministic PRNG (SplitMix64) used only for reproducible masking. +/// +/// Not cryptographic; the point is that the same `seed` always yields the same +/// token permutation so masked-autoencoder runs are byte-reproducible. +#[derive(Debug, Clone)] +struct SplitMix64(u64); + +impl SplitMix64 { + fn new(seed: u64) -> Self { + // Avoid the degenerate all-zero state. + Self(seed ^ 0x9E37_79B9_7F4A_7C15) + } + fn next_u64(&mut self) -> u64 { + self.0 = self.0.wrapping_add(0x9E37_79B9_7F4A_7C15); + let mut z = self.0; + z = (z ^ (z >> 30)).wrapping_mul(0xBF58_476D_1CE4_E5B9); + z = (z ^ (z >> 27)).wrapping_mul(0x94D0_49BB_1331_11EB); + z ^ (z >> 31) + } + /// Uniform `usize` in `[0, n)` (Lemire-ish; bias is negligible for our `n`). + fn below(&mut self, n: usize) -> usize { + deb_assert_nonzero(n); + (self.next_u64() % (n as u64)) as usize + } +} + +#[inline] +fn deb_assert_nonzero(n: usize) { + debug_assert!(n > 0, "SplitMix64::below requires n > 0"); +} + +/// In-place Fisher–Yates shuffle of `xs` using `rng`. +fn shuffle(xs: &mut [T], rng: &mut SplitMix64) { + let n = xs.len(); + if n < 2 { + return; + } + for i in (1..n).rev() { + let j = rng.below(i + 1); + xs.swap(i, j); + } +} + +/// Per-token "information" score used by [`MaskStrategy::InfoGuided`]: the +/// (population) variance of the token's amplitude values plus the variance of +/// its phase values. Near-constant tokens (e.g. a quiet sub-carrier slice) score +/// near zero, so they're less likely to be masked; structured tokens score +/// higher. `amp`/`phase` are the flattened `[N, sub]` grids; `i` is the token row. +fn token_information(amp: &Array2, phase: &Array2, i: usize) -> f64 { + let var = |row: ndarray::ArrayView1| -> f64 { + let m = row.len(); + if m == 0 { + return 0.0; + } + let mean = row.iter().map(|&x| x as f64).sum::() / m as f64; + row.iter().map(|&x| { let d = x as f64 - mean; d * d }).sum::() / m as f64 + }; + var(amp.row(i)) + var(phase.row(i)) +} + +// --------------------------------------------------------------------------- +// Masking strategy +// --------------------------------------------------------------------------- + +/// How tokens are chosen for masking in the MAE pre-text task. +#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)] +#[serde(rename_all = "snake_case")] +pub enum MaskStrategy { + /// Uniform-random token masking (the MAE default — cheap, strong baseline). + Random, + /// Information-guided masking (CIG-MAE): preferentially mask high-energy / + /// high-variance tokens so the model can't trivially in-paint flat regions. + /// + /// Not yet implemented — selecting it currently falls back to [`MaskStrategy::Random`] + /// (with a `tracing::warn!`). Lands in iteration 2. + InfoGuided, +} + +impl Default for MaskStrategy { + fn default() -> Self { + MaskStrategy::Random + } +} + +// --------------------------------------------------------------------------- +// MaeConfig +// --------------------------------------------------------------------------- + +/// Hyper-parameters for the CSI masked autoencoder. +/// +/// Defaults track the MAE / CIG-MAE recipes (high mask ratio, narrow decoder). +/// Dimensions are deliberately small — this is a prototype encoder, and the +/// survey's finding is that *data breadth*, not model size, is the bottleneck. +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct MaeConfig { + /// Fraction of tokens hidden from the encoder, in `(0, 1)`. MAE uses ~0.75. + pub mask_ratio: f64, + /// Masking strategy. + pub mask_strategy: MaskStrategy, + /// Token (sub-carrier) dimension. Must match the dataset after interpolation + /// (the system target is 56 — see `TrainingConfig::num_subcarriers`). + pub token_dim: usize, + /// Encoder embedding dimension. + pub encoder_dim: usize, + /// Number of encoder transformer blocks (v0 skeleton ignores depth > 0 and + /// uses an MLP; honoured from iteration 2). + pub encoder_depth: usize, + /// Number of encoder attention heads. + pub encoder_heads: usize, + /// Decoder embedding dimension (MAE uses a *narrower* decoder than the encoder). + pub decoder_dim: usize, + /// Number of decoder transformer blocks. + pub decoder_depth: usize, + /// Number of decoder attention heads. + pub decoder_heads: usize, + /// Weight of the phase-reconstruction loss relative to amplitude (CIG-MAE ≈ 1.0). + pub phase_loss_weight: f64, + /// Default RNG seed for masking when a per-call seed isn't supplied. + pub seed: u64, +} + +impl Default for MaeConfig { + fn default() -> Self { + Self { + mask_ratio: 0.75, + mask_strategy: MaskStrategy::Random, + token_dim: 56, + encoder_dim: 128, + encoder_depth: 4, + encoder_heads: 4, + decoder_dim: 64, + decoder_depth: 2, + decoder_heads: 4, + phase_loss_weight: 1.0, + seed: 0xC511_0027, + } + } +} + +impl MaeConfig { + /// Validate the configuration. Mirrors the `TrainingConfig::validate` style. + pub fn validate(&self) -> Result<(), ConfigError> { + let bad = |field: &'static str, reason: String| ConfigError::invalid_value(field, reason); + + if !(self.mask_ratio > 0.0 && self.mask_ratio < 1.0) { + return Err(bad( + "mask_ratio", + format!("must be in (0, 1), got {}", self.mask_ratio), + )); + } + if self.token_dim == 0 { + return Err(bad("token_dim", "must be >= 1".into())); + } + for (field, v) in [ + ("encoder_dim", self.encoder_dim), + ("decoder_dim", self.decoder_dim), + ("encoder_heads", self.encoder_heads), + ("decoder_heads", self.decoder_heads), + ] { + if v == 0 { + return Err(bad(field, "must be >= 1".into())); + } + } + if self.encoder_dim % self.encoder_heads != 0 { + return Err(bad( + "encoder_dim", + format!( + "must be divisible by encoder_heads ({} % {} != 0)", + self.encoder_dim, self.encoder_heads + ), + )); + } + if self.decoder_dim % self.decoder_heads != 0 { + return Err(bad( + "decoder_dim", + format!( + "must be divisible by decoder_heads ({} % {} != 0)", + self.decoder_dim, self.decoder_heads + ), + )); + } + if !(self.phase_loss_weight >= 0.0 && self.phase_loss_weight.is_finite()) { + return Err(bad( + "phase_loss_weight", + format!("must be a finite, non-negative number, got {}", self.phase_loss_weight), + )); + } + Ok(()) + } +} + +// --------------------------------------------------------------------------- +// Token layout +// --------------------------------------------------------------------------- + +/// Token-grid layout derived from a CSI window of shape `[T, tx, rx, sub]`. +#[derive(Debug, Clone, Copy, PartialEq, Eq)] +pub struct TokenLayout { + /// Number of tokens, `T · tx · rx`. + pub n_tokens: usize, + /// Per-token dimension, `sub`. + pub token_dim: usize, + /// Window frame count `T`. + pub frames: usize, + /// Transmit-antenna count `tx`. + pub tx: usize, + /// Receive-antenna count `rx`. + pub rx: usize, +} + +impl TokenLayout { + /// Derive the layout from a `[T, tx, rx, sub]` view. + pub fn from_window(window: ArrayView4) -> Self { + let s = window.shape(); + Self { + n_tokens: s[0] * s[1] * s[2], + token_dim: s[3], + frames: s[0], + tx: s[1], + rx: s[2], + } + } + + /// Flatten a `[T, tx, rx, sub]` window into a `[N, sub]` token matrix + /// (row `f·tx·rx + t·rx + r` = the snapshot for frame `f`, tx `t`, rx `r`). + pub fn flatten(window: ArrayView4) -> Array2 { + let layout = Self::from_window(window); + window + .to_owned() + .into_shape((layout.n_tokens, layout.token_dim)) + .expect("[T,tx,rx,sub] -> [T*tx*rx, sub] reshape is always valid") + } +} + +// --------------------------------------------------------------------------- +// Masking +// --------------------------------------------------------------------------- + +/// The result of masking one CSI sample for the MAE pre-text task. +/// +/// `visible_idx` and `masked_idx` are sorted ascending, are disjoint, and +/// together cover `0..n_tokens`. The encoder sees `visible_*`; the decoder is +/// trained to reconstruct `target_*` at the `masked_idx` positions. +#[derive(Debug, Clone)] +pub struct MaskedCsi { + /// Token indices visible to the encoder. Length `round((1 − r)·N)`, ≥ 1. + pub visible_idx: Vec, + /// Token indices hidden from the encoder (reconstruction targets). Length `N − |visible|`, ≥ 1. + pub masked_idx: Vec, + /// Per-token boolean mask over `0..N`; `true` ⇒ masked (target). + pub mask: Vec, + /// Visible amplitude tokens, shape `[|visible|, token_dim]`. + pub visible_amp: Array2, + /// Visible phase tokens, shape `[|visible|, token_dim]`. + pub visible_phase: Array2, + /// Target (masked) amplitude tokens, shape `[|masked|, token_dim]`. + pub target_amp: Array2, + /// Target (masked) phase tokens, shape `[|masked|, token_dim]`. + pub target_phase: Array2, + /// Layout of the source window. + pub layout: TokenLayout, +} + +/// Deterministically split a CSI window's tokens into visible / masked sets and +/// return the masked-out amplitude+phase as reconstruction targets. +/// +/// * `amplitude`, `phase` — `[T, tx, rx, sub]`, identical shapes. +/// * `mask_ratio` — fraction hidden; clamped so at least one token is visible +/// and at least one is masked. +/// * `strategy` — [`MaskStrategy::Random`] (uniform) or [`MaskStrategy::InfoGuided`] +/// (CIG-MAE-style: preferentially mask high-information tokens, where a token's +/// "information" is the variance of its amplitude + phase values — flat tokens +/// are trivially in-painted, so masking them teaches less). Both are +/// deterministic in `seed`. +/// * `seed` — makes the choice reproducible. A good per-sample seed is +/// `base_seed ^ (sample_index as u64).wrapping_mul(0x9E3779B97F4A7C15)`. +/// +/// # Errors +/// +/// Returns [`ConfigError::InvalidValue`] when the shapes mismatch, the window +/// has no tokens, or `mask_ratio` is not in `(0, 1)`. +pub fn mask_csi_window( + amplitude: ArrayView4, + phase: ArrayView4, + mask_ratio: f64, + strategy: MaskStrategy, + seed: u64, +) -> Result { + if amplitude.shape() != phase.shape() { + return Err(ConfigError::InvalidValue { + field: "phase".into(), + reason: format!( + "amplitude/phase shape mismatch: {:?} vs {:?}", + amplitude.shape(), + phase.shape() + ), + }); + } + if !(mask_ratio > 0.0 && mask_ratio < 1.0) { + return Err(ConfigError::InvalidValue { + field: "mask_ratio".into(), + reason: format!("must be in (0, 1), got {mask_ratio}"), + }); + } + + let layout = TokenLayout::from_window(amplitude); + let n = layout.n_tokens; + if n == 0 { + return Err(ConfigError::InvalidValue { + field: "amplitude".into(), + reason: "CSI window has zero tokens (empty T/tx/rx)".into(), + }); + } + + // Number of masked tokens, clamped so both partitions are non-empty. + let mut n_mask = (mask_ratio * n as f64).round() as usize; + if n_mask == 0 { + n_mask = 1; + } + if n_mask >= n { + n_mask = n - 1; + } + + let amp_flat = TokenLayout::flatten(amplitude); + let phase_flat = TokenLayout::flatten(phase); + + // Pick the n_mask masked token indices according to the strategy. + let mut rng = SplitMix64::new(seed); + let masked_set: Vec = match strategy { + MaskStrategy::Random => { + // Uniform: shuffle [0, n) and take the first n_mask. + let mut perm: Vec = (0..n).collect(); + shuffle(&mut perm, &mut rng); + perm[..n_mask].to_vec() + } + MaskStrategy::InfoGuided => { + // Weighted-without-replacement by per-token information (variance of + // amplitude+phase). Efraimidis–Spirakis: key_i = u_i^(1/w_i), + // pick the n_mask largest keys. Deterministic given `seed`. + let mut keyed: Vec<(f64, usize)> = (0..n) + .map(|i| { + let w = token_information(&_flat, &phase_flat, i) + 1e-6; + // u in (0, 1]: avoid 0 so ln() is finite. key = u^(1/w); + // rank by ln(key) = ln(u)/w (monotone, avoids tiny powers). + let u = ((rng.next_u64() >> 11) as f64 + 1.0) / (((1u64 << 53) as f64) + 1.0); + let key = u.ln() / w; // larger (closer to 0) ⇒ more likely chosen + (key, i) + }) + .collect(); + // Largest key = least-negative ln(u)/w ⇒ sort descending by key. + keyed.sort_by(|a, b| b.0.partial_cmp(&a.0).unwrap_or(std::cmp::Ordering::Equal)); + keyed[..n_mask].iter().map(|&(_, i)| i).collect() + } + }; + + let mut masked_idx = masked_set; + masked_idx.sort_unstable(); + let masked_lookup: std::collections::HashSet = masked_idx.iter().copied().collect(); + let mut visible_idx: Vec = (0..n).filter(|i| !masked_lookup.contains(i)).collect(); + visible_idx.sort_unstable(); + + let mut mask = vec![false; n]; + for &i in &masked_idx { + mask[i] = true; + } + + let gather = |src: &Array2, idx: &[usize]| -> Array2 { + let mut out = Array2::::zeros((idx.len(), layout.token_dim)); + for (row, &i) in idx.iter().enumerate() { + out.row_mut(row).assign(&src.row(i)); + } + out + }; + + Ok(MaskedCsi { + visible_amp: gather(&_flat, &visible_idx), + visible_phase: gather(&phase_flat, &visible_idx), + target_amp: gather(&_flat, &masked_idx), + target_phase: gather(&phase_flat, &masked_idx), + visible_idx, + masked_idx, + mask, + layout, + }) +} + +/// Re-assemble a full `[N, token_dim]` token grid from encoder-visible tokens +/// plus decoder-predicted masked tokens. Useful for evaluating / visualising +/// reconstructions (it is *not* needed for training the loss). +/// +/// # Errors +/// +/// Returns [`ConfigError::InvalidValue`] if the index sets don't partition +/// `0..N` or the row counts don't match the index lengths / `token_dim`. +pub fn reassemble_tokens( + layout: TokenLayout, + visible_idx: &[usize], + visible: &Array2, + masked_idx: &[usize], + predicted: &Array2, +) -> Result, ConfigError> { + let n = layout.n_tokens; + let inv = |field: &'static str, reason: String| ConfigError::invalid_value(field, reason); + if visible_idx.len() + masked_idx.len() != n { + return Err(inv( + "indices", + format!( + "visible ({}) + masked ({}) != n_tokens ({n})", + visible_idx.len(), + masked_idx.len() + ), + )); + } + if visible.nrows() != visible_idx.len() || predicted.nrows() != masked_idx.len() { + return Err(inv("rows", "row count does not match index length".into())); + } + if visible.ncols() != layout.token_dim || predicted.ncols() != layout.token_dim { + return Err(inv("token_dim", "column count does not match layout.token_dim".into())); + } + + let mut out = Array2::::zeros((n, layout.token_dim)); + let mut seen = vec![false; n]; + for (row, &i) in visible_idx.iter().enumerate() { + if i >= n || seen[i] { + return Err(inv("visible_idx", format!("out of range or duplicate index {i}"))); + } + seen[i] = true; + out.row_mut(i).assign(&visible.row(row)); + } + for (row, &i) in masked_idx.iter().enumerate() { + if i >= n || seen[i] { + return Err(inv("masked_idx", format!("out of range or duplicate index {i}"))); + } + seen[i] = true; + out.row_mut(i).assign(&predicted.row(row)); + } + Ok(out) +} + +// --------------------------------------------------------------------------- +// tch-gated: the MAE networks + pre-train step +// --------------------------------------------------------------------------- + +/// CSI masked-autoencoder networks (LibTorch / `tch`). +/// +/// **Compiled only with `--features tch-backend`.** Not exercised by the default +/// `cargo test --workspace --no-default-features` CI job — compile-/run-checking +/// this submodule requires a LibTorch toolchain (`LIBTORCH` was unset on the dev +/// box that wrote it, so it is CI-verified only; if a `tch` API call below has +/// drifted, it's a localised fix). +/// +/// # v0 design (iteration 2) +/// +/// A deliberately small **dual-stream** MAE, MLP-based (no self-attention yet — +/// transformer blocks are iteration 3): +/// +/// ```text +/// visible amplitude [B, V, sub] ─► amp_embed ─┐ +/// ├─ cat ─► tok_fuse ─► relu ─► enc_blocks(residual MLP) ─► [B, V, enc] +/// visible phase [B, V, sub] ─► ph_embed ─┘ │ +/// reshape [B, V·enc] │ +/// to_latent│ +/// ▼ +/// latent [B, enc] +/// from_latent│ +/// ▼ +/// learned per-position query pos_query [N, dec] + ─► relu ─► dec_blocks(residual MLP) ─► [B, N, dec] +/// (broadcast latent over N positions) │ +/// ┌──────────────────────┤ +/// dec_amp_head dec_ph_head +/// [B, N, sub] [B, N, sub] +/// index_select(masked positions) ─► (pred_amp, pred_ph) [B, M, sub] +/// ``` +/// +/// Limitations to lift later: (1) a *fixed* `n_tokens` (the bottleneck flattens +/// all visible token embeddings, so V — hence N and `mask_ratio` — is baked in +/// at `new()` time); (2) **batch-shared masking** (`MaeBatch::from_samples` masks +/// every sample in a batch with the same seed, so `masked_pos` is shared) — +/// per-sample masking via gather/scatter is iteration 3; (3) MSE on unwrapped +/// phase rather than a circular loss. +#[cfg(feature = "tch-backend")] +pub mod model { + use super::{mask_csi_window, MaeConfig, MaskStrategy}; + use ndarray::{Array2, Array4, Axis}; + use tch::{nn, nn::Module, Device, Kind, Reduction, Tensor}; + + /// A residual MLP block: `LayerNorm(x + relu(Linear(x)))`. + #[derive(Debug)] + struct ResidualMlp { + lin: nn::Linear, + ln: nn::LayerNorm, + } + impl ResidualMlp { + fn new(p: &nn::Path, dim: i64) -> Self { + Self { + lin: nn::linear(p / "lin", dim, dim, Default::default()), + ln: nn::layer_norm(p / "ln", vec![dim], Default::default()), + } + } + fn forward(&self, x: &Tensor) -> Tensor { + self.ln.forward(&(x + self.lin.forward(x).relu())) + } + } + + /// The CSI masked autoencoder. See the module docs for the v0 design. + #[derive(Debug)] + pub struct CsiMae { + /// Hyper-parameters this model was built with. + pub cfg: MaeConfig, + /// Number of tokens per window (`T·tx·rx`) — fixed at construction. + pub n_tokens: i64, + /// Number of masked (target) tokens per window. + pub n_masked: i64, + /// Number of visible (encoder-input) tokens per window. + pub n_visible: i64, + device: Device, + amp_embed: nn::Linear, + ph_embed: nn::Linear, + tok_fuse: nn::Linear, + enc_blocks: Vec, + to_latent: nn::Linear, + from_latent: nn::Linear, + /// Learned per-position query, shape `[n_tokens, decoder_dim]`. + pos_query: Tensor, + dec_blocks: Vec, + dec_amp_head: nn::Linear, + dec_ph_head: nn::Linear, + } + + impl CsiMae { + /// Build a `CsiMae` under `vs` for windows of exactly `n_tokens` tokens. + /// + /// `n_tokens` is fixed because the bottleneck flattens all visible token + /// embeddings; it must equal `T·tx·rx` of the windows fed at train/eval + /// time (e.g. `TokenLayout::from_window(sample.amplitude.view()).n_tokens`). + pub fn new(vs: &nn::Path, cfg: &MaeConfig, n_tokens: i64) -> Self { + assert!(n_tokens >= 2, "n_tokens must be >= 2"); + let td = cfg.token_dim as i64; + let enc = cfg.encoder_dim as i64; + let dec = cfg.decoder_dim as i64; + // Mirror mask_csi_window's clamping so the shapes line up exactly. + let mut n_mask = (cfg.mask_ratio * n_tokens as f64).round() as i64; + if n_mask < 1 { + n_mask = 1; + } + if n_mask >= n_tokens { + n_mask = n_tokens - 1; + } + let n_vis = n_tokens - n_mask; + + let enc_blocks = (0..cfg.encoder_depth) + .map(|i| ResidualMlp::new(&(vs / "enc" / i), enc)) + .collect(); + let dec_blocks = (0..cfg.decoder_depth) + .map(|i| ResidualMlp::new(&(vs / "dec" / i), dec)) + .collect(); + let pos_query = vs.var( + "pos_query", + &[n_tokens, dec], + nn::Init::Randn { mean: 0.0, stdev: 0.02 }, + ); + + Self { + cfg: cfg.clone(), + n_tokens, + n_masked: n_mask, + n_visible: n_vis, + device: vs.device(), + amp_embed: nn::linear(vs / "amp_embed", td, enc, Default::default()), + ph_embed: nn::linear(vs / "ph_embed", td, enc, Default::default()), + tok_fuse: nn::linear(vs / "tok_fuse", 2 * enc, enc, Default::default()), + enc_blocks, + to_latent: nn::linear(vs / "to_latent", n_vis * enc, enc, Default::default()), + from_latent: nn::linear(vs / "from_latent", enc, dec, Default::default()), + pos_query, + dec_blocks, + dec_amp_head: nn::linear(vs / "dec_amp_head", dec, td, Default::default()), + dec_ph_head: nn::linear(vs / "dec_ph_head", dec, td, Default::default()), + } + } + + /// Reconstruct the masked amplitude & phase tokens. + /// + /// * `vis_amp`, `vis_phase` — `[B, n_visible, token_dim]`. + /// * `masked_pos` — the `n_masked` masked token indices (shared across + /// the batch in this v0; see the module docs). + /// * returns `(pred_amp, pred_phase)`, each `[B, n_masked, token_dim]`. + pub fn forward( + &self, + vis_amp: &Tensor, + vis_phase: &Tensor, + masked_pos: &[i64], + train: bool, + ) -> (Tensor, Tensor) { + let _ = train; // dropout/layernorm-train hooks would go here in iter 3 + let enc = self.cfg.encoder_dim as i64; + let b = vis_amp.size()[0]; + + // Per-token dual-stream embed → fuse. + let a = self.amp_embed.forward(vis_amp); // [B, V, enc] + let p = self.ph_embed.forward(vis_phase); // [B, V, enc] + let mut t = self.tok_fuse.forward(&Tensor::cat(&[&a, &p], -1)).relu(); // [B, V, enc] + for blk in &self.enc_blocks { + t = blk.forward(&t); + } + + // Bottleneck: flatten visible token embeddings → latent [B, enc]. + let flat = t.reshape([b, self.n_visible * enc]); + let latent = self.to_latent.forward(&flat).relu(); // [B, enc] + + // Decoder: learned per-position query + broadcast latent context. + let ctx = self.from_latent.forward(&latent).unsqueeze(1); // [B, 1, dec] + let mut d = (self.pos_query.unsqueeze(0) + ctx).relu(); // [B, N, dec] + for blk in &self.dec_blocks { + d = blk.forward(&d); + } + + let all_amp = self.dec_amp_head.forward(&d); // [B, N, td] + let all_ph = self.dec_ph_head.forward(&d); // [B, N, td] + let idx = Tensor::from_slice(masked_pos).to_device(self.device); // [M] i64 + (all_amp.index_select(1, &idx), all_ph.index_select(1, &idx)) + } + + /// Dual-stream reconstruction loss: `MSE(pred_amp, tgt_amp) + w·MSE(pred_phase, tgt_phase)`. + pub fn reconstruction_loss( + pred_amp: &Tensor, + pred_phase: &Tensor, + tgt_amp: &Tensor, + tgt_phase: &Tensor, + phase_w: f64, + ) -> Tensor { + let amp_l = pred_amp.mse_loss(tgt_amp, Reduction::Mean); + let ph_l = pred_phase.mse_loss(tgt_phase, Reduction::Mean); + amp_l + ph_l * phase_w + } + } + + /// One batch of masked CSI windows ready for [`pretrain_step`]. + /// + /// All windows in the batch are masked with the *same* seed (v0 + /// simplification), so `masked_pos` / `n_visible` / `n_masked` are shared. + #[derive(Debug)] + pub struct MaeBatch { + /// Visible amplitude tokens, `[B, n_visible, token_dim]`. + pub vis_amp: Tensor, + /// Visible phase tokens, `[B, n_visible, token_dim]`. + pub vis_phase: Tensor, + /// Target (masked) amplitude tokens, `[B, n_masked, token_dim]`. + pub tgt_amp: Tensor, + /// Target (masked) phase tokens, `[B, n_masked, token_dim]`. + pub tgt_phase: Tensor, + /// Masked token indices (length `n_masked`), shared across the batch. + pub masked_pos: Vec, + /// `T·tx·rx` of every window in the batch. + pub n_tokens: i64, + } + + impl MaeBatch { + /// Build a batch from `(amplitude, phase)` windows (each `[T,tx,rx,sub]`). + /// + /// The visible/masked token partition is computed once from the **first** + /// window (via [`mask_csi_window`] with `strategy`/`seed`) and reused for + /// every window in the batch, so `masked_pos` is shared — the + /// fixed-`n_tokens` model requires it. Every window must have the same + /// `[T,tx,rx,sub]` shape. Returns `Err` on a shape mismatch / empty batch. + pub fn from_windows( + windows: &[(Array4, Array4)], + cfg: &MaeConfig, + seed: u64, + strategy: MaskStrategy, + device: Device, + ) -> Result { + if windows.is_empty() { + return Err("MaeBatch::from_windows: empty batch".into()); + } + let td = cfg.token_dim; + + // Partition from window 0; reuse it for the rest of the batch. + let m0 = mask_csi_window(windows[0].0.view(), windows[0].1.view(), cfg.mask_ratio, strategy, seed) + .map_err(|e| format!("MaeBatch window 0: {e}"))?; + if m0.layout.token_dim != td { + return Err(format!("MaeBatch window 0: token_dim {} != cfg.token_dim {td}", m0.layout.token_dim)); + } + let n_tokens = m0.layout.n_tokens as i64; + let visible_idx = m0.visible_idx.clone(); + let masked_idx = m0.masked_idx.clone(); + let masked_pos: Vec = masked_idx.iter().map(|&x| x as i64).collect(); + + let gather = |grid: &Array2, idx: &[usize]| -> Array2 { + let mut out = Array2::::zeros((idx.len(), td)); + for (r, &i) in idx.iter().enumerate() { + out.row_mut(r).assign(&grid.row(i)); + } + out + }; + + let mut vis_amp_rows: Vec> = Vec::with_capacity(windows.len()); + let mut vis_ph_rows: Vec> = Vec::with_capacity(windows.len()); + let mut tgt_amp_rows: Vec> = Vec::with_capacity(windows.len()); + let mut tgt_ph_rows: Vec> = Vec::with_capacity(windows.len()); + + for (i, (amp, ph)) in windows.iter().enumerate() { + let layout = super::TokenLayout::from_window(amp.view()); + if layout.token_dim != td || layout.n_tokens as i64 != n_tokens { + return Err(format!( + "MaeBatch window {i}: shape {:?} incompatible with batch (n_tokens={n_tokens}, token_dim={td})", + amp.shape() + )); + } + if amp.shape() != ph.shape() { + return Err(format!("MaeBatch window {i}: amplitude/phase shape mismatch")); + } + let amp_flat = super::TokenLayout::flatten(amp.view()); + let ph_flat = super::TokenLayout::flatten(ph.view()); + vis_amp_rows.push(gather(&_flat, &visible_idx)); + vis_ph_rows.push(gather(&ph_flat, &visible_idx)); + tgt_amp_rows.push(gather(&_flat, &masked_idx)); + tgt_ph_rows.push(gather(&ph_flat, &masked_idx)); + } + + let stack3 = |rows: &[Array2]| -> Tensor { + let views: Vec<_> = rows.iter().map(|r| r.view()).collect(); + let a3 = ndarray::stack(Axis(0), &views).expect("uniform [k, td] rows stack"); + let (b, k, d) = a3.dim(); + let std = a3.as_standard_layout(); + Tensor::from_slice(std.as_slice().expect("contiguous")) + .reshape([b as i64, k as i64, d as i64]) + .to_device(device) + }; + + Ok(MaeBatch { + vis_amp: stack3(&vis_amp_rows), + vis_phase: stack3(&vis_ph_rows), + tgt_amp: stack3(&tgt_amp_rows), + tgt_phase: stack3(&tgt_ph_rows), + masked_pos, + n_tokens, + }) + } + } + + /// Run one optimiser step on `batch`. Returns the (scalar) reconstruction loss. + pub fn pretrain_step(model: &CsiMae, opt: &mut nn::Optimizer, batch: &MaeBatch) -> f64 { + let (pred_amp, pred_ph) = model.forward(&batch.vis_amp, &batch.vis_phase, &batch.masked_pos, true); + let loss = CsiMae::reconstruction_loss( + &pred_amp, + &pred_ph, + &batch.tgt_amp, + &batch.tgt_phase, + model.cfg.phase_loss_weight, + ); + opt.backward_step(&loss); + f64::try_from(&loss).unwrap_or(f64::NAN) + } + + #[cfg(test)] + mod tests { + use super::*; + use crate::csi_mae::{MaeConfig, MaskStrategy, TokenLayout}; + use tch::nn::OptimizerConfig; + + /// Deterministic synthetic CSI window `[T, tx, rx, sub]` with structure. + fn synth(seed: u64, frames: usize, tx: usize, rx: usize, sub: usize) -> (Array4, Array4) { + let mut s = seed.wrapping_mul(0x9E37_79B9_7F4A_7C15) ^ 0xDEAD_BEEF; + let mut next = || { + s = s.wrapping_add(0x9E37_79B9_7F4A_7C15); + let mut z = s; + z = (z ^ (z >> 30)).wrapping_mul(0xBF58_476D_1CE4_E5B9); + z = (z ^ (z >> 27)).wrapping_mul(0x94D0_49BB_1331_11EB); + ((z ^ (z >> 31)) as f64 / u64::MAX as f64) as f32 + }; + let amp = Array4::from_shape_fn((frames, tx, rx, sub), |(f, _, _, c)| { + 0.5 + 0.4 * ((f as f32 * 0.3 + c as f32 * 0.1).sin()) + 0.05 * next() + }); + let ph = Array4::from_shape_fn((frames, tx, rx, sub), |(f, _, _, c)| { + 0.3 * ((f as f32 * 0.2 - c as f32 * 0.05).cos()) + 0.05 * next() + }); + (amp, ph) + } + + #[test] + fn loss_decreases_when_overfitting_one_batch() { + tch::manual_seed(7); + let (frames, tx, rx, sub) = (6usize, 1usize, 1usize, 8usize); + let n_tokens = (frames * tx * rx) as i64; + let windows: Vec<_> = (0..3).map(|i| synth(i, frames, tx, rx, sub)).collect(); + + let mut cfg = MaeConfig::default(); + cfg.token_dim = sub; + cfg.encoder_dim = 32; + cfg.decoder_dim = 16; + cfg.encoder_depth = 1; + cfg.decoder_depth = 1; + cfg.mask_ratio = 0.5; + cfg.validate().unwrap(); + + // sanity: the model's derived n_visible matches mask_csi_window's. + let m0 = mask_csi_window(windows[0].0.view(), windows[0].1.view(), cfg.mask_ratio, MaskStrategy::Random, 1).unwrap(); + assert_eq!(TokenLayout::from_window(windows[0].0.view()).n_tokens as i64, n_tokens); + + let vs = nn::VarStore::new(Device::Cpu); + let model = CsiMae::new(&vs.root(), &cfg, n_tokens); + assert_eq!(model.n_visible, m0.visible_idx.len() as i64); + assert_eq!(model.n_masked, m0.masked_idx.len() as i64); + + let mut opt = nn::Adam::default().build(&vs, 1e-2).unwrap(); + let batch = MaeBatch::from_windows(&windows, &cfg, 1, MaskStrategy::Random, Device::Cpu).unwrap(); + + let l0 = pretrain_step(&model, &mut opt, &batch); + let mut last = l0; + for _ in 0..60 { + last = pretrain_step(&model, &mut opt, &batch); + } + assert!(l0.is_finite() && last.is_finite(), "loss must be finite (l0={l0}, last={last})"); + assert!(last < 0.5 * l0, "overfitting one batch should cut loss in half: l0={l0}, last={last}"); + } + } +} + +// --------------------------------------------------------------------------- +// Tests (pure-Rust portion) +// --------------------------------------------------------------------------- + +#[cfg(test)] +mod tests { + use super::*; + use ndarray::Array4; + + fn synth_window(frames: usize, tx: usize, rx: usize, sub: usize, seed: u64) -> (Array4, Array4) { + let mut rng = SplitMix64::new(seed); + let mk = |rng: &mut SplitMix64| { + Array4::::from_shape_fn((frames, tx, rx, sub), |_| (rng.next_u64() as f32) / (u64::MAX as f32)) + }; + let a = mk(&mut rng); + let p = mk(&mut rng); + (a, p) + } + + #[test] + fn mae_config_defaults_validate() { + MaeConfig::default().validate().expect("default MaeConfig must validate"); + } + + #[test] + fn mae_config_rejects_bad_values() { + let mut c = MaeConfig::default(); + c.mask_ratio = 1.0; + assert!(c.validate().is_err()); + let mut c = MaeConfig::default(); + c.encoder_dim = 130; // not divisible by encoder_heads (4) + assert!(c.validate().is_err()); + let mut c = MaeConfig::default(); + c.token_dim = 0; + assert!(c.validate().is_err()); + } + + #[test] + fn token_layout_matches_window() { + let (a, _p) = synth_window(8, 2, 3, 56, 1); + let l = TokenLayout::from_window(a.view()); + assert_eq!(l, TokenLayout { n_tokens: 8 * 2 * 3, token_dim: 56, frames: 8, tx: 2, rx: 3 }); + assert_eq!(TokenLayout::flatten(a.view()).dim(), (48, 56)); + } + + #[test] + fn masking_partitions_exhaustively_and_disjointly() { + let (a, p) = synth_window(10, 1, 1, 56, 7); + let m = mask_csi_window(a.view(), p.view(), 0.75, MaskStrategy::Random, 42).unwrap(); + let n = m.layout.n_tokens; + assert!(!m.visible_idx.is_empty() && !m.masked_idx.is_empty()); + assert_eq!(m.visible_idx.len() + m.masked_idx.len(), n); + // disjoint + exhaustive + let mut all: Vec = m.visible_idx.iter().chain(m.masked_idx.iter()).copied().collect(); + all.sort_unstable(); + assert_eq!(all, (0..n).collect::>()); + // mask vec agrees with masked_idx + assert_eq!(m.mask.iter().filter(|&&b| b).count(), m.masked_idx.len()); + for &i in &m.masked_idx { assert!(m.mask[i]); } + for &i in &m.visible_idx { assert!(!m.mask[i]); } + // target/visible row counts + dims + assert_eq!(m.target_amp.dim(), (m.masked_idx.len(), 56)); + assert_eq!(m.visible_phase.dim(), (m.visible_idx.len(), 56)); + // mask ratio ≈ 0.75 on n=10 → 8 masked, sorted ascending + assert_eq!(m.masked_idx.len(), 8); + assert!(m.masked_idx.windows(2).all(|w| w[0] < w[1])); + } + + #[test] + fn masking_is_deterministic_in_seed() { + let (a, p) = synth_window(6, 1, 1, 16, 3); + let m1 = mask_csi_window(a.view(), p.view(), 0.5, MaskStrategy::Random, 123).unwrap(); + let m2 = mask_csi_window(a.view(), p.view(), 0.5, MaskStrategy::Random, 123).unwrap(); + let m3 = mask_csi_window(a.view(), p.view(), 0.5, MaskStrategy::Random, 124).unwrap(); + assert_eq!(m1.masked_idx, m2.masked_idx); + assert_eq!(m1.visible_amp, m2.visible_amp); + assert_ne!(m1.masked_idx, m3.masked_idx); // different seed → different partition + } + + /// Build a window where the first half of the tokens are (near-)constant + /// (low information) and the second half are noisy (high information). + /// Returns `(amp, phase, n_tokens, n_low)`. + fn split_info_window() -> (ndarray::Array4, ndarray::Array4, usize, usize) { + // 20 frames, 1x1, 8 sub → 20 tokens; first 10 constant, last 10 noisy. + let frames = 20; + let sub = 8; + let mut rng = SplitMix64::new(999); + let amp = ndarray::Array4::::from_shape_fn((frames, 1, 1, sub), |(f, _, _, _)| { + if f < 10 { 1.0 } else { (rng.next_u64() as f32) / (u64::MAX as f32) } + }); + let phase = ndarray::Array4::::from_shape_fn((frames, 1, 1, sub), |(f, _, _, _)| { + if f < 10 { 0.0 } else { (rng.next_u64() as f32) / (u64::MAX as f32) } + }); + (amp, phase, frames, 10) + } + + #[test] + fn info_guided_masking_prefers_high_information_tokens() { + let (a, p, _n, n_low) = split_info_window(); + // Mask 50% (10 of 20). With info-guided selection the noisy tokens + // (indices 10..20) should dominate the masked set far beyond chance. + let mut high_count_total = 0usize; + let trials = 8; + for seed in 0..trials { + let m = mask_csi_window(a.view(), p.view(), 0.5, MaskStrategy::InfoGuided, seed).unwrap(); + assert_eq!(m.masked_idx.len(), 10); + let high = m.masked_idx.iter().filter(|&&i| i >= n_low).count(); + high_count_total += high; + } + // Random would average ~5/10 high per trial; info-guided should be ≥ ~8/10. + let avg_high = high_count_total as f64 / trials as f64; + assert!(avg_high >= 7.5, "info-guided avg high-info masked = {avg_high}, expected >= 7.5"); + } + + #[test] + fn info_guided_masking_is_deterministic_in_seed() { + let (a, p, _n, _) = split_info_window(); + let m1 = mask_csi_window(a.view(), p.view(), 0.4, MaskStrategy::InfoGuided, 5).unwrap(); + let m2 = mask_csi_window(a.view(), p.view(), 0.4, MaskStrategy::InfoGuided, 5).unwrap(); + let m3 = mask_csi_window(a.view(), p.view(), 0.4, MaskStrategy::InfoGuided, 6).unwrap(); + assert_eq!(m1.masked_idx, m2.masked_idx); + assert_eq!(m1.target_amp, m2.target_amp); + assert_ne!(m1.masked_idx, m3.masked_idx); + // still a valid exhaustive/disjoint partition + let n = m1.layout.n_tokens; + assert_eq!(m1.visible_idx.len() + m1.masked_idx.len(), n); + let mut all: Vec = m1.visible_idx.iter().chain(m1.masked_idx.iter()).copied().collect(); + all.sort_unstable(); + assert_eq!(all, (0..n).collect::>()); + } + + #[test] + fn token_information_is_zero_for_constant_and_positive_for_varied() { + let (a, p, _n, _) = split_info_window(); + let amp_flat = TokenLayout::flatten(a.view()); + let ph_flat = TokenLayout::flatten(p.view()); + assert!(token_information(&_flat, &ph_flat, 0) < 1e-9); // constant token + assert!(token_information(&_flat, &ph_flat, 15) > 1e-6); // noisy token + } + + #[test] + fn masking_clamps_extreme_ratios() { + let (a, p) = synth_window(4, 1, 1, 8, 9); + // huge ratio still leaves ≥1 visible + let m = mask_csi_window(a.view(), p.view(), 0.999, MaskStrategy::Random, 1).unwrap(); + assert_eq!(m.visible_idx.len(), 1); + // tiny ratio still masks ≥1 + let m = mask_csi_window(a.view(), p.view(), 0.0001, MaskStrategy::Random, 1).unwrap(); + assert_eq!(m.masked_idx.len(), 1); + // out-of-range ratio is an error + assert!(mask_csi_window(a.view(), p.view(), 0.0, MaskStrategy::Random, 1).is_err()); + assert!(mask_csi_window(a.view(), p.view(), 1.0, MaskStrategy::Random, 1).is_err()); + } + + #[test] + fn shape_mismatch_is_an_error() { + let (a, _) = synth_window(4, 1, 1, 8, 1); + let (_, p) = synth_window(4, 1, 1, 16, 1); + assert!(mask_csi_window(a.view(), p.view(), 0.5, MaskStrategy::Random, 1).is_err()); + } + + #[test] + fn reassemble_round_trips_the_masking() { + let (a, p) = synth_window(5, 1, 1, 16, 11); + let m = mask_csi_window(a.view(), p.view(), 0.6, MaskStrategy::Random, 77).unwrap(); + // "perfect decoder": predicted == true masked tokens + let recon = reassemble_tokens(m.layout, &m.visible_idx, &m.visible_amp, &m.masked_idx, &m.target_amp).unwrap(); + let orig = TokenLayout::flatten(a.view()); + assert_eq!(recon, orig); + // a bad partition is rejected + assert!(reassemble_tokens(m.layout, &m.visible_idx, &m.visible_amp, &[], &Array2::zeros((0, 16))).is_err()); + } +} diff --git a/v2/crates/wifi-densepose-train/src/lib.rs b/v2/crates/wifi-densepose-train/src/lib.rs index 8831c54978..bb36ca0a0e 100644 --- a/v2/crates/wifi-densepose-train/src/lib.rs +++ b/v2/crates/wifi-densepose-train/src/lib.rs @@ -44,6 +44,7 @@ #![warn(missing_docs)] pub mod config; +pub mod csi_mae; pub mod dataset; pub mod domain; pub mod error; @@ -79,6 +80,7 @@ pub use error::TrainResult as TrainResultAlias; pub use subcarrier::{compute_interp_weights, interpolate_subcarriers, select_subcarriers_by_variance}; // MERIDIAN (ADR-027) re-exports. +pub use csi_mae::{mask_csi_window, reassemble_tokens, MaeConfig, MaskStrategy, MaskedCsi, TokenLayout}; pub use domain::{ AdversarialSchedule, DomainClassifier, DomainFactorizer, GradientReversalLayer, };