feat(MERIDIAN): CSI masked-autoencoder pre-training — ADR-027 re-scope + iter-1 scaffold (#68) by ruvnet · Pull Request #529 · ruvnet/RuView

ruvnet · 2026-05-11T16:45:56Z

Re-scopes MERIDIAN (ADR-027 / #68) toward the 2026-Q2 SOTA conclusion — cross-room generalisation is a data-breadth problem — and lands the first implementation iteration. Draft: iterating via /loop until the prototype is complete (see Roadmap below).

What's here (iteration 1)

docs/research/sota/2026-Q2-agentic-ai-and-edge-for-ruview.md — the survey that motivates this: agentic-AI breakthroughs + RF/WiFi-sensing SOTA, mapped to RuView ADRs. (Also published as a gist.)
ADR-027 §2.0 re-scope — the primary MERIDIAN path is now: (1) pre-train a CIG-MAE-style dual-stream amplitude+phase masked autoencoder on heterogeneous CSI; (2) fine-tune the existing §2.1–§2.6 heads (17-kpt/DensePose, AETHER, domain-adversarial, geometry-conditioned) on top; (3) per-room source-free domain adaptation behind coherence_gate.rs::Recalibrate (separate ADR). §2.1+ is retained, re-framed as the fine-tune-stage head. References: arXiv:2511.18792 (data > capacity), 2512.04723 (CIG-MAE), 2605.01369 (MU-SHOT-Fi), 2506.12052, ACM TOSN 10.1145/3715130.
wifi-densepose-train::csi_mae — the masking pipeline:
- MaeConfig (+validate), MaskStrategy {Random, InfoGuided}
- TokenLayout — flattens a [T,tx,rx,sub] CSI window to [N=T·tx·rx, sub] tokens (the layout model.rs::ModalityTranslator already consumes)
- mask_csi_window — deterministic visible/masked partition + amp & phase reconstruction targets; reproducible via a tiny inline SplitMix64 PRNG (no new dep); clamps so both partitions are non-empty
- reassemble_tokens — round-trips visible + predicted tokens back to a full grid (for reconstruction eval/viz)
- csi_mae::model — gated behind tch-backend; v0 stub for now (interface fixed, networks land in iter 2)
- 8 new unit tests; cargo test -p wifi-densepose-train --no-default-features → 118 lib tests pass (the tch-gated submodule is not exercised by the default workspace job — compile-checking it needs LibTorch).

Diff: 4 files, +888.

Roadmap (the `/loop` will work through this until complete)

iter 2 — the tch encoder/decoder (dual-stream → shared latent → narrow decoder over all positions with learned mask tokens → reconstruct amp+phase), reconstruction_loss, pretrain_step, a pretrain-mae binary on SyntheticCsiDataset/MmFiDataset; information-guided masking; a gated "loss decreases over N steps on synthetic data" smoke test.
iter 3+ — pool & ingest heterogeneous CSI; a real pre-train run (GPU — scripts/gcloud-train.sh / cognitum); wire the §2.x heads on top; cross-domain eval (ADR-027 §4.6); ship the encoder as an RVF segment (§4.7).
mark PR ready when iter 2 is in and green.

Notes

Per-room source-free adaptation (MERIDIAN stage 3) is intentionally out of scope here — its own ADR/PR.
Branch made off main (post fix: ESP32 CSI 0pps (#521), aggregator sibling magics (#517), version.txt (#505) + fix-marker CI guard #526/ci: fix "Update vendor submodules" workflow (git identity + drop --merge) #528), not the feat/ruvllm-sparse-attention-edge (Integrate ruvllm_sparse_attention for on-ESP32-S3 temporal modeling + AETHER temporal head #513) branch.

🤖 Generated with claude-flow

Surveys the relevant slice of the 2026-Q2 agentic-AI literature (long-horizon agents, agent-memory discipline, self-improving/continual learning, the ESP32 mesh-as-a-swarm framing, the "agent harness on the MCU" pattern, retrieval/ quantization incl. the CoDEQ verdict, agentic verification) and the related RF/WiFi-sensing SOTA (CSI foundation models — the 1.3M-sample MAE scaling study showing data > capacity, AM-FM, CIG-MAE amplitude+phase MAE; source-free domain adaptation; the DensePose-from-WiFi lineage; multistatic fusion; mmWave+WiFi vitals; adversarial/privacy; through-wall). Maps every finding to a RuView ADR with impact/effort/horizon. Headline recommendation: re-scope MERIDIAN (ADR-027) as a heterogeneous-CSI MAE pre-train -> small task head. Lives under docs/research/sota/ alongside 2026-Q2-rf-sensing-and-edge-rust.md. Co-Authored-By: claude-flow <ruv@ruv.net>

…#68) Adds §2.0 — the primary MERIDIAN path is now a three-stage pipeline: 1. pre-train a CIG-MAE-style dual-stream (amplitude+phase) masked autoencoder on heterogeneous CSI (data breadth > pose-net capacity — arXiv:2511.18792); 2. fine-tune the existing §2.1–§2.6 heads (17-kpt/DensePose, AETHER, domain- adversarial, geometry-conditioned) on top of the pre-trained encoder; 3. adapt per-room with source-free unsupervised domain adaptation behind coherence_gate.rs::Recalibrate (separate ADR). §2.1+ is retained but re-framed as the fine-tune-stage head, not a from-scratch design. Adds the supporting references (2511.18792, 2512.04723, 2605.01369, 2506.12052, ACM TOSN 10.1145/3715130) and points at the 2026-Q2 SOTA survey. Co-Authored-By: claude-flow <ruv@ruv.net>

…iter 1, #68) New `wifi-densepose-train::csi_mae` module (ADR-027 §2.0): - MaeConfig (+ validate), MaskStrategy {Random, InfoGuided} - TokenLayout — flattens a [T,tx,rx,sub] CSI window to [N=T*tx*rx, sub] tokens (the same layout model.rs::ModalityTranslator consumes) - mask_csi_window — deterministic visible/masked token partition + amplitude & phase reconstruction targets; reproducible via a tiny inline SplitMix64 PRNG (no extra dependency); clamps so both partitions are non-empty - reassemble_tokens — round-trips encoder-visible + decoder-predicted tokens back to a full [N, sub] grid (for reconstruction eval/viz) - model submodule (gated behind `tch-backend`): v0 skeleton — the encoder/decoder networks, reconstruction loss, and pretrain_step land in iteration 2 (transformer blocks, per-sample masking, info-guided masking, a `pretrain-mae` bin) 8 new unit tests; builds and tests green under `cargo test -p wifi-densepose-train --no-default-features` (118 lib tests pass). The tch-gated `model` submodule is not exercised by the default workspace test job — compile-checking it needs a LibTorch toolchain. Co-Authored-By: claude-flow <ruv@ruv.net>

csi_mae::mask_csi_window now dispatches on MaskStrategy: - Random: uniform Fisher–Yates (as before). - InfoGuided: CIG-MAE-style — preferentially mask high-information tokens. A token's "information" = variance of its amplitude values + variance of its phase values (token_information()); near-constant tokens are trivially in-painted so masking them teaches less. Selection is weighted-without-replacement (Efraimidis–Spirakis: key_i = u_i^(1/w_i), ranked by ln(u_i)/w_i) — exact, and deterministic given `seed` (the u_i come from SplitMix64). Replaces the iteration-1 "InfoGuided falls back to Random with a warning" stub. +3 unit tests (info-guided skews ≥7.5/10 toward high-info tokens; deterministic in seed; token_information ≈ 0 for constant tokens). `cargo test -p wifi-densepose-train --no-default-features` → 121 lib tests pass. Still to do (iter 2b, next loop tick): the real csi_mae::model (tch encoder/ decoder + reconstruction_loss + pretrain_step), bin/pretrain_mae.rs, a gated "loss decreases" smoke test. Co-Authored-By: claude-flow <ruv@ruv.net>

…-mae bin (iter 2b, #68) Real CSI masked-autoencoder behind feature `tch-backend` (ADR-027 §2.0): - CsiMae: dual-stream per-token amp+phase embed → fuse → residual-MLP encoder over the visible tokens → flatten-to-latent bottleneck → learned per-position query + broadcast latent → residual-MLP decoder → dec_amp_head / dec_ph_head → index_select the masked positions. (MLP-based v0; self-attention transformer blocks are iter 3.) - CsiMae::reconstruction_loss(pred_amp, pred_phase, tgt_amp, tgt_phase, phase_w) = MSE(amp) + phase_w * MSE(phase). - MaeBatch::from_windows — partition computed once from window 0 and reused across the batch (the bottleneck fixes n_tokens), ndarray → tch conversion. - pretrain_step(model, opt, batch) -> f64 — one Adam step, returns the loss. - src/bin/pretrain_mae.rs — synthetic-data pre-train driver (required-features = ["tch-backend"]); clap args for epochs/batch/samples/lr/mask-ratio/save. - #[cfg(feature="tch-backend")] smoke test: loss halves when overfitting one batch over 60 steps; also asserts model.n_visible/n_masked match mask_csi_window's clamping. v0 limits (documented in the module): fixed n_tokens; batch-shared masking; MSE on unwrapped phase (vs a circular loss). The dev box has no LibTorch, so the tch path is CI-verified (`--features tch-backend`), not locally. The default `cargo test -p wifi-densepose-train --no-default-features` stays green (121 lib tests) — the model module and the bin are both feature-gated. Co-Authored-By: claude-flow <ruv@ruv.net>

Iteration status block: iter 1 + 2a + 2b done; iter 3 plan listed (heterogeneous-CSI ingest, real GPU pre-train, per-sample masking + transformer blocks, fine-tune §2.x heads, cross-domain eval, RVF segment). Co-Authored-By: claude-flow <ruv@ruv.net>

ruvnet · 2026-05-11T17:05:55Z

Iteration 2 complete — marking ready for review (the loop continues for iteration 3: heterogeneous-CSI ingest plan + GPU pre-train wiring stubs + per-sample masking / transformer blocks + fine-tuning the §2.x heads on the pre-trained encoder + cross-domain eval).

Added since the draft:

iter 2a — MaskStrategy::InfoGuided: CIG-MAE-style masking, weighting token selection by per-token information (variance of amplitude + variance of phase), exact weighted-without-replacement (Efraimidis–Spirakis), deterministic in seed. +3 tests.
iter 2b — csi_mae::model behind tch-backend: CsiMae (dual-stream amp+phase per-token embed → fuse → residual-MLP encoder over visible tokens → flatten-to-latent bottleneck → learned per-position query + broadcast latent → residual-MLP decoder → dec_amp_head/dec_ph_head → index_select masked positions); reconstruction_loss (MSE amp + w·MSE phase); MaeBatch::from_windows (partition from window 0, reused across the batch — n_tokens is fixed); pretrain_step; src/bin/pretrain_mae.rs (synthetic-data driver); a #[cfg(feature="tch-backend")] smoke test (loss halves when overfitting one batch). v0 limits documented: fixed n_tokens, batch-shared masking, MSE on unwrapped phase.

cargo test -p wifi-densepose-train --no-default-features → 121 lib tests pass (the model module + the bin are feature-gated; the dev box has no LibTorch, so the tch-backend path is CI-verified). PR diff now 6 files, +1441.

🤖 Generated with claude-flow

Closes the prototype's "iter 3 = plan + wiring documented" item (ADR-027 §2.0): - scripts/pretrain-mae-gcloud.sh — GCloud GPU driver for the MAE pre-train: a thin, reviewable mirror of scripts/gcloud-train.sh that provisions a VM in cognitum-20260110, builds wifi-densepose-train --features tch-backend,cuda, runs the `pretrain-mae` binary, downloads the .ot variable store, tears the VM down. Currently drives SyntheticCsiDataset (the smoke path); the one TODO is the --data-dir/--datasets plumbing for the real heterogeneous corpus. NOT run as part of this prototype. Also supports --dry-run (local synthetic pre-train, needs LibTorch). - ADR-027 §2.0 — added the "Iteration 3 plan" subsection: heterogeneous-CSI ingest (own recordings + MM-Fi + Wi-Pose + multi-band virtual sub-carriers, normalised to 56 sub-carriers), the GPU run, lifting the v0 limits (per-sample masking, transformer blocks, circular phase loss), the fine-tune handoff (load the CsiMae encoder into WiFiDensePoseModel via a `--init-encoder <mae.ot>` flag, then train the §2.x heads as regularisers), cross-domain eval (§4.6 protocol), and shipping the encoder as an RVF segment. - wifi-densepose-train/README.md — new "MERIDIAN-MAE" section pointing at the csi_mae module, the pretrain-mae binary, the gcloud script, and ADR-027 §2.0. - csi_mae.rs module doc — updated the iteration-status block. cargo test -p wifi-densepose-train --no-default-features → 121 lib tests pass. This completes the MERIDIAN CSI-MAE *prototype* (iter 1 masking pipeline + iter 2 tch model/pretrain loop/bin + iter 3 plan/wiring). Real cross-domain results need the heterogeneous ingest + a GPU pre-train run (iter 3 execution), out of scope for the prototype. Co-Authored-By: claude-flow <ruv@ruv.net>

ruvnet · 2026-05-11T17:10:30Z

Prototype complete — loop stopped.

Final state of feat/meridian-csi-mae-prototype (6 files, +1643 vs main):

iter 1 — csi_mae masking pipeline (MaeConfig/validate, MaskStrategy, TokenLayout, deterministic mask_csi_window/reassemble_tokens, dependency-free PRNG) + ADR-027 §2.0 re-scope + the 2026-Q2 SOTA survey.
iter 2a — MaskStrategy::InfoGuided (CIG-MAE-style, variance-weighted token selection, Efraimidis–Spirakis, deterministic in seed).
iter 2b — csi_mae::model (tch-backend): CsiMae (dual-stream amp+phase encoder over visible tokens → latent → decoder with learned per-position query → reconstruct masked amp+phase), reconstruction_loss, MaeBatch::from_windows, pretrain_step, bin/pretrain-mae, a gated "loss halves when overfitting one batch" smoke test. v0 limits documented (fixed n_tokens, batch-shared masking, MSE on unwrapped phase).
iter 3 — scripts/pretrain-mae-gcloud.sh (GCloud GPU driver, mirrors gcloud-train.sh; one TODO = heterogeneous-corpus plumbing), ADR-027 §2.0 "Iteration 3 plan" (ingest spec, GPU run, lifting the v0 limits, fine-tune handoff via --init-encoder, cross-domain eval, RVF segment), wifi-densepose-train/README.md section.

cargo test -p wifi-densepose-train --no-default-features → 121 lib tests pass (the tch-backend model module + bin are feature-gated and CI-verified — no LibTorch on the dev box).

Out of scope for the prototype (= iter 3 execution): the heterogeneous-CSI ingest implementation, the actual GPU pre-train run, per-sample masking + transformer blocks, and fine-tuning the §2.x heads on the pre-trained encoder. All planned in ADR-027 §2.0.

Ready for review.

🤖 Generated with claude-flow

ruvnet added 6 commits May 11, 2026 12:45

ruvnet marked this pull request as ready for review May 11, 2026 17:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(MERIDIAN): CSI masked-autoencoder pre-training — ADR-027 re-scope + iter-1 scaffold (#68)#529

feat(MERIDIAN): CSI masked-autoencoder pre-training — ADR-027 re-scope + iter-1 scaffold (#68)#529
ruvnet wants to merge 7 commits into
mainfrom
feat/meridian-csi-mae-prototype

ruvnet commented May 11, 2026

Uh oh!

ruvnet commented May 11, 2026

Uh oh!

ruvnet commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ruvnet commented May 11, 2026

What's here (iteration 1)

Roadmap (the /loop will work through this until complete)

Notes

Uh oh!

ruvnet commented May 11, 2026

Uh oh!

ruvnet commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Roadmap (the `/loop` will work through this until complete)