Validate expert grid compatibility in DenoisingMoEPredictor.__init__ by frodre · Pull Request #1234 · ai2cm/ace

frodre · 2026-06-06T21:42:25Z

First in a 5-PR stack adding support for longitude domains that cross the 0/360 prime meridian in downscaling. This standalone hardening PR moves expert grid-compatibility validation into the predictor constructor so every construction path is protected, not just the config-build path: only the primary expert's coordinates are used for input prep and output coords, so an expert built on a mismatched grid would otherwise silently downscale onto the wrong grid.

Changes:

fme.downscaling.predictors.serial_denoising: move _validate_experts_compatible from DenoisingMoEConfig.build into DenoisingMoEPredictor.__init__, so it holds for build, from_state, and future callers (e.g. with_rolled_lon).
fme.downscaling.test_models: add test_denoising_moe_predictor_rejects_mismatched_expert_grids, constructing the predictor directly with mismatched-grid experts and asserting it raises.
Tests added
If dependencies changed, "deps only" image rebuilt and "latest_deps_only_image.txt" file updated

Base: main

Stack

PR	Head → Base	Title
#1234	`refactor/moe-validate-experts-init` → `main`	Validate expert grid compatibility in `DenoisingMoEPredictor.__init__`
#1235	`feature/lon-roll-primitives` → PR1	Add longitude roll primitives
#1236	`feature/lon-roll-data-layer` → PR2	Roll seam-crossing longitudes in the data layer
#1237	`feature/lon-roll-model` → PR3	Add with_rolled_lon to models
#1238	`feature/lon-roll-integration` → PR4	Roll the model in inference/predict/evaluator

Move _validate_experts_compatible out of DenoisingMoEConfig.build and into DenoisingMoEPredictor.__init__. Only the primary expert's coordinates are used for input prep and output coords, so a mismatched expert would silently downscale onto the wrong grid. Enforcing in __init__ closes that gap on every construction path (build, from_state, and future callers), not just the config-build path. Add a regression test constructing the predictor directly with mismatched-grid experts and asserting it raises.

frodre · 2026-06-06T22:13:17Z

        if expert_renames is not None and len(expert_renames) != len(experts):
            raise ValueError("expert_renames and experts must have the same length.")
+
+        _validate_experts_compatible(experts)


I want validation to occur anytime we initialize a model, not only through the build pathway. This will ensure the static / coordinate rolled models are initialized properly in subsequent PRs.

) PR 2 of 5 in the prime-meridian longitude stack. Adds the pure coordinate/data rolling utilities needed to re-express a global grid in a seam-crossing domain's convention. These have no production callers yet — later PRs wire them into the data and model layers — so they are reviewable in isolation with full unit coverage. The interval-based roll only triggers when an interval actually crosses the seam (`start < 0` or `stop > 360`), so in-range intervals are a no-op and non-global grids are left untouched. Primitives overview (PR #1235) These primitives are always used as a pair: find_roll_anchor (or find_roll_anchor_from_interval) computes the roll amount once; callers pass it to all subsequent roll_lon_coords and roll_lon_data so coordinates and field tensors shift by the same amount. Two downstream pathways use them: - Dataset load — rolls each loaded grid into the user's configured lon_extent convention (PR #1236) - Model setup — rolls the model's fine grid to match the incoming coarse batch's convention (PR #1237) Changes: - `fme.downscaling.data.utils`: add `ClosedInterval.finite_values`, `_requires_lon_roll`, `coords_require_lon_roll`, `find_roll_anchor`, `find_roll_anchor_from_interval`, `roll_lon_coords`, `roll_lon_data`, and private helpers `_validate_rollable_lon` and `_validate_monotonic_lon`. - `roll_lon_coords` (1-D coordinate tensor) and `roll_lon_data` (N-D field tensor) form a parallel pair: both apply the same roll amount, but `roll_lon_coords` also remaps values to keep the result monotonically increasing, while `roll_lon_data` is a pure cyclic shift. Callers pre-compute the roll amount once via `find_roll_anchor` and pass it to both. - `roll_latlon_coords` is not included here; it operates on a `LatLonCoordinates` struct rather than a raw tensor and belongs in the PR that first uses it. - `fme.downscaling.data` (`__init__`): export the new roll helpers. - `fme.downscaling.data.test_utils`: unit tests for roll amounts, seam-crossing conventions, round-trip invertibility, non-global/non-uniform rejection, and invalid input validation. - [x] Tests added - [ ] If dependencies changed, "deps only" image rebuilt and "latest_deps_only_image.txt" file updated Base: `refactor/moe-validate-experts-init` (PR 1) ### Stack | PR | Head → Base | Title | |----|-------------|-------| | [#1234](#1234) | `refactor/moe-validate-experts-init` → `main` | Validate expert grid compatibility in `DenoisingMoEPredictor.__init__` | | [#1235](#1235) | `feature/lon-roll-primitives` → PR1 | Add longitude roll primitives | | [#1236](#1236) | `feature/lon-roll-data-layer` → PR2 | Roll seam-crossing longitudes in the data layer | | [#1237](#1237) | `feature/lon-roll-model` → PR3 | Add with_rolled_lon to models | | [#1238](#1238) | `feature/lon-roll-integration` → PR4 | Roll the model in inference/predict/evaluator |

PR 3 of 5 in the prime-meridian longitude stack. Applies the roll primitives (PR 2) in the data layer so a longitude interval that crosses the 0/360 seam can be subset instead of raising `NotImplementedError`. In-range intervals resolve to a zero roll and behave exactly as before. Changes: - `fme.downscaling.data.datasets.HorizontalSubsetDataset`: roll data and coordinates into the requested interval's convention rather than raising on wraparound. - `fme.downscaling.data.config`: extract `_build_aligned_subset_pair`, which rolls coarse and fine lon coords into the extent's convention (`_roll_lons_to_extent_convention`) before `adjust_fine_coord_range`, so fine/coarse subselection stays aligned across the seam. - `fme.downscaling.data.static.StaticInputs.roll`: roll static fields and their lon coordinates to match. - `fme.downscaling.data.test_config`, `fme.downscaling.data.test_datasets`, `fme.downscaling.data.test_static`: tests for seam-crossing subsetting (negative and >360 conventions), fine/coarse scale-factor preservation across the seam (even and odd downscale factors), end-to-end paired loader with a seam-crossing extent, and `StaticInputs.roll`. Note: surfacing the coarse grid convention on `GriddedData`/`PairedGriddedData` (`coarse_latlon_coords`) was deferred to the integration PR after review discussion. - [x] Tests added - [ ] If dependencies changed, "deps only" image rebuilt and "latest_deps_only_image.txt" file updated Base: `feature/lon-roll-primitives` (PR 2) ### Stack | PR | Head → Base | Title | |----|-------------|-------| | [#1234](#1234) | `refactor/moe-validate-experts-init` → `main` | Validate expert grid compatibility in `DenoisingMoEPredictor.__init__` | | [#1235](#1235) | `feature/lon-roll-primitives` → PR1 | Add longitude roll primitives | | [#1236](#1236) | `feature/lon-roll-data-layer` → PR2 | Roll seam-crossing longitudes in the data layer | | [#1237](#1237) | `feature/lon-roll-model` → PR3 | Add with_rolled_lon to models | | [#1238](#1238) | `feature/lon-roll-integration` → PR4 | Roll the model in inference/predict/evaluator |

PR 4 of 5 in the prime-meridian longitude stack (PRs 1–3 now merged to main). Lets a model re-express its grid in a seam-crossing coarse domain's longitude convention while sharing the trained network weights, so a single checkpoint can generate over a domain expressed west of 0 or east of 360. Changes: - `fme.downscaling.models.DiffusionModel.with_rolled_lon`: rebuild the model through its constructor with `full_fine_coords` and `static_inputs` rolled to match the coarse grid, anchored on the western coarse-cell edge so the fine grid stays aligned to whole coarse cells; returns `self` when no roll is needed. Inference-only (rebuilding re-wraps the module under torch distributed). - `fme.downscaling.predictors.serial_denoising.DenoisingMoEPredictor.with_rolled_lon`: roll every expert (preserving the shared-grid invariant) and rebuild so the sigma dispatcher is reconstructed from the rolled experts. - `fme.downscaling.data` exports `roll_lon_coords` for the model layer. - `fme.downscaling.test_models`: tests for no-roll passthrough, coord shifting with shared weights (including value-level checks that coords and static data roll together, and that a double roll is a no-op), and coarse-cell alignment for a seam-crossing domain. MoE rolling tests live in `test_serial_denoising` next to the existing grid-validation test. - Test cleanup: shared `cell_centered_coordinate` helper in `test_utils` replaces per-file midpoint-coordinate constructions (`test_models`, `test_config`); removed a test and helper in `test_models`/`test_serial_denoising` duplicated from #1234. - [x] Tests added - [ ] If dependencies changed, "deps only" image rebuilt and "latest_deps_only_image.txt" file updated Base: `main` (PRs 1–3 of the stack merged) ### Stack | PR | Head → Base | Title | Status | |----|-------------|-------|--------| | [#1234](#1234) | `refactor/moe-validate-experts-init` → `main` | Validate expert grid compatibility in `DenoisingMoEPredictor.__init__` | merged | | [#1235](#1235) | `feature/lon-roll-primitives` → `main` | Add longitude roll primitives | merged | | [#1236](#1236) | `feature/lon-roll-data-layer` → `main` | Roll seam-crossing longitudes in the data layer | merged | | [#1237](#1237) | `feature/lon-roll-model` → `main` | Add with_rolled_lon to models | this PR | | [#1238](#1238) | `feature/lon-roll-integration` → PR4 | Roll the model in inference/predict/evaluator | open |

frodre added 2 commits June 6, 2026 15:10

Remove verbose comment

a3cb3ba

Move tests to correct place

f393c79

frodre commented Jun 6, 2026

View reviewed changes

Merge branch 'main' into refactor/moe-validate-experts-init

12a0b9a

frodre marked this pull request as ready for review June 6, 2026 22:13

AnnaKwa approved these changes Jun 8, 2026

View reviewed changes

frodre merged commit 2043d79 into main Jun 8, 2026
7 checks passed

frodre deleted the refactor/moe-validate-experts-init branch June 8, 2026 16:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Validate expert grid compatibility in DenoisingMoEPredictor.init#1234

Validate expert grid compatibility in DenoisingMoEPredictor.init#1234
frodre merged 4 commits into
mainfrom
refactor/moe-validate-experts-init

frodre commented Jun 6, 2026 •

edited

Loading

Uh oh!

frodre Jun 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

frodre commented Jun 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Stack

Uh oh!

frodre Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

frodre commented Jun 6, 2026 •

edited

Loading