EUFLOW · kilojoules · May 21, 2026
diff --git a/examples/cases/open_source_scada/README.md b/examples/cases/open_source_scada/README.md
@@ -0,0 +1,84 @@
+# open_source_scada — Twin Groves Wind Farm
+
+This example case packages one year (2010-09-01 to 2011-08-31, hourly) of
+per-turbine SCADA power and wind-speed observations from the **Twin
+Groves Wind Farm**, owned and operated by Invenergy in McLean County,
+Illinois, USA. The farm has approximately 200 Vestas V82 1.65 MW
+turbines (82 m rotor, 80 m hub height).
+
+The packaged data is suitable for benchmarking flow models against
+operational observations: it provides a public reference case with a
+known turbine type, a publicly available manufacturer power curve, and
+permissive licensing.
+
+## Files
+
+| path | content |
+|--|--|
+| `plant_wind_farm/FLOW_toy_study_wind_farm.yaml` | Turbine x/y coordinates (UTM zone 16N, EPSG:32616). |
+| `plant_energy_turbine/turbine.yaml` | Vestas V82 1.65 MW power and thrust curves (OEM values, also at https://github.com/NatLabRockies/turbine-models). |
+| `plant_energy_turbine/VestasV82_1.65MW_82.csv` | Same OEM curve in CSV form (WS, P, Cp, Ct). |
+| `plant_energy_resource/` | ERA5 reanalysis wind timeseries at hub height + Weibull resource + utility scripts. |
+| `plant_energy_site/FLOW_toy_study_energy_site.yaml` | Site polygon and energy-resource include. |
+| `wind_energy_system/system.yaml` | Top-level WindIO definition tying farm, site, and analysis together. |
+| `wind_energy_system/analysis.yaml` | Example Bastankhah Gaussian wake-model analysis configuration. |
+| `outputs/observedPower.nc` | **Observed** per-turbine power (W) and wind speed (m/s) per hour, shape `(time=8760, turbine=200)`. |
+
+## Units and conventions
+
+- `observedPower.nc.power`: per-turbine electrical power output in **watts (W)**.
+  Maximum across all (time, turbine) entries is the rated power
+  $1.65\times10^{6}$ W, consistent with the V82 OEM curve packaged in
+  `plant_energy_turbine/turbine.yaml`.
+- `observedPower.nc.rotor_effective_velocity`: per-turbine wind speed in
+  **m/s**. Currently the raw nacelle-anemometer reading; this is **not** a
+  freestream value (no Nacelle Transfer Function correction has been
+  applied, and no upstream met mast is provided).
+- Time axis is hourly UTC. Approximately 20% of (time, turbine) entries
+  are NaN due to operational gaps, maintenance, and turbine outages — this
+  is real operational SCADA, not a synthetic dataset.
+
+## Historical note on the power scaling
+
+Prior to the cleanup PR that accompanies this README, the
+`observedPower.nc.power` values were stored with a **multiplicative factor
+of 1.65 applied** (i.e., the rated-power numerical value in MW). Maximum
+values were near 2.72 MW, which is inconsistent with the V82 1.65 MW
+rated power and made the file unusable for any benchmark relying on the
+packaged OEM power curve.
+
+If you have a copy of the older file, you can recover the corrected
+values by dividing `power` by `1.65`:
+
+```python
+import xarray as xr
+ds = xr.open_dataset("outputs/observedPower.nc")
+ds["power"] = ds["power"] / 1.65   # only if you have the pre-PR file
+```
+
+The current file in this repository has the correction applied. A
+self-consistency check is included in `tests/sanity_open_source_scada.py`
+(see PR description).
+
+## Suggested usage
+
+```python
+import xarray as xr
+ds = xr.open_dataset("outputs/observedPower.nc")
+print(ds)                       # (time: 8760, turbine: 200)
+print(ds.power.max().item())    # ~1.65e6 W
+print(ds.power.attrs)           # units, source
+```
+
+## License
+
+MIT (inherited from the WIFA repository).
+
+## Provenance and citation
+
+If you use this dataset in published work, please cite the WIFA
+repository and acknowledge Twin Groves Wind Farm / Invenergy as the
+source of the SCADA observations. The Vestas V82 1.65 MW OEM power and
+thrust curves are reproduced from public manufacturer data (also
+available at the NatLabRockies turbine-models repository:
+https://github.com/NatLabRockies/turbine-models/blob/main/turbine_models/data/Onshore/VestasV82_1.65MW_82.csv).
diff --git a/examples/cases/open_source_scada/outputs/observedPower.nc b/examples/cases/open_source_scada/outputs/observedPower.nc
diff --git a/examples/cases/open_source_scada/tests/sanity_open_source_scada.py b/examples/cases/open_source_scada/tests/sanity_open_source_scada.py
@@ -0,0 +1,81 @@
+"""Sanity check for examples/cases/open_source_scada/outputs/observedPower.nc.
+
+Verifies that the cleaned per-turbine power timeseries is consistent with
+the Vestas V82 1.65 MW OEM power curve packaged in the same case
+directory. Catches regressions of the historical "multiply by rated MW"
+bug fixed in the cleanup PR.
+
+Run from the WIFA repository root:
+
+    python examples/cases/open_source_scada/tests/sanity_open_source_scada.py
+"""
+from __future__ import annotations
+
+from pathlib import Path
+
+import numpy as np
+import xarray as xr
+import yaml
+
+
+CASE_DIR = Path(__file__).resolve().parents[1]
+OBSERVED = CASE_DIR / "outputs" / "observedPower.nc"
+TURBINE = CASE_DIR / "plant_energy_turbine" / "turbine.yaml"
+
+RATED_W = 1.65e6
+TOL_RATED = 0.02 * RATED_W            # 2% above rated tolerated (overshoot)
+TOL_BIN_FRAC = 0.20                   # 20% bin-median tolerance vs OEM
+WS_TEST_BINS = list(range(5, 14))     # exclude very-low (start-up scatter)
+                                      # and very-high (storm shutdown scatter)
+
+
+def load_oem_pc(turbine_yaml: Path):
+    d = yaml.safe_load(turbine_yaml.read_text())
+    pc = d["performance"]["power_curve"]
+    ws = np.array(pc["power_wind_speeds"], dtype=float)
+    p_kw = np.array(pc["power_values"], dtype=float)
+    return ws, p_kw * 1000.0          # convert to W
+
+
+def main():
+    print(f"Checking {OBSERVED} ...")
+    ds = xr.open_dataset(OBSERVED)
+    P = ds["power"].values             # (time, turbine), W
+    WS = ds["rotor_effective_velocity"].values
+    oem_ws, oem_p = load_oem_pc(TURBINE)
+
+    valid = np.isfinite(P) & np.isfinite(WS)
+    p_max = float(np.nanmax(P))
+    print(f"  max(power) = {p_max:.1f} W (rated = {RATED_W:.1f} W)")
+    assert p_max <= RATED_W + TOL_RATED, (
+        f"max(power)={p_max} exceeds rated+tol={RATED_W + TOL_RATED}; "
+        "is the historical 'multiply by rated MW' scaling re-introduced?"
+    )
+
+    fails = []
+    for lo in WS_TEST_BINS:
+        m = valid & (WS >= lo) & (WS < lo + 1)
+        if m.sum() < 30:
+            continue
+        oem_at = float(np.interp(lo + 0.5, oem_ws, oem_p))
+        med = float(np.median(P[m]))
+        rel = abs(med - oem_at) / max(oem_at, 1.0)
+        flag = "OK" if rel <= TOL_BIN_FRAC else "FAIL"
+        print(f"  WS={lo:2d}-{lo+1:2d}: n={m.sum():6d}  "
+              f"OEM={oem_at:7.0f} W  median={med:7.0f} W  rel-err={rel:.2f} [{flag}]")
+        if rel > TOL_BIN_FRAC:
+            fails.append((lo, oem_at, med, rel))
+
+    if fails:
+        msg = ", ".join(f"WS={lo}: rel-err={r:.2f}" for lo, _, _, r in fails)
+        raise AssertionError(
+            f"Power-curve agreement worse than {TOL_BIN_FRAC:.0%} on bins: {msg}"
+        )
+
+    nan_frac = float(np.isnan(P).mean())
+    print(f"  NaN fraction (operational gaps): {nan_frac:.2f}  (expect ~0.20)")
+    print("Sanity check passed.")
+
+
+if __name__ == "__main__":
+    main()