Skip to content

Bake numba JIT cache into model-worker#1398

Merged
sambles merged 12 commits into
mainfrom
feature/jit-baked-in
Jun 1, 2026
Merged

Bake numba JIT cache into model-worker#1398
sambles merged 12 commits into
mainfrom
feature/jit-baked-in

Conversation

@sambles

@sambles sambles commented May 26, 2026

Copy link
Copy Markdown
Contributor

Bake numba JIT cache into model-worker

  • Add Numba JIT warmup files into build image (only works on x86 systems 2015(ish) or newer
  • Added Numba Cache check to CI

Notes:

  • the file src/utils/warmup-serial.py is needed to avoid race condition by calling the oasislmf warmup with warmup(max_workers=1) possible fix in Fixes for warm up code OasisLMF#2001 could mean its not needed
  • Cached files are stored in /home/worker/.numba_jit_cache using NUMBA_CACHE_DIR
  • To avoid cache misses between Intel / AMD cpu NUMBA_CPU_FEATURES is set to blank
  • Added a script scripts/test-jit-cache.sh to check that an running an image will used the cache files by re-running warm-up and checking the cache for regenerated files
  • The *.py files in oasislmf MUST BE timestamp normalized to work. this happens automatically when building on an image pulled from a reg. To apply the same locally we used touch to zero the timestamp, RUN find /root/.local -exec touch -d @0 {} +

Numba JIT Cache — Baked into Docker Image

Problem

The model_worker image was not using pre-compiled Numba JIT cache files. Although .nbi/.nbc files were present in the image, Numba considered
them invalid at runtime and recompiled everything from scratch on first use, adding significant startup latency.


Root Cause

Numba validates its cache by comparing the mtime of the source .py file recorded in the .nbi index against the actual mtime of the file at
runtime. If they differ, the cache is considered stale.

In a multi-stage Docker build, pip install sets file timestamps to the current build time. When those files are copied between stages (e.g. COPY --from=jit-warmup), Docker can reset or drift the mtimes, causing a mismatch between what Numba recorded during warmup and what it sees at runtime.

Registry-pulled images (e.g. coreoasis/model_worker:2.5.3) don't have this problem because their layer timestamps are fixed at the time the image was
originally built and pushed — they never change.


Solution

Two changes to Dockerfile.model_worker:

1. Normalise pip-installed file timestamps to epoch 0

Added to the build-packages stage, after all pip install steps:

RUN find /root/.local -exec touch -d @0 {} +

This sets every installed .py file's mtime to 1970-01-01 00:00:00 UTC, making timestamps stable and identical across every build regardless of when it
runs. This applies to all pip installs including optional branch overrides (oasislmf_branch, ods_tools_branch, odm_branch).

2. Dedicated JIT cache directory (NUMBA_CACHE_DIR)

Instead of writing cache files into __pycache__ directories alongside the source files, Numba is directed to a separate directory:

ENV NUMBA_CACHE_DIR=/home/worker/.numba_jit_cache

Set in both the jit-warmup stage (so warmup writes there) and the final stage (so runtime reads from the same location). Only this directory is copied
into the final image — source .py files are never touched, so their normalised mtimes are preserved exactly as Numba recorded them during warmup.

---
Dockerfile Structure

STAGE 1: build-packages
  - pip install all dependencies
  - touch -d @0 to normalise all file timestamps

STAGE 2: model_worker
  - COPY --from=build-packages /root/.local → /home/worker/.local
  - all source files have mtime = 1970-01-01

STAGE 3: jit-warmup (FROM model_worker)
  - ENV NUMBA_CPU_NAME=x86-64-v3
  - ENV NUMBA_CACHE_DIR=/home/worker/.numba_jit_cache
  - runs warmup-serial.py (serial execution to avoid parallel cache race)
  - ~200 .nbi files produced in NUMBA_CACHE_DIR

STAGE 4: final (FROM model_worker)
  - ENV NUMBA_CPU_NAME=x86-64-v3
  - ENV NUMBA_CACHE_DIR=/home/worker/.numba_jit_cache
  - copies only .numba_jit_cache from jit-warmup stage
  - source .py files untouched, mtimes unchanged

Serial Warmup

The warmup uses warmup-serial.py (from the saas repo) which calls oasislmf.warmup.warmup(max_workers=1). The upstream default is parallel execution
across CPU cores, which causes a cache race condition — multiple task groups concurrently compile the same shared @njit functions with slightly
different type signatures, leaving the cache index in a mixed state. Forcing max_workers=1 eliminates this.

---
Verification

After a model run, the cache directory should be unchanged — no new .nbc files, no updated timestamps:

before:
  manager.compute_event_losses-787.py312.1.nbc  14:33  ← baked in
  manager.compute_event_losses-787.py312.nbi    14:33

after model run:
  manager.compute_event_losses-787.py312.1.nbc  14:33  ← unchanged ✓
  manager.compute_event_losses-787.py312.nbi    14:33  ← unchanged ✓

---
Known Limitation — Branch Installs

When building with oasislmf_branch pointing to a development branch, the warmup compiles against that branch's pytools code. If the branch has changed
function bodies or type handling relative to the released version, the warmup may not cover all type specializations that real model data triggers,
resulting in a small number of additional compilations on first run. These are one-time per container instance and cached in the writable container
layer for subsequent runs.

This is a warmup coverage issue in the oasislmf repo — the synthetic dummy data in oasislmf/warmup.py needs to exercise the same type signatures that
real models use.

@sambles sambles marked this pull request as draft May 26, 2026 14:14
@sambles sambles mentioned this pull request May 28, 2026
@sambles sambles self-assigned this May 29, 2026
@sambles sambles moved this to In Progress in Oasis Dev Team Tasks May 29, 2026
@sambles sambles marked this pull request as ready for review May 29, 2026 15:32
@sambles sambles linked an issue May 29, 2026 that may be closed by this pull request
@sambles sambles added the Enhancement Small improvement or refinement. label May 29, 2026
sambles added 2 commits May 29, 2026 16:47
NUMBA_CPU_FEATURES= is Unlikely to slow exec  (i hope)

 NUMBA_CPU_NAME=x86-64-v3 tells LLVM to compile targeting the x86-64-v3 microarchitecture level, which already implies AVX2, FMA, BMI1/BMI2, F16C,
  LZCNT, MOVBE — the SIMD instructions that matter for numeric/financial workloads. Those are still active.

  What NUMBA_CPU_FEATURES= suppresses is the host-specific extras that auto-detection adds on top:

  ┌─────────────────┬─────────────────────────────────────────────┐
  │     Machine     │            Auto-detected extras             │
  ├─────────────────┼─────────────────────────────────────────────┤
  │ AMD Zen3        │ sha, sse4a, clzero, rdpru, mwaitx, wbnoinvd │
  ├─────────────────┼─────────────────────────────────────────────┤
  │ Intel (typical) │ avx512f, avx512bw, various Intel extensions │
  └─────────────────┴─────────────────────────────────────────────┘

  For oasislmf's pytools workloads (array arithmetic, loss calculations, stream I/O), none of those vendor-specific extensions are in the hot path —
  Numba wouldn't generate SHA instructions for financial model maths regardless. The meaningful speedup from AVX2 and FMA is preserved.

  The only scenario where you'd see a slowdown is if a future oasislmf function was explicitly written to exploit avx512f on Intel. At that point you'd
  want to reconsider the portability trade-off — but for now there's no practical impact.
@sambles sambles requested a review from sstruzik June 1, 2026 07:36
@sambles sambles merged commit 9f0a002 into main Jun 1, 2026
37 checks passed
@sambles sambles deleted the feature/jit-baked-in branch June 1, 2026 11:06
@github-project-automation github-project-automation Bot moved this from In Progress to Done in Oasis Dev Team Tasks Jun 1, 2026
@awsbuild awsbuild added this to the 2.5.4 milestone Jun 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Enhancement Small improvement or refinement.

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

Numba JIT baked into image

3 participants