A dedicated JSON-only CloudFront distribution that serves the SlideRule schema endpoints. Separate from the web client's distribution — different bucket, different distribution, different CORS and cache policy.
The publishable tree lives at
schema-endpoints/merged/source/ and
mirrors the S3 bucket layout, which mirrors the public URL structure
1:1 with no rewriting, no CloudFront Function, no Lambda@Edge.
CloudFront takes the URL path, strips the leading /, and looks up that
exact key in the bucket.
merged/ is an artifact produced by schema-endpoints/merge.py
from schema-endpoints/authored/ (human-edited)
and schema-endpoints/generated/ (tool-emitted).
It is committed to git so reviewers see the S3-bound diff on every PR,
and make verify asserts it matches what merge.py would produce
today. See schema-endpoints/README.md for
the three-tier architecture.
schema-endpoints/merged/source/ ← on disk in this repo
← same keys in s3://sliderule-schema-test/
← same paths at https://schema.testsliderule.org/
├── schema.json (index of available domains)
└── schema/
├── core.json (shared request parameters)
├── icesat2.json (ICESat-2 request parameters)
├── gedi.json (GEDI request parameters)
├── swot.json (not yet generated → 404)
├── cre.json (not yet generated → 404)
│
├── icesat2/
│ ├── fields.json (selector listing)
│ │
│ ├── fields/ ← columns added by *_fields selectors
│ │ ├── atl03_ph.json
│ │ ├── atl03_geo.json
│ │ ├── atl03_corr.json
│ │ ├── atl03_bckgrd.json
│ │ ├── atl06.json
│ │ ├── atl08.json
│ │ ├── atl09.json
│ │ └── atl13.json
│ │
│ └── output/ ← per-API output column schemas
│ ├── atl03x.json (base + fit/phoreal/yapc/atl24/atl13 mods)
│ ├── atl06x.json (base land-ice segment columns)
│ ├── atl08x.json (base land/veg segment columns)
│ ├── atl13x.json (base inland-water columns)
│ └── atl24x.json (base bathymetry photon columns)
│
└── gedi/
├── fields.json (future: GEDI selector listing)
├── fields/ (future: GEDI *_fields selectors)
└── output/
└── gedil4ax.json (base GEDI L4A footprint columns)
Anything not present in the source tree (including swot.json, cre.json,
and any unpublished path) returns HTTP 404 with body:
{"error": "not yet generated"}This is configured via CloudFront custom_error_response pointing at
/errors/not-found.json, which make deploy uploads alongside the schema
tree. The source of that body is
schema-endpoints/authored/errors/not-found.json.
Every file served by the distribution starts life in
schema-endpoints/authored/ (human-edited) or
schema-endpoints/generated/ (tool-emitted).
schema-endpoints/merge.py fuses the two into
merged/, which scripts/build.sh copies verbatim into
build/ for the S3 sync.
| Published URL | In-repo source | Origin in the sliderule repo |
|---|---|---|
/source/schema.json (domain/API index) |
schema-endpoints/authored/schema.json |
Hand-written in this repo |
/source/schema/{core,icesat2,gedi}.json |
Merged from generated/<domain>/params.json + authored/<domain>/{structure,behavior}.json |
generated/<domain>/params.json will eventually come from the sliderule server's /source/defaults endpoint (packages/core/endpoints/defaults.lua); hand-maintained for now |
/source/schema/icesat2/fields.json |
schema-endpoints/authored/icesat2/fields.json |
Hand-written selector listing |
/source/schema/icesat2/fields/<selector>.json, /source/schema/gedi/fields/<selector>.json |
schema-endpoints/generated/{icesat2,gedi}/fields/<selector>.json |
schema_fields/fields_<selector>.json (from local scripts/enumerate_h5_fields.py, adopted into this repo Apr 2026) |
/source/schema/icesat2/output/<api>.json |
schema-endpoints/generated/icesat2/output/<api>.json |
sliderule/tmp_server_generated_schema_test/schema_<API>DataFrame.json (from test_server_generated_schema.sh) |
/source/schema/gedi/output/gedil4ax.json |
schema-endpoints/generated/gedi/output/gedil4ax.json |
sliderule/tmp_server_generated_schema_test/schema_Gedi04aDataFrame.json |
Field enumerations (granule-level HDF5 structure):
# From this repo:
python3 scripts/download_h5_granules.py --output-dir ./granules/
python3 scripts/enumerate_h5_fields.py --earthdata --output-dir ./schema_fields/
# Or point at granules you already have:
python3 scripts/enumerate_h5_fields.py \
--atl03 granules/ATL03_*.h5 \
--atl24 granules/ATL24_*.h5 \
--gedi_l4a granules/GEDI04_A_*.h5 \
--output-dir ./schema_fields/
# Mirror into schema-endpoints/generated/ and re-merge:
cp schema_fields/fields_atl*.json schema-endpoints/generated/icesat2/fields/
cp schema_fields/fields_gedi_*.json schema-endpoints/generated/gedi/fields/
python3 schema-endpoints/merge.pyFor a new product whose HDF5 group paths aren't yet in SELECTOR_MAP,
use --walk <granule> to discover the structure first, then fill in
the paths in the SELECTOR_MAP entry before re-running the normal
enumeration flow.
Output DataFrame schemas (what the server actually returns per API):
# From the sliderule repo:
cd ../sliderule
bash scripts/test_server_generated_schema.sh
# Output lands in sliderule/tmp_server_generated_schema_test/
# Mirror into this repo and re-merge:
cp ../sliderule/tmp_server_generated_schema_test/schema_Atl03DataFrame.json schema-endpoints/generated/icesat2/output/atl03x.json
cp ../sliderule/tmp_server_generated_schema_test/schema_Atl06DataFrame.json schema-endpoints/generated/icesat2/output/atl06x.json
# ... etc for atl08x, atl13x, atl24x, gedil4ax (Gedi04aDataFrame.json)
python3 schema-endpoints/merge.pyRequest-parameter schemas (schema/core.json, icesat2.json, gedi.json):
# Against a running sliderule server:
curl http://<server>:9081/source/defaults | jq '.core' > schema-endpoints/generated/core/params.json
curl http://<server>:9081/source/defaults | jq '.icesat2' > schema-endpoints/generated/icesat2/params.json
curl http://<server>:9081/source/defaults | jq '.gedi' > schema-endpoints/generated/gedi/params.json
python3 schema-endpoints/merge.pyCommit the resulting schema-endpoints/merged/ diff. make verify
will flag any drift in CI if you forget.
| Environment | Domain | S3 bucket |
|---|---|---|
| test | schema.testsliderule.org |
sliderule-schema-test |
| prod (future) | schema.slideruleearth.io |
sliderule-schema-prod |
Per-environment wrapper targets in the Makefile
(deploy-to-testsliderule, deploy-to-slideruleearth, etc.) carry the
DOMAIN / S3_BUCKET / DOMAIN_APEX variables, matching the pattern in
sliderule-web-client/Makefile. DISTRIBUTION_ID is auto-resolved from the
domain alias via aws cloudfront list-distributions.
make build Run merge, then stage schema-endpoints/merged/
into build/
make clean Remove build/
make verify Assert merged/ matches merge.py output
(run after any edit to authored/ or generated/)
make live-update Verify + build + aws s3 sync + invalidation
(requires DOMAIN + S3_BUCKET + DOMAIN_APEX)
make deploy Alias for live-update
make terraform-apply Create/update distribution + bucket + DNS
make terraform-destroy Tear down the above
make smoketest curl the public endpoints and verify
status + Content-Type + CORS
# Per-env wrappers (no variables needed):
make deploy-to-testsliderule Infra + content at schema.testsliderule.org
make live-update-testsliderule Content only (assumes infra exists)
make destroy-testsliderule Tear down the test env
schema-endpoints/merged/ is a build
artifact — output of merge.py from
authored/ + generated/ — that is nevertheless committed to git.
Two concrete reasons:
-
Paired-diff review. Every source edit under
authored/orgenerated/is committed alongside the resultingmerged/diff, so reviewers see the exact bytes going to S3 next to the edit that caused them. A coupling added inauthored/icesat2/behavior.jsonshows up paired with the new field appearing in the correct param in the mergedsource/schema/icesat2.json— catches translation bugs (wrong group, wrong position) that the source diff alone wouldn't surface. -
A simple drift check.
make verifyassertsgit diff --quiet schema-endpoints/merged/after runningmerge.py. If someone editsauthored/orgenerated/without regenerating, the diff is non-empty and verify fails. No schema-comparison logic needed — just git. The merge is deterministic (nosort_keys,indent=2, trailing newline) so the git-diff check is reliable.
terraform/.terraform.lock.hcl is also committed (per HashiCorp
recommendation): without it, terraform init picks the latest provider
matching the version constraint and different teammates resolve to
different SHAs. Committing the lock pins the whole team to the same
provider build.
Ignored (see .gitignore):
/build/— downstream ofmerged/(purecp -R). No new review signal, regenerated on everymake build.**/.terraform/*,*.tfstate*,*.tfplan,*.tfvars— ephemeral or secret-bearing. Terraform state lives in S3 per terraform/backend.tf.__pycache__/,*.pyc,.DS_Store, etc. — per-user / per-OS noise.
Workflow implication. Edits to authored/ or generated/ are
two-step commits: change the source, python3 schema-endpoints/merge.py,
git add both trees, commit together. The friction is deliberate —
it's the price of the paired-diff review benefit.
- Origin: S3 bucket, fronted by an Origin Access Identity. The bucket is private; only CloudFront can read it.
- Path mapping: 1:1. CloudFront strips the leading
/from the request path and looks for that exact key in the bucket. No CloudFront Functions, no Lambda@Edge, no SPA fallback. - Content-Type:
aws s3 syncauto-detectsapplication/jsonfrom the.jsonextension — no per-file content-type flag needed. - CORS:
Access-Control-Allow-Origin: *,Methods: GET, OPTIONS,Headers: *. Applied via a CloudFront response headers policy so every response (including errors) gets the CORS headers. - Cache:
Cache-Control: max-age=60while iterating. Raise once the schemas stabilise. - TLS: ACM certificate for
schema.<apex>, DNS-validated against the existing Route 53 zone for the apex. TLS 1.2+. - Errors: 403/404 from S3 → 404 from CloudFront with body
/errors/not-found.json({"error": "not yet generated"}). This coversswot.json,cre.json, and any other unpublished path.
make smoketest runs these against https://$DOMAIN:
curl /source/schema.json -> 200 application/json
curl /source/schema/core.json -> 200 application/json
curl /source/schema/icesat2.json -> 200 application/json
curl /source/schema/gedi.json -> 200 application/json
curl /source/schema/icesat2/fields.json -> 200 application/json
curl /source/schema/icesat2/fields/atl03_ph.json -> 200 application/json
curl /source/schema/icesat2/fields/atl03_geo.json -> 200 application/json
curl /source/schema/icesat2/fields/atl03_corr.json -> 200 application/json
curl /source/schema/icesat2/fields/atl03_bckgrd.json -> 200 application/json
curl /source/schema/icesat2/fields/atl06.json -> 200 application/json
curl /source/schema/icesat2/fields/atl08.json -> 200 application/json
curl /source/schema/icesat2/fields/atl09.json -> 200 application/json
curl /source/schema/icesat2/fields/atl13.json -> 200 application/json
curl /source/schema/icesat2/output/atl03x.json -> 200 application/json
curl /source/schema/icesat2/output/atl06x.json -> 200 application/json
curl /source/schema/icesat2/output/atl08x.json -> 200 application/json
curl /source/schema/icesat2/output/atl13x.json -> 200 application/json
curl /source/schema/icesat2/output/atl24x.json -> 200 application/json
curl /source/schema/gedi/output/gedil4ax.json -> 200 application/json
curl /source/schema/swot.json -> 404 application/json
curl /source/schema/cre.json -> 404 application/json
curl -H "Origin: https://example.com" /source/schema.json -> header Access-Control-Allow-Origin: *
The repo ships a Claude skill at
skills/sliderule-schema/ — a thin HTTPS
client that fetches JSON from the deployed schema distribution. Pure
transport: all interpretation lives in
SKILL.md.
make package-skill-schema # → skills/sliderule-schema.skill
make package-skills # same thing today (future-proof for more skills)Each .skill archive is a zip with the skill directory at the root
(e.g. sliderule-schema/…). Packages are ~3 KB — SKILL.md plus one
tiny Python client, no corpus or model bytes bundled. Archives are
gitignored (/skills/*.skill); the on-disk skills/<name>/ directory
is the source of truth, rebuilt on demand.
Mirrors the pattern in sliderule-search-server,
which ships sliderule-docsearch and nsidc-reference the same way.
- Populate
schema-endpoints/authored/andschema-endpoints/generated/with the files listed in the "Source files" table above. Anything not present will simply 404 in production. python3 schema-endpoints/merge.pyto produceschema-endpoints/merged/(and commit the result).make deploy-to-testsliderule- Terraform stands up the bucket, distribution, ACM cert, and Route 53 record.
- The same wrapper then runs
live-update, which verifies, stages, and syncs the JSON tree and kicks off an invalidation.
make smoketest-testsliderule
CloudFront distribution creation takes a few minutes; DNS propagation can
take a few more. If smoketest fails immediately after the first apply,
give it 5–10 minutes and re-run.
# After editing anything under schema-endpoints/authored/ or schema-endpoints/generated/:
python3 schema-endpoints/merge.py # refresh merged/, commit the diff
make live-update-testsliderule # verify + build + sync + invalidate
make smoketest-testslideruleDOMAIN,S3_BUCKET,DOMAIN_APEX— set by the per-env wrapper targets, or overrideable on the command line for ad-hoc deploys.DISTRIBUTION_ID— looked up from theDOMAINalias; no manual input.terraform/backend.tf— state is stored ins3://sliderule/tf-states/schema-server.tfstatewith per-domain workspaces, mirroring the web client's backend layout.