Skip to content

Latest commit

 

History

History
124 lines (100 loc) · 8.09 KB

File metadata and controls

124 lines (100 loc) · 8.09 KB

Incremental baking — gallery & listing regeneration design

Status: design / not implemented. Companion to architecture.md and the living-site direction. Captures the decision to avoid full-gallery regeneration when a sprite is added or removed, so the event-driven re-bake loop (DynamoDB Streams → render Lambda → S3) can update incrementally instead of rebuilding ~4,100 pages per change. Raised 2026-06-03.

Problem

The gallery uses offset pagination over a newest-first list: page N = items[(N-1)*60 … N*60]. Inserting one sprite at the front shifts every item down a slot, so every page's contents change and the last-page count can flip. One add/remove ⇒ re-bake all gallery pages. The page boundaries are a sliding window that moves under everything. This is fine for a one-shot full build; it's incompatible with per-event incremental re-baking.

What's already fine (don't change)

  • Per-sprite pages — independent; add/remove re-bakes exactly one. They are the SEO asset and are reached via the sitemap, not via gallery pagination.
  • Tag and profile listings — naturally localized: adding a sprite only shifts its tags' pages and its owner's profile, not the global list. Bounded re-bake; acceptable as-is.
  • The global gallery is the pathological case (every add affects it). That's the target of this doc.

Goal

Add/remove a sprite ⇒ a bounded, O(1) set of page re-bakes, so the Streams→Lambda loop's fan-out map is small.

Options

A. Fixed-boundary stable pages (fully static)

  • Assign each sprite a dense, append-only sequence at creation (archive: freeze once by id-order; new sprites: next counter value). Page = floor(seq / 60), permanent.
  • Never repack on delete — a removed sprite leaves its page slightly under-full (no shift).
  • Add ⇒ only the current top page changes (or a new top page rolls over). Remove ⇒ only that one page.
  • Bonus: every sprite's gallery-page URL becomes permanent (no listing-URL churn → SEO-stable).
  • Cost: page numbers run oldest→newest (page 1 = oldest). Browse newest-first by entering at the highest page and paginating down; relabel controls Newest / Newer / Older / Oldest instead of First/Last/numbers.
  • Subtlety: don't hardcode the changing max page in the pagination bar, or old pages go stale when a new top page appears. Make "Newest" a relative link to /sprites/ (a tiny landing that points at the current top page and re-bakes on add); the logarithmic neighbors (±10, …) are stable numbers. Then old pages are truly immutable.

B. Bake page-1 + cursor-fed deep pages (partly dynamic)

  • Pre-bake only the newest page (/sprites/, re-baked per add — cheap, SEO-fast).
  • Serve deeper pages dynamically via DynamoDB cursor pagination (Query + LastEvaluatedKey on a GSI keyed by created/id): either rendered on-demand by the Lambda and cached to S3/CloudFront (ISR-on-demand) or client-fetched (infinite scroll).
  • No deep static pages ⇒ no cascade, and you stop pre-baking ~4,100 pages. Crawl coverage is unaffected because the sitemap already lists every per-sprite URL — deep gallery pagination isn't the crawl path.
  • Cost: deep pages aren't pre-warmed (first hit renders) and carry little SEO weight (acceptable).

C. Hybrid

Stable head (re-bake the first 1–2 pages on add) + fixed-boundary or on-demand for the tail.

D. Date-based buckets (year / month)

URLs like /sprites/2024/09/ — each sprite lives in its creation-month bucket forever (creation date is immutable). Add ⇒ re-bakes only the current month (+ its within-month sub-pages if it overflows 60) and the index; past months are frozen. Remove ⇒ only that sprite's month.

  • Pros: human-meaningful, SEO-friendly, immutable past, natural archive browsing (the blog-archive pattern); effectively a concrete instantiation of A's fixed boundaries with a date key instead of an assigned sequence (no counter to manage).
  • Cons: uneven bucket sizes — big months still need within-month pagination, but only the current month ever shifts (past months frozen); needs a reliable creation date per sprite (most have one from CF-log first-seen / the 2017 DB; the ~135k untitled have approximate dates); a flat newest-first scroll must merge buckets at the index; dateless sprites need a fallback bucket.

Event-loop fan-out (the payoff)

With a fix in place, the re-bake set for an add/remove is bounded: { the sprite's page, the top gallery page (or page-1), /sprites/ landing, its tag pages, its owner's profile page }. With today's offset pagination the set is "all gallery pages" — untenable for per-event re-baking.

Moderation ties in

Takedowns and adult-gating are just DynamoDB writes (delete / set status/adult) → the same Streams→re-bake loop → the same fan-out. removed.tsv/adult.tsv remain the build-time mechanism for the frozen ndjson archive; living content uses per-item Dynamo status. The render components are moderation-agnostic (the adult gate is a prop; comments are whatever the data layer passes). See the moderation notes in TODO.md and the living-site discussion.

Current leanings (to validate during exploration — NOT final)

  • Stay fully static → Option A or D (fixed-boundary / date-bucketed stable pages + a relative /sprites/ landing): everything pre-baked, permanent per-sprite listing URLs, O(1) re-bake. D (year/month) is the more human-/SEO-friendly instantiation when creation dates are reliable.
  • Comfortable with a dynamic deep gallery → Option B (page-1 + cursor feed): less generated HTML, leans on Dynamo's native cursor pagination, and matches the fact that the sitemap (not gallery pages) is what Google actually crawls.

Keep tag/profile listings on offset pagination (already localized); apply a stable-boundary/date scheme only to a tag or user that grows pathologically large.

Exploration step (do this before committing)

This is not decided, and the option list above is not exhaustive. Before implementing:

  1. Brainstorm more options. Treat A–D as a starting set, not a menu — actively look for other schemes and record them here. Seeds: content-addressed/hashed buckets; a "recent N + frozen archive" split; reverse-chronological keyset/cursor with a baked head; append-only feed files the client stitches; CDN edge assembly (compose pages at the edge from fragments); a hybrid of date buckets + a rolling recent page; etc.
  2. Pin the evaluation criteria. At least: re-bake fan-out size per add/remove; crawl/SEO behaviour and URL stability/permanence; browse UX (newest-first + deep navigation); build cost (full) + per-event cost; runtime/cold-start cost if dynamic; data requirements (reliable dates? an assigned seq? a GSI?); and complexity / failure modes.
  3. Spike the leading candidates against the real dataset and the DynamoDB-Streams → render-Lambda loop: measure the actual re-bake fan-out and page counts, and sanity-check the browse UX + crawl path. Reuse the existing parity harness (diff vs generate.mjs) wherever output should still match.
  4. Then decide — record the choice + rationale here, and resolve the specifics:
    • bucket/sequence key (date vs assigned seq), where it's stored/assigned, archive backfill, and a fallback bucket for dateless sprites.
    • deep gallery: static / on-demand-cached / client infinite-scroll.
    • control relabeling + canonical/SEO policy for listing pages (esp. any oldest=page-1 inversion).
    • threshold for applying the scheme to a hot tag/user.

Current code (the starting point)

render-app/src/generate.civet paginates the gallery with offset pages via Listing.astro + Pagination.astro. Per-sprite, tag, and profile re-bakes are already localized; the global gallery pagination is what this redesign replaces. The event Lambda (render-app/src/handler.civet) shares the same Astro components, so whatever scheme is chosen is rendered identically in batch and per-event.