Skip to content

Latest commit

 

History

History
60 lines (50 loc) · 12.8 KB

File metadata and controls

60 lines (50 loc) · 12.8 KB

TODO

  • Remove old rails code (branch remove-rails-app; deleted 232 files — app/config/test/public/db/bin/etc. + Gemfile. Kept infra/, static-site/, docs/. Clears 70/71 Dependabot alerts.)
  • Explore the 8 archived Heroku apps in more detail — decide what (if anything) is worth reviving or mirroring to GitHub. See docs/heroku-source-archive.md.

Static archive — post-launch follow-ups

Phase 1 is live on https://pixieengine.com (cutover 2026-06-01). See docs/deploy.md, infra/, static-site/.

Cleanup

  • Delete the staging bucket pixieengine-com-site now that content lives in pixieengine-static. (Verified no CloudFront dist used it as an origin; deleted via CloudShell 2026-06-02.)
  • Delete the pixieengine-audit IAM user (discovery complete). See docs/aws-inventory.md §6. (defer, this user has come in handy while exploring)
  • Confirm www.pixieengine.com still redirects to the apex (separate dist E1YLL4NVXDIDA5, untouched by the cutover).

Deploy

  • Deployed attribution + moderation (2026-06-02). generate.mjsaws s3 sync --delete (full, then --size-only delta: 3,217 up / 400 del) → CloudFront invalidation (E2QQUW2BPHXXNP /*). Live-verified: removed→404, adult→gate+noindex, attribution rel=author live, normal→200, sitemap→200. This was also the first deploy of the attribution rollout (creator links + 11,876 profile pages). Gotcha logged: don't pipe the sync to tail (truncates the count log); and re-generate after any further moderation before syncing. --size-only is unsafe for paginated listings — when the page count changes digit-preserving (e.g. last-page 41154108), page byte size is identical so --size-only skips it, leaving stale "Last »" links. Use a default (mtime) sync after a regen, or it desyncs the gallery/tag/profile pagination.

Notes from moderation

  • Not a bug: .../original. images (~3,166). The S3 object is literally named original. (trailing dot, no extension) and returns 200 image/png — the live site displays these correctly. The review tool showed "no image" because the server hardcoded original.png; fixed to use the dataset's real img field (+ a thumb.png fallback). Likely uploads whose source file had no extension.

Content / SEO

  • User attribution + profile pages (2026-06-02). /<display_name>/ profile pages (×11,896, paginated) with avatar + bio + their sprites/tunes; sprite/tune cards now link to the creator. ~104,989 sprites attributed (rest are post-2017 S3-only → Anonymous). Display_name + public bio + CDN-verified avatar only — no email/hash/token. Revises architecture decision #4 (was "no listing"); now consistent with the published recovered comment handles. Source: static-site/extract-attribution.shbuild/{users.ndjson,sprite_owners.tsv,tune_owners.tsv}. No infra/functions/rewrite.js change (profiles ride the dir→index.html rule). Not yet deployed — needs aws s3 sync (most HTML changed) + a broad CloudFront invalidation.
  • Submit https://pixieengine.com/sitemap.xml to Google Search Console. (sitemap index → 6 children, 252,566 URLs, all 200/valid; robots.txt advertises it)
  • Recover titles for the ~135k still-untitled sprites (more CF-log / Wayback mining) → re-merge via static-site/build-dataset.mjs, regenerate, re-sync. See docs/wayback-recovery.md, docs/cloudfront-log-recovery.md.
  • (Optional) Return a true 410 for missing sprites via Lambda@Edge — currently 404 (CloudFront custom-error responses can't emit 410; both de-index cleanly).

Performance / CDN

  • Drop the id % 4 CDN domain sharding. build-dataset.mjs:36, generate.mjs:236, and the client loader (generate.mjs:528) spread image/avatar URLs across 0–3.pixiecdn.com — an HTTP/1.1-era trick for more parallel connections. All four hostnames already serve HTTP/2 (dist E30UBGU2BPKA0U) and point at the same distribution/origin, so sharding now costs up to 4× the DNS+TCP+TLS handshakes and breaks HTTP/2 connection reuse + HPACK sharing. Fix is generator-only (regenerate HTML, no image migration): collapse ${id % 4} to a single existing host (e.g. always 0.pixiecdn.com) — reusing one of the four avoids any new alias/ACM cert. Then re-generate + s3 sync + invalidate.
  • Enable HTTP/3 on the HTTP/2-only dists. Static site E2QQUW2BPHXXNP and image CDN E30UBGU2BPKA0U are HTTP2 only; others in the account already run HTTP2and3. Static dist = one CDK line in infra/lib/pixie-static-stack.civet (httpVersion: cf.HttpVersion.HTTP2_AND_3 in the Distribution props). Image dist is legacy/not in CDK → bump via console or update-distribution.

Security

  • Dependabot: all 71 open alerts (incl. all 3 critical / 17 high) are in the Rails Gemfile.lock — auto-resolve once remove-rails-app merges. The lone npm alert (#163, brace-expansion 5.0.5, GHSA-jxxr-4gwj-5jf2, moderate) was dismissed (tolerable_risk): bundled inside aws-cdk-lib@2.257.0 (latest) via minimatch — overrides/audit fix can't rewrite a bundled dep, no upstream patch yet, build-time-only CDK dep with no real exposure. Will auto-resolve on npm install once AWS bundles brace-expansion ≥ 5.0.6.
  • Rotate exposed prod secrets (see docs/aws-inventory.md §6):
    • stay-pegged AWS keys (local [default]) — rotated.
    • GitHub token — verified dead (legacy 40-char classic PAT, returns 401; auto-expired/revoked). No action needed.
    • ADMIN_CODEmoot: Heroku app is at 0 dynos with no DB, and nothing in the static archive reads it. Dies with the app.

Future

  • Phase 2 — whimsy progressive enhancement: Cognito login + favorites/comments via api-whimsy-space + Briefcase S3. See docs/architecture.md.
  • Design the incremental gallery / baking system — so adding or removing a sprite doesn't re-bake every gallery page (offset-pagination cascade: front-insert shifts all ~4,100 pages). For the event-driven re-bake loop (DynamoDB Streams → render Lambda → S3).
    • Exploration step (do first): brainstorm options beyond the starting set (list is NOT exhaustive), pin evaluation criteria (re-bake fan-out, SEO/URL stability, UX, build/per-event cost, data needs), spike the leaders against real data + the Streams→Lambda loop, then decide.
    • Starting options: (A) fixed-boundary stable pages (floor(seq/60), no repack, oldest=page-1, browse newest-first via Newest/Older, relative /sprites/ landing); (B) bake page-1 + cursor-fed deep gallery (Dynamo Query+LastEvaluatedKey; sitemap is the crawl path); (D) year/month date-based buckets (/sprites/2024/09/, immutable past months); + others to brainstorm. Tag/profile listings already localized. Full writeup + tradeoffs: docs/incremental-baking.md.
  • Moderation: reactive takedown process for the ~105k post-2017 sprites.
    • Removal mechanism (2026-06-02). static-site/removed.tsv (tracked, source of truth) + remove.mjs appender; generate.mjs enforces it — removed sprites/tunes get no page (S3 404 → /410 "Gone") and drop from every gallery/tag/profile/comment/sitemap surface; a user row removes the profile, de-attributes their art to Anonymous, and scrubs their comment handle. Verified end-to-end. Runbook (incl. urgent CDN-image deletion + NCMEC note): static-site/README.md "Removing content".
    • Per-comment removal (2026-06-02). comment type in removed.tsv, addressed by <spriteId>#<hash> (stable content hash via comment-key.mjs, shared by generator + tool so it can't drift). node remove.mjs comment <spriteId> lists a thread's keys; the removal drops one comment's body+handle while the rest of the thread survives. Closes the gap where a user removal scrubbed only the handle, not an abusive comment body. Verified end-to-end.
    • Intake: abuse@ address + per-page "Report" link (mailto / reuse the Feedback form).
    • Proactive scan of the unreviewed content:
      • Text-signal scan (2026-06-02). moderation/scan-text.mjs + moderation/terms.tsvbuild/scan-candidates.tsv (ranked review queue). First run flagged 320 items (sev3=5 csai, sev2=289 sexual/hate, sev1=26), incl. comment-handle/body hits. Covers the 119,666 sprites (48%) with any title/tags/description — incl. 87% of the 105,542 unreviewed (CF-log-recovered titles). 5 sev-3 to action first: sprites 139734, 153693 ("pedo bear"), 185865, 185897, 258085 ("loli").
      • Image classifier scan — REQUIRED for the 128,509 sprites (52%) with no text signal (incl. 13,744 unreviewed). Plan: NSFWJS (tfjs-node, local/free) PoC on the text-flagged set + a random blind-set sample to gauge accuracy on 64×64 pixel art, then full run. Caveat: NSFW classifiers are photo-trained; pixel-art recall is unproven. Output appends to build/scan-candidates.tsv → flows into the same review tool.
      • CSAM hash-matching — the real legal exposure; a generic NSFW model does NOT detect it. Needs PhotoDNA / NCMEC / Cloudflare CSAM tool (authorized enrollment). Separate track.
      • Replay scan for uploads/empties (2026-06-02). moderation/scan-replay.mjs — fetches replay.json, counts ops (v0 stroke-array / v1 {history}); flags empty (0 ops), single-stroke/single-op/single-resize-upload (one paste), large-canvas (>256px), no-replay. No image decode needed; resumable cache (build/replay-scan-cache.ndjson); output → build/replay-candidates.tsv → review with node moderation/review.mjs build/replay-candidates.tsv. PoC (600 ids): 284 drawn, 245 pre-replay, 71 flagged (~20% of replay-era sprites). Precision strong on large uploads (1190×1540, 1280×720…) + empties; small single-op lower-confidence (review-gated). Covers the post-2017 (id ≳110k) upload-prone set.
        • Full replay scan done (2026-06-02). 149,446 ids; 121,474 drawn. 25,601 flagged: empty 3,884 (v0, truly blank, sev2 trash), no-edits 5,927 (v1 history=0 — may be uploads, sev1), large-canvas 5,885, single-resize-upload 5,491, single-op 2,484, single-stroke 1,747, no-replay 183. Key finding: v1 history=0 ≠ blank (initialState may hold an upload) — only v0-empty is auto-trash. Formats + this distinction documented in docs/replay-format.md. Output: build/replay-candidates.tsvnode moderation/review.mjs build/replay-candidates.tsv.
        • Review the replay candidatesupload/no-edits on merit (uploads ≠ bad).
      • Pixel-decode emptiness verify (2026-06-02). The replay empty signal (v0 ops=0) over-flags badly — pixel decode showed only 643 of 3,884 (17%) are truly empty (512 transparent + 131 uniform); 3,241 (83%) actually have content. Added zero-dep PNG decoder moderation/png-decode.mjs (node:zlib, 8-bit non-interlaced, color types 0/2/3/4/6) + moderation/check-empty.mjsbuild/empty-verified.tsv (real empties) + build/empty-content.tsv (false empties → merit review). Lesson: replay signals need pixel confirmation before any "trash" claim. Decoder is reusable for the broader image-decode pass.
      • Review-tool scroll-jump fixed (2026-06-02)render() only resets scroll on page/filter change, preserves position after a decision.
      • Image-decode pass (optional follow-up) — for scribble detection + the pre-replay (<110k) set + photo pixel-distribution. Zero-dep PNG decoder (node:zlib) → coverage % / distinct colors / continuous-tone. Heavier (decode ~248k). Decisions: dep (hand-roll vs sharp/pngjs) + scope. Lower priority now that replays cover uploads/empties.
    • Review tool (2026-06-02). moderation/review.mjs (local server) + review.html — 100-at-a-time grid, arrow-key paging, integer-scaled native pixel art, comment/user views, Valid removal / 🔞 Adult / False positive buttons + multi-select & bulk (filter → select-all-shown → bulk-decide). Valid → removed.tsv; adult → adult.tsv; every decision → moderation/reviewed.tsv (FP allowlist, so dismissed items don't resurface). Lists aws s3 rm for valid sprite removals. Loads the scan queue or any id list. Shared removed-list.mjs / comment-key.mjs keep formats from drifting. First session: 269 removals, 50 FPs.
    • Adult (18+) gating (2026-06-02). "Tasteful NSFW" allowed but age-restricted. adult.tsv (tracked, shares the list machinery); generate.mjs keeps the sprite page but adds noindex + a fail-closed self-attestation interstitial + 🔞 badge, drops og:image/JSON-LD, and excludes it from galleries/tags/profiles/sitemap (listable). Verified end-to-end. Static stopgap only — real "logged-in + over-18" enforcement needs the Phase-2 dynamic layer (Cognito); adult.tsv is the input that will drive it. Open follow-up: decide whether adult shows blurred-in-gallery vs fully hidden (currently hidden).
    • Triage/review tool (page through candidates by CF-log traffic → write removed.tsv).