Store compact transcript.jsonl in v1 checkpoints, point metadata at it#1419
Store compact transcript.jsonl in v1 checkpoints, point metadata at it#1419computermode wants to merge 1 commit into
Conversation
Committed checkpoint writes now generate a compact transcript (transcript.jsonl) from the full transcript via transcript/compact and store it in the session directory next to full.jsonl, so it is pushed with the entire/checkpoints/v1 branch. The root metadata.json sessions[].transcript pointer targets transcript.jsonl when it was generated and falls back to full.jsonl otherwise (unparseable or external-agent transcripts, checkpoints from older CLI versions). Unlike the removed v2 (cumulative compact plus compact-unit offset), the stored compact is pre-sliced to the checkpoint's own portion: compact.Compact runs with StartLine = checkpoint_transcript_start. v1's checkpoint_transcript_start must keep full.jsonl units for CLI readers, so a pre-sliced file avoids a new offset field and is self-describing for consumers. Generation is best-effort and never fails the checkpoint write; finalization (UpdateCommitted) regenerates the compact from the new content and keeps the previous one on generation failure so the metadata pointer never dangles. CLI read paths (rewind/resume/explain) are unaffected: they read full.jsonl by filename, not through the metadata pointer. Entire-Checkpoint: 63b41777384a
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Comment @cursor review or bugbot run to trigger another review on this PR
Reviewed by Cursor Bugbot for commit afd9428. Configure here.
| agentType = sessionMeta.Agent | ||
| } | ||
| } | ||
| if err := s.replaceTranscript(ctx, opts.Transcript, agentType, startLine, opts.PrecomputedBlobs, sessionPath, entries); err != nil { |
There was a problem hiding this comment.
Summary pointer not updated
Medium Severity
UpdateCommitted regenerates transcript.jsonl via replaceTranscript but never updates the checkpoint root CheckpointSummary sessions[].transcript path. If initial WriteCommitted pointed at full.jsonl because compaction failed or the transcript was not yet compactable, finalize can add a valid transcript.jsonl while metadata still points at full.jsonl.
Reviewed by Cursor Bugbot for commit afd9428. Configure here.
| // Regenerate the compact transcript from the new content. Best-effort: on | ||
| // generation failure the previous transcript.jsonl entry (if any) is left | ||
| // in place so the metadata.json transcript pointer never dangles. | ||
| s.writeCompactTranscript(ctx, agentType, startLine, transcript.Bytes(), sessionPath, entries) |
There was a problem hiding this comment.
Codex sanitize skipped on finalize
Medium Severity
Initial committed writes run Codex transcripts through codex.SanitizePortableTranscript before compaction, but replaceTranscript passes raw bytes to writeCompactTranscript on finalize. Compact output after UpdateCommitted can diverge from the initial write path or fail where the initial write succeeded.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit afd9428. Configure here.
There was a problem hiding this comment.
Pull request overview
Adds a compact, checkpoint-scoped transcript artifact to v1 committed checkpoints and updates the checkpoint summary metadata to prefer that compact transcript when it is successfully generated, while preserving existing CLI read behavior that continues to read full.jsonl by filename.
Changes:
- Store
transcript.jsonl(compact, pre-sliced tocheckpoint_transcript_start) alongsidefull.jsonlfor each committed v1 session, best-effort. - Point
metadata.jsonsessions[].transcriptattranscript.jsonlwhen compact generation succeeds, otherwise fall back tofull.jsonl. - Add tests and documentation clarifying the dual-transcript layout and pointer semantics.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| docs/architecture/sessions-and-checkpoints.md | Documents the new dual-transcript checkpoint layout and metadata pointer behavior. |
| cmd/entire/cli/paths/paths.go | Introduces CompactTranscriptFileName constant for transcript.jsonl. |
| cmd/entire/cli/checkpoint/committed.go | Generates/stores compact transcripts on write/finalize and sets metadata pointer accordingly (best-effort). |
| cmd/entire/cli/checkpoint/committed_compact_transcript_test.go | Adds unit tests for compact transcript write, scoping, fallback, and regeneration on finalize. |
| cmd/entire/cli/checkpoint/checkpoint.go | Documents SessionFilePaths.Transcript pointer semantics and updates checkpoint tree shape comments. |
| CLAUDE.md | Updates architecture notes for manual-commit strategy to include compact transcript behavior. |


What
Committed checkpoints on
entire/checkpoints/v1now carry a compact transcript (transcript.jsonl) next tofull.jsonlin each session directory, so it is pushed with the v1 branch (and mirrored by the v1.1 ref, which points at the same commit). The rootmetadata.jsonsessions[].transcriptpointer targetstranscript.jsonlwhen it was generated, falling back tofull.jsonlotherwise (unparseable/external-agent transcripts, or checkpoints written by older CLI versions).How
GitStore.writeTranscript(initial write) andreplaceTranscript(finalization viaUpdateCommitted) generate the compact form with the existingtranscript/compactpackage and record it as a single blob.writeTranscriptreturns the filename the metadata pointer should target, so the pointer decision lives in one place.compact.Compactruns withStartLine = checkpoint_transcript_start. This differs from the removed v2 (cumulative compact + compact-unit offset) deliberately — v1'scheckpoint_transcript_startmust keepfull.jsonlunits for CLI readers, and a pre-sliced file needs no offset to consume.transcript.jsonlso the pointer never dangles. Compacts cannot reuse the per-turn precomputed blobs since each checkpoint has its own start offset.full.jsonlby exact filename, not through the metadata pointer.Reviewer notes
transcript_identifier_at_start) get numerically sliced at storage time withfull.jsonl-unit offsets, which is wrong against compact content. The server should skip numeric slicing when the session path ends in/transcript.jsonl(it already detects the unified format by filename).mise run fmt,mise run lint(0 issues),mise run test(6222 tests),mise run test:integration(400 tests),mise run test:e2e:canary(vogon 59/59, roger-roger 4/4).Note
Medium Risk
Changes committed checkpoint tree shape and metadata pointers (with best-effort fallback), affecting remote consumers that follow
sessions[].transcript; CLI read paths stay onfull.jsonl.Overview
Committed v1 checkpoints now store
transcript.jsonlnext tofull.jsonlin each session directory.GitStore.writeTranscriptstill writes chunked raw transcripts and content hashes, then best-effort builds a compact file viatranscript/compactscoped withcheckpoint_transcript_start. Rootsessions[].transcriptpoints attranscript.jsonlwhen generation succeeds, otherwisefull.jsonl(older or non-compactable data).UpdateCommitted/replaceTranscriptload per-checkpoint start line and agent from session metadata, regenerate the compact blob on finalize, and leave an existingtranscript.jsonlin place if regeneration fails so pointers stay valid. CLI rewind/resume/explain keep readingfull.jsonlby filename.Adds
paths.CompactTranscriptFileName, unit tests for write/update/fallback/scoping, and architecture docs describing the dual-transcript layout.Reviewed by Cursor Bugbot for commit afd9428. Configure here.