Skip to content

Store compact transcript.jsonl in v1 checkpoints, point metadata at it#1419

Draft
computermode wants to merge 1 commit into
mainfrom
push-compact-for-v1
Draft

Store compact transcript.jsonl in v1 checkpoints, point metadata at it#1419
computermode wants to merge 1 commit into
mainfrom
push-compact-for-v1

Conversation

@computermode

@computermode computermode commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

What

Committed checkpoints on entire/checkpoints/v1 now carry a compact transcript (transcript.jsonl) next to full.jsonl in each session directory, so it is pushed with the v1 branch (and mirrored by the v1.1 ref, which points at the same commit). The root metadata.json sessions[].transcript pointer targets transcript.jsonl when it was generated, falling back to full.jsonl otherwise (unparseable/external-agent transcripts, or checkpoints written by older CLI versions).

How

  • GitStore.writeTranscript (initial write) and replaceTranscript (finalization via UpdateCommitted) generate the compact form with the existing transcript/compact package and record it as a single blob. writeTranscript returns the filename the metadata pointer should target, so the pointer decision lives in one place.
  • The stored compact is pre-sliced to the checkpoint's own portion: compact.Compact runs with StartLine = checkpoint_transcript_start. This differs from the removed v2 (cumulative compact + compact-unit offset) deliberately — v1's checkpoint_transcript_start must keep full.jsonl units for CLI readers, and a pre-sliced file needs no offset to consume.
  • Generation is best-effort: failures are logged and never fail the checkpoint write. During finalization a failed regeneration keeps the previous transcript.jsonl so the pointer never dangles. Compacts cannot reuse the per-turn precomputed blobs since each checkpoint has its own start offset.
  • CLI read paths (rewind/resume/explain) are unaffected — they read full.jsonl by exact filename, not through the metadata pointer.

Reviewer notes

  • Server-side follow-up (entire.io repo, not this PR): offset-only agents (no transcript_identifier_at_start) get numerically sliced at storage time with full.jsonl-unit offsets, which is wrong against compact content. The server should skip numeric slicing when the session path ends in /transcript.jsonl (it already detects the unified format by filename).
  • Verified locally: mise run fmt, mise run lint (0 issues), mise run test (6222 tests), mise run test:integration (400 tests), mise run test:e2e:canary (vogon 59/59, roger-roger 4/4).

Note

Medium Risk
Changes committed checkpoint tree shape and metadata pointers (with best-effort fallback), affecting remote consumers that follow sessions[].transcript; CLI read paths stay on full.jsonl.

Overview
Committed v1 checkpoints now store transcript.jsonl next to full.jsonl in each session directory. GitStore.writeTranscript still writes chunked raw transcripts and content hashes, then best-effort builds a compact file via transcript/compact scoped with checkpoint_transcript_start. Root sessions[].transcript points at transcript.jsonl when generation succeeds, otherwise full.jsonl (older or non-compactable data).

UpdateCommitted / replaceTranscript load per-checkpoint start line and agent from session metadata, regenerate the compact blob on finalize, and leave an existing transcript.jsonl in place if regeneration fails so pointers stay valid. CLI rewind/resume/explain keep reading full.jsonl by filename.

Adds paths.CompactTranscriptFileName, unit tests for write/update/fallback/scoping, and architecture docs describing the dual-transcript layout.

Reviewed by Cursor Bugbot for commit afd9428. Configure here.

Committed checkpoint writes now generate a compact transcript
(transcript.jsonl) from the full transcript via transcript/compact and
store it in the session directory next to full.jsonl, so it is pushed
with the entire/checkpoints/v1 branch. The root metadata.json
sessions[].transcript pointer targets transcript.jsonl when it was
generated and falls back to full.jsonl otherwise (unparseable or
external-agent transcripts, checkpoints from older CLI versions).

Unlike the removed v2 (cumulative compact plus compact-unit offset),
the stored compact is pre-sliced to the checkpoint's own portion:
compact.Compact runs with StartLine = checkpoint_transcript_start.
v1's checkpoint_transcript_start must keep full.jsonl units for CLI
readers, so a pre-sliced file avoids a new offset field and is
self-describing for consumers. Generation is best-effort and never
fails the checkpoint write; finalization (UpdateCommitted) regenerates
the compact from the new content and keeps the previous one on
generation failure so the metadata pointer never dangles. CLI read
paths (rewind/resume/explain) are unaffected: they read full.jsonl by
filename, not through the metadata pointer.

Entire-Checkpoint: 63b41777384a
Copilot AI review requested due to automatic review settings June 11, 2026 07:12

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit afd9428. Configure here.

agentType = sessionMeta.Agent
}
}
if err := s.replaceTranscript(ctx, opts.Transcript, agentType, startLine, opts.PrecomputedBlobs, sessionPath, entries); err != nil {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary pointer not updated

Medium Severity

UpdateCommitted regenerates transcript.jsonl via replaceTranscript but never updates the checkpoint root CheckpointSummary sessions[].transcript path. If initial WriteCommitted pointed at full.jsonl because compaction failed or the transcript was not yet compactable, finalize can add a valid transcript.jsonl while metadata still points at full.jsonl.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit afd9428. Configure here.

// Regenerate the compact transcript from the new content. Best-effort: on
// generation failure the previous transcript.jsonl entry (if any) is left
// in place so the metadata.json transcript pointer never dangles.
s.writeCompactTranscript(ctx, agentType, startLine, transcript.Bytes(), sessionPath, entries)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codex sanitize skipped on finalize

Medium Severity

Initial committed writes run Codex transcripts through codex.SanitizePortableTranscript before compaction, but replaceTranscript passes raw bytes to writeCompactTranscript on finalize. Compact output after UpdateCommitted can diverge from the initial write path or fail where the initial write succeeded.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit afd9428. Configure here.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a compact, checkpoint-scoped transcript artifact to v1 committed checkpoints and updates the checkpoint summary metadata to prefer that compact transcript when it is successfully generated, while preserving existing CLI read behavior that continues to read full.jsonl by filename.

Changes:

  • Store transcript.jsonl (compact, pre-sliced to checkpoint_transcript_start) alongside full.jsonl for each committed v1 session, best-effort.
  • Point metadata.json sessions[].transcript at transcript.jsonl when compact generation succeeds, otherwise fall back to full.jsonl.
  • Add tests and documentation clarifying the dual-transcript layout and pointer semantics.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.

Show a summary per file
File Description
docs/architecture/sessions-and-checkpoints.md Documents the new dual-transcript checkpoint layout and metadata pointer behavior.
cmd/entire/cli/paths/paths.go Introduces CompactTranscriptFileName constant for transcript.jsonl.
cmd/entire/cli/checkpoint/committed.go Generates/stores compact transcripts on write/finalize and sets metadata pointer accordingly (best-effort).
cmd/entire/cli/checkpoint/committed_compact_transcript_test.go Adds unit tests for compact transcript write, scoping, fallback, and regeneration on finalize.
cmd/entire/cli/checkpoint/checkpoint.go Documents SessionFilePaths.Transcript pointer semantics and updates checkpoint tree shape comments.
CLAUDE.md Updates architecture notes for manual-commit strategy to include compact transcript behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants