Skip to content

[bug]Preserve Claude subagent token totals during condensation#1394

Open
stale2000 wants to merge 1 commit into
entireio:mainfrom
stale2000:stale2000/condense-subagent-tokens
Open

[bug]Preserve Claude subagent token totals during condensation#1394
stale2000 wants to merge 1 commit into
entireio:mainfrom
stale2000:stale2000/condense-subagent-tokens

Conversation

@stale2000

@stale2000 stale2000 commented Jun 9, 2026

Copy link
Copy Markdown

Entire Logs: https://entire.io/gh/stale2000/cli/session/019eaa64-5270-74c0-9f8a-e4df88c7014e
SUMMARY:
After finding and making the fix, I noticed the log saying that the "bug" might be intentional? Or it is unknown why it is written this way?

I did an analysis as for why it might have been written this way and got the following response that implies that its indeed a bug, and the previous implementation didn't know how to solve it. But this PR does fix the issue.

Explanation/response to the TODO statement:
"
So the nuanced answer is:

For live transcript condensation: bug.
For shadow-only fallback with no live path: empty dir may be unavoidable.
The TODO was probably someone noticing the bug-prone gap during cleanup but not resolving the live-vs-shadow distinction.
Our fix handles exactly that: derive the dir when a live transcript path exists, otherwise helper returns "".
"

Summary

This fixes a metadata loss bug in checkpoint condensation for Claude Code sessions that spawn Task subagents.

Before this change, live session handling could see subagent token usage, but the later condensation step recalculated token usage without passing the session's subagent transcript directory. As a result, SubagentTokens were silently dropped from the committed checkpoint metadata written to entire/checkpoints/v1.

Issue

Claude Code stores Task subagent transcripts separately from the main transcript:

<transcriptDir>/<sessionID>/subagents/agent-<agentID>.jsonl

The lifecycle path already knows that convention and passes the subagent directory into token/file extraction. Condensation did not. Both condensation paths called:

agent.CalculateTokenUsage(ctx, ag, data.Transcript, checkpointTranscriptStart, "")

For subagent-aware agents, the empty directory intentionally disables subagent transcript reads. That means the main transcript token totals survived, but spawned subagent token totals disappeared when checkpoint metadata was committed.

How to reproduce

One reproducible shape is:

  1. Enable Entire in a repo and start a Claude Code session.

  2. Have Claude invoke the Task tool so the main transcript contains a Task tool_use and a corresponding tool_result with an agentId, for example agentId: sub1.

  3. Ensure the subagent transcript exists at:

    <main-transcript-dir>/<sessionID>/subagents/agent-sub1.jsonl
    

    and contains assistant usage metadata.

  4. Let the normal stop hook run. At this point live session state can include subagent token usage because the lifecycle path passes the subagent directory.

  5. Make a user commit so the post-commit hook condenses the session into committed checkpoint metadata.

  6. Inspect the committed checkpoint metadata from entire/checkpoints/v1.

Before this fix, the condensed checkpoint metadata had main token usage but no subagent_tokens field, even though the subagent transcript was present and readable.

The added regression test builds that shape directly: a main Claude transcript with a Task result referencing sub1, a matching agent-sub1.jsonl subagent transcript, and a condensation call that verifies SubagentTokens survive in committed metadata.

Impact

This affects checkpoint metadata for Claude Code sessions that use Task subagents.

Observed effects:

  • entire/checkpoints/v1 underreports total work done by spawned subagents.
  • Later checkpoint consumers such as activity, explain/status views, or metadata sync consumers cannot recover those subagent token totals from the committed metadata.
  • The bug is silent: checkpoint writes succeed, but the committed metadata is incomplete.

The code path is shared through the SubagentAwareExtractor token interface, so passing the directory also keeps condensation aligned with other subagent-aware implementations that use the same transcript layout convention.

Fix

  • Add subagentsDirForTranscript(transcriptPath, sessionID) in the strategy package.
  • Pass the derived subagent directory into agent.CalculateTokenUsage from both condensation paths:
    • shadow-branch extraction with live-transcript preference
    • direct live-transcript extraction
  • Add TestCondenseSession_ClaudeSubagentTokenUsage to lock the end-to-end condensation behavior.

Verification

go test ./cmd/entire/cli/strategy -run 'TestCondenseSession' -count=1
go test ./cmd/entire/cli/agent/claudecode -run 'TestCalculateTotalTokenUsage|TestExtractAllModifiedFiles' -count=1
mise run lint

@stale2000 stale2000 requested a review from a team as a code owner June 9, 2026 04:20
@stale2000 stale2000 force-pushed the stale2000/condense-subagent-tokens branch from 39234cb to a0b732a Compare June 9, 2026 04:22
Condensation was recalculating Claude Code token usage without the session subagent transcript directory, so metadata written on user commits dropped Task-spawned token totals even though live session state could see them. The fix reuses the transcript-dir/session-id convention already used by lifecycle hooks and locks the behavior with a focused condensation regression.

Constraint: Condensation must keep full-session raw transcripts while scoping token usage to CheckpointTranscriptStart.

Rejected: Rework token parsing or checkpoint storage | the loss was caused by a missing subagent transcript directory argument, not the parser or metadata schema.

Confidence: high

Scope-risk: narrow

Directive: Keep condensation subagent path derivation aligned with lifecycle hook path derivation when changing transcript layouts.

Tested: go test ./cmd/entire/cli/strategy -run 'TestCondenseSession' -count=1

Tested: go test ./cmd/entire/cli/agent/claudecode -run 'TestCalculateTotalTokenUsage|TestExtractAllModifiedFiles' -count=1

Tested: mise run lint
Entire-Checkpoint: 6656450350ce
@stale2000 stale2000 force-pushed the stale2000/condense-subagent-tokens branch from a0b732a to 3a07223 Compare June 9, 2026 04:22
// extract them from offset 0; consumers can filter by checkpoint_transcript_start
// if they only render the checkpoint-scoped slice.
if len(data.Transcript) > 0 {
data.TokenUsage = agent.CalculateTokenUsage(ctx, ag, data.Transcript, checkpointTranscriptStart, "") //TODO: why do we not use here subagents dir?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Answer:

"So the nuanced answer is:

For live transcript condensation: bug.

For shadow-only fallback with no live path: empty dir may be unavoidable.

The TODO was probably someone noticing the bug-prone gap during cleanup but not resolving the live-vs-shadow distinction.
Our fix handles exactly that: derive the dir when a live transcript path exists, otherwise helper returns "".

"

@stale2000 stale2000 changed the title Preserve Claude subagent token totals during condensation [bug]Preserve Claude subagent token totals during condensation Jun 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant