feat: improve context management for large PRs by jwm4 · Pull Request #10 · ambient-code/review-roadmap

jwm4 · 2026-01-11T16:08:29Z

Summary

This PR fixes issues where the model would incorrectly flag files as 'missing' when they were either in the PR (but on page 2 of results) or pre-existing in the repository.

Changes

1. Fix GitHub API Pagination (root cause)

The PR files endpoint defaults to 30 files per page, but we weren't paginating. PRs with 31+ files were missing files from the analysis.

File: review_roadmap/github/client.py

Updated _fetch_file_diffs() to request 100 files per page and paginate through all results

2. Include Diff Content in LLM Context

The LLM was only seeing file names, not the actual code changes. This made it hard to understand imports and cross-file relationships.

File: review_roadmap/agent/nodes.py

Added _build_diffs_context() to include truncated diffs in the draft_roadmap prompt
Smart truncation: 1500 chars per file, 80K total budget

3. Better Context Expansion Prompt

Updated the prompt to explicitly tell the LLM it can fetch any file from the repo to verify imports exist.

Files: review_roadmap/agent/prompts.py, review_roadmap/agent/tools.py

Expanded CONTEXT_EXPANSION_SYSTEM_PROMPT with examples and clearer guidance
Updated read_file tool docstring to clarify scope

Testing

Tested against jwm4/temperature_alert#1 which has 39 files:

Before: Only fetched 30 files, model incorrectly warned about missing tools/temperature.py and tools/memory.py
After: Fetches all 39 files, model correctly includes all tool files in the roadmap

All 70 tests pass with 84% coverage.

- Fix GitHub API pagination for PR files endpoint (was missing files 31+) - Include actual diff content in draft_roadmap prompt for better code understanding - Enhance context_expansion prompt to verify imports from pre-existing files - Update read_file tool docstring to clarify it can fetch any repo file This fixes issues where the model would incorrectly flag files as 'missing' when they were either in the PR (but on page 2 of results) or pre-existing in the repository.

jwm4 · 2026-01-11T16:11:00Z

🗺️ Auto-Generated Review Roadmap

This roadmap was automatically generated by review-roadmap.
Model: anthropic-vertex/claude-opus-4-5

Review Roadmap: PR #10 - Improve Context Management for Large PRs

High-Level Summary

This PR addresses a critical bug where PRs with 31+ files were incomplete in analysis due to GitHub API pagination limits. It also enriches the LLM context by including actual diff content (not just file names) and improves the read_file tool documentation so the model understands it can fetch any repository file, not just PR files.

Three distinct changes:

Pagination fix in the GitHub client (root cause)
Diff content injection into the LLM prompt (context improvement)
Prompt/tool clarification for the context expansion phase

Recommended Review Order

Phase 1: Understand the Root Cause Fix

Start with the GitHub client since it's the foundational bug fix that the other changes build upon.

Order	File	Focus Area
1	review_roadmap/github/client.py	Pagination logic in `_fetch_file_diffs()`

What to verify:

Does the while True loop correctly break when receiving fewer results than per_page?
Is per_page=100 documented as GitHub's max?
Error handling: what happens if pagination fails mid-way?

Phase 2: Context Building Logic

Review how diffs are now formatted and injected into prompts.

Order	File	Focus Area
2	review_roadmap/agent/nodes.py	New `_build_diffs_context()` function starting around the new additions

What to verify:

Truncation constants: MAX_DIFF_CHARS = 1500 and MAX_TOTAL_DIFF_CHARS = 80000 — are these reasonable for typical LLM context windows?
Edge case: what happens when a single file exceeds the total budget?
The diff is truncated mid-content — does this risk cutting in the middle of a line/syntax?

Phase 3: Prompt & Tool Documentation

These are documentation-focused changes that tell the LLM how to behave.

Order	File	Focus Area
3	review_roadmap/agent/prompts.py	`CONTEXT_EXPANSION_SYSTEM_PROMPT` changes
4	review_roadmap/agent/tools.py	`read_file` docstring update

What to verify:

Prompt and tool docstring should be consistent (both now say "ANY file from the repository")
Examples in the prompt reference temperature_agent.tools.memory — is this from the test PR? Could be confusing if left in production

Watch Outs

🔴 Critical: Pagination Termination Logic

In client.py, verify the loop termination condition. The standard pattern is:

if len(files_data) < per_page:
    break

If this check is incorrect, you could either:

Exit too early (missing files again)
Loop infinitely on exact multiples of 100

🟡 Medium: Token Budget Accounting

In nodes.py, the 80K char budget (~20K tokens) is generous. Verify:

Is this budget only for diffs, or does it compete with other prompt sections?
What's the model's actual context limit (MAX_TOKENS = 4096 for output, but input could be much higher)?

🟡 Medium: Truncation Mid-Diff

When truncating at MAX_DIFF_CHARS = 1500, the code does:

diff = diff[:MAX_DIFF_CHARS] + f"\n... (truncated, {len(f.diff_content)} chars total)"

This could slice mid-line. Consider truncating at a newline boundary for cleaner context.

🟢 Low: Hardcoded Example in Prompt

The example temperature_agent.tools.memory in prompts.py is from the test PR. This is fine for illustrative purposes, but ensure it's clear this is just an example pattern.

Existing Discussions

No comments have been left on this PR yet. You'll be the first reviewer!

Quick Checklist

Pagination loop terminates correctly for PRs with exactly 100, 200, etc. files
Diffs context doesn't blow past model context limits when combined with other prompts
Error handling exists if GitHub API returns an error mid-pagination
read_file tool docstring matches CONTEXT_EXPANSION_SYSTEM_PROMPT guidance
Test coverage includes pagination edge cases (the PR claims 70 tests pass — check if pagination is covered)

jwm4 merged commit 194ca5f into main Jan 11, 2026
1 check passed

jwm4 deleted the feat/improve-context-management branch January 11, 2026 16:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: improve context management for large PRs#10

feat: improve context management for large PRs#10
jwm4 merged 1 commit intomainfrom
feat/improve-context-management

jwm4 commented Jan 11, 2026

Uh oh!

jwm4 commented Jan 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jwm4 commented Jan 11, 2026

Summary

Changes

1. Fix GitHub API Pagination (root cause)

2. Include Diff Content in LLM Context

3. Better Context Expansion Prompt

Testing

Uh oh!

jwm4 commented Jan 11, 2026

Review Roadmap: PR #10 - Improve Context Management for Large PRs

High-Level Summary

Recommended Review Order

Phase 1: Understand the Root Cause Fix

Phase 2: Context Building Logic

Phase 3: Prompt & Tool Documentation

Watch Outs

🔴 Critical: Pagination Termination Logic

🟡 Medium: Token Budget Accounting

🟡 Medium: Truncation Mid-Diff

🟢 Low: Hardcoded Example in Prompt

Existing Discussions

Quick Checklist

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant