feat: improve context management for large PRs#10
Conversation
- Fix GitHub API pagination for PR files endpoint (was missing files 31+) - Include actual diff content in draft_roadmap prompt for better code understanding - Enhance context_expansion prompt to verify imports from pre-existing files - Update read_file tool docstring to clarify it can fetch any repo file This fixes issues where the model would incorrectly flag files as 'missing' when they were either in the PR (but on page 2 of results) or pre-existing in the repository.
|
🗺️ Auto-Generated Review Roadmap
Review Roadmap: PR #10 - Improve Context Management for Large PRsHigh-Level SummaryThis PR addresses a critical bug where PRs with 31+ files were incomplete in analysis due to GitHub API pagination limits. It also enriches the LLM context by including actual diff content (not just file names) and improves the Three distinct changes:
Recommended Review OrderPhase 1: Understand the Root Cause FixStart with the GitHub client since it's the foundational bug fix that the other changes build upon.
What to verify:
Phase 2: Context Building LogicReview how diffs are now formatted and injected into prompts.
What to verify:
Phase 3: Prompt & Tool DocumentationThese are documentation-focused changes that tell the LLM how to behave.
What to verify:
Watch Outs🔴 Critical: Pagination Termination LogicIn client.py, verify the loop termination condition. The standard pattern is: if len(files_data) < per_page:
breakIf this check is incorrect, you could either:
🟡 Medium: Token Budget AccountingIn nodes.py, the 80K char budget (~20K tokens) is generous. Verify:
🟡 Medium: Truncation Mid-DiffWhen truncating at diff = diff[:MAX_DIFF_CHARS] + f"\n... (truncated, {len(f.diff_content)} chars total)"This could slice mid-line. Consider truncating at a newline boundary for cleaner context. 🟢 Low: Hardcoded Example in PromptThe example Existing DiscussionsNo comments have been left on this PR yet. You'll be the first reviewer! Quick Checklist
|
Summary
This PR fixes issues where the model would incorrectly flag files as 'missing' when they were either in the PR (but on page 2 of results) or pre-existing in the repository.
Changes
1. Fix GitHub API Pagination (root cause)
The PR files endpoint defaults to 30 files per page, but we weren't paginating. PRs with 31+ files were missing files from the analysis.
File:
review_roadmap/github/client.py_fetch_file_diffs()to request 100 files per page and paginate through all results2. Include Diff Content in LLM Context
The LLM was only seeing file names, not the actual code changes. This made it hard to understand imports and cross-file relationships.
File:
review_roadmap/agent/nodes.py_build_diffs_context()to include truncated diffs in the draft_roadmap prompt3. Better Context Expansion Prompt
Updated the prompt to explicitly tell the LLM it can fetch any file from the repo to verify imports exist.
Files:
review_roadmap/agent/prompts.py,review_roadmap/agent/tools.pyCONTEXT_EXPANSION_SYSTEM_PROMPTwith examples and clearer guidanceread_filetool docstring to clarify scopeTesting
Tested against jwm4/temperature_alert#1 which has 39 files:
tools/temperature.pyandtools/memory.pyAll 70 tests pass with 84% coverage.