Skip to content

Replay 2906 commits from private repo history#549

Open
jiaminc-cmu wants to merge 2907 commits intomainfrom
replay-commit-history
Open

Replay 2906 commits from private repo history#549
jiaminc-cmu wants to merge 2907 commits intomainfrom
replay-commit-history

Conversation

@jiaminc-cmu
Copy link
Copy Markdown
Contributor

Summary

  • Replayed 2,906 filtered commits from the private repo onto this branch
  • Only includes files matching the shared section of .sync-config.yml
  • Original author, date, and commit messages are preserved
  • Private-only files (architecture.json, _python.prompt in pdd/prompts/, etc.) were excluded

What's included

  • pdd/*.py, pdd/commands/*.py, pdd/core/*.py, pdd/server/**/*.py
  • tests/**/*.py
  • prompts/*_LLM.prompt (rewritten to pdd/prompts/*_LLM.prompt)
  • examples/, context/**/*_example.py
  • Root configs (README, requirements.txt, pyproject.toml, Makefile, etc.)
  • docs/, utils/vscode_prompt/

What's excluded

  • architecture.json
  • pdd/prompts/*_python.prompt (cap-only)
  • .github/workflows/ (not synced)
  • All non-shared internal files

Verification

  • Leak check passed — no private files introduced by replay
  • Review final file state matches expected public content
  • Verify commit authorship: git log --format="%an <%ae> %ad %s" | head -20

🤖 Generated with Claude Code

gltanaka and others added 30 commits January 26, 2026 12:32
- Fix format string injection: Escape curly braces in LLM outputs before
  storing in context to prevent KeyError when subsequent prompts contain
  {placeholders} from code/error analysis
- Fix silent error: Print KeyError messages to console before returning
- Fix resume message: Calculate actual start step (5.5) before displaying
  resume message instead of showing incorrect "step 6"

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Add 5 new tests covering:
- Format string injection: Verify curly braces in LLM outputs don't cause KeyError
- Restored context escaping: Verify curly braces in resumed state are escaped
- Error console output: Verify KeyError messages are printed to console
- Resume message for step 5.5: Verify correct step shown when resuming after step 5
- Resume message for step 6: Verify correct step shown when resuming after step 5.5

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Cherry-picked changes from PR #267:
- Add optional interactive steering to sync command
- Fix sync animation for horizontal terminal resizes
- Add --no-steer and --steer-timeout options
- Add sync_tui tests and example

Note: Excluded pdd/prompts/* (symlink) and project_dependencies.csv

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
## Non-Python Sync Fixes

- Skip test_extend for non-Python languages since code coverage tooling is Python-specific
- Fix sync returning success without generating tests for non-Python modules
  - Added check for test file existence before accepting workflow as complete
  - The synthetic RunReport from crash/verify was incorrectly triggering "all_synced"
- Add safety checks in sync_orchestration.py and pin_example_hack.py

## Frontend File Detection Fixes

- Support new .pddrc `outputs.code.path` format (Issue #237)
  - Previously only looked for legacy `generate_output_path`
  - Now checks `outputs.code.path`, `outputs.test.path`, `outputs.example.path` first
- Add .test.ts/.spec.ts patterns for Jest/TypeScript test file detection
  - Fixes detection of files like `test_prisma_schema.test.ts`
- Rebuild frontend with updated file detection logic

## Architecture Generation Fixes

- Add valid language suffixes guidance to prevent LLM using invalid suffixes like `_NextJS`
- Escape curly braces in architecture_json.prompt template to prevent .format() errors
- Add preprocessing in orchestrator before .format() calls

## Path Resolution Fixes

- Add `typescriptreact` -> `.tsx`, `javascriptreact` -> `.jsx`, `prisma` -> `.prisma` mappings
- Ensure example and test paths always have fallback defaults

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
## Workflow Changes

Reorganized the agentic architecture workflow for better modularity:
- Step 1-6: Unchanged (analyze PRD, decompose, research, design, deps, generate)
- Step 7: NEW - Generate .pddrc configuration file
- Step 8: Renamed from step 7 - Generate individual prompts
- Step 9: Renamed from step 8 - Completeness validation
- Step 10: Renamed from step 9 - Sync prompts with architecture
- Step 11: NEW - Dependency resolution
- Step 12: Renamed from step 11 - Fix validation errors

## New Files

- `prompts/agentic_arch_step7_pddrc_LLM.prompt` - .pddrc generation step
- `prompts/agentic_arch_step11_deps_LLM.prompt` - Dependency resolution step
- `prompts/agentic_arch_step12_fix_LLM.prompt` - Enhanced fix step
- `pdd/templates/architecture/example_nextjs_task_notes.prompt` - Example for Next.js projects
- `pdd/templates/architecture/pdd_path_construction_guide.prompt` - Path construction reference

## Template Fixes

- Escape curly braces in docs/prompting_guide.md to prevent .format() errors
- Change {PLACEHOLDER} to [PLACEHOLDER] in generate_prompt.prompt to avoid confusion
- Updated step count references in step 1 and 2 prompts (8 -> 11)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
## Task Queue Panel Improvements

- Make task queue panel draggable by adding drag handle
- Save/restore panel position to localStorage
- Add reset position button to return to default (top-right corner)
- Keep panel within viewport bounds on window resize
- Separate collapse toggle from drag handle for better UX

## Generate Command

- Add --skip-prompts flag to skip prompt generation in agentic architecture mode
- Prompts are generated by default; flag allows skipping when not needed

## Logging

- Change generate_output_paths logging from INFO to DEBUG level
- Reduces noise since paths may be overridden by outputs.code.path config

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- test_agentic_architecture_orchestrator: Update all tests to reflect
  the new 11-step workflow (steps 1-8 linear + steps 9-11 validation)
- test_sync_determine_operation: Fix test_decision_test_on_low_coverage
  to create actual test file (required after test file existence check)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Update README and TUTORIALS.md to reflect the new 11-step agentic
architecture workflow:
- Steps 1-8: Analysis & generation (architecture.json, .pddrc, prompts)
- Steps 9-11: Validation (completeness, sync, dependencies)

Also document the new --skip-prompts option for faster architecture-only
generation.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- test_commands_generate.py: Add skip_prompts=False to expected call
- code_generator_main.py: Handle both {{PLACEHOLDER}} (YAML-escaped)
  and {PLACEHOLDER} (single brace) in post_process_args substitution

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
…ck.acquire()

This commit adds comprehensive unit and E2E tests that detect the file handle
resource leak in SyncLock.acquire() when non-IOError/OSError exceptions occur.

Tests added:
- Unit tests in test_sync_determine_operation.py (6 tests)
  - KeyboardInterrupt during lock acquisition
  - RuntimeError during lock acquisition
  - Exception during file operations
  - IOError/OSError regression tests
  - Normal operation regression tests
  - Context manager exception handling

- E2E tests in test_e2e_issue_403_file_handle_leak.py (4 tests)
  - Real-world KeyboardInterrupt scenario (Ctrl+C)
  - RuntimeError leak detection
  - Normal operation verification
  - Context manager interrupt handling

All tests correctly fail on current code, detecting the bug where file
descriptors remain open when exceptions other than IOError/OSError occur
during lock acquisition.

Related to #403

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Revert unnecessary code change to handle double braces in code_generator_main.py
- Update test_code_generator_main.py to normalize {{PLACEHOLDER}} to {PLACEHOLDER}
  when reading the template for testing purposes

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Fix mock_httpx_client fixture to properly mock async context manager
  using AsyncMock for __aenter__ and __aexit__
- Fix test_first_heartbeat_sent_immediately to avoid orphan coroutines
  by using side_effect instead of reassigning the mock
- Fix test_heartbeat_refreshes_token_on_401 to use side_effect pattern
- Fix test_heartbeat_only_refreshes_once_per_cycle to use return_value

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Add explanation that core_dump files are created when PDD runs crash
or hit internal errors, per Copilot review suggestion.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
…h expanded agentic architecture, various bug fixes, and refactors.
…gration

This update improves the verbose logging setup by allowing the LiteLLM library to toggle its debug output based on the verbose flag or environment variable. It ensures that logging levels are appropriately set for production and development modes, and adds error handling for potential attribute access issues in LiteLLM.
- Add batch detection using Union-Find algorithm to group modules by dependency
- Each batch is a connected component in the dependency graph
- Modules within a batch sync sequentially (by priority), different batches are independent
- Add BatchFilterDropdown component with expandable module list view
- Add batch color stripe indicator on graph nodes
- Add SyncOptionsModal for configuring sync options before execution
- Various UI improvements to PromptSelector, PromptSpace, and constants

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Add `agentic_mode` parameter to sync_orchestration for Python agentic path
- Change cmd_test_main return type from 3-tuple to 4-tuple with agentic_success flag
- Add run_agentic_test_generate 4-tuple return with success boolean
- Add _use_agentic_path() helper to determine agentic behavior
- Add _create_synthetic_run_report_for_agentic_success() for non-Python languages
- Fix sync_determine_operation to differentiate synthetic vs real run reports using test_hash
- Use sentinel value "agentic_test_success" when agent succeeds but file is at different path
- Trust agentic_success flag for non-Python test generation instead of file existence check
- Update prompts and examples to reflect API changes

Fixes issue where sync reported failure despite successful agentic test generation
for non-Python languages (CSS, TypeScript, etc.) where test files may be created
at different paths or with different extensions than expected.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
…upport

Add agentic_mode parameter throughout the sync workflow:
- commands/maintenance.py: Add --agentic CLI flag to sync command
- sync_main.py: Pass agentic_mode parameter to sync_orchestration
- prompts/sync_main_python.prompt: Document agentic_mode parameter

Update agentic_test_generate return signature:
- prompts/agentic_test_generate_python.prompt: Update return type from
  tuple[str, float, str] to tuple[str, float, str, bool] to include
  success boolean, matching actual implementation

Fix cloud timeout handling:
- fix_verification_errors_loop.py: Use get_cloud_timeout() function
  instead of hardcoded CLOUD_REQUEST_TIMEOUT constant

Improve server job failure detection:
- server/jobs.py: Check stdout for sync failure indicators even when
  exit code is 0, since sync may return 0 but report failure in output

Extend language extension mappings:
- server/routes/files.py: Add HTML, CSS, and Makefile extensions;
  include "Dockerfile" without extension prefix

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Update test_sync_dry_run_mode to expect the new agentic_mode=False
parameter in sync_orchestration calls.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
…er for the 3blue1brown demo, along with related dependency and changelog updates.
This commit adds comprehensive test coverage for the bug where commits
created by LLM agents during Step 1 of the agentic E2E fix workflow are
not pushed to the remote repository when the workflow exits early at Step 2.

Test files:
- tests/test_e2e_issue_419_unpushed_commits.py: Unit tests for _commit_and_push()
- tests/test_e2e_issue_419_cli_unpushed_commits.py: E2E integration test

The tests verify the expected behavior documented in CHANGELOG v0.0.121:
"pdd fix now automatically commits and pushes changes after successful completion"

These tests fail on the current buggy code and will pass once the fix
is implemented in pdd/agentic_e2e_fix_orchestrator.py lines 237-238.

Related to #419
gltanaka and others added 23 commits February 15, 2026 18:43
Reverts 53a9caa and 38d3ab3. The calculate_prompt_hash() fix is
correct at the fingerprint-calculation layer but incomplete end-to-end:
pdd sync's insert-includes step strips <include> tags from the original
.prompt file, so subsequent syncs cannot detect include dependency
changes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…o-deps)

- Skip auto-deps in agentic mode (prompts already have explicit dependencies)
- Add 30s client-side timeout to Firecrawl scrape_url via ThreadPoolExecutor
  (works around SDK bug where timeout ms is passed to requests as seconds)
- Update Firecrawl method from scrape() to scrape_url() for current SDK
- Add 30s timeout to git ls-files subprocess call
- Add label parameter to _run_with_provider for future heartbeat support

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… auth

When Claude Code runs on subscription (not API key), total_cost_usd is
absent from JSON output. Add _calculate_anthropic_cost() with three-tier
fallback: (1) modelUsage per-model costUSD, (2) token-based estimation
from usage field with model-family-aware pricing (Opus/Sonnet/Haiku),
(3) zero. This matches the existing pattern used for Gemini and Codex
providers which always estimate from token counts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…and Vertex AI ADC, various bug fixes, build improvements, and grounding experiment documentation.
- Fix result[-2] indexing bug in sync_orchestration.py and pin_example_hack.py
  that caused $0.00 cost for agentic test generation. For 4-tuple returns
  (content, cost, model, success), result[-2] gave model (str) not cost (float).
  Changed to result[1] which is always the cost index.

- Increase MODULE_TIMEOUT from 900s (15 min) to 1800s (30 min) in
  agentic_sync_runner.py. Complex modules (e.g. TypeScriptReact with <web> tags)
  need generate+crash+verify+test which can take 20+ min total.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove unused signal/threading imports from agentic_common.py
- Guard against UnboundLocalError in _save_state if mkstemp fails
- Clean up temp cost_file on Popen failure in _sync_one_module

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…state desynchronization (Issue #159), update LLM invocation logic and prompts, and add new grounding experiment results.
…e Claude model, and migrate pytest configuration to pyproject.toml.
Introduces user story tests as a first-class PDD feature:
- `pdd/user_story_tests.py`: core validation logic — discover story files,
  run detect_change against each story, and apply fixes via change_main
- `pdd detect --stories`: new mode that validates all user_stories/story__*.md
  files against current prompts (pass = no changes needed)
- `pdd fix user_stories/story__<name>.md`: auto-detects affected prompts,
  applies changes, then re-validates
- `pdd change`: auto-validates user stories post-change before finalizing,
  respecting `skip_user_stories` context flag to prevent recursion
- `user_stories/story__template.md`: standard story template
- Full test coverage (89 tests across 4 test files)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Set mock_proc.pid = 99999 in _make_mock_popen so that the timeout test
calls os.killpg(99999, SIGTERM) instead of resolving MagicMock.__index__()
to 1, which was sending SIGTERM to process group 1 and killing the entire
pytest-xdist worker mid-run, causing CI to fail at 26% every time.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace claude-sonnet-4-5 entries (both Vertex AI and Anthropic) with
claude-sonnet-4-6 in the canonical data file shipped with the package.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…dates for `sync_orchestration` to fix state desynchronization.
…hestration runs

All 5 runs used vertex_ai/claude-opus-4-6 (context-1m-2025-08-07 beta working),
achieving 100% test pass rate (108/108) and ref_sim=0.823 ± 0.031.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…architecture improvements (#482)

* feat: Add iterative fix-verify loop (steps 3-7) to checkup orchestrator

Steps 3-7 (build, interfaces, test, fix, verify) now run in a while loop
(max 3 iterations) instead of once. If step 7 finds remaining failures,
the workflow loops back to re-check/fix until "All Issues Fixed" or max
iterations. Worktree is created before the loop; step 8 runs after.

Prompts updated with iteration awareness, previous_fixes context, e2e test
instructions, and "All Issues Fixed" exit signal. 59 tests (12 new).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: Copy uncommitted/untracked files into worktree on creation

The worktree is created from HEAD, which only contains committed files.
If the user has uncommitted or untracked files (e.g. new CRM modules),
the worktree would be missing them, causing steps 3-7 to see different
files than steps 1-2 analyzed.

Now _setup_worktree calls _copy_uncommitted_changes which:
1. Applies uncommitted tracked changes via git diff HEAD | git apply
2. Copies untracked files (excluding .pdd/) into the worktree

Both operations are best-effort — failures are logged but don't block.
Added 7 tests for the new behavior. Reverted prompt workaround.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: Split step 6 into sub-steps, fix resume bugs, and fix iteration display

- Split monolithic step 6 into 6.1 (fix), 6.2 (regression tests), 6.3 (e2e tests)
  with separate prompts and 600s timeouts each
- Bug A fix: save worktree state immediately after creation so Ctrl+C during
  step 3 can resume without recreating
- Bug B fix: detect between-iterations resume (start_step > 7 with
  fix_verify_iteration > 0) and restart at step 3 with incremented iteration
- Fix iteration number always showing "1" in GitHub comments: add iteration
  suffix to steps 3-5 comment headers, add explicit instruction to all loop
  step prompts to use exact iteration number
- Fix total step count: "X of 7" -> "X of 8" across all prompts
- Add STEP_ORDER constant and _next_step() helper for fractional step arithmetic
- Add checkup command, agentic_checkup module, and comprehensive tests (108 total)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: Add frontend integration checks to checkup step 4 and step 6.1

Step 4 (Interface Check) now audits:
- Frontend navigation reachability: detect orphan pages with no nav link
- Frontend→Backend API call consistency: detect pages using different
  URL patterns than the rest of the codebase (e.g. relative vs base URL)

Step 6.1 (Fix) now handles:
- Adding missing nav links for orphan pages
- Updating inconsistent API call patterns to match codebase convention

Found via QA on the CRM app where the page existed but had no sidebar
link and used relative `/adminCrmActions` instead of the standard
`${NEXT_PUBLIC_API_BASE_URL}/adminCrmActions` pattern.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: Increase step 7 (verify) timeout from 600s to 1200s

Step 7 re-runs the full test suite to verify fixes, which can exceed
10 minutes on larger projects (e.g. 4600+ tests). The 600s timeout
caused step 7 to time out after posting its GitHub comment but before
returning, leaving state stuck at step 6.3 and causing infinite
resume loops.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: Improve architecture generation with Strategy B support, register checkup command, and add gh timeout

- Add Strategy B (template-based group contexts) support to arch steps 7, 8, 10, 12
- Add pdd_path_construction_guide Strategy B documentation
- Add example_python_backend.prompt template
- Register checkup command in CLI
- Add timeout parameter to _run_gh_command()
- Update test durations

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: Add PDD prompts, examples, and architecture entries for checkup modules

- Add agentic_checkup_python.prompt and agentic_checkup_orchestrator_python.prompt
- Add context/agentic_checkup_example.py and context/agentic_checkup_orchestrator_example.py
- Register checkup and orchestrator in architecture.json (priority 217, 218)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: Add README documentation for pdd checkup and pdd sync URL mode

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* ci: trigger Cloud Build CI

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Greg Tanaka <glt@alumni.caltech.edu>
Align the replay branch's final file state with main so the PR
shows zero diff. The branch preserves the full commit history
while ending at the same tree as main.
@gltanaka gltanaka requested a review from Copilot February 21, 2026 07:44
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot wasn't able to review any files in this pull request.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Serhan-Asad added a commit to Serhan-Asad/pdd that referenced this pull request Mar 4, 2026
…c orchestrators (promptdriven#549) (promptdriven#565)

* fix: replace .format(**context) with safe str.replace() in all agentic orchestrators (promptdriven#549)

Root cause: orchestrators called prompt_template.format(**context) to
substitute step outputs into prompt templates. When a step's LLM output
contained JSON curly braces (e.g. {"error": "Insufficient role"}), the
format call either raised KeyError or, when values were pre-escaped with
.replace("{", "{{"), inserted doubled braces {{...}} into the LLM prompt
instead of the original single-brace JSON.

Fix: replace .format(**context) with iterative str.replace() — the same
safe pattern used in template_expander.py:155-156. This substitutes each
context key literally without interpreting curly braces in values. The
remaining {{ }} from preprocess(double_curly_brackets=True) are then
un-doubled with a final .replace("{{", "{").replace("}}", "}") call.
Value pre-escaping (.replace("{", "{{") in context storage) is also
removed as it is no longer needed.

Files changed:
- pdd/agentic_bug_orchestrator.py
- pdd/agentic_change_orchestrator.py
- pdd/agentic_checkup_orchestrator.py
- pdd/agentic_test_orchestrator.py
- pdd/agentic_e2e_fix_orchestrator.py
- pdd/agentic_architecture_orchestrator.py

Tests: 22 passing unit tests across both test files verify:
- JSON output from step N appears as single braces in step N+1 prompt
- Nested/multiple JSON blocks are preserved
- Unknown placeholders left intact (no KeyError)
- Structural assertions confirm the buggy patterns are removed from source

Fixes promptdriven#549

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: address copilot review comments on PR promptdriven#565

- Remove unused `ast` import from test_e2e_issue_549_format_double_escaping.py
- Remove unused `re` and `MagicMock` imports from test_e2e_issue_549_other_orchestrators.py
- Remove dead no-op `if` block in agentic_checkup_orchestrator.py that
  claimed to set a dotted alias for integer step keys but wrote to the
  same dict key as the line above it

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Serhan-Asad pushed a commit to Serhan-Asad/pdd that referenced this pull request Mar 16, 2026
* test: Add failing tests for E2E timeout retry bug promptdriven#791

- 8 unit tests in test_agentic_e2e_fix_orchestrator.py (2 pass prompt checks, 6 fail detecting missing behavior)
- 3 E2E tests in test_e2e_issue_791_e2e_timeout_retry.py (all fail detecting the bug)
- Prompt fix adding E2E pre-flight check and cross-cycle memory requirements

Root causes:
1. No environment pre-flight check before Step 2 E2E tests (line 660-726)
2. step_outputs cleared between cycles destroying failure memory (line 857-859)

Fixes promptdriven#791

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: PDD bug changes for promptdriven#791

* fix: pdd fix: E2E test step times out on every cycle, wasting cost and time

Fixes promptdriven#791

* fix: PDD fix changes for promptdriven#791

* fix: persist skipped_steps to state and remove artifact files

- Save skipped_steps in state_data so it survives resume across sessions
- Load skipped_steps from state on resume (with JSON string-to-int key conversion)
- Include skipped_steps in KeyboardInterrupt and Exception state saves
- Remove artifact files: error_output_791.txt, test_errors_791.txt,
  and agentic_e2e_fix_orchestrator_test_agentic_e2e_fix_orchestrator_fixed.py

Fixes promptdriven#791

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: add _check_e2e_environment mock to issue promptdriven#419 tests

The new E2E environment preflight check skips Step 2 when no playwright
config exists. Existing tests that expect Step 2 to dispatch to the LLM
agent need to mock _check_e2e_environment to return (True, "").

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: add _check_e2e_environment mock to issue promptdriven#545 tests

Same fix as promptdriven#419 tests: mock the E2E environment preflight check
so Step 2 dispatches to the LLM mock instead of being skipped.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: add _check_e2e_environment mock to e2e_fix_deps fixture in promptdriven#549 tests

Without this mock, _check_e2e_environment skips Step 2 (no playwright
config in tmp_path), so the cycle1_step2 assertion fails.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: add _check_e2e_environment mock to promptdriven#468, promptdriven#467, promptdriven#357 test fixtures

Same pattern as previous fixes — without this mock, _check_e2e_environment
skips Step 2 (no playwright in tmp_path), breaking step execution assertions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review — narrow skip trigger to timeouts, remove last_completed_step advance, add check=True

- Narrow Step 2 skip to timeout-specific errors only (not transient
  provider outages like rate limits)
- Remove contradictory last_completed_step = step_num in skip path
  (skipped_steps dict already handles cross-cycle memory)
- Add check=True to git subprocess calls in test_issue_791 fixture

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: PDD Bot <pdd-bot@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Greg Tanaka <glt@alumni.caltech.edu>
Serhan-Asad pushed a commit to Serhan-Asad/pdd that referenced this pull request Mar 16, 2026
Run pdd update + pdd example on agentic_e2e_fix_orchestrator,
agentic_e2e_fix, commands/fix, and agentic_common to capture
accumulated bug fixes (promptdriven#338, promptdriven#468, promptdriven#549, promptdriven#791, #830) into prompts
and refresh few-shot examples before re-running pdd change on #822.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants