feat(agentic): thread --time into provider subprocess reasoning flags#787
feat(agentic): thread --time into provider subprocess reasoning flags#787Serhan-Asad wants to merge 5 commits intomainfrom
Conversation
The top-level CLI already stores --time on ctx.obj but the value was
dropped for agentic subprocess launches: LiteLLM-invoked steps honored
reasoning_effort via the CSV-driven mapping in llm_invoke, while the
subprocess calls that actually write code (Codex, Claude Code, Gemini
CLIs via run_agentic_task -> _run_with_provider) forwarded only --model
and never propagated a reasoning signal. A --time high job therefore
produced the same provider argv as an unspecified run.
- pdd/core/cli.py: when --time is explicitly provided, map the float
into PDD_REASONING_EFFORT={low,medium,high} using the same thresholds
llm_invoke already uses (>0.7 high, >0.3 medium, else low). When
--time is omitted, leave the env untouched so worker env.yaml values
still apply.
- pdd/agentic_common.py: _run_with_provider reads PDD_REASONING_EFFORT
and, for Codex, injects "-c model_reasoning_effort=<level>" BEFORE
the "exec" subcommand — Codex silently ignores -c / --config when
it appears after the subcommand. Invalid values are ignored so bad
env cannot poison argv. Anthropic Claude Code and Gemini CLIs do not
currently expose a reasoning flag; in verbose mode we log once that
the effort did not apply to the subprocess instead of silently
dropping the signal.
- tests/test_agentic_common.py: argv assertions across all three
providers, including Codex flag ordering, missing env no-op, invalid
value guard, and explicit "argv unchanged" coverage for Anthropic
and Gemini.
- tests/test_time_reasoning_effort_env.py: CliRunner-level coverage of
the --time -> PDD_REASONING_EFFORT mapping, boundary thresholds, and
the no-flag no-op.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ffort
Addresses review follow-ups on fix/agentic-reasoning-effort-threading:
- pdd/reasoning.py: new `time_to_effort_level(time)` helper. Single
source of truth for the {low,medium,high} thresholds that were
previously duplicated in core/cli.py and llm_invoke.py. Strict `>`
thresholds preserved (0.3 → "low", 0.7 → "medium").
- pdd/core/cli.py + pdd/llm_invoke.py: both now delegate to the helper.
- pdd/agentic_common.py: drop the `verbose` gate on the Anthropic and
Gemini "effort not applied" notices; now gated only by `quiet`. The
user asked for reasoning effort, provider CLI can't honor it — that
is signal, not noise. Wording also clarifies that LiteLLM-invoked
steps (analysis/verification) still apply the effort.
- tests/test_reasoning.py: parametrized boundary coverage for the helper
(0.0, 0.1, 0.3, 0.31, 0.5, 0.7, 0.71, 0.85, 1.0).
- tests/test_agentic_common.py: case/whitespace tolerance (`" High "`
/ `"HIGH"` / `"high\n"`); explicit tests that the Claude Code notice
fires without --verbose and stays silent under --quiet.
- tests/test_time_reasoning_effort_env.py: exact 0.0 / 0.3 / 0.7 / 1.0
boundaries plus a full-chain CliRunner → _run_with_provider integration
test that pins the env-var seam.
Suite: 216 passed in the three reasoning-related test files; 249 passed
in test_llm_invoke.py confirming the helper swap left that path intact.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ider
Complements the PDD_REASONING_EFFORT env-var plumbing with explicit
kwarg threading so the top-level ctx.obj["time"] flows all the way from
each issue-URL command into _run_with_provider's argv construction
without relying on a global env mutation.
Threading the kwarg gives provenance (the user's --time choice for
*this* call) that the env var cannot; the env path remains as the
fallback for the ~15 other run_agentic_task call sites in sync/split/
test_generate/update/verify/crash/etc. that aren't reached via the
five issue-URL commands.
Signature additions (all default to None — source-compatible):
- agentic_common.py:
* run_agentic_task(..., reasoning_time=None)
* _run_with_provider(..., reasoning_time=None)
Inside, explicit kwarg wins; env var is the fallback.
- Entry wrappers (commands → orchestrator):
* run_agentic_bug / run_agentic_change / run_agentic_checkup /
run_agentic_e2e_fix / run_agentic_sync each take reasoning_time
and forward it into their _orchestrator counterpart.
- Orchestrators and their helpers:
* run_agentic_bug_orchestrator — 5 run_agentic_task call sites
* run_agentic_change_orchestrator — 4 call sites
* run_agentic_checkup_orchestrator + _run_single_step — 1 call site
through the helper, 5 helper callers
* run_agentic_e2e_fix_orchestrator + _run_step11_code_cleanup — 4
direct call sites plus the step-11 helper
* run_agentic_sync + _run_dry_run_validation +
_llm_fix_dry_run_failure — 2 direct call sites plus a two-level
dry-run helper chain
- Commands:
* commands/fix.py — reasoning_time=ctx.obj.get("time") into run_agentic_e2e_fix
* commands/analysis.py — same into run_agentic_bug
* commands/modify.py — same into run_agentic_change
* commands/maintenance.py — same into run_agentic_sync (agentic dispatch)
* commands/checkup.py — same into run_agentic_checkup
Param named reasoning_time (not "time") to avoid shadowing the
``import time`` module in agentic_common.py — several code paths call
time.time() / time.sleep() inside run_agentic_task and would NameError
with a `time: Optional[float] = None` kwarg. reasoning_time makes the
parameter's purpose explicit in every signature it appears in.
Tests:
- tests/test_agentic_common.py
* test_codex_injects_reasoning_effort_from_time_kwarg —
parametrized low/medium/high/0.0/1.0 cases asserting the Codex
argv token comes from the kwarg with env UNSET.
* test_reasoning_time_kwarg_overrides_env_var — explicit kwarg
wins over a conflicting env var (env: low, kwarg: high → high).
* test_run_agentic_task_forwards_reasoning_time_to_provider —
pins the run_agentic_task → _run_with_provider seam.
- tests/test_time_reasoning_effort_env.py
* test_command_forwards_reasoning_time_to_orchestrator —
parametrized across all five commands (fix/bug/change/sync/checkup),
runs each via CliRunner with --time 0.85, patches the orchestrator,
and asserts it received reasoning_time=0.85.
Suite: 477 passed across test_agentic_common + test_reasoning +
test_time_reasoning_effort_env + test_llm_invoke.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The kwarg-threading commit added ``reasoning_time`` to run_agentic_task's
signature. Several test files mock run_agentic_task with hand-rolled
stub functions that declare fixed positional-only signatures, so the
new kwarg lands as an unexpected-argument TypeError at runtime.
- tests/test_e2e_issue_*: 21 ``mock_run_agentic_task`` stubs across 11
test files gain ``**kwargs`` so additive keyword arguments pass
through without breaking the mock contract.
- tests/test_agentic_change_orchestrator.py: 2 long-form ``side_effect``
helpers (full explicit signature) gain ``**kwargs`` as well.
- tests/commands/test_fix.py::test_agentic_mode_passes_all_options:
relaxed the strict ``kwargs == {...}`` assertion to a superset check so
additive options (reasoning_time now, future options later) don't
trip the enumeration. The test still pins every option value it sets,
and explicitly asserts reasoning_time is None when --time isn't passed.
No production code changes. Local: 180 pass across the previously-
failing test files.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Greptile SummaryThis PR introduces
Confidence Score: 4/5Safe to merge after addressing the default-value propagation bug; no data loss or security concerns. One P1 finding: pdd/core/cli.py (ctx.obj["time"] default), pdd/commands/fix.py, analysis.py, modify.py, checkup.py, maintenance.py (all pass ctx.obj.get("time") which is never None)
|
| Filename | Overview |
|---|---|
| pdd/reasoning.py | New helper module with time_to_effort_level() mapping float→{low,medium,high}; clean single-responsibility unit, well-tested. |
| pdd/core/cli.py | Sets PDD_REASONING_EFFORT only when --time is explicit (correct), but also unconditionally stores DEFAULT_TIME (0.25) in ctx.obj["time"], causing plumbed commands to always forward a non-None reasoning_time. |
| pdd/agentic_common.py | Adds reasoning_time kwarg and env-var fallback in _run_with_provider; Codex -c injection is correct in isolation, but always fires due to the default value propagation bug; notice message fires in non-quiet rather than verbose-only mode. |
| pdd/commands/analysis.py | Forwards reasoning_time=obj.get("time") which is always non-None (DEFAULT_TIME=0.25 fallback), so Codex always receives -c model_reasoning_effort=low even without an explicit --time flag. |
| pdd/agentic_bug_orchestrator.py | Correctly threads reasoning_time through to run_agentic_task calls in the 11-step orchestrator; additive change, no regressions introduced. |
| tests/test_time_reasoning_effort_env.py | Comprehensive test coverage for threshold boundaries, env-var plumbing, full Codex chain, and per-command kwarg forwarding; test_no_time_flag_leaves_env_unset correctly validates the env-var path but does not catch the kwarg path regression. |
| tests/test_reasoning.py | Parametrized boundary tests for time_to_effort_level — thorough and correct. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["pdd --time 0.85 fix <url>"] --> B["cli.py: time is not None?"]
B -- "Yes (explicit --time)" --> C["os.environ['PDD_REASONING_EFFORT'] = 'high'"]
B -- "No (omitted)" --> D["env var left untouched ✓"]
A --> E["ctx.obj['time'] = time ?? DEFAULT_TIME\n(always float, never None)"]
E --> F["commands/fix.py:\nreasoning_time=ctx.obj.get('time')\n⚠️ always 0.25 when --time omitted"]
F --> G["run_agentic_e2e_fix(reasoning_time=0.25)"]
G --> H["run_agentic_task(reasoning_time=0.25)"]
H --> I["_run_with_provider(reasoning_time=0.25)"]
I --> J{"reasoning_time is not None?"}
J -- "Yes (always true)" --> K["time_to_effort_level(0.25) → 'low'"]
K --> L["Codex: -c model_reasoning_effort=low\n⚠️ injected even without --time"]
K --> M["Claude/Gemini: print notice\n⚠️ fires in non-quiet mode always"]
J -- "No (never reached)" --> N["read PDD_REASONING_EFFORT env var"]
N --> O["inject if set, skip if unset ✓"]
style F fill:#ffcccc
style K fill:#ffcccc
style L fill:#ffcccc
style M fill:#ffcccc
Prompt To Fix All With AI
This is a comment left during a code review.
Path: pdd/commands/analysis.py
Line: 247
Comment:
**`reasoning_time` is never `None` for plumbed callers — default always injects `low` effort**
`ctx.obj["time"]` is unconditionally set in `cli.py`:
```python
ctx.obj["time"] = time if time is not None else DEFAULT_TIME # always 0.25 when --time omitted
```
So `obj.get("time")` here returns `0.25` (not `None`) whenever `--time` is absent. That `0.25` propagates to `_run_with_provider`, which hits the `if reasoning_time is not None:` branch and maps it to `"low"`, injecting `-c model_reasoning_effort=low` into every Codex invocation — even when the user never passed `--time`.
This contradicts the PR's stated design: *"When `--time` is omitted, the env var is left untouched."* The env-var path in `cli.py` is correctly guarded by `if time is not None:`, but all five plumbed commands bypass that guard by forwarding `ctx.obj.get("time")` which is always a float.
The same issue applies to `pdd/commands/fix.py:156`, `pdd/commands/modify.py:266`, `pdd/commands/checkup.py:113`, and `pdd/commands/maintenance.py:208`.
A fix is to store the explicit-flag state separately and only forward a non-None value when the user actually typed `--time`:
```python
# in cli.py
ctx.obj["time"] = time # keep None when --time was not provided
ctx.obj["time_default"] = time if time is not None else DEFAULT_TIME # for LiteLLM steps
```
and downstream commands use `reasoning_time=ctx.obj.get("time")` (which is now `None` when omitted).
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: pdd/agentic_common.py
Line: 1190-1200
Comment:
**Notice fires in normal mode, not just verbose — and fires for every Codex default effort**
The comment says *"Always surface outside --quiet mode"*, but given the P1 above (default `0.25` → `"low"` always propagates), this message fires on every `pdd fix`/`pdd bug`/`pdd change`/`pdd checkup`/`pdd sync` invocation in normal mode, even when the user didn't touch `--time`. The PR description says *"logs once in verbose mode"*, but the guard is `not quiet`, so users who have never heard of `--time` will see this diagnostic in every run.
Once the `None`-propagation bug is fixed this becomes moot for the unintentional case; but the `not quiet` vs. `verbose` discrepancy with the PR description is worth aligning.
```suggestion
if reasoning_effort and verbose and not quiet:
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: pdd/agentic_common.py
Line: 784-787
Comment:
**Docstring parameter name mismatch** — the docstring here says `time:` but the parameter is `reasoning_time`. Same issue in `_run_with_provider` (line 1116). IDEs and generated docs will show the wrong name; trivial to align.
```suggestion
reasoning_time: Reasoning-allocation float in [0.0, 1.0] forwarded from the
top-level ``pdd --time`` flag. When provided, overrides the
``PDD_REASONING_EFFORT`` env var for argv injection. ``None``
means "fall back to env" so unplumbed call sites keep working.
```
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "test: accept additive reasoning_time kwa..." | Re-trigger Greptile
| quiet=obj.get("quiet", False), | ||
| timeout_adder=timeout_adder, | ||
| use_github_state=not no_github_state, | ||
| reasoning_time=obj.get("time"), |
There was a problem hiding this comment.
reasoning_time is never None for plumbed callers — default always injects low effort
ctx.obj["time"] is unconditionally set in cli.py:
ctx.obj["time"] = time if time is not None else DEFAULT_TIME # always 0.25 when --time omittedSo obj.get("time") here returns 0.25 (not None) whenever --time is absent. That 0.25 propagates to _run_with_provider, which hits the if reasoning_time is not None: branch and maps it to "low", injecting -c model_reasoning_effort=low into every Codex invocation — even when the user never passed --time.
This contradicts the PR's stated design: "When --time is omitted, the env var is left untouched." The env-var path in cli.py is correctly guarded by if time is not None:, but all five plumbed commands bypass that guard by forwarding ctx.obj.get("time") which is always a float.
The same issue applies to pdd/commands/fix.py:156, pdd/commands/modify.py:266, pdd/commands/checkup.py:113, and pdd/commands/maintenance.py:208.
A fix is to store the explicit-flag state separately and only forward a non-None value when the user actually typed --time:
# in cli.py
ctx.obj["time"] = time # keep None when --time was not provided
ctx.obj["time_default"] = time if time is not None else DEFAULT_TIME # for LiteLLM stepsand downstream commands use reasoning_time=ctx.obj.get("time") (which is now None when omitted).
Prompt To Fix With AI
This is a comment left during a code review.
Path: pdd/commands/analysis.py
Line: 247
Comment:
**`reasoning_time` is never `None` for plumbed callers — default always injects `low` effort**
`ctx.obj["time"]` is unconditionally set in `cli.py`:
```python
ctx.obj["time"] = time if time is not None else DEFAULT_TIME # always 0.25 when --time omitted
```
So `obj.get("time")` here returns `0.25` (not `None`) whenever `--time` is absent. That `0.25` propagates to `_run_with_provider`, which hits the `if reasoning_time is not None:` branch and maps it to `"low"`, injecting `-c model_reasoning_effort=low` into every Codex invocation — even when the user never passed `--time`.
This contradicts the PR's stated design: *"When `--time` is omitted, the env var is left untouched."* The env-var path in `cli.py` is correctly guarded by `if time is not None:`, but all five plumbed commands bypass that guard by forwarding `ctx.obj.get("time")` which is always a float.
The same issue applies to `pdd/commands/fix.py:156`, `pdd/commands/modify.py:266`, `pdd/commands/checkup.py:113`, and `pdd/commands/maintenance.py:208`.
A fix is to store the explicit-flag state separately and only forward a non-None value when the user actually typed `--time`:
```python
# in cli.py
ctx.obj["time"] = time # keep None when --time was not provided
ctx.obj["time_default"] = time if time is not None else DEFAULT_TIME # for LiteLLM steps
```
and downstream commands use `reasoning_time=ctx.obj.get("time")` (which is now `None` when omitted).
How can I resolve this? If you propose a fix, please make it concise.| if reasoning_effort and not quiet: | ||
| # Always surface outside --quiet mode — silently dropping the user's | ||
| # reasoning signal is a support-ticket generator. The Claude Code CLI | ||
| # has no --reasoning-effort flag today, so clarify that the effort | ||
| # applies to LiteLLM-invoked steps (analysis/verification) but not | ||
| # to this code-writing subprocess. | ||
| console.print( | ||
| f"[dim]PDD_REASONING_EFFORT={reasoning_effort} requested, but Claude Code CLI " | ||
| "has no reasoning-effort flag; applies to llm_invoke steps only, " | ||
| "not this subprocess.[/dim]" | ||
| ) |
There was a problem hiding this comment.
Notice fires in normal mode, not just verbose — and fires for every Codex default effort
The comment says "Always surface outside --quiet mode", but given the P1 above (default 0.25 → "low" always propagates), this message fires on every pdd fix/pdd bug/pdd change/pdd checkup/pdd sync invocation in normal mode, even when the user didn't touch --time. The PR description says "logs once in verbose mode", but the guard is not quiet, so users who have never heard of --time will see this diagnostic in every run.
Once the None-propagation bug is fixed this becomes moot for the unintentional case; but the not quiet vs. verbose discrepancy with the PR description is worth aligning.
| if reasoning_effort and not quiet: | |
| # Always surface outside --quiet mode — silently dropping the user's | |
| # reasoning signal is a support-ticket generator. The Claude Code CLI | |
| # has no --reasoning-effort flag today, so clarify that the effort | |
| # applies to LiteLLM-invoked steps (analysis/verification) but not | |
| # to this code-writing subprocess. | |
| console.print( | |
| f"[dim]PDD_REASONING_EFFORT={reasoning_effort} requested, but Claude Code CLI " | |
| "has no reasoning-effort flag; applies to llm_invoke steps only, " | |
| "not this subprocess.[/dim]" | |
| ) | |
| if reasoning_effort and verbose and not quiet: |
Prompt To Fix With AI
This is a comment left during a code review.
Path: pdd/agentic_common.py
Line: 1190-1200
Comment:
**Notice fires in normal mode, not just verbose — and fires for every Codex default effort**
The comment says *"Always surface outside --quiet mode"*, but given the P1 above (default `0.25` → `"low"` always propagates), this message fires on every `pdd fix`/`pdd bug`/`pdd change`/`pdd checkup`/`pdd sync` invocation in normal mode, even when the user didn't touch `--time`. The PR description says *"logs once in verbose mode"*, but the guard is `not quiet`, so users who have never heard of `--time` will see this diagnostic in every run.
Once the `None`-propagation bug is fixed this becomes moot for the unintentional case; but the `not quiet` vs. `verbose` discrepancy with the PR description is worth aligning.
```suggestion
if reasoning_effort and verbose and not quiet:
```
How can I resolve this? If you propose a fix, please make it concise.| ``PDD_REASONING_EFFORT`` env var for argv injection. ``None`` | ||
| means "fall back to env" so unplumbed call sites keep working. | ||
|
|
||
| Returns: |
There was a problem hiding this comment.
Docstring parameter name mismatch — the docstring here says
time: but the parameter is reasoning_time. Same issue in _run_with_provider (line 1116). IDEs and generated docs will show the wrong name; trivial to align.
| ``PDD_REASONING_EFFORT`` env var for argv injection. ``None`` | |
| means "fall back to env" so unplumbed call sites keep working. | |
| Returns: | |
| reasoning_time: Reasoning-allocation float in [0.0, 1.0] forwarded from the | |
| top-level ``pdd --time`` flag. When provided, overrides the | |
| ``PDD_REASONING_EFFORT`` env var for argv injection. ``None`` | |
| means "fall back to env" so unplumbed call sites keep working. |
Prompt To Fix With AI
This is a comment left during a code review.
Path: pdd/agentic_common.py
Line: 784-787
Comment:
**Docstring parameter name mismatch** — the docstring here says `time:` but the parameter is `reasoning_time`. Same issue in `_run_with_provider` (line 1116). IDEs and generated docs will show the wrong name; trivial to align.
```suggestion
reasoning_time: Reasoning-allocation float in [0.0, 1.0] forwarded from the
top-level ``pdd --time`` flag. When provided, overrides the
``PDD_REASONING_EFFORT`` env var for argv injection. ``None``
means "fall back to env" so unplumbed call sites keep working.
```
How can I resolve this? If you propose a fix, please make it concise.|
The Unit Tests failures here are pre-existing on Happy to merge as-is, or wait if you want main green first. |
… CODEX_REASONING_EFFORT precedence (xhigh) Mirror of gltanaka/pdd#1293 commit 27980dc, addressing Greg's PR review. Same fix as the private PR (gated kwarg forwarding + Codex-specific env var with xhigh support), prompt files omitted because this public repo does not carry pdd/prompts/agentic_common_python.prompt or pdd/prompts/core/cli_python.prompt — the prompt-side documentation lives only on the private repo. 13 new tests pass on this branch covering both blockers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Mirrors the CLI-side fix from
gltanaka/pdd#1293onto the publicpromptdriven/pddrepo, since the GitHub App worker image installs frompromptdriven/pdd@main.Closes the gap between the top-level
--timeflag and the agentic subprocess launches used bypdd fix <issue-url>,pdd analysis/bug <issue-url>, andpdd checkup <issue-url>.ctx.obj["time"]already routes into LiteLLM-driven steps (the analysis planner, verifier, classifier) via the CSV-drivenreasoning_effortmapping inllm_invoke.py. But the subprocess calls that actually write code (run_agentic_task→_run_with_provider→ Codex/Claude Code/Gemini CLIs) only ever forwarded--model, never a reasoning signal. A--time 0.85run produced the same provider argv as an unspecified run.What changed
pdd/core/cli.pyMaps
--time→PDD_REASONING_EFFORT={low,medium,high}using the same thresholdsllm_invoke.pyuses (>0.7 high, >0.3 medium, else low). When--timeis omitted, the env var is left untouched.pdd/agentic_common.py_run_with_providerreadsPDD_REASONING_EFFORT(validated against{low, medium, high}) and:-c model_reasoning_effort=<level>before theexecsubcommand.Tests
-cordered strictly beforeexec; Anthropic and Gemini argv unchanged when env set).CliRunnercoverage of--time→ env mapping including boundary values.Why this PR exists
The companion GitHub App PR (
promptdriven/pdd_cloud#1162) plumbspdd-effort-*labels →--timeargv position, but the worker installs the CLI frompromptdriven/pdd@main. Without this PR merged first, the worker would install a CLI where--timereaches argv but doesn't change provider reasoning behavior.Merge order: this PR first, then
pdd_cloud#1162(rebased onto main, Dockerfile stays at@main).Test plan
pdd_cloud#1162lands on top.🤖 Generated with Claude Code