feat: split localpi token status rates by osolmaz · Pull Request #14 · dutifuldev/localpi

osolmaz · 2026-06-26T05:04:35Z

Summary

Localpi was showing one token speed number for a whole turn.
That mixed prompt processing time with output generation time.
This change makes the Pi token status extension report generation speed separately, and adds a final prefill rate when usage data is available.
It uses the first streamed assistant output as the only phase boundary Pi exposes to this extension.

What Changed

The token status extension now tracks when assistant output first appears.
Before that point, it treats the turn as prefill/time-to-first-output; after that point, it reports generation speed using output tokens over generation elapsed time.

Added firstOutputAt to per-turn token status state.
Changed live status from one whole-turn tok/s value to gen ... tok/s after output begins.
Added final prefill ... tok/s when usage input/cache data is available.
Kept existing output, input, cache, elapsed, and context status fields.
Updated the runtime spec and extension regression test.

Testing

The changed extension source transpiles, the focused extension tests pass, and the TypeScript project builds.
The full local check is blocked on this machine by live local model servers being discovered during an unrelated runtime test.

npm run format passed.
npm run typecheck passed.
npm test -- tests/extensions.test.ts tests/extension-source.test.ts passed.
npm run build passed.
npm run check failed only in tests/runtime.test.ts > runtime resolution > selects profile aliases for providers with discovery disabled; the failure shows live LM Studio/vLLM models from 127.0.0.1:1234 and 127.0.0.1:8000 being included in catalog data.

Risks

This is a display-only change in the generated Pi extension.
The main limitation is that Pi does not expose DS4's internal prompt-sync boundary here, so localpi uses first streamed output as the observable boundary.

If a provider does not stream message_update events, final generation timing falls back to whole-turn elapsed time.
Prefill speed is only shown when final usage data includes input token counts.

osolmaz · 2026-06-26T05:16:20Z

Final report:

Implemented the localpi token status phase split and pushed the branch.
The extension now reports live generation speed separately from prefill/time-to-first-output and reports final prefill rate from the uncached input buckets when usage data is available.

Validation:

npm run format passed.
npm run lint passed locally during npm run check before the unrelated runtime-test failure, and passed in Codex review.
npm run typecheck passed.
npm test -- tests/extensions.test.ts tests/extension-source.test.ts passed.
npm run build passed.
npm run check locally reached the full test suite but failed in tests/runtime.test.ts > runtime resolution > selects profile aliases for providers with discovery disabled because this machine has live local model servers on 127.0.0.1:1234 and 127.0.0.1:8000 contaminating catalog data.
codex review --base main first found a valid cache bucket issue; that was fixed in f0617d5.
codex review --base main after the fix found no actionable correctness issues.
GitHub CI ci / test passed.

PR is ready for human review/merge.

osolmaz added 2 commits June 26, 2026 13:03

feat: split localpi token status rates

93b5ccf

fix: count uncached prefill token buckets

f0617d5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: split localpi token status rates#14

feat: split localpi token status rates#14
osolmaz wants to merge 2 commits into
mainfrom
feat/token-status-phase-rates

osolmaz commented Jun 26, 2026

Uh oh!

osolmaz commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

osolmaz commented Jun 26, 2026

Summary

What Changed

Testing

Risks

Uh oh!

osolmaz commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant