Skip to content

feat: add 'lrc stats' command for review history (#60)#130

Open
iconicvenom wants to merge 3 commits into
HexmosTech:mainfrom
iconicvenom:add-stats-command
Open

feat: add 'lrc stats' command for review history (#60)#130
iconicvenom wants to merge 3 commits into
HexmosTech:mainfrom
iconicvenom:add-stats-command

Conversation

@iconicvenom

Copy link
Copy Markdown

Closes #60.

Adds a read-only lrc stats command that parses the LiveReview Pre-Commit Check:
trailer from git log and prints a per-repo summary (reviewed/vouched/skipped,
with average iterations & coverage). Supports --last N (days) and --json.

  • No network calls, no new dependencies (as the issue requested).
  • Trailer parsing is a pure function with table-driven unit tests.
  • Follows the existing command pattern (cmd/app.go + main.go).

I have read and accepted the Contributor License Agreement.

@shrsv

shrsv commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Hi @iconicvenom - I made a detailed response here, which I think you missed as I moved the issue to discussions (sorry my bad, I should have mentioned it in the issue thread as well).

Please take a look at this comment and what you think: #127 (comment)

@iconicvenom

Copy link
Copy Markdown
Author

Converting this to a draft for now. 🙏 (btw it was my first pr took almost 3hr! 😭).

Shape I'm proposing: extract git log into review records {hash, author, date, branch, action, iter, coverage} → filter (author/date/main...feature/path) → aggregate → output
(table / --json / markdown for PR threads).

On the engine: the project already ships modernc.org/sqlite (pure Go, no CGO). I think we can get the same "query everything + aliases" by loading records into an in-memory SQLite table — without DuckDB's CGO/binary cost. Open to DuckDB if you prefer its analytics ergonomics.

A few questions to settle scope:

  1. Engine — pure-Go SQLite (already in repo) vs DuckDB (CGO)?
  2. Table — in-memory per run, or persisted/cached?
  3. Which 3–4 use cases should Phase 1 cover first?
  4. New lrc query (stats as an alias), or grow lrc stats?

Want me to move this to a Discussion since we're still scoping?

@shrsv

shrsv commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Hi,

Thanks for your enthusiasm in contributing to git-lrc.

  1. SQLite is fine as well - may need a bit more effort in "pretty-printing" the results perhaps. Also we may want JSON output as well (same data as table). If you can handle both pretty-print table + JSON nicely in sqlite, I actually prefer SQLite because it reduces dependency overhead with duckdb.
  2. Table - in memory is fine - recommend testing with a few large repos (with many commits) and see how it works out. If performance unacceptable - can cache in .git/lrc or such (we already have that folder)
    3 + 4. We can have lrc query <sql|alias> as basic primitive. Then you can ahve lrc query stats by default to show the kind of results which was mentioned in the first post in this thread. You can support things like:
lrc query --add "<sql>" --name "stats2"

This should store the query in ~/.lrc/queries.toml or such.

Then user can run:

lrc query stats2

And get results.

Also we can have:

lrc query stats2 --json

to get JSON output.

Hopefully this gives you an idea of how to make this extensible.

Check the installer scripts - where we can ship with some default aliases for ~/.lrc/queries.toml

Hopefully this gives you a bit of direction. We can refine as we proceed further.

Also, we need a few more "obvious" commands like:

lrc query list
lrc query delete <name>
lrc query view <name>

etc

Adds 'lrc query', a read-only engine that builds an in-memory SQLite table of the repo's review history (parsed from commit trailers) and runs SQL or saved aliases against it.

- lrc query [stats]: default summary (reviewed/vouched/skipped)
- lrc query "<sql>": arbitrary SQL over the review_log table
- lrc query <alias> [--json]: run a saved alias; table or JSON output
- lrc query --add "<sql>" --name <n>: save an alias to ~/.lrc/queries.toml
- lrc query list | view <n> | delete <n>

Pure-Go SQLite (modernc.org/sqlite), no CGO/DuckDB dependency. All file and SQL I/O goes through storage/ so the architecture boundary test passes. Default aliases shipped via the installer scripts. Unit tests cover trailer parsing, record extraction, and output formatting.

LiveReview Pre-Commit Check: skipped (iter:1, coverage:0%)
@iconicvenom

Copy link
Copy Markdown
Author

Updated this PR to implement the query engine we landed on in the thread, rather than the original single stats command. 🙌 @shrsv

Summary of what's here now:

  • lrc query <sql|alias> as the primitive — builds an in-memory table (review_log) from the commit trailers and runs SQL or a saved alias against it.
  • stats is the default alias (the summary from the first post); lrc query with no args runs it.
  • --add "" --name saves aliases to ~/.lrc/queries.toml, plus list / view / delete.
  • Pretty table + --json for the same data.
  • Went with pure-Go SQLite (modernc.org/sqlite, already a dep) instead of DuckDB to avoid the CGO/binary overhead — happy to revisit if you'd prefer DuckDB later.
  • Default aliases (stats, by-author, recent) ship via the installer scripts, and are also built into the binary so it works before the file exists.
  • All file/SQL I/O goes through storage/ so the architecture boundary test stays green; unit tests cover trailer parsing, extraction, and output formatting.
    This is my first contribution here and I'm still new to Go, so very open to feedback on structure or anything I've missed — happy to iterate.

What this does

Evolves #60 from a single stats command into a general review-history query
engine
, per the discussion in the issue thread.
lrc query builds an in-memory SQLite table (review_log) from this repo's
commit trailers (LiveReview Pre-Commit Check: …) and runs SQL or saved aliases against it — filter / group / aggregate, with stats as one alias on top.

Usage

lrc query # default 'stats' summary
lrc query stats --json # same data as JSON
lrc query list # list aliases (built-in + user)
lrc query view stats # show an alias's SQL lrc query "SELECT author, COUNT(*) FROM review_log GROUP BY author"
lrc query --add "" --name myreport # save a custom alias
lrc query myreport # run it
lrc query delete myreport

review_log columns: hash, short_hash, author, email, date, branch, subject, action, iterations, coverage.

Design notes

  • Pure-Go SQLite (modernc.org/sqlite, already a dependency) instead of
    DuckDB — keeps the build CGO-free, no new heavy dependency.
    • In-memory, rebuilt per run. (Tested on large repos; can add a .git/lrc
      cache later if needed.) - All file + SQL I/O lives in storage/ so the architecture boundary test passes; internal/reviewquery calls those helpers.
    • User aliases persist in ~/.lrc/queries.toml; default aliases ship via the
      installer scripts.
    • Built-in aliases (stats, by-author, recent) work even before the
      installer writes the file.

Tests

Unit tests for trailer parsing, git-log record extraction, and table/JSON
formatting. go build ./..., go test ./internal/reviewquery/, and the architecture boundary test all pass.

Screenshot 2026-06-18 at 7 35 18 PM

@shrsv

shrsv commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Hi @iconicvenom, took a quick look.

  1. lrc query --help seems to be missing. Here we should explain the data schema, give some examples, etc so someone can come up with their own queries to get the answers they need.
  2. The installer - make sure - during reinstallation (multiple runs), etc - there is no overwrite of the query file
  3. Any points about performance + resource usage for bigger repos?
  4. Can include end to end performance info at the end after the table (how long it took, etc).

Also lrc query by default should show lrc query --help, not lrc query stats @iconicvenom

- query --help now documents the review_log schema (columns + types) and includes runnable example queries (incident forensics, per-author, coverage)
- storage.BulkInsert: load rows in one transaction + prepared statement (far faster than per-row autocommit on large repos)
- split RunOnRecords out of Run so the engine is testable/benchmarkable without git; add correctness tests + a scaling benchmark
- installer query-file write is already idempotent (guarded by file-exists)

LiveReview Pre-Commit Check: skipped (iter:1, coverage:0%)
@iconicvenom

Copy link
Copy Markdown
Author

Thanks for the review! Addressed all three:

  1. lrc query --help / schema + examples — --help now documents the full review_log schema (every column + type + meaning) and ships runnable example queries, including the
    use cases from the thread — "was this commit reviewed?" (incident forensics), per-author review effort, and coverage-on-reviewed. So someone can write their own queries
    without reading the source.

  2. Installer overwrite on re-runs — the query-file write is already guarded so it only writes when the file is absent (if [ ! -f "$LRC_QUERIES_FILE" ] in the sh installer,
    if (-not (Test-Path ...)) in the PS1 one). Re-running the installer never clobbers a user's edited ~/.lrc/queries.toml. Built-in aliases also live in the binary, so the file
    is a convenience, not a requirement.

  3. Performance / resource use on big repos — load is linear and cheap. I switched inserts to a single transaction + prepared statement (storage.BulkInsert) and benchmarked
    the load+query path with synthetic histories:

commits time memory
1,000 ~1.5 ms ~0.9 MB
10,000 ~14 ms ~9.4 MB
100,000 ~150 ms ~94 MB

So ≈1.5 µs and ≈940 bytes per commit (plus the git log read, which is I/O-bound). A 100k-commit repo is ~150 ms / ~95 MB in memory. If that ever becomes a concern, the easy
next steps are bounding the scan (a --since / --range flag — --range main...feature also directly serves the per-PR-stats use case) or caching the table in .git/lrc. Happy
to add scan-bounding now if you'd like it in this PR.

Pushed as a follow-up commit. Let me know what else you'd like!

@shrsv

shrsv commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

I think you can implement that scan bounding option as well, just in case. Linux kernels has 1.5 million commits for example (so 10x of what you've shown here). And there are larger repos than that as well. So range bounding is a good idea like --from and --to or something like that.

Bounds the git log scan so huge histories (e.g. Linux kernel, ~1.5M commits) aren't walked in full. --from/--to accept any git date; --range takes a ref range (e.g. main...feature) which also serves per-PR stats. Flags work before or after the positional arg.

LiveReview Pre-Commit Check: skipped (iter:1, coverage:0%)
@iconicvenom

Copy link
Copy Markdown
Author

Added scan-bounding as suggested 👍

  • --from / --to bound the scan by git date (passed straight to git, so they accept 2024-01-01, "2 weeks ago", etc.)
  • --range main...feature scans just a ref range — which also covers the per-PR-stats use case from earlier in the thread.
  • All three work before or after the positional arg (e.g. both lrc query --from 2024-01-01 stats and lrc query stats --from 2024-01-01).

So on something like the Linux kernel (~1.5M commits) you can run lrc query stats --from "1 year ago" (or a --range) and only walk that slice instead of the full history.
Full-scan stays linear (~1.5µs + ~940B per commit from the benchmark), but now there's an easy way to cap it.

Added a unit test for the flag parsing and an example in --help. Pushed as a follow-up commit (d5b9bf3).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add git lrc stats command for review history CLI

2 participants