Skip to content

LLM-based optimizations for BatchCompleter#1293

Merged
brandur merged 1 commit into
masterfrom
brandur-completion-optimizations
Jun 29, 2026
Merged

LLM-based optimizations for BatchCompleter#1293
brandur merged 1 commit into
masterfrom
brandur-completion-optimizations

Conversation

@brandur

@brandur brandur commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

This one's just throwing Codex at BatchCompleter to see what
optimizations it can find in there as this seems to be by far our
slowest spot in River with respect to benchmarking at least.

It didn't do anything huge, but ended up putting in a variety of minor
optimizations:

  • Removed the separate setStateStartTimes map by storing StartTime
    in batchCompleterSetState.

  • Changed queued state storage from pointer values to map values,
    removing a wrapper allocation.

  • Collapsed backlog checking + enqueue into one lock acquisition on the
    common path.

  • Replaced sliceutil.Map with a direct event-building loop.

  • Use one batch completion timestamp instead of calling Now() once per
    completed row.

  • Avoid repeatedly assigning params.Schema inside the batch mapping
    loop.

Benchmarks seem to indicate a previous 12-13 us/op coming down to 10-11
us/op, and with one fewer allocation per op.

I'm seeing a definite speedup when I run the full benchmark. Our bench
has some consistency problems (can show considerably different results
per run), but whereas before ~46k jobs/sec was about the best I ever saw
on my commodity MBP here, I'm now seeing up to ~56k jobs/sec, so at best
a 10k jobs/sec improvement:

$ go run ./cmd/river bench --database-url $DATABASE_URL --num-total-jobs 1_000_000
bench: jobs worked [          0 ], inserted [    1000000 ], job/sec [        0.0 ] [0s]
bench: jobs worked [     106472 ], inserted [          0 ], job/sec [    53236.0 ] [2s]
bench: jobs worked [     108440 ], inserted [          0 ], job/sec [    54220.0 ] [2s]
bench: jobs worked [     114035 ], inserted [          0 ], job/sec [    57017.5 ] [2s]
bench: jobs worked [     107402 ], inserted [          0 ], job/sec [    53701.0 ] [2s]
bench: jobs worked [     114433 ], inserted [          0 ], job/sec [    57216.5 ] [2s]
bench: jobs worked [     105701 ], inserted [          0 ], job/sec [    52850.5 ] [2s]
bench: jobs worked [     116051 ], inserted [          0 ], job/sec [    58025.5 ] [2s]
bench: jobs worked [     108054 ], inserted [          0 ], job/sec [    54027.0 ] [2s]
bench: jobs worked [     119412 ], inserted [          0 ], job/sec [    59706.0 ] [2s]
bench: total jobs worked [    1000000 ], total jobs inserted [    1000000 ], overall job/sec [    55710.8 ], running 17.949838958s

The number should be even better on faster computers.

Nothing in the code gets any worse (and I think some of it is actually
an improvement?) so I think it's probably worthwhile to bring these in.

@brandur brandur force-pushed the brandur-smoother-completion branch from 168f657 to d177425 Compare June 20, 2026 16:07
@brandur brandur force-pushed the brandur-completion-optimizations branch 2 times, most recently from 1e5df34 to 4a982be Compare June 20, 2026 16:19
@brandur brandur force-pushed the brandur-smoother-completion branch 2 times, most recently from fb7ef2c to 4acc752 Compare June 20, 2026 19:17
@brandur brandur force-pushed the brandur-completion-optimizations branch from 4a982be to d165299 Compare June 20, 2026 19:48
@brandur brandur requested a review from bgentry June 20, 2026 19:58
@brandur brandur force-pushed the brandur-smoother-completion branch from 4acc752 to 2bb559c Compare June 20, 2026 20:01
@brandur brandur force-pushed the brandur-completion-optimizations branch from d165299 to 9b268fa Compare June 20, 2026 20:03

@bgentry bgentry left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Just one minor code quality concern from me but overall a nice batch of improvements, good stuff 🙌

Comment on lines +557 to +580
for {
// Keep the common enqueue path to one lock acquisition. If the
// completer is behind, wait for the current backlog gate to open and
// retry so the threshold is checked against fresh state.
c.setStateParamsMu.Lock()
if c.waitOnBacklogWaiting {
waitChan := c.waitOnBacklogChan
c.setStateParamsMu.Unlock()
<-waitChan
continue
}

statsSnapshot := *stats
waitAt := c.backlogWaitThresholdEffective()
backlogSize = len(c.setStateParams)
if backlogSize >= waitAt {
c.initBacklogWaitLocked(ctx, backlogSize, waitAt)
}

c.setStateParams[params.ID] = &batchCompleterSetState{params, &statsSnapshot}
c.setStateStartTimes[params.ID] = now
backlogSize := len(c.setStateParams)
c.setStateParamsMu.Unlock()
statsSnapshot := *stats
c.setStateParams[params.ID] = batchCompleterSetState{Params: params, StartTime: now, Stats: &statsSnapshot}
backlogSize = len(c.setStateParams)
c.setStateParamsMu.Unlock()
break
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section feels a little sketchier to me in terms of mutex handling fwiw, might be worth asking if a refactor could clean it up and make it easier to read.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. I had Codex refactor it into JobSetStateIfRunning plus a helper function and it's much cleaner now. All mutex handling happens with a single lock and a deferred unlock.

Base automatically changed from brandur-smoother-completion to master June 29, 2026 06:04
@brandur brandur force-pushed the brandur-completion-optimizations branch from 9b268fa to 5dbdca4 Compare June 29, 2026 06:04
This one's just throwing Codex at `BatchCompleter` to see what
optimizations it can find in there as this seems to be by far our
slowest spot in River with respect to benchmarking at least.

It didn't do anything huge, but ended up putting in a variety of minor
optimizations:

- Removed the separate `setStateStartTimes` map by storing `StartTime`
  in `batchCompleterSetState`.

- Changed queued state storage from pointer values to map values,
  removing a wrapper allocation.

- Collapsed backlog checking + enqueue into one lock acquisition on the
  common path.

- Replaced `sliceutil.Map` with a direct event-building loop.

- Use one batch completion timestamp instead of calling `Now()` once per
  completed row.

- Avoid repeatedly assigning `params.Schema` inside the batch mapping
  loop.

Benchmarks seem to indicate a previous 12-13 us/op coming down to 10-11
us/op, and with one fewer allocation per op.

I'm seeing a definite speedup when I run the full benchmark. Our bench
has some consistency problems (can show considerably different results
per run), but whereas before ~46k jobs/sec was about the best I ever saw
on my commodity MBP here, I'm now seeing up to ~56k jobs/sec, so at best
a 10k jobs/sec improvement:

    $ go run ./cmd/river bench --database-url $DATABASE_URL --num-total-jobs 1_000_000
    bench: jobs worked [          0 ], inserted [    1000000 ], job/sec [        0.0 ] [0s]
    bench: jobs worked [     106472 ], inserted [          0 ], job/sec [    53236.0 ] [2s]
    bench: jobs worked [     108440 ], inserted [          0 ], job/sec [    54220.0 ] [2s]
    bench: jobs worked [     114035 ], inserted [          0 ], job/sec [    57017.5 ] [2s]
    bench: jobs worked [     107402 ], inserted [          0 ], job/sec [    53701.0 ] [2s]
    bench: jobs worked [     114433 ], inserted [          0 ], job/sec [    57216.5 ] [2s]
    bench: jobs worked [     105701 ], inserted [          0 ], job/sec [    52850.5 ] [2s]
    bench: jobs worked [     116051 ], inserted [          0 ], job/sec [    58025.5 ] [2s]
    bench: jobs worked [     108054 ], inserted [          0 ], job/sec [    54027.0 ] [2s]
    bench: jobs worked [     119412 ], inserted [          0 ], job/sec [    59706.0 ] [2s]
    bench: total jobs worked [    1000000 ], total jobs inserted [    1000000 ], overall job/sec [    55710.8 ], running 17.949838958s

The number should be even better on faster computers.

Nothing in the code gets any worse (and I think some of it is actually
an improvement?) so I think it's probably worthwhile to bring these in.
@brandur brandur force-pushed the brandur-completion-optimizations branch from 5dbdca4 to 6bb88ad Compare June 29, 2026 19:30
@brandur brandur merged commit 944f077 into master Jun 29, 2026
15 checks passed
@brandur brandur deleted the brandur-completion-optimizations branch June 29, 2026 19:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants