Skip to content

Process shake restarts asynchronously#4879

Open
crtschin wants to merge 11 commits into
haskell:masterfrom
crtschin:crtschin/deduplicate-shake-restarts
Open

Process shake restarts asynchronously#4879
crtschin wants to merge 11 commits into
haskell:masterfrom
crtschin:crtschin/deduplicate-shake-restarts

Conversation

@crtschin
Copy link
Copy Markdown
Collaborator

@crtschin crtschin commented Mar 31, 2026

Would close #4725.

Implements the ideas from #4725 (comment). Makes the following changes:

  1. Processes notifications asynchronously keeping with the logic that notifications preceding any request, also finish before that request is processed. But does so asynchronously.
  2. The above opens up the possibility that there could be multiple shake restarts in-flight because multiple notifications arriving in quick succession. Instead of storing these restarts in a queue, accumulate all restart changes in a single slot, to be processed at once.

@crtschin crtschin requested review from fendor and wz1000 as code owners March 31, 2026 19:53
@crtschin crtschin marked this pull request as draft March 31, 2026 19:56
@crtschin crtschin force-pushed the crtschin/deduplicate-shake-restarts branch 4 times, most recently from cc251cd to 88e0f88 Compare April 1, 2026 23:22
@fendor fendor requested a review from soulomoon April 7, 2026 09:37
@crtschin crtschin force-pushed the crtschin/deduplicate-shake-restarts branch from 0a7302b to 4090ec3 Compare April 9, 2026 17:27
@crtschin
Copy link
Copy Markdown
Collaborator Author

crtschin commented Apr 9, 2026

I can't get tests to pass with notifications fully async. Kinda makes sense looking at what these notifications indicate. I thought I could get away with only LSP handling everything synchronously of the VFS-side, but it appears not.

I'm now trying out a small variant where notifications are still synchronously, but that the session restarts are initiated async, while still blocking subsequent LSP requests. So restarts still get squashed.

@crtschin crtschin force-pushed the crtschin/deduplicate-shake-restarts branch 2 times, most recently from 42b60e4 to 69e0d70 Compare April 9, 2026 22:30
@crtschin crtschin force-pushed the crtschin/deduplicate-shake-restarts branch 5 times, most recently from 1dfd24e to 5978cad Compare May 2, 2026 22:54
@soulomoon soulomoon added the performance Issues about memory consumption, responsiveness, etc. label May 3, 2026
Copy link
Copy Markdown
Collaborator

@soulomoon soulomoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good work on the restart squash.
But if notifications are still handled synchronously, #4725 won't be close since consective typing emits consective notifications and their restarts won't be able to merge

sessionRestartTQueue <- withWorkerQueueSimple (cmapWithPrio Session.LogSessionWorkerThread recorder) "RestartTQueue"
sessionRestartTQueue <- liftIO $ newRestartSlot
sessionLoaderTQueue <- withWorkerQueueSimple (cmapWithPrio Session.LogSessionWorkerThread recorder) "SessionLoaderTQueue"
ContT $ \action -> withRestartWorker ideMVar $ action ()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better to extend the logic of workerThread module so it can be reused here.

there is a slight chance that downstream code would swallow the async exception and hang the shutdown. workerThread handled that by setting up a shutdown flag .

@crtschin
Copy link
Copy Markdown
Collaborator Author

crtschin commented May 4, 2026

But if notifications are still handled synchronously, #4725 won't be close since consective typing emits consective notifications and their restarts won't be able to merge

Thanks for taking a peak at the PR! The idea I'm going for now here is to still have notifications be synchronous, but that they don't wait for the shake restart to finish, unlike now. So multiple subsequent edits will issue multiple synchronous notifications which will all start asynchronous shake restarts. This allows the restarts to merge.

Note that this is still me experimenting, I'm not sure if I'm breaking preconditions. This is all quite nuancy.

@soulomoon
Copy link
Copy Markdown
Collaborator

soulomoon commented May 4, 2026

So multiple subsequent edits will issue multiple synchronous notifications which will all start asynchronous shake restarts. This allows the restarts to merge.

good idea

they don't wait for the shake restart to finish

I’ve experienced this before: if we don’t wait for Shake to fully restart, requests may observe stale state. I think we should ensure any pending restarts have completed before processing new requests but keep the notifications side of the story as you suggested, It might work.

Also it might be a good idea to go through the occurences of mkPluginNotificationHandler and find out if bug would occur.

----- update ----

Also we should move the "updatePositionMapping" in the notification handler into restart so every restart(merged or not) would see consistent mapping state ?

@crtschin crtschin changed the title Process notifications asynchronously Process shake restarts asynchronously May 4, 2026
@crtschin crtschin changed the title Process shake restarts asynchronously WIP: Process shake restarts asynchronously May 4, 2026
@crtschin crtschin force-pushed the crtschin/deduplicate-shake-restarts branch from 7e4c4c7 to 224e74f Compare May 13, 2026 20:38
@crtschin
Copy link
Copy Markdown
Collaborator Author

crtschin commented May 15, 2026

Alright, report time. Tests now pass. It now runs the notification handlers synchronously but any shake restart is done asynchronously through a worker thread. Some notes.

  • I refactored WorkerThread so I could swap out the queue for a single slot that merges elements.
  • While shake restarts are asynchronously, most notifications have side-effects they want to have happen inbetween triggering shake restarts, e.g. changing the set of FOI. Tripped me up quite a bit what to do with this in the context of merging shake restarts, the current version runs these prior to even submitting the shake restart.
  • The eval plugin contains a request that triggers a shake restart and subsequently waited on it. As the shake restart is now async, I had to add a blocking wait on the shake restart finishing.
  • Other variations I tried but didn't pass tests:
    • Running notifications async.
    • Running notifications sync, with async shake restarts, but accumulating and calling the notification actions in between shake restarts.

I ran the benchmarks in #4934, with smaller (very small) typing delays, which is optimal for observing the effect of this PR, but not super realistic. I couldn't observe a difference with the default delay of 0.25s.

Running the typing burst benchmarks with very small typing delays Comparison against upstream
Example, Version, Configuration, name, success, samples, startup, setup, userT, delayedT, 1stBuildT, avgPerRespT, totalT, rulesBuilt, rulesChanged, rulesVisited, rulesTotal, ruleEdges, ghcRebuilds, maxResidency, allocatedBytes
cabal, upstream, All, hover after typing burst, True, 100, 4.07, 0.00, 6.49, 14.59, 0.55, 0.03, 21.09, 18, 15, 1164, 4042, 19942, 134, 286MB, 90219MB
cabal, HEAD, All, hover after typing burst, True, 100, 3.99, 0.00, 5.95, 14.93, 0.67, 0.03, 20.89, 18, 15, 1181, 4041, 19942, 114, 277MB, 88055MB
cabal, upstream, All, getDefinition after typing burst, True, 100, 0.66, 0.00, 7.51, 14.49, 0.64, 0.03, 22.02, 17, 14, 1183, 4045, 19949, 137, 271MB, 85329MB
cabal, HEAD, All, getDefinition after typing burst, True, 100, 0.57, 0.00, 6.10, 12.72, 0.75, 0.03, 18.83, 18, 15, 1181, 4043, 19949, 119, 279MB, 79739MB
cabal, upstream, All, completions after typing burst, True, 100, 0.67, 0.00, 9.62, 12.36, 0.76, 0.04, 21.99, 19, 16, 1182, 4043, 19941, 127, 348MB, 90711MB
cabal, HEAD, All, completions after typing burst, True, 100, 0.56, 0.00, 8.96, 11.11, 0.88, 0.04, 20.09, 19, 16, 1182, 4042, 19941, 117, 317MB, 88975MB

@crtschin crtschin changed the title WIP: Process shake restarts asynchronously Process shake restarts asynchronously May 15, 2026
@crtschin crtschin marked this pull request as ready for review May 15, 2026 23:14
@crtschin crtschin requested a review from soulomoon May 15, 2026 23:19
@soulomoon
Copy link
Copy Markdown
Collaborator

soulomoon commented May 15, 2026

any shake restart is done asynchronously through a worker thread

@crtschin consider this, notification A then request B. How do we ensure B run after A's session restart in this PR ?

@crtschin
Copy link
Copy Markdown
Collaborator Author

@crtschin consider this, notification A then request B. How do we ensure B run after A's session restart in this PR ?

Good question, I'll walk you through an example timeline.

  1. Notification A adds a lock identifying itself here.
  2. Notification A gets executed.
  3. Notification A queues up a shakeRestart via one of the set*Modified helpers.
  4. Notification A spawns a thread waiting on the restart being finished and fills the lock it created in step (1) when the shake restart is finished.
  5. Notification A finishes
  6. Request B starts being processed asynchronously
  7. Request B waits on all notification locks that were submitted prior.
  8. Worker thread finishes the shake restart submitted by notification A.
  9. Request B proceeds and gets handled.

Critically it's the locks that are added by notifications, which are waited on by follow-up requests, that ensures the ordering.

Copy link
Copy Markdown
Collaborator

@soulomoon soulomoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@crtschin Good work and thanks for the explaination.
There is a problem about ioActionBetweenShakeSession.

-- old session and @GetLinkable@ errors.
-- See Note [Notification vs request restart ordering]
(modifyForEvaluate queueForEvaluation >> waitForLastRestart st)
(modifyForEvaluate unqueueForEvaluation)
Copy link
Copy Markdown
Collaborator

@soulomoon soulomoon May 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we not need to waitForLastRestart for unqueueForEvaluation as well ?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good one, I was thinking waiting was only needed for the session setup, but it makes sense for the eval itself as well.

-- do need to occur here, the keys they invalidate need to propagate to the
-- worker so it can be used during the concrete restart.
-- See Note [Housekeeping rule cache and dirty key outside of hls-graph].
!newDirty <- fromListKeySet <$> ioActionBetweenShakeSession
Copy link
Copy Markdown
Collaborator

@soulomoon soulomoon May 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ioActionBetweenShakeSession should be run after all the shake rules that rely on these states fully stop.
If it happpens here, there is chance that the newer state being picked up by an old step rule run and result in inconsistancy of the shake database. It is used to be an old bug stay in hls for years until it was fixed.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I tried this, but some tests failed. I'll try this again, as some of the failures may have been eval-related.

Moves in-between shake actions to be accumulated and ran on the worker
thread again, instead of prior to restart submission.
@crtschin crtschin force-pushed the crtschin/deduplicate-shake-restarts branch from e2a5d19 to fad54eb Compare May 17, 2026 19:59
@crtschin crtschin force-pushed the crtschin/deduplicate-shake-restarts branch from ca33b02 to 26d319d Compare May 17, 2026 20:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Issues about memory consumption, responsiveness, etc.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Reduce lag when editing by merging sequential Shake restarts

2 participants