fix(core): defer MLS epoch recovery until notification replay finishes [WPB-21916] by thisisamir98 · Pull Request #21390 · wireapp/wire-webapp

thisisamir98 · 2026-05-27T09:30:57Z

WPB-21916 [Web/Desktop] : Missing messages in Web - Prod

Problem
After login, the client replays a large backlog of legacy notifications while local MLS state may already be ahead, behind, or incomplete. When decrypting an MLS message-add failed with WrongEpoch, the recovery orchestrator ran an external-commit rejoin immediately. That could happen in the middle of replay, before later notifications had a chance to apply missing commits.

This led to unnecessary MLSConversationRecovered system messages and made replay noisier. Pausing the rejoin queue did not help, because orchestrator recovery bypassed that queue entirely.

Approach

Defer epoch-mismatch recovery for inbound handleMessageAdd while the connection is not LIVE. The failed message returns null and replay continues.
After catch-up finishes, runDeferredEpochRecovery() reconciles any conversations that still need external commit (once per conversation).

Problem After login, the client replays a large backlog of legacy notifications while local MLS state may already be ahead, behind, or incomplete. When decrypting an MLS message-add failed with WrongEpoch, the recovery orchestrator ran an external-commit rejoin immediately. That could happen in the middle of replay, before later notifications had a chance to apply missing commits. This led to unnecessary MLSConversationRecovered system messages and made replay noisier. Pausing the rejoin queue did not help, because orchestrator recovery bypassed that queue entirely. Approach - Defer epoch-mismatch recovery for inbound handleMessageAdd while the connection is not LIVE. The failed message returns null and replay continues. - After catch-up finishes, runDeferredEpochRecovery() reconciles any conversations that still need external commit (once per conversation).

github-actions · 2026-05-27T09:43:14Z

🔗 Download Full Report Artifact

🧪 Playwright Test Summary

✅ Passed: 15
❌ Failed: 0
⏭ Skipped: 0
🔁 Flaky: 0
📊 Total: 15
⏱ Total Runtime: 105.7s (~ 1 min 46 sec)

PR checkout only included the head commit, so nx format:check could not resolve the dev base ref and logged a misleading git error before falling back to checking the whole repo. Fetch the base branch in CI so format failures are reported clearly.

Avoid expanding github.base_ref directly in the shell script so zizmor does not flag code injection via template expansion.

…poch-fix

sonarqubecloud · 2026-05-29T13:12:48Z

Quality Gate passed

Issues
1 New issue
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

zskhan · 2026-06-02T13:01:07Z

+
+    for (const {conversationId, subconvId, trigger} of entries) {
+      try {
+        await this.recoverMLSGroupFromEpochMismatch(conversationId, subconvId, trigger, {force: true});


If entries are already cleared and after that if for some reason recoverMLSGroupFromEpochMismatch fails then there will be no way to retry the entry.

What do you think?

e-maad · 2026-06-02T14:13:05Z

      const firstEventPayload = notification.data.event.payload[0];
      const notificationTime = firstEventPayload ? this.getNotificationEventTime(firstEventPayload) : null;
-      if (this.connectionState !== ConnectionState.LIVE && notificationTime !== null && notificationTime.length > 0) {
+      if (!this.connectionStateTracker.isLive() && notificationTime !== null && notificationTime.length > 0) {


Suggested change

if (!this.connectionStateTracker.isLive() && notificationTime !== null && notificationTime.length > 0) {

if (!this.connectionStateTracker.isLive() && is.nonEmptyString(notificationTime)) {

e-maad · 2026-06-02T14:13:55Z

        .push(async () => {
          this.logger.info(`Resuming message sending. ${getQueueLength()} messages to be sent`);
          await this.rehydrateMlsPendingProposalsTasksOnLiveTransition();
+          await this.service!.conversation.runDeferredEpochRecovery();


Do we need non-null assertion here?

e-maad · 2026-06-02T14:16:33Z

    trigger?: MlsEpochRecoveryTrigger,
+    options: {force: boolean} = {force: false},
  ) {
+    if (options.force === false && this.shouldDeferEpochRecovery(trigger)) {


If option.force is boolean (and controlled by us) the please rename to isForced

Suggested change

if (options.force === false && this.shouldDeferEpochRecovery(trigger)) {

if (!options.force && this.shouldDeferEpochRecovery(trigger)) {

e-maad · 2026-06-02T14:18:22Z

  }

+  private getDeferredEpochRecoveryKey(conversationId: QualifiedId, subconvId?: SUBCONVERSATION_ID): string {
+    return `${conversationId.id}@${conversationId.domain}:${subconvId ?? 'none'}`;


Is this none can be provided by any enum?

e-maad · 2026-06-02T14:24:11Z

+   * notification replay. Called when the connection transitions to LIVE.
+   */
+  public async runDeferredEpochRecovery(): Promise<void> {
+    if (this.deferredEpochRecoveries.size === 0) {


I would suggest to use is.emptySet(deferredEpochRecoveries)

e-maad · 2026-06-02T14:30:46Z

+    options: {force: boolean} = {force: false},
  ) {
+    if (options.force === false && this.shouldDeferEpochRecovery(trigger)) {
+      this.deferEpochRecovery(conversationId, subconversationId, trigger);


You are throwing an error from deferEpochRecovery, are we handling it somewhere?

e-maad · 2026-06-02T14:38:31Z

    private readonly subconversationService: SubconversationService,
    private readonly isMLSConversationRecoveryEnabled: () => Promise<boolean>,
    private readonly _mlsService?: MLSService,
+    private readonly connectionStateTracker: ConnectionStateTracker = createConnectionStateTracker(


Why are we maintaining two state-managers for connection?
I can see another one inlibraries/core/src/account.ts. These states will not be synced, is this intentional?

thisisamir98 requested review from arjita-mitra, e-maad, otto-the-bot, screendriver and zskhan as code owners May 27, 2026 09:30

thisisamir98 added 2 commits May 27, 2026 12:17

fix prettier lint issues

3979a22

thisisamir98 force-pushed the WPB-21916-epoch-fix branch from f952354 to 5a0c6d2 Compare May 27, 2026 10:17

github-advanced-security AI found potential problems May 27, 2026

View reviewed changes

Comment thread .github/workflows/ci.yml Fixed

Comment thread .github/workflows/ci.yml Fixed

thisisamir98 added 4 commits May 27, 2026 13:12

fix(ci): pass PR base ref via env in format-check fetch step

31f6c17

Avoid expanding github.base_ref directly in the shell script so zizmor does not flag code injection via template expansion.

Fix lint issue in account.s

2f6f762

add default value for epoch mismatch recovery options

9307916

Merge branch 'dev' of github.com:wireapp/wire-webapp into WPB-21916-e…

38de871

…poch-fix

zskhan reviewed Jun 2, 2026

View reviewed changes

zskhan approved these changes Jun 2, 2026

View reviewed changes

e-maad reviewed Jun 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(core): defer MLS epoch recovery until notification replay finishes [WPB-21916]#21390

fix(core): defer MLS epoch recovery until notification replay finishes [WPB-21916]#21390
thisisamir98 wants to merge 7 commits into
devfrom
WPB-21916-epoch-fix

thisisamir98 commented May 27, 2026 •

edited by github-actions Bot

Loading

Uh oh!

github-actions Bot commented May 27, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

sonarqubecloud Bot commented May 29, 2026

Uh oh!

zskhan Jun 2, 2026

Uh oh!

e-maad Jun 2, 2026

Uh oh!

e-maad Jun 2, 2026

Uh oh!

e-maad Jun 2, 2026

Uh oh!

e-maad Jun 2, 2026

Uh oh!

e-maad Jun 2, 2026

Uh oh!

e-maad Jun 2, 2026

Uh oh!

e-maad Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	if (!this.connectionStateTracker.isLive() && notificationTime !== null && notificationTime.length > 0) {
	if (!this.connectionStateTracker.isLive() && is.nonEmptyString(notificationTime)) {

	if (options.force === false && this.shouldDeferEpochRecovery(trigger)) {
	if (!options.force && this.shouldDeferEpochRecovery(trigger)) {

Conversation

thisisamir98 commented May 27, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🧪 Playwright Test Summary

Uh oh!

Uh oh!

Uh oh!

sonarqubecloud Bot commented May 29, 2026

Quality Gate passed

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

thisisamir98 commented May 27, 2026 •

edited by github-actions Bot

Loading

github-actions Bot commented May 27, 2026 •

edited

Loading