Skip to content

Conversation

@JayPanoz
Copy link
Contributor

@JayPanoz JayPanoz commented Nov 12, 2025

This fixes known issues:

There is nothing fancy as regards Android, it simply pauses and resumes the utterance from the start, it does not try to keep track of the current progress through boundaries since boundary is not well supported on Android, or at least not available for a significant amount of voices. A possible improvement would be to track when it is supported so that it behaves more like a pause and resume than a pause and restart utterance.

That implies creating an utterance and replacing the existing one in the loaded utterances though, as it should not leak to Navigator. Which is non-trivial when you cannot even test it.

An interesting development of handling the navigation methods in a smarter way is that I am not necessarily sure how to best handle this because it does not really map to an existing event. At the moment, I added a positionchanged so that it is isolated, as I’m not sure end is what would be expected by users. But it is also kinda problematic in the sense that it adds a Playback event specifically to handle this smarter logic.

Maybe that should not be implemented in the Navigator, and should be the user’s concern. But at least we have it implemented and can measure its impact, and whether we should even handle that.

Revert "Android polyfill for pause/resume"

This reverts commit ad463d2027b6a40ae6727c643b84d8e29e5c4ece.

L

m
Navigator should not care about this WebSpeech API idiosyncracy
Previously we were not taking paused into account. An issue is that we need a new event for positionchange though…
New build does not need this so let’s get rid of it right away.
@JayPanoz
Copy link
Contributor Author

cc @renevanderark if you want to give it a look and some additional thoughts

@danielweck
Copy link
Member

Thorium Desktop POV: in analogy to filesystem seek/tell (bidirectional model), the TTS word boundary events are one-way only (tell), there is no seek operator. The Web Speech API simply doesn't allow the reading system to resume a TTS utterance at the paused point, i.e. to start playing at an arbitrary location within the utterance duration other than at the very beginning of the generated stream of audio samples.

@JayPanoz
Copy link
Contributor Author

JayPanoz commented Nov 13, 2025

@danielweck is this raising a specific concern with the temporary standalone navigator? It is expected to be unstable so do not hesitate to point things out if you think they may be problematic.

The Web Speech API is kind of an outlier across the platforms that were initially discussed (Kotlin, Swift, Web) and how they will handle things i.e. audio (either natively or through third-party services) so it's perhaps not the ideal engine to start with – and I have to keep others in mind so that it does not impact and diverge the temporary Navigator too much, otherwise I'll have tough challenges further down the line.

if that was an additional input for the Android workaround, which would effectively be an utterance.pause() polyfill of sorts, I can perhaps clarify that yeah due to the limitation you are highlighting, it makes things very complex: you can use boundary to know where Android stopped, but you cannot resume the utterance there, you have to cancel it, and create another one replacing it, with only the remaining text.

Since we effectively load an array of utterances into the engine, this can become messy real quick because all of a sudden, you have an extra utterance to mix in for a given index, then dispose of when it is no longer needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Draft

Development

Successfully merging this pull request may close these issues.

Navigator: pause is broken on Android Navigator: smart playback and additional controls

3 participants