-
Notifications
You must be signed in to change notification settings - Fork 2
Playback fixes #20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Playback fixes #20
Conversation
Revert "Android polyfill for pause/resume" This reverts commit ad463d2027b6a40ae6727c643b84d8e29e5c4ece. L m
Navigator should not care about this WebSpeech API idiosyncracy
Previously we were not taking paused into account. An issue is that we need a new event for positionchange though…
New build does not need this so let’s get rid of it right away.
|
cc @renevanderark if you want to give it a look and some additional thoughts |
|
Thorium Desktop POV: in analogy to filesystem seek/tell (bidirectional model), the TTS word boundary events are one-way only (tell), there is no seek operator. The Web Speech API simply doesn't allow the reading system to resume a TTS utterance at the paused point, i.e. to start playing at an arbitrary location within the utterance duration other than at the very beginning of the generated stream of audio samples. |
|
@danielweck is this raising a specific concern with the temporary standalone navigator? It is expected to be unstable so do not hesitate to point things out if you think they may be problematic. The Web Speech API is kind of an outlier across the platforms that were initially discussed (Kotlin, Swift, Web) and how they will handle things i.e. audio (either natively or through third-party services) so it's perhaps not the ideal engine to start with – and I have to keep others in mind so that it does not impact and diverge the temporary Navigator too much, otherwise I'll have tough challenges further down the line. if that was an additional input for the Android workaround, which would effectively be an Since we effectively load an array of utterances into the engine, this can become messy real quick because all of a sudden, you have an extra utterance to mix in for a given index, then dispose of when it is no longer needed. |
This fixes known issues:
There is nothing fancy as regards Android, it simply pauses and resumes the utterance from the start, it does not try to keep track of the current progress through boundaries since
boundaryis not well supported on Android, or at least not available for a significant amount of voices. A possible improvement would be to track when it is supported so that it behaves more like apauseandresumethan apauseandrestartutterance.That implies creating an utterance and replacing the existing one in the loaded utterances though, as it should not leak to Navigator. Which is non-trivial when you cannot even test it.
An interesting development of handling the navigation methods in a smarter way is that I am not necessarily sure how to best handle this because it does not really map to an existing event. At the moment, I added a
positionchangedso that it is isolated, as I’m not sureendis what would be expected by users. But it is also kinda problematic in the sense that it adds a Playback event specifically to handle this smarter logic.Maybe that should not be implemented in the Navigator, and should be the user’s concern. But at least we have it implemented and can measure its impact, and whether we should even handle that.