Summary
VoiceFlow provides local voice dictation and meeting recording for Windows + Linux. Since it already supports hotkey-triggered dictation, I'd like to suggest SenseVoice as an alternative backend for significantly faster response times.
Why SenseVoice for dictation?
For voice typing, latency is everything. SenseVoice's non-autoregressive architecture gives you:
- 5x faster inference than Whisper — near-instant text appearance after speaking
- 234M parameters (vs Whisper's 1.5B) — lower memory usage
- 50+ languages including strong CJK support
- Emotion & audio event detection as bonus features
Integration options
Option 1: Via FunASR Python (simplest)
from funasr import AutoModel
model = AutoModel(model="iic/SenseVoiceSmall", device="cuda")
result = model.generate(input=audio_buffer)
text = result[0]["text"]
Option 2: Via Sherpa-ONNX (no Python, native C/C++)
- Pre-built binaries for Windows/Linux
- ~50MB ONNX model
- Real-time streaming support
Option 3: Via OpenAI-compatible API
funasr-server --device cuda
# Same API as Whisper: POST /v1/audio/transcriptions
References
Summary
VoiceFlow provides local voice dictation and meeting recording for Windows + Linux. Since it already supports hotkey-triggered dictation, I'd like to suggest SenseVoice as an alternative backend for significantly faster response times.
Why SenseVoice for dictation?
For voice typing, latency is everything. SenseVoice's non-autoregressive architecture gives you:
Integration options
Option 1: Via FunASR Python (simplest)
Option 2: Via Sherpa-ONNX (no Python, native C/C++)
Option 3: Via OpenAI-compatible API
funasr-server --device cuda # Same API as Whisper: POST /v1/audio/transcriptionsReferences