Skip to content

Feature request: Add SenseVoice as alternative to Whisper #28

@LauraGPT

Description

@LauraGPT

Summary

VoiceFlow provides local voice dictation and meeting recording for Windows + Linux. Since it already supports hotkey-triggered dictation, I'd like to suggest SenseVoice as an alternative backend for significantly faster response times.

Why SenseVoice for dictation?

For voice typing, latency is everything. SenseVoice's non-autoregressive architecture gives you:

  • 5x faster inference than Whisper — near-instant text appearance after speaking
  • 234M parameters (vs Whisper's 1.5B) — lower memory usage
  • 50+ languages including strong CJK support
  • Emotion & audio event detection as bonus features

Integration options

Option 1: Via FunASR Python (simplest)

from funasr import AutoModel
model = AutoModel(model="iic/SenseVoiceSmall", device="cuda")
result = model.generate(input=audio_buffer)
text = result[0]["text"]

Option 2: Via Sherpa-ONNX (no Python, native C/C++)

  • Pre-built binaries for Windows/Linux
  • ~50MB ONNX model
  • Real-time streaming support

Option 3: Via OpenAI-compatible API

funasr-server --device cuda
# Same API as Whisper: POST /v1/audio/transcriptions

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions