Feature request: Add SenseVoice as alternative to Whisper

## Summary

VoiceFlow provides local voice dictation and meeting recording for Windows + Linux. Since it already supports hotkey-triggered dictation, I'd like to suggest **SenseVoice** as an alternative backend for significantly faster response times.

## Why SenseVoice for dictation?

For voice typing, latency is everything. SenseVoice's non-autoregressive architecture gives you:

- **5x faster inference** than Whisper — near-instant text appearance after speaking
- **234M parameters** (vs Whisper's 1.5B) — lower memory usage
- **50+ languages** including strong CJK support
- **Emotion & audio event detection** as bonus features

## Integration options

**Option 1: Via FunASR Python** (simplest)
```python
from funasr import AutoModel
model = AutoModel(model="iic/SenseVoiceSmall", device="cuda")
result = model.generate(input=audio_buffer)
text = result[0]["text"]
```

**Option 2: Via Sherpa-ONNX** (no Python, native C/C++)
- Pre-built binaries for Windows/Linux
- ~50MB ONNX model
- Real-time streaming support

**Option 3: Via OpenAI-compatible API**
```bash
funasr-server --device cuda
# Same API as Whisper: POST /v1/audio/transcriptions
```

## References

- SenseVoice: https://github.com/FunAudioLLM/SenseVoice (8.3K stars)
- FunASR: https://github.com/modelscope/FunASR (16.6K stars)
- Sherpa-ONNX: https://github.com/k2-fsa/sherpa-onnx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: Add SenseVoice as alternative to Whisper #28

Summary

Why SenseVoice for dictation?

Integration options

References

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Feature request: Add SenseVoice as alternative to Whisper #28

Description

Summary

Why SenseVoice for dictation?

Integration options

References

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions