Turn your voice into text with a triple-tap — minimal, fast, and macOS-native.
ctrlSPEAK is your set-it-and-forget-it speech-to-text companion. Triple-tap Ctrl, speak your mind, and watch your words appear wherever your cursor blinks — effortlessly copied and pasted. Built for macOS, it's lightweight, low-overhead, and stays out of your way until you call it.
ctrlspeak-demo.mp4
- 🖥️ Minimal Interface: Runs quietly in the background via the command line
- ⚡ Triple-Tap Magic: Start/stop recording with a quick
Ctrltriple-tap - 📋 Auto-Paste: Text lands right where you need it, no extra clicks
- 🔊 Audio Cues: Hear when recording begins and ends
- 🍎 Mac Optimized: Harnesses Apple Silicon's MPS for blazing performance
- 🌟 Top-Tier Models: Powered by NVIDIA NeMo and OpenAI Whisper
- System: macOS 12.3+ (MPS acceleration supported)
- Python: 3.10
- Permissions:
- 🎤 Microphone (for recording)
- ⌨️ Accessibility (for shortcuts)
Grant these on first launch and you're good to go!
# Basic installation (MLX models only)
brew tap patelnav/ctrlspeak
brew install ctrlspeak
# Recommended: Full installation with all model support
brew install ctrlspeak --with-nvidia --with-whisper
# Check what models are available after installation
ctrlspeak --list-modelsWhat each option does:
--with-nvidia: Enables NVIDIA Parakeet and Canary models (recommended for best performance)--with-whisper: Enables OpenAI Whisper models (optional)
If you get "No module named 'nemo'" errors:
# Reinstall with NVIDIA support
brew reinstall ctrlspeak --with-nvidiaClone the repository:
git clone https://github.com/patelnav/ctrlspeak.git
cd ctrlspeakCreate and activate a virtual environment:
# Create a virtual environment
python -m venv .venv
# Activate it on macOS/Linux
source .venv/bin/activateInstall dependencies:
# Install core dependencies
pip install -r requirements.txt
# For NVIDIA model support (optional)
pip install -r requirements-nvidia.txt
# For Whisper model support (optional)
pip install -r requirements-whisper.txtctrlspeak.py: The full-featured star of the showlive_transcribe.py: Continuous transcription for testing vibestest_transcription.py: Debug or benchmark with ease
- Run ctrlSPEAK in a terminal window:
# If installed with Homebrew ctrlspeak # If installed manually (from the project directory with activated venv) python ctrlspeak.py
- Triple-tap Ctrl to start recording
- Speak clearly into your microphone
- Triple-tap Ctrl again to stop recording
- The transcribed text will be automatically pasted at your cursor position
ctrlSPEAK uses open-source speech recognition models:
- Parakeet 0.6B (MLX) (default):
mlx-community/parakeet-tdt-0.6b-v3model optimized for Apple Silicon. Recommended for most users on M1/M2/M3 Macs. - Canary: NVIDIA NeMo's
nvidia/canary-1b-flashmultilingual model (En, De, Fr, Es) with punctuation, but can be slower. Requiresrequirements-nvidia.txt. - Canary (180M): NVIDIA NeMo's
nvidia/canary-180m-flashmultilingual model, smaller and less accurate. Requiresrequirements-nvidia.txt. - Whisper (optional): OpenAI's
openai/whisper-large-v3model. A fast, accurate, and powerful model that includes excellent punctuation and capitalization. Requiresrequirements-whisper.txt.
Note: The nvidia/parakeet-tdt-1.1b model is also available for testing, but it is not recommended for general use as it lacks punctuation and is slower than the 0.6b model. Requires requirements-nvidia.txt.
The models are automatically downloaded from HuggingFace the first time you use them.
To see a list of all supported models, use the --list-models flag:
ctrlspeak --list-modelsThis will output a list of the available model aliases and their corresponding Hugging Face model names.
For users on Apple Silicon (M1/M2/M3 Macs), an optimized version of the Parakeet model is available using Apple's MLX framework. This is the default model and provides a significant performance boost.
You can select a model using the --model flag. You can use either the full model name from HuggingFace or a short alias.
Short Names:
parakeet: Parakeet 0.6B optimized for Apple Silicon (MLX). (Default)canary: NVIDIA's Canary 1B Flash model.canary-180m: NVIDIA's Canary 180M Flash model.whisper: OpenAI's Whisper v3 model.
Full Model URL:
You can also provide a full model URL from Hugging Face. For example:
ctrlspeak --model nvidia/parakeet-tdt-1.1bThis will download and use the specified model.
# Using Homebrew installation
ctrlspeak --model parakeet # Default
ctrlspeak --model canary # Multilingual with punctuation
ctrlspeak --model canary-180m # The smaller Canary model
ctrlspeak --model canary-v2
ctrlspeak --model whisper # OpenAI's model
ctrlspeak --model parakeet-mlx # MLX-accelerated model
# Using manual installation
python ctrlspeak.py --model parakeet
python ctrlspeak.py --model canary
python ctrlspeak.py --model canary-180m
python ctrlspeak.py --model canary-v2
python ctrlspeak.py --model whisper
python ctrlspeak.py --model parakeet-mlxFor debugging, you can use the --debug flag:
ctrlspeak --debug- Parakeet 0.6B (NVIDIA) -
nvidia/parakeet-tdt-0.6b-v3(Default) - Parakeet 1.1B (NVIDIA) -
nvidia/parakeet-tdt-1.1b - Canary (NVIDIA) -
nvidia/canary-1b-flash - Canary (NVIDIA) -
nvidia/canary-180m-flash - Canary (NVIDIA) -
nvidia/canary-1b-v2 - Whisper (OpenAI) -
openai/whisper-large-v3
| Model | Framework | Load Time (s) | Transcription Time (s) | Output Example (test.wav) |
|---|---|---|---|---|
parakeet-tdt-0.6b-v3 |
MLX (Apple Silicon) | 0.97 | 0.53 | "Well, I don't wish to see it any more, observed Phoebe, turning away her eyes. It is certainly very like the old portrait." |
| NeMo (NVIDIA) | 15.52 | 1.68 | ||
parakeet-tdt-0.6b-v2 |
MLX (Apple Silicon) | 0.99 | 0.56 | "Well, I don't wish to see it any more, observed Phebe, turning away her eyes. It is certainly very like the old portrait." |
| NeMo (NVIDIA) | 8.23 | 1.61 | ||
canary-1b-flash |
NeMo (NVIDIA) | 32.06 | 3.20 | "Well, I don't wish to see it any more, observed Phoebe, turning away her eyes. It is certainly very like the old portrait." |
canary-180m-flash |
NeMo (NVIDIA) | 6.16 | 3.20 | "Well, I don't wish to see it any more, observed Phoebe, turning away her eyes. It is certainly very like the old portrait." |
whisper-large-v3 |
Transformers (OpenAI) | 5.44 | 2.53 | "Well, I don't wish to see it any more, observed Phoebe, turning away her eyes. It is certainly very like the old portrait." |
Testing performed on a MacBook Pro (M2 Max) with a 7-second audio file (test.wav). Your results may vary.
Note: Whisper model uses translate mode to enable proper punctuation and capitalization for English transcription.
The app requires:
- Microphone access (for recording audio)
- Accessibility permissions (for global keyboard shortcuts)
You'll be prompted to grant these permissions on first run.
- No sound on recording start/stop: Ensure your system volume is not muted
- Keyboard shortcuts not working: Grant accessibility permissions in System Settings
- Transcription errors: Try speaking more clearly or using the other model
- Start sound: "Notification Pluck On" from Pixabay
- Stop sound: "Notification Pluck Off" from Pixabay
This outlines the steps to create a new release and update the associated Homebrew tap.
1. Prepare the Release:
- Ensure the code is stable and tests pass.
- Update the version number in the following files:
VERSION(e.g.,1.2.0)__init__.py(__version__ = "1.2.0")pyproject.toml(version = "1.2.0")
- Commit these version changes:
git add VERSION __init__.py pyproject.toml git commit -m "Bump version to X.Y.Z"
2. Tag and Push:
- Create a git tag matching the version:
git tag vX.Y.Z
- Push the commits and the tag to the remote repository:
git push && git push origin vX.Y.Z
3. Update Homebrew Tap:
- The source code tarball URL is automatically generated based on the tag (usually
https://github.com/<your-username>/ctrlspeak/archive/refs/tags/vX.Y.Z.tar.gz). - Download the tarball using its URL and calculate its SHA256 checksum:
# Replace URL with the actual tarball link based on the tag curl -sL https://github.com/<your-username>/ctrlspeak/archive/refs/tags/vX.Y.Z.tar.gz | shasum -a 256
- Clone or navigate to your Homebrew tap repository (e.g.,
../homebrew-ctrlspeak). - Edit the formula file (e.g.,
Formula/ctrlspeak.rb):- Update the
urlline with the tag tarball URL. - Update the
sha256line with the checksum you calculated. - Optional: Update the
versionline if necessary (though it's often inferred). - Optional: If
requirements.txtor dependencies changed, update thedepends_onandinstallsteps accordingly.
- Update the
- Commit and push the changes in the tap repository:
cd ../path/to/homebrew-ctrlspeak # Or wherever your tap repo is git add Formula/ctrlspeak.rb git commit -m "Update ctrlspeak to vX.Y.Z" git push
4. Verify (Optional):
- Run
brew updatelocally to fetch the updated formula. - Run
brew upgrade ctrlspeakto install the new version. - Test the installed version.