🎙️ ctrlSPEAK

Turn your voice into text with a triple-tap — minimal, fast, and macOS-native.

🚀 Overview

ctrlSPEAK is your set-it-and-forget-it speech-to-text companion. Triple-tap Ctrl, speak your mind, and watch your words appear wherever your cursor blinks — effortlessly copied and pasted. Built for macOS, it's lightweight, low-overhead, and stays out of your way until you call it.

ctrlspeak-demo.mp4

✨ Features

🖥️ Minimal Interface: Runs quietly in the background via the command line
⚡ Triple-Tap Magic: Start/stop recording with a quick Ctrl triple-tap
📋 Auto-Paste: Text lands right where you need it, no extra clicks
🔊 Audio Cues: Hear when recording begins and ends
🍎 Mac Optimized: Harnesses Apple Silicon's MPS for blazing performance
🌟 Top-Tier Models: Powered by NVIDIA NeMo and OpenAI Whisper

🛠️ Get Started

System: macOS 12.3+ (MPS acceleration supported)
Python: 3.10
Permissions:
- 🎤 Microphone (for recording)
- ⌨️ Accessibility (for shortcuts)
  Grant these on first launch and you're good to go!

📦 Installation

Using Homebrew (Recommended)

# Basic installation (MLX models only)
brew tap patelnav/ctrlspeak
brew install ctrlspeak

# Recommended: Full installation with all model support
brew install ctrlspeak --with-nvidia --with-whisper

# Check what models are available after installation
ctrlspeak --list-models

What each option does:

--with-nvidia: Enables NVIDIA Parakeet and Canary models (recommended for best performance)
--with-whisper: Enables OpenAI Whisper models (optional)

If you get "No module named 'nemo'" errors:

# Reinstall with NVIDIA support
brew reinstall ctrlspeak --with-nvidia

Manual Installation

Clone the repository:

git clone https://github.com/patelnav/ctrlspeak.git
cd ctrlspeak

Create and activate a virtual environment:

# Create a virtual environment
python -m venv .venv

# Activate it on macOS/Linux
source .venv/bin/activate

Install dependencies:

# Install core dependencies
pip install -r requirements.txt

# For NVIDIA model support (optional)
pip install -r requirements-nvidia.txt

# For Whisper model support (optional)
pip install -r requirements-whisper.txt

🧰 Entry Points

ctrlspeak.py: The full-featured star of the show
live_transcribe.py: Continuous transcription for testing vibes
test_transcription.py: Debug or benchmark with ease

Workflow

Run ctrlSPEAK in a terminal window:

# If installed with Homebrew
ctrlspeak

# If installed manually (from the project directory with activated venv)
python ctrlspeak.py

Triple-tap Ctrl to start recording
Speak clearly into your microphone
Triple-tap Ctrl again to stop recording
The transcribed text will be automatically pasted at your cursor position

Models

ctrlSPEAK uses open-source speech recognition models:

Parakeet 0.6B (MLX) (default): mlx-community/parakeet-tdt-0.6b-v3 model optimized for Apple Silicon. Recommended for most users on M1/M2/M3 Macs.
Canary: NVIDIA NeMo's nvidia/canary-1b-flash multilingual model (En, De, Fr, Es) with punctuation, but can be slower. Requires requirements-nvidia.txt.
Canary (180M): NVIDIA NeMo's nvidia/canary-180m-flash multilingual model, smaller and less accurate. Requires requirements-nvidia.txt.
Whisper (optional): OpenAI's openai/whisper-large-v3 model. A fast, accurate, and powerful model that includes excellent punctuation and capitalization. Requires requirements-whisper.txt.

Note: The nvidia/parakeet-tdt-1.1b model is also available for testing, but it is not recommended for general use as it lacks punctuation and is slower than the 0.6b model. Requires requirements-nvidia.txt.

The models are automatically downloaded from HuggingFace the first time you use them.

Listing Supported Models

To see a list of all supported models, use the --list-models flag:

ctrlspeak --list-models

This will output a list of the available model aliases and their corresponding Hugging Face model names.

Apple Silicon (MLX) Acceleration

For users on Apple Silicon (M1/M2/M3 Macs), an optimized version of the Parakeet model is available using Apple's MLX framework. This is the default model and provides a significant performance boost.

Model Selection

You can select a model using the --model flag. You can use either the full model name from HuggingFace or a short alias.

Short Names:

parakeet: Parakeet 0.6B optimized for Apple Silicon (MLX). (Default)
canary: NVIDIA's Canary 1B Flash model.
canary-180m: NVIDIA's Canary 180M Flash model.
whisper: OpenAI's Whisper v3 model.

Full Model URL:

You can also provide a full model URL from Hugging Face. For example:

ctrlspeak --model nvidia/parakeet-tdt-1.1b

This will download and use the specified model.

# Using Homebrew installation
ctrlspeak --model parakeet  # Default
ctrlspeak --model canary         # Multilingual with punctuation
ctrlspeak --model canary-180m    # The smaller Canary model
ctrlspeak --model canary-v2
ctrlspeak --model whisper        # OpenAI's model
ctrlspeak --model parakeet-mlx # MLX-accelerated model

# Using manual installation
python ctrlspeak.py --model parakeet
python ctrlspeak.py --model canary
python ctrlspeak.py --model canary-180m
python ctrlspeak.py --model canary-v2
python ctrlspeak.py --model whisper
python ctrlspeak.py --model parakeet-mlx

For debugging, you can use the --debug flag:

ctrlspeak --debug

Models Tested

Parakeet 0.6B (NVIDIA) - nvidia/parakeet-tdt-0.6b-v3 (Default)
Parakeet 1.1B (NVIDIA) - nvidia/parakeet-tdt-1.1b
Canary (NVIDIA) - nvidia/canary-1b-flash
Canary (NVIDIA) - nvidia/canary-180m-flash
Canary (NVIDIA) - nvidia/canary-1b-v2
Whisper (OpenAI) - openai/whisper-large-v3

Performance Comparison

Model	Framework	Load Time (s)	Transcription Time (s)	Output Example (test.wav)
`parakeet-tdt-0.6b-v3`	MLX (Apple Silicon)	0.97	0.53	"Well, I don't wish to see it any more, observed Phoebe, turning away her eyes. It is certainly very like the old portrait."
	NeMo (NVIDIA)	15.52	1.68
`parakeet-tdt-0.6b-v2`	MLX (Apple Silicon)	0.99	0.56	"Well, I don't wish to see it any more, observed Phebe, turning away her eyes. It is certainly very like the old portrait."
	NeMo (NVIDIA)	8.23	1.61
`canary-1b-flash`	NeMo (NVIDIA)	32.06	3.20	"Well, I don't wish to see it any more, observed Phoebe, turning away her eyes. It is certainly very like the old portrait."
`canary-180m-flash`	NeMo (NVIDIA)	6.16	3.20	"Well, I don't wish to see it any more, observed Phoebe, turning away her eyes. It is certainly very like the old portrait."
`whisper-large-v3`	Transformers (OpenAI)	5.44	2.53	"Well, I don't wish to see it any more, observed Phoebe, turning away her eyes. It is certainly very like the old portrait."

Testing performed on a MacBook Pro (M2 Max) with a 7-second audio file (test.wav). Your results may vary.

Note: Whisper model uses translate mode to enable proper punctuation and capitalization for English transcription.

Permissions

The app requires:

Microphone access (for recording audio)
Accessibility permissions (for global keyboard shortcuts)

You'll be prompted to grant these permissions on first run.

Troubleshooting

No sound on recording start/stop: Ensure your system volume is not muted
Keyboard shortcuts not working: Grant accessibility permissions in System Settings
Transcription errors: Try speaking more clearly or using the other model

Credits

Sound Effects

Start sound: "Notification Pluck On" from Pixabay
Stop sound: "Notification Pluck Off" from Pixabay

License

MIT License

Release Process

This outlines the steps to create a new release and update the associated Homebrew tap.

1. Prepare the Release:

Ensure the code is stable and tests pass.
Update the version number in the following files:
- VERSION (e.g., 1.2.0)
- __init__.py (__version__ = "1.2.0")
- pyproject.toml (version = "1.2.0")

Commit these version changes:

git add VERSION __init__.py pyproject.toml
git commit -m "Bump version to X.Y.Z"

2. Tag and Push:

Create a git tag matching the version:
```
git tag vX.Y.Z
```
Push the commits and the tag to the remote repository:
```
git push && git push origin vX.Y.Z
```

3. Update Homebrew Tap:

The source code tarball URL is automatically generated based on the tag (usually https://github.com/<your-username>/ctrlspeak/archive/refs/tags/vX.Y.Z.tar.gz).

Download the tarball using its URL and calculate its SHA256 checksum:

# Replace URL with the actual tarball link based on the tag
curl -sL https://github.com/<your-username>/ctrlspeak/archive/refs/tags/vX.Y.Z.tar.gz | shasum -a 256

Clone or navigate to your Homebrew tap repository (e.g., ../homebrew-ctrlspeak).
Edit the formula file (e.g., Formula/ctrlspeak.rb):
- Update the url line with the tag tarball URL.
- Update the sha256 line with the checksum you calculated.
- Optional: Update the version line if necessary (though it's often inferred).
- Optional: If requirements.txt or dependencies changed, update the depends_on and install steps accordingly.

Commit and push the changes in the tap repository:

cd ../path/to/homebrew-ctrlspeak # Or wherever your tap repo is
git add Formula/ctrlspeak.rb
git commit -m "Update ctrlspeak to vX.Y.Z"
git push

4. Verify (Optional):

Run brew update locally to fetch the updated formula.
Run brew upgrade ctrlspeak to install the new version.
Test the installed version.

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
models		models
tests		tests
ui		ui
utils		utils
.cursorrules		.cursorrules
.gitignore		.gitignore
.gitmodules		.gitmodules
.python-version		.python-version
CLAUDE.md		CLAUDE.md
MANIFEST.in		MANIFEST.in
PLAN.md		PLAN.md
README.md		README.md
TQDM_ISSUE_ANALYSIS.md		TQDM_ISSUE_ANALYSIS.md
VERSION		VERSION
__init__.py		__init__.py
__main__.py		__main__.py
cli.py		cli.py
ctrlspeak-demo.mp4		ctrlspeak-demo.mp4
ctrlspeak.py		ctrlspeak.py
environment.py		environment.py
hotkeys.py		hotkeys.py
live_transcribe.py		live_transcribe.py
logging_config.py		logging_config.py
model_loader.py		model_loader.py
off.mp3		off.mp3
off.wav		off.wav
on.mp3		on.mp3
on.wav		on.wav
permissions.py		permissions.py
pyproject.toml		pyproject.toml
requirements-mlx.txt		requirements-mlx.txt
requirements-nvidia.txt		requirements-nvidia.txt
requirements-whisper.txt		requirements-whisper.txt
requirements.txt		requirements.txt
state.py		state.py
test.wav		test.wav
test_ctrlspeak.py		test_ctrlspeak.py
test_fix.py		test_fix.py
test_mlx_transcription.py		test_mlx_transcription.py
test_model_loading.py		test_model_loading.py
test_permissions.py		test_permissions.py
test_tqdm_patch.py		test_tqdm_patch.py
test_transcribe_worker.py		test_transcribe_worker.py
transcription.py		transcription.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🎙️ ctrlSPEAK

🚀 Overview

✨ Features

🛠️ Get Started

📦 Installation

Using Homebrew (Recommended)

Manual Installation

🧰 Entry Points

Workflow

Models

Listing Supported Models

Apple Silicon (MLX) Acceleration

Model Selection

Models Tested

Performance Comparison

Permissions

Troubleshooting

Credits

Sound Effects

License

Release Process

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

patelnav/ctrlspeak

Folders and files

Latest commit

History

Repository files navigation

🎙️ ctrlSPEAK

🚀 Overview

✨ Features

🛠️ Get Started

📦 Installation

Using Homebrew (Recommended)

Manual Installation

🧰 Entry Points

Workflow

Models

Listing Supported Models

Apple Silicon (MLX) Acceleration

Model Selection

Models Tested

Performance Comparison

Permissions

Troubleshooting

Credits

Sound Effects

License

Release Process

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages