🎙️ Simple Speeh To Text With AI

A modern, feature-rich audio transcription application powered by OpenAI Whisper with real-time audio monitoring and intelligent state management.

✨ Features

Real-time Audio Testing: Test your microphone with live audio level visualization
Multiple Audio Devices: Select from all available input devices with detailed information
Global Hotkey Support: Use Ctrl+Shift+Space to start/stop recording from anywhere
Multiple Whisper Models: Choose from tiny, base, small, medium, large models
Language Selection: Auto-detect or specify target language for better accuracy
Clean State Management: No audio conflicts between testing and recording
Text History: View all transcriptions with timestamps in a scrollable display
Clipboard Integration: Automatic copying to clipboard with manual copy option
Professional UI: Clean, modern interface with status indicators and error handling

🖼️ Application Demo

What you see in the interface:

Status Bar (top): Shows current state ("Idle") and global hotkey (Ctrl+Shift+Space)
Audio Device Section:
- Device dropdown showing "Headset (Boe) (2)" - your selected microphone
- Device info displaying "Channels: 1, Sample Rate: 44100 Hz"
- "Test Audio" button for real-time microphone testing
- Audio level indicator (shows "--" when not testing)
Model Selection:
- Language: Set to "Autodetect" (or choose specific language)
- Model: Set to "base" (balance of speed and accuracy)
Control Buttons:
- Start Recording: Begin voice capture
- Clear Text: Clear transcription history
- Copy to Clipboard: Manually copy all text
- Refresh Devices: Update audio device list
Transcription Display: Shows your complete session history including:
- Welcome message with usage instructions
- Device switching notifications (blue text)
- Recording sessions with timestamps [14:17:46] and [14:18:00]
- Audio statistics (Duration, Max level, RMS) for debugging
- Transcription results showing your actual spoken words
- Error handling suggestions when audio issues occur

📋 Requirements

Python: 3.13 or higher
Operating System: Windows, macOS, or Linux
Audio: Working microphone/audio input device
CUDA (optional): For GPU acceleration on NVIDIA cards

🚀 Installation

Install uv (if not already installed):

# On Windows
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"

# On macOS/Linux
curl -LsSf https://astral.sh/uv/install.sh | sh

Clone and setup:

git clone <your-repo-url>
cd transcribe-whisper
uv sync

Run the application:
```
python main.py
```

That's it! The application will automatically download the required Whisper models on first use.

🎯 How It Works

Step-by-Step Workflow

Launch & Setup:
- Run python main.py to start the application
- The app automatically detects your audio devices
- Select your preferred microphone from the dropdown
Test Your Audio (Recommended):
- Click "Test Audio" to verify your microphone
- Speak normally and watch for audio level bars
- This prevents "no audio detected" issues later
Choose Your Settings:
- Language: Use "Autodetect" or specify your language
- Model: base is good for most users (larger = more accurate but slower)
Start Recording:
- Method 1: Press Ctrl+Shift+Space (works from any application)
- Method 2: Click "Start Recording" button
- Status changes to red "Listening..." when active
Speak Your Content:
- Speak clearly and naturally
- The app shows which device is recording
- Press the hotkey or click "Stop Recording" when done
Get Your Transcription:
- Status shows "Transcribing..." while processing
- Your text appears with timestamp in the display area
- Text is automatically copied to your clipboard
- You can see audio quality stats for debugging

Interface Features Explained

Real-time Feedback: The demo shows multiple recording sessions with timestamps, demonstrating how the app tracks your transcription history.

Error Prevention: Notice how the app shows audio statistics ("Duration: 6.5s, Max: 0.000") - this helps diagnose microphone issues before they become problems.

Device Management: The demo shows device switching notifications, proving the app handles audio device changes gracefully.

Professional Output: Each transcription includes timing, device info, and quality metrics - perfect for debugging and professional use.

🔧 Configuration

Customize the app by modifying TranscriberConfig in main.py:

config = TranscriberConfig(
    hotkey="ctrl+alt+r",          # Change global hotkey
    default_model="large",         # Use larger model by default
    test_duration=3,              # Shorter audio tests
    min_recording_duration=1.0,   # Require longer recordings
)

🐛 Troubleshooting

No audio detected?

Use "Test Audio" first - you should see level bars when speaking
Check microphone permissions in your OS
Try different devices from the dropdown
Look for audio stats showing "Max: 0.000" (indicates silence)

Poor transcription quality?

Use larger Whisper models (small, medium, large)
Improve audio environment (quiet room, good microphone)
Specify your language instead of auto-detect
Check the audio statistics - low RMS values indicate poor audio

App conflicts?

The modern architecture prevents audio device conflicts
State management ensures testing and recording don't interfere
Error messages in the text display guide you to solutions

📚 Technical Excellence

Built with modern Python architecture:

Pydantic models for configuration and validation
Thread-safe audio operations with proper locking
Clean state management preventing conflicts
Type hints throughout for better development experience
Modular design for easy maintenance and extension

Ready to transcribe? Just uv sync and python main.py! 🎉

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.vscode		.vscode
img		img
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
ruff.toml		ruff.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🎙️ Simple Speeh To Text With AI

✨ Features

🖼️ Application Demo

📋 Requirements

🚀 Installation

🎯 How It Works

Step-by-Step Workflow

Interface Features Explained

🔧 Configuration

🐛 Troubleshooting

📚 Technical Excellence

About

Uh oh!

Releases 1

Packages

Uh oh!

Languages

License

zhangxingeng/Easy-AI-speech-to-text

Folders and files

Latest commit

History

Repository files navigation

🎙️ Simple Speeh To Text With AI

✨ Features

🖼️ Application Demo

📋 Requirements

🚀 Installation

🎯 How It Works

Step-by-Step Workflow

Interface Features Explained

🔧 Configuration

🐛 Troubleshooting

📚 Technical Excellence

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Languages

Packages