live speech-2-text recognition is a real-time, live caption speech-to-text recognition system that uses Wav2Vec model fine-tuned with the LJSpeech dataset. This project is designed to provide highly accurate live transcriptions, making it ideal for accessibility features, live events, and many other applications where real-time captioning is essential.
To set up the project, follow these steps:
-
Clone the repository and install dependencies
pip install -r requirements.txt
-
Download the Model and Processor
If you prefer to train the model yourself:
cd src
python3 train.pyThis script will train the Wav2Vec model using the LJSpeech dataset and save the fine-tuned model in the specified directory.
To process transcriptions and evaluate them:
-
Process Transcriptions
python3 process.py
Use this script to process the transcriptions using the default and fine-tuned models.
-
Evaluate Transcriptions
python3 check.py
Run this script to calculate the BLEU, WER, and CER scores for the transcriptions placed in the
src/directory.
For live speech recognition:
python3 speech.pyEnsure your microphone is set up and calibrated when prompted. Speak after the "Speak now!" prompt for live captioning.
- Facebook AI Research for the Wav2Vec 2.0 model.
- The LJSpeech dataset contributors.