-
Notifications
You must be signed in to change notification settings - Fork 108
Closed
Labels
bugSomething isn't workingSomething isn't workingspeech-to-textissues related to transcription/asrissues related to transcription/asr
Description
Transcribed this video using Parakeet v2 and v3.
# Clone repo
git clone https://github.com/FluidInference/FluidAudio.git
cd FluidAudio
# Download and convert to 16 KHz 16-bit wav
uvx "yt-dlp[default]" -f bestaudio GT_sXIUJPUo -o "temp_GT_sXIUJPUo.%(ext)s"
ffmpeg -i "temp_GT_sXIUJPUo."* -acodec pcm_s16le -ac 1 -ar 16000 cowen.wav
rm "temp_GT_sXIUJPUo."*
# Build
swift build -c release
# Transcribe
swift run fluidaudio transcribe cowen.wav --model-version v2
swift run fluidaudio transcribe cowen.wav --model-version v3I also transcribed the same file using parakeet-mlx to compare.
All results can be found here: cowen_results.zip
Inspected the first ~1.5 pages of all transcripts and noticed:
- Fluid Audio parakeet v3 drops a lot of sentences. See
fluid_audio_parakeet_v3/cowen_fluidaudio_parakeet_v3_errors.pdfin the zip file. - Fluid Audio parakeet v2 still drops sentences but much fewer (only 1 vs 8 with v3). See
fluid_audio_parakeet_v2/cowen_fluidaudio_parakeet_v2_errors.pdfin the zip file. parakeet-mlxdrops no sentences at all. Seeparakeet_mlx_parakeet_v3/cowen_parakeet_mlx_parakeet_v3in the zip file.
Wondering if this is a known issue? Or is it just this one audio file causing a malfunction somehow?
Unfortunate, because the FluidAudio parakeet-tdt implementations are otherwise the fastest on Apple Silicon, faster than parakeet-mlx. Hopefully fixable; I will try to understand the model architecture and code and see if I'm able to find the cause as well.
BrandonWeng, semlinker and bytefer
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingspeech-to-textissues related to transcription/asrissues related to transcription/asr