Skip to content

Commit f50c4f2

Browse files
authored
docs: updated README to specify translation model limitation (#2547)
Updated README given info from #2483
1 parent 8689924 commit f50c4f2

File tree

1 file changed

+18
-8
lines changed

1 file changed

+18
-8
lines changed

README.md

Lines changed: 18 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -77,25 +77,35 @@ Whisper's performance varies widely depending on the language. The figure below
7777

7878
![WER breakdown by language](https://github.com/openai/whisper/assets/266841/f4619d66-1058-4005-8f67-a9d811b77c62)
7979

80-
81-
8280
## Command-line usage
8381

8482
The following command will transcribe speech in audio files, using the `turbo` model:
8583

86-
whisper audio.flac audio.mp3 audio.wav --model turbo
84+
```bash
85+
whisper audio.flac audio.mp3 audio.wav --model turbo
86+
```
87+
88+
The default setting (which selects the `turbo` model) works well for transcribing English. However, **the `turbo` model is not trained for translation tasks**. If you need to **translate non-English speech into English**, use one of the **multilingual models** (`tiny`, `base`, `small`, `medium`, `large`) instead of `turbo`.
89+
90+
For example, to transcribe an audio file containing non-English speech, you can specify the language:
8791

88-
The default setting (which selects the `turbo` model) works well for transcribing English. To transcribe an audio file containing non-English speech, you can specify the language using the `--language` option:
92+
```bash
93+
whisper japanese.wav --language Japanese
94+
```
8995

90-
whisper japanese.wav --language Japanese
96+
To **translate** speech into English, use:
9197

92-
Adding `--task translate` will translate the speech into English:
98+
```bash
99+
whisper japanese.wav --model medium --language Japanese --task translate
100+
```
93101

94-
whisper japanese.wav --language Japanese --task translate
102+
> **Note:** The `turbo` model will return the original language even if `--task translate` is specified. Use `medium` or `large` for the best translation results.
95103
96104
Run the following to view all available options:
97105

98-
whisper --help
106+
```bash
107+
whisper --help
108+
```
99109

100110
See [tokenizer.py](https://github.com/openai/whisper/blob/main/whisper/tokenizer.py) for the list of all available languages.
101111

0 commit comments

Comments
 (0)