Skip to content

Commit 1e8aeb4

Browse files
authored
Add voice and transcriber fallback docs (#847)
* docs: add transcriber fallback configuration guide Introduced a new documentation file detailing the configuration of fallback transcribers for speech-to-text services. The guide covers the benefits, setup instructions via both the dashboard and API, provider-specific settings, best practices, and FAQs to ensure call continuity during provider outages. * docs: update voice fallback configuration guide Revised the documentation for voice fallback configuration, including a title change, expanded sections on configuration via dashboard and API, and added provider-specific settings. Enhanced clarity with new best practices and FAQs to improve user understanding and implementation of fallback voices. * docs: update navigation for voice and transcriber fallback configurations Renamed the "Voice fallback plan" to "Voice fallback configuration" and added a new entry for "Transcriber fallback configuration" in the documentation navigation. This enhances clarity and accessibility for users seeking information on fallback configurations.
1 parent 498a17c commit 1e8aeb4

File tree

3 files changed

+324
-28
lines changed

3 files changed

+324
-28
lines changed
Lines changed: 174 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,174 @@
1+
---
2+
title: Transcriber fallback configuration
3+
subtitle: Configure fallback transcribers that activate automatically if your primary transcriber fails.
4+
slug: customization/transcriber-fallback-plan
5+
---
6+
7+
## Overview
8+
9+
Transcriber fallback configuration ensures your calls continue even if your primary speech-to-text provider experiences issues. Your assistant will sequentially fallback to the transcribers you configure, in the exact order you specify.
10+
11+
**Key benefits:**
12+
- **Call continuity** during provider outages
13+
- **Automatic failover** with no user intervention required
14+
- **Provider diversity** to protect against single points of failure
15+
16+
<Note>
17+
Without a fallback plan configured, your call will end with an error if your chosen transcription provider fails.
18+
</Note>
19+
20+
## How it works
21+
22+
When a transcriber failure occurs, Vapi will:
23+
1. Detect the failure of the primary transcriber
24+
2. Switch to the first fallback transcriber in your plan
25+
3. Continue through your specified list if subsequent failures occur
26+
4. Terminate only if all transcribers in your plan have failed
27+
28+
## Configure via Dashboard
29+
30+
<Steps>
31+
<Step title="Open Transcriber tab">
32+
Navigate to your assistant and select the **Transcriber** tab.
33+
</Step>
34+
<Step title="Expand Fallback Transcribers section">
35+
Scroll down to find the **Fallback Transcribers** collapsible section. A warning indicator appears if no fallback transcribers are configured.
36+
</Step>
37+
<Step title="Add a fallback transcriber">
38+
Click **Add Fallback Transcriber** to configure your first fallback:
39+
- Select a **provider** from the dropdown
40+
- Choose a **model** (if the provider offers multiple models)
41+
- Select a **language** for transcription
42+
</Step>
43+
<Step title="Configure provider-specific settings (optional)">
44+
Expand **Additional Configuration** to access provider-specific settings like numerals formatting, VAD settings, and confidence thresholds.
45+
</Step>
46+
<Step title="Add more fallbacks">
47+
Repeat to add additional fallback transcribers. Order matters—the first fallback in your list is tried first.
48+
</Step>
49+
</Steps>
50+
51+
<Note>
52+
If HIPAA or PCI compliance is enabled on your account or assistant, only **Deepgram** and **Azure** transcribers will be available as fallback options.
53+
</Note>
54+
55+
## Configure via API
56+
57+
Add the `fallbackPlan` property to your assistant's transcriber configuration, and specify the fallback transcribers within the `transcribers` property.
58+
59+
```json
60+
{
61+
"transcriber": {
62+
"provider": "deepgram",
63+
"model": "nova-3",
64+
"language": "en",
65+
"fallbackPlan": {
66+
"transcribers": [
67+
{
68+
"provider": "assembly-ai",
69+
"speechModel": "universal-streaming-multilingual",
70+
"language": "en"
71+
},
72+
{
73+
"provider": "azure",
74+
"language": "en-US"
75+
}
76+
]
77+
}
78+
}
79+
}
80+
```
81+
82+
## Provider-specific settings
83+
84+
Each transcriber provider supports different configuration options. Expand the accordion below to see available settings for each provider.
85+
86+
<AccordionGroup>
87+
<Accordion title="Deepgram">
88+
- **model**: Model selection (`nova-3`, `nova-3-general`, `nova-3-medical`, `nova-2`, `flux-general-en`, etc.).
89+
- **language**: Language code for transcription.
90+
- **keywords**: Keywords with optional boost values for improved recognition (e.g., `["companyname", "productname:2"]`).
91+
- **keyterm**: Keyterm prompting for up to 90% keyword recall rate improvement.
92+
- **smartFormat** (boolean): Enable smart formatting for numbers and dates.
93+
- **eotThreshold** (0.5-0.9): End-of-turn confidence threshold. Only available with Flux models.
94+
- **eotTimeoutMs** (500-10000): Maximum time to wait after speech before finalizing turn. Only available with Flux models. Default is 5000ms.
95+
</Accordion>
96+
<Accordion title="AssemblyAI">
97+
- **language**: Language code (`multi` for multilingual, `en` for English).
98+
- **speechModel**: Streaming speech model (`universal-streaming-english` or `universal-streaming-multilingual`).
99+
- **wordBoost**: Custom vocabulary array (up to 2500 characters total).
100+
- **keytermsPrompt**: Array of keyterms for improved recognition (up to 100 terms, 50 characters each). Costs additional $0.04/hour.
101+
- **endUtteranceSilenceThreshold**: Duration of silence in milliseconds to detect end of utterance.
102+
- **disablePartialTranscripts** (boolean): Set to `true` to disable partial transcripts.
103+
- **confidenceThreshold** (0-1): Minimum confidence threshold for accepting transcriptions. Default is 0.4.
104+
- **vadAssistedEndpointingEnabled** (boolean): Enable VAD-based endpoint detection.
105+
</Accordion>
106+
<Accordion title="Azure">
107+
- **language**: Language code in BCP-47 format (e.g., `en-US`, `es-MX`, `fr-FR`).
108+
- **segmentationSilenceTimeoutMs** (100-5000): Duration of silence after which a phrase is finalized. Configure to adjust sensitivity to pauses.
109+
- **segmentationMaximumTimeMs** (20000-70000): Maximum duration a segment can reach before being cut off.
110+
- **segmentationStrategy**: Controls phrase boundary detection. Options: `Default`, `Time`, or `Semantic`.
111+
</Accordion>
112+
<Accordion title="Gladia">
113+
- **model**: Model selection (`fast`, `accurate`, or `solaria-1`).
114+
- **language**: Language code.
115+
- **confidenceThreshold** (0-1): Minimum confidence for transcription acceptance. Default is 0.4.
116+
- **endpointing** (0.01-10): Time in seconds to wait before considering speech ended.
117+
- **speechThreshold** (0-1): Speech detection sensitivity (0.0 to 1.0).
118+
- **prosody** (boolean): Enable prosody detection (laugh, giggle, music, etc.).
119+
- **audioEnhancer** (boolean): Pre-process audio for improved accuracy (increases latency).
120+
- **transcriptionHint**: Hint text to guide transcription.
121+
- **customVocabularyEnabled** (boolean): Enable custom vocabulary.
122+
- **customVocabularyConfig**: Custom vocabulary configuration with vocabulary array and default intensity.
123+
- **region**: Processing region (`us-west` or `eu-west`).
124+
- **receivePartialTranscripts** (boolean): Enable partial transcript delivery.
125+
</Accordion>
126+
<Accordion title="Speechmatics">
127+
- **model**: Model selection (currently only `default`).
128+
- **language**: Language code.
129+
- **operatingPoint**: Accuracy level. `standard` for faster turnaround, `enhanced` for highest accuracy. Default is `enhanced`.
130+
- **region**: Processing region (`eu` for Europe, `us` for United States). Default is `eu`.
131+
- **enableDiarization** (boolean): Enable speaker identification for multi-speaker conversations.
132+
- **maxDelayMs**: Maximum delay in milliseconds for partial transcripts. Balances latency and accuracy.
133+
</Accordion>
134+
<Accordion title="Google">
135+
- **model**: Gemini model selection.
136+
- **language**: Language selection (e.g., `Multilingual`, `English`, `Spanish`, `French`).
137+
</Accordion>
138+
<Accordion title="OpenAI">
139+
- **model**: OpenAI Realtime STT model selection (required).
140+
- **language**: Language code for transcription.
141+
</Accordion>
142+
<Accordion title="ElevenLabs">
143+
- **model**: Model selection (currently only `scribe_v1`).
144+
- **language**: ISO 639-1 language code.
145+
</Accordion>
146+
<Accordion title="Cartesia">
147+
- **model**: Model selection (currently only `ink-whisper`).
148+
- **language**: ISO 639-1 language code.
149+
</Accordion>
150+
</AccordionGroup>
151+
152+
## Best practices
153+
154+
- Use **different providers** for fallbacks to protect against provider-wide outages.
155+
- Consider **language compatibility** when selecting fallbacks—ensure all fallback transcribers support your required languages.
156+
- Test your fallback configuration to ensure smooth transitions between transcribers.
157+
- For **HIPAA/PCI compliance**, ensure all fallbacks are compliant providers (Deepgram or Azure).
158+
159+
## FAQ
160+
161+
<AccordionGroup>
162+
<Accordion title="Which providers support fallback?">
163+
All major transcriber providers are supported: Deepgram, AssemblyAI, Azure, Gladia, Google, Speechmatics, Cartesia, ElevenLabs, and OpenAI.
164+
</Accordion>
165+
<Accordion title="Does fallback affect pricing?">
166+
No additional fees for using fallback transcribers. You are only billed for the transcriber that processes the audio.
167+
</Accordion>
168+
<Accordion title="How fast is the failover?">
169+
Failover typically occurs within milliseconds of detecting a failure, ensuring minimal disruption to the call.
170+
</Accordion>
171+
<Accordion title="Can I use different languages for fallbacks?">
172+
Yes, each fallback transcriber can have its own language configuration. However, for the best user experience, we recommend using the same or similar languages across all fallbacks.
173+
</Accordion>
174+
</AccordionGroup>

fern/docs.yml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -160,8 +160,10 @@ navigation:
160160
path: customization/speech-configuration.mdx
161161
- page: Voice pipeline configuration
162162
path: customization/voice-pipeline-configuration.mdx
163-
- page: Voice fallback plan
163+
- page: Voice fallback configuration
164164
path: voice-fallback-plan.mdx
165+
- page: Transcriber fallback configuration
166+
path: customization/transcriber-fallback-plan.mdx
165167
- page: OpenAI realtime speech-to-speech
166168
path: openai-realtime.mdx
167169
- page: Provider keys

fern/voice-fallback-plan.mdx

Lines changed: 147 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -1,62 +1,182 @@
11
---
2-
title: Voice Fallback Plan
2+
title: Voice fallback configuration
33
subtitle: Configure fallback voices that activate automatically if your primary voice fails.
44
slug: voice-fallback-plan
55
---
66

7-
<Note>
8-
Voice fallback plans can currently only be configured through the API. We are working on making this available through our dashboard.
9-
</Note>
10-
11-
## Introduction
7+
## Overview
128

13-
Voice fallback plans give you the ability to continue your call in the event that your primary voice fails. Your assistant will sequentially fallback to only the voices you configure within your plan, in the exact order you specify.
9+
Voice fallback configuration gives you the ability to continue your call in the event that your primary voice fails. Your assistant will sequentially fallback to only the voices you configure within your plan, in the exact order you specify.
1410

1511
<Note>
1612
Without a fallback plan configured, your call will end with an error in the event that your chosen voice provider fails.
1713
</Note>
1814

19-
## How It Works
15+
## How it works
2016

2117
When a voice failure occurs, Vapi will:
2218
1. Detect the failure of the primary voice
2319
2. If a custom fallback plan exists:
24-
- Switch to the first fallback voice in your plan
25-
- Continue through your specified list if subsequent failures occur
26-
- Terminate only if all voices in your plan have failed
20+
- Switch to the first fallback voice in your plan
21+
- Continue through your specified list if subsequent failures occur
22+
- Terminate only if all voices in your plan have failed
2723

28-
## Configuration
24+
## Configure via Dashboard
25+
26+
<Steps>
27+
<Step title="Open Voice tab">
28+
Navigate to your assistant and select the **Voice** tab.
29+
</Step>
30+
<Step title="Expand Fallback Voices section">
31+
Scroll down to find the **Fallback Voices** collapsible section. A warning indicator appears if no fallback voices are configured.
32+
</Step>
33+
<Step title="Add a fallback voice">
34+
Click **Add Fallback Voice** to configure your first fallback:
35+
- Select a **provider** from the dropdown (supports 20+ voice providers)
36+
- Choose a **voice** from the searchable popover (shows gender, language, and deprecated status)
37+
- The **model** is automatically selected based on your voice choice
38+
</Step>
39+
<Step title="Configure provider-specific settings (optional)">
40+
Expand **Additional Configuration** to access provider-specific settings like stability, speed, and emotion controls.
41+
</Step>
42+
<Step title="Add more fallbacks">
43+
Repeat to add additional fallback voices. Order matters—the first fallback in your list is tried first.
44+
</Step>
45+
</Steps>
46+
47+
## Configure via API
2948

3049
Add the `fallbackPlan` property to your assistant's voice configuration, and specify the fallback voices within the `voices` property.
31-
- Please note that fallback voices must be valid JSON configurations, and not strings.
32-
- The order matters. Vapi will choose fallback voices starting from the beginning of the list.
50+
51+
<Note>
52+
Fallback voices must be valid JSON configurations, not strings. The order matters—Vapi will choose fallback voices starting from the beginning of the list.
53+
</Note>
3354

3455
```json
3556
{
3657
"voice": {
3758
"provider": "openai",
3859
"voiceId": "shimmer",
3960
"fallbackPlan": {
40-
"voices": [
41-
{
42-
"provider": "cartesia",
43-
"voiceId": "248be419-c632-4f23-adf1-5324ed7dbf1d"
44-
},
45-
{
46-
"provider": "11labs",
47-
"voiceId": "cgSgspJ2msm6clMCkdW9"
48-
}
49-
]
61+
"voices": [
62+
{
63+
"provider": "cartesia",
64+
"voiceId": "248be419-c632-4f23-adf1-5324ed7dbf1d"
65+
},
66+
{
67+
"provider": "11labs",
68+
"voiceId": "cgSgspJ2msm6clMCkdW9",
69+
"stability": 0.5,
70+
"similarityBoost": 0.75
71+
}
72+
]
5073
}
5174
}
5275
}
5376
```
5477

78+
## Provider-specific settings
79+
80+
Each voice provider supports different configuration options. Expand the accordion below to see available settings for each provider.
81+
82+
<AccordionGroup>
83+
<Accordion title="ElevenLabs">
84+
- **stability** (0-1): Controls voice consistency. Lower values allow more emotional range; higher values produce more stable output.
85+
- **similarityBoost** (0-1): Enhances similarity to the original voice. Higher values make the voice more similar to the reference.
86+
- **style** (0-1): Voice style intensity. Higher values amplify the speaker's style.
87+
- **useSpeakerBoost** (boolean): Enable to boost similarity to the original speaker.
88+
- **speed** (0.7-1.2): Speech speed multiplier. Default is 1.0.
89+
- **optimizeStreamingLatency** (0-4): Controls streaming latency optimization. Default is 3.
90+
- **enableSsmlParsing** (boolean): Enable SSML pronunciation support.
91+
- **model**: Select from `eleven_multilingual_v2`, `eleven_turbo_v2`, `eleven_turbo_v2_5`, `eleven_flash_v2`, `eleven_flash_v2_5`, or `eleven_monolingual_v1`.
92+
</Accordion>
93+
<Accordion title="Cartesia">
94+
- **model**: Model selection (`sonic-english`, `sonic-3`, etc.).
95+
- **language**: Language code for the voice.
96+
- **experimentalControls.speed**: Speech speed adjustment (-1 to 1). Negative values slow down; positive values speed up.
97+
- **experimentalControls.emotion**: Array of emotion configurations (e.g., `["happiness:high", "curiosity:medium"]`).
98+
- **generationConfig** (sonic-3 only):
99+
- **speed** (0.6-1.5): Fine-grained speed control.
100+
- **volume** (0.5-2.0): Volume adjustment.
101+
- **experimental.accentLocalization** (0 or 1): Toggle accent localization.
102+
</Accordion>
103+
<Accordion title="Azure">
104+
- **speed** (0.5-2): Speech rate multiplier. Default is 1.0.
105+
</Accordion>
106+
<Accordion title="OpenAI">
107+
- **speed** (0.25-4): Speech speed multiplier. Default is 1.0.
108+
- **model**: Select from `tts-1`, `tts-1-hd`, or realtime models.
109+
- **instructions**: Voice prompt to control the generated audio style. Does not work with `tts-1` or `tts-1-hd` models.
110+
</Accordion>
111+
<Accordion title="LMNT">
112+
- **speed** (0.25-2): Speech rate multiplier. Default is 1.0.
113+
- **language**: Two-letter ISO 639-1 language code, or `auto` for auto-detection.
114+
</Accordion>
115+
<Accordion title="Rime AI">
116+
- **model**: Select from `arcana`, `mistv2`, or `mist`. Defaults to `arcana`.
117+
- **speed** (0.1+): Speech speed multiplier.
118+
- **pauseBetweenBrackets** (boolean): Enable pause control using angle brackets (e.g., `<200>` for 200ms pause).
119+
- **phonemizeBetweenBrackets** (boolean): Enable phonemization using curly brackets (e.g., `{h'El.o}`).
120+
- **reduceLatency** (boolean): Optimize for reduced streaming latency.
121+
- **inlineSpeedAlpha**: Inline speed control using alpha notation.
122+
</Accordion>
123+
<Accordion title="PlayHT">
124+
- **speed** (0.1-5): Speech rate multiplier.
125+
- **temperature** (0.1-2): Controls voice variance. Lower values are more predictable; higher values allow more variation.
126+
- **emotion**: Emotion preset (e.g., `female_happy`, `male_sad`, `female_angry`, `male_surprised`).
127+
- **voiceGuidance** (1-6): Controls voice uniqueness. Lower values reduce uniqueness.
128+
- **styleGuidance** (1-30): Controls emotion intensity. Higher values create more emotional performance.
129+
- **textGuidance** (1-2): Controls text adherence. Higher values are more accurate to input text.
130+
- **model**: Select from `PlayHT2.0`, `PlayHT2.0-turbo`, `Play3.0-mini`, or `PlayDialog`.
131+
</Accordion>
132+
<Accordion title="Deepgram">
133+
- **model**: Select from `aura` or `aura-2`. Defaults to `aura-2`.
134+
- **mipOptOut** (boolean): Opt out of the Deepgram Model Improvement Partnership program.
135+
</Accordion>
136+
<Accordion title="Hume">
137+
- **model**: Model selection (e.g., `octave2`).
138+
- **description**: Natural language instructions describing how the speech should sound (tone, intonation, pacing, accent).
139+
- **isCustomHumeVoice** (boolean): Indicates whether using a custom Hume voice.
140+
</Accordion>
141+
<Accordion title="Minimax">
142+
- **model**: Select from `speech-02-hd` (high-fidelity) or `speech-02-turbo` (low latency). Defaults to `speech-02-turbo`.
143+
- **emotion**: Emotion preset (`happy`, `sad`, `angry`, `fearful`, `surprised`, `disgusted`, `neutral`).
144+
- **pitch** (-12 to 12): Voice pitch adjustment in semitones.
145+
- **speed** (0.5-2): Speech speed adjustment.
146+
- **volume** (0.5-2): Volume adjustment.
147+
</Accordion>
148+
<Accordion title="WellSaid">
149+
- **model**: Model selection.
150+
- **enableSsml** (boolean): Enable limited SSML translation for input text.
151+
- **libraryIds**: Array of library IDs to use for voice synthesis.
152+
</Accordion>
153+
<Accordion title="Neuphonic">
154+
- **model**: Model selection (e.g., `neu_fast`).
155+
- **language**: Language code (required).
156+
- **speed** (0.25-2): Speech speed multiplier.
157+
</Accordion>
158+
<Accordion title="SmallestAI">
159+
- **model**: Model selection (e.g., `lightning`).
160+
- **speed**: Speech speed multiplier.
161+
</Accordion>
162+
</AccordionGroup>
163+
55164
## Best practices
56165

57-
- Use <b>different providers</b> for your fallback voices to protect against provider-wide outages.
166+
- Use **different providers** for your fallback voices to protect against provider-wide outages.
58167
- Select voices with **similar characteristics** (tone, accent, gender) to maintain consistency in the user experience.
168+
- Test your fallback configuration to ensure smooth transitions between voices.
59169

60-
## How will pricing work?
170+
## FAQ
61171

62-
There is no change to the pricing of the voices. Your call will not incur any extra fees while using fallback voices, and you will be able to see the cost for each voice in your end-of-call report.
172+
<AccordionGroup>
173+
<Accordion title="How will pricing work?">
174+
There is no change to the pricing of the voices. Your call will not incur any extra fees while using fallback voices, and you will be able to see the cost for each voice in your end-of-call report.
175+
</Accordion>
176+
<Accordion title="How many fallback voices can I configure?">
177+
You can configure as many fallback voices as you need. However, we recommend 2-3 fallbacks from different providers for optimal reliability.
178+
</Accordion>
179+
<Accordion title="Will users notice when a fallback is activated?">
180+
Users may notice a brief pause and a change in voice characteristics when switching to a fallback voice. Selecting voices with similar properties helps minimize this disruption.
181+
</Accordion>
182+
</AccordionGroup>

0 commit comments

Comments
 (0)