Skip to content

Commit 55a6618

Browse files
authored
Update --max-batch-total-tokens description (#3083)
* Update `--max-batch-total-tokens` description * Update docstring in `launcher/src/main.rs` instead
1 parent 036d802 commit 55a6618

File tree

2 files changed

+3
-3
lines changed

2 files changed

+3
-3
lines changed

docs/source/reference/launcher.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -198,7 +198,7 @@ Options:
198198
199199
For `max_batch_total_tokens=1000`, you could fit `10` queries of `total_tokens=100` or a single query of `1000` tokens.
200200
201-
Overall this number should be the largest possible amount that fits the remaining memory (after the model is loaded). Since the actual memory overhead depends on other parameters like if you're using quantization, flash attention or the model implementation, text-generation-inference cannot infer this number automatically.
201+
Overall this number should be the largest possible amount that fits the remaining memory (after the model is loaded). Since the actual memory overhead depends on other parameters like if you're using quantization, flash attention or the model implementation, text-generation-inference infers this number automatically if not provided ensuring that the value is as large as possible.
202202

203203
[env: MAX_BATCH_TOTAL_TOKENS=]
204204

launcher/src/main.rs

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -702,8 +702,8 @@ struct Args {
702702
/// Overall this number should be the largest possible amount that fits the
703703
/// remaining memory (after the model is loaded). Since the actual memory overhead
704704
/// depends on other parameters like if you're using quantization, flash attention
705-
/// or the model implementation, text-generation-inference cannot infer this number
706-
/// automatically.
705+
/// or the model implementation, text-generation-inference infers this number automatically
706+
/// if not provided ensuring that the value is as large as possible.
707707
#[clap(long, env)]
708708
max_batch_total_tokens: Option<u32>,
709709

0 commit comments

Comments
 (0)