perf: share few-shot preamble across prompts via PromptParts#447
perf: share few-shot preamble across prompts via PromptParts#447dan504512 wants to merge 6 commits into
Conversation
|
Your branch is 1 commits behind git fetch origin main
git merge origin/main
git pushNote: Enable "Allow edits by maintainers" to allow automatic updates. |
…ompts Add a frozen PromptParts dataclass that splits rendered prompts into prefix (description + context), examples (large, shared by reference), and suffix (question + answer prefix). QAPromptGenerator caches the formatted examples text in __post_init__ and exposes render_parts() which returns a PromptParts whose examples field is always the same string object. render() is reimplemented as str(render_parts(...)). PromptBuilder.build_prompt() and ContextAwarePromptBuilder.build_prompt() now return PromptParts instead of str, so downstream consumers receive structured prompts that share the large examples allocation. Closes google#446
When _build_request receives a PromptParts with non-empty examples, emit three text parts in contents[0].parts instead of one concatenated string. The middle part holds the shared examples reference, so 10,000 requests share one ~300 KB string instead of duplicating it per request. Gemini single-prompt, OpenAI, and Ollama providers convert PromptParts to str at their entry points; since they process prompts one at a time (or in small thread pools), the temporary string has negligible memory impact.
Update PromptBuilder, ContextAwarePromptBuilder, Annotator, and extract() tests to work with PromptParts instead of plain strings. Add test_build_prompt_shares_examples_reference and test_context_aware_shares_examples_reference to verify the memory- sharing invariant (all prompts from the same generator share a single examples string object via `assertIs`).
Convert PromptParts to str before inserting into cache key_data dicts so that the SHA256 hash matches the old string-based format. This avoids a full cache miss on upgrade. The str() call creates one temporary string per prompt, processed sequentially, so peak memory is unchanged.
b9c1238 to
221917c
Compare
…ptParts" The str(prompt) conversion in key_data dicts negates the memory optimization from PromptParts by materializing 10,000 × ~640 KB concatenated strings in key_data_list (6.4 GB). Without it, PromptParts serializes via dataclasses.asdict in _json_default, keeping the shared examples reference intact. Cache keys will differ from pre-PromptParts entries, but those expire via GCS lifecycle (retention_days) anyway. This reverts commit b9c1238.
…hash Convert non-primitive values (e.g. PromptParts) to str inside _compute_hash rather than at key_data construction time. This keeps PromptParts references in key_data_list (shared examples, ~0.4 MB) while producing hashes identical to the old string-based format (cache compat preserved). Only one transient str copy exists at a time during sequential hash computation. Replaces the reverted str(prompt) approach which materialized all prompts upfront in key_data_list, negating the PromptParts memory optimization (10,000 × 640 KB = 6.4 GB).
|
Your branch is 6 commits behind git fetch origin main
git merge origin/main
git pushNote: Enable "Allow edits by maintainers" to allow automatic updates. |
|
Your branch is 13 commits behind git fetch origin main
git merge origin/main
git pushNote: Enable "Allow edits by maintainers" to allow automatic updates. |
1 similar comment
|
Your branch is 13 commits behind git fetch origin main
git merge origin/main
git pushNote: Enable "Allow edits by maintainers" to allow automatic updates. |
|
Your branch is 15 commits behind git fetch origin main
git merge origin/main
git pushNote: Enable "Allow edits by maintainers" to allow automatic updates. |
|
Your branch is 20 commits behind git fetch origin main
git merge origin/main
git pushNote: Enable "Allow edits by maintainers" to allow automatic updates. |
4 similar comments
|
Your branch is 20 commits behind git fetch origin main
git merge origin/main
git pushNote: Enable "Allow edits by maintainers" to allow automatic updates. |
|
Your branch is 20 commits behind git fetch origin main
git merge origin/main
git pushNote: Enable "Allow edits by maintainers" to allow automatic updates. |
|
Your branch is 20 commits behind git fetch origin main
git merge origin/main
git pushNote: Enable "Allow edits by maintainers" to allow automatic updates. |
|
Your branch is 20 commits behind git fetch origin main
git merge origin/main
git pushNote: Enable "Allow edits by maintainers" to allow automatic updates. |
Fixes #446
Description
Share few-shot preamble across prompts via a new
PromptPartsdataclass,reducing batch prompt memory from O(N × preamble_size) to O(1 × preamble_size + N × small_parts).
PromptParts(prefix, examples, suffix)frozen dataclass inprompting.pywith__str__()for backward-compatible string conversionQAPromptGeneratorcaches formatted examples in__post_init__and exposesrender_parts()returningPromptParts;render()reimplemented viastr(render_parts(...))PromptBuilder.build_prompt()andContextAwarePromptBuilder.build_prompt()returnPromptPartsinstead ofstr_build_request()ingemini_batch.pyemits 3 text parts incontents[0].partswhen givenPromptPartswith non-empty examples, keeping the shared reference intactPromptPartstostrat entry point (negligible memory impact since they process one at a time)GCSBatchCache._compute_hash()resolves non-primitive values (e.g.PromptParts) tostrtransiently at hash time, producing hashes identical to the old string-based format. Only one temporary concatenated string exists at a time during sequential hash computation, so peak memory is unaffected. Cache compatibility is preserved — no cache misses on upgrade.Expected impact at
batch_length=10000with ~300 KB examples:10,000 × 640 KB = 6.4 GB → 1 × 640 KB + 10,000 × ~1 KB = ~10 MB (640× reduction)
Memory benchmarks
Tested with 1000 prompts, ~154 KB examples per prompt:
main(str only)How Has This Been Tested?
test_build_prompt_shares_examples_referenceandtest_context_aware_shares_examples_referenceverify the memory-sharing invariant viaassertIs(all prompts from the same generator share a single examples string object)str(PromptParts)produces byte-identical output to oldrender()for all 4 cases (examples/no-examples × context/no-context)_compute_hashproduces identical hashes forPromptPartsvs plainstrprompts, confirming cache key stabilityChecklist:
pylintover the affected code.