-
Notifications
You must be signed in to change notification settings - Fork 494
feat: GeminiEmbedding rate-limit handling #2237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: GeminiEmbedding rate-limit handling #2237
Conversation
…e limit, add example/test
|
@llamaindex/autotool
@llamaindex/community
@llamaindex/core
@llamaindex/env
@llamaindex/experimental
llamaindex
@llamaindex/node-parser
@llamaindex/readers
@llamaindex/tools
@llamaindex/wasm-tools
@llamaindex/workflow
@llamaindex/anthropic
@llamaindex/assemblyai
@llamaindex/aws
@llamaindex/clip
@llamaindex/cohere
@llamaindex/deepinfra
@llamaindex/deepseek
@llamaindex/discord
@llamaindex/excel
@llamaindex/fireworks
@llamaindex/google
@llamaindex/groq
@llamaindex/huggingface
@llamaindex/jinaai
@llamaindex/mistral
@llamaindex/mixedbread
@llamaindex/notion
@llamaindex/ollama
@llamaindex/openai
@llamaindex/perplexity
@llamaindex/portkey-ai
@llamaindex/replicate
@llamaindex/together
@llamaindex/vercel
@llamaindex/vllm
@llamaindex/voyage-ai
@llamaindex/xai
@llamaindex/astra
@llamaindex/azure
@llamaindex/chroma
@llamaindex/elastic-search
@llamaindex/firestore
@llamaindex/milvus
@llamaindex/mongodb
@llamaindex/pinecone
@llamaindex/postgres
@llamaindex/qdrant
@llamaindex/supabase
@llamaindex/upstash
@llamaindex/weaviate
commit: |
Google Gemini has rate-limits on embeddings -- 3000 vectors per minute. Right now, if we're generating embeddings via LlamaIndex (e.g. with
VectorStoreIndex.initor similar methods), if we hit the rate limits, it just errors out, with no ability to wait or restart.This proposed addition would wait 5s and retry, up to 20 times any embed call that fails with a rate limit error. That up-to-100s wait gets you out of the per-minute limit -- so that the requests-per-minute limit is seamlessly handled by applications using LlamaIndex.
I've added an example file that fails with the existing main branch of llamaindexts, but succeeds with this patch. You can run it with
ts-node examples/models/gemini/embedding_ratelimits.ts. (I didn't really know how write a proper jest test for this, without actually hitting the Gemini API and without faking the way that the Google AI library throws errors. Rather than tightly couple the test to the current behavior of the Google AI library, I wrote an example that does hit the Gemini API.)