Research papers are everywhere, but finding answers across multiple papers is painful. You either:
- Spend hours manually reading through PDFs (slow)
- Use keyword search and get irrelevant results (noisy)
- Copy paste text into ChatGPT and lose source attribution (dangerous)
PaperPro solves this. It understands research context semantically, finds the most relevant papers, and generates answers grounded in real citations.
-
Semantic querying over 20+ research papers
-
Hybrid Retrieval:
- BM25 keyword retrieval
- Dense vector similarity retrieval
-
Cohere reranking for relevance optimization
-
Query rewriting using LLMs
-
Citation-aware answer generation
-
ChromaDB vector database with HNSW indexing
-
Gemini 2.5 Flash integration
-
DeepEval for RAG evaluation
-
Streamlit frontend for interactive querying
-
Optimized low latency inference pipeline
-
Source aware contextual responses
User Query
↓
Query Rewriting
↓
Hybrid Retrieval
├── BM25 Retrieval
└── Dense Vector Retrieval
↓
Document Deduplication
↓
Cohere Reranking
↓
Context Construction
↓
Gemini 2.5 Flash
↓
Grounded Answer + Citations
- Python
- LangChain
- ChromaDB
- Sentence Transformers
- BM25 Retriever
- Gemini 2.5 Flash
- Cohere Rerank API
- Streamlit
- DeepEval
PaperPro uses a hybrid retrieval pipeline combining:
Captures:
- exact keyword matches
- acronyms
- paper-specific terminology
- lexical similarity
Uses Sentence Transformers embeddings to capture:
- semantic meaning
- paraphrased queries
- contextual similarity
This improves retrieval quality significantly over standalone retrieval methods.
Dense embeddings are stored inside ChromaDB using:
- persistent vector storage
- HNSW indexing
- efficient similarity search
Benefits:
- fast retrieval
- scalable vector search
- persistent storage
Before retrieval, ambiguous or vague user queries are rewritten using Gemini.
Example:
Input:
How can models become more reliable?
Rewritten:
What techniques improve factual reliability in large language models during inference and deployment?
This improves:
- retrieval relevance
- semantic precision
- reranking quality
Retrieved documents are reranked using Cohere Rerank API.
Purpose:
- reorder retrieved chunks by semantic relevance
- filter noisy retrievals
- improve context quality before generation
This significantly improves final answer quality.
The final response is generated using:
- Gemini 2.5 Flash
- context-aware prompting
- source-grounded synthesis
The model:
- synthesizes across multiple papers
- cites relevant sources
- avoids hallucination when context is insufficient
Initial end-to-end query latency:
~2 minutes
Optimized latency:
~9 seconds
Used:
@st.cache_resourceto avoid reloading vector stores on every rerun.
Prevented reconstruction of BM25 indexes for each query.
Cached paper metadata and sidebar information to reduce Streamlit reruns.
Optimized retrieval flow to minimize unnecessary generation overhead.
Models are initialized once and reused across requests.
PaperPro includes a DeepEval-based evaluation framework.
Measures how well generated answers address the query.
Measures whether generated responses remain grounded in retrieved context.
Measures whether the retrieval system fetched sufficient relevant information.
git clone https://github.com/yourusername/MLResearchRAG.git
cd MLResearchRAGpython -m venv venvActivate:
venv\Scripts\activatesource venv/bin/activatepip install -r requirements.txtCreate .env
GEMINI_API_KEY=your_key
COHERE_API_KEY=your_keystreamlit run app.py- Multi-paper comparative reasoning
- Conversational memory
- PDF upload support
- Research graph visualization
- Agentic retrieval workflows
- Citation export
- Multi-modal paper understanding
- RAGAS evaluation integration
- Streaming token generation
- Redis caching layer
- GPU inference optimization
The application is deployed using:
- Streamlit Cloud
Can also be deployed on:
- HuggingFace Spaces
- Render
- Railway
- AWS/GCP/Azure
Through this project:
- implemented production-style RAG architectures
- optimized low-latency retrieval systems
- worked with vector databases and semantic search
- designed evaluation pipelines for generative AI systems
- improved retrieval grounding and hallucination reduction
- built scalable AI-assisted research workflows
MIT License
Roopasree Computer Science Engineering, NIT Trichy
Made with <3 for Researchers