Oncology Literature RAG Engine — PubMed-grounded retrieval-augmented generation with evidence scoring, contradiction detection, and research gap analysis.
LitRAG is Module 1 of the three-part ONCO-LENS platform. It ingests PubMed articles, embeds them into a pgvector index, and answers clinical oncology questions with graded, citable evidence. Modules 2 (TrialRadar) and 3 (EvidenceGraph) consume LitRAG's NER output and trial-match results.
- Architecture Overview
- Prerequisites
- Quick Start (Docker Compose)
- Local Development Setup
- Configuration Reference
- Database Migrations
- Running the Ingestion Pipeline
- API Reference
- Running Tests
- Kubernetes Deployment
- Helm Deployment
- Monitoring and Observability
- Integration Contracts (Modules 2 & 3)
- Project Structure
- Troubleshooting
┌─────────────────────────────────────────────────────────────┐
│ User / Client │
└───────────────────────────┬─────────────────────────────────┘
│ POST /litrag/v1/query
▼
┌─────────────────────────────────────────────────────────────┐
│ FastAPI Service (port 8001) │
│ QueryEngine (10-step pipeline): │
│ 1. QueryRouter → classify intent (evidence/contradiction/│
│ gap/trial) │
│ 2. QueryExpander → 2-3 Claude-generated variants │
│ 3. EmbeddingClient → OpenAI text-embedding-3-small │
│ 4. PgVectorRetriever → HNSW ANN (top-k=50) │
│ 5. RerankerClient → BAAI/bge-reranker-v2-m3 (top-n=10) │
│ 6. EvidenceScorerClient → 4-dim regression (top-m=5) │
│ 7. ContradictionDetector → Claude pair-wise check │
│ 8. GapRecommender → Claude gap analysis │
│ 9. Synthesis → Claude grounded answer │
│ 10. Cache → Redis-like TTL per query hash │
└────────────┬──────────────────────────┬─────────────────────┘
│ │
▼ ▼
┌─────────────────────┐ ┌─────────────────────────────────┐
│ PostgreSQL 15 │ │ TorchServe (port 8080) │
│ + pgvector │ │ • bge-reranker-v2-m3 │
│ Schema: onco_lens │ │ • PubMedBERT evidence scorer │
│ HNSW index (m=16) │ └─────────────────────────────────┘
└─────────────────────┘
┌─────────────────────┐ ┌─────────────────────────────────┐
│ Ingestion Pipeline │ │ MLflow (port 5000) │
│ (Prefect flow) │ │ Experiment: onco-lens/evidence- │
│ PubMed → chunks → │ │ scorer │
│ embeddings → NER │ └─────────────────────────────────┘
└─────────────────────┘
| Requirement | Version | Notes |
|---|---|---|
| Python | 3.11+ | 3.13 supported |
| Docker & Docker Compose | 24+ / v2+ | For the full stack |
| PostgreSQL | 15+ with pgvector | Provided via compose |
| API keys | — | Anthropic + OpenAI (required) |
| NCBI API key | — | Optional; raises rate limit from 3 to 10 req/s |
The fastest way to get the full five-service stack running.
git clone https://github.com/viche72/OncoLens.git
cd OncoLens
cp .env.example .envEdit .env and add your API keys:
ANTHROPIC_API_KEY=sk-ant-your-key-here
OPENAI_API_KEY=sk-your-key-here
# Optional — raises PubMed rate limit from 3 to 10 req/s
NCBI_API_KEY=your-ncbi-key-heredocker compose up --buildAll five services start with health checks. Typical startup time is 60–90 seconds. You can watch readiness with:
docker compose ps # wait for (healthy)
docker compose logs -f litrag-svccurl http://localhost:8001/internal/health
# → {"status":"healthy","version":"1.0.0","db":"ok"}curl -X POST http://localhost:8001/litrag/v1/query \
-H "Content-Type: application/json" \
-d '{
"query": "What is the efficacy of pembrolizumab in metastatic NSCLC?",
"top_m": 5
}'Navigate to http://localhost:8001/litrag/v1/docs in your browser.
Use this path when you want to run tests, iterate on code, or run the service without Docker.
python -m venv .venv
# Windows
.venv\Scripts\activate
# macOS/Linux
source .venv/bin/activatepip install -r requirements.txtThe NER extractor requires the en_core_sci_sm model. It is downloaded automatically during the Docker build, but for local dev run:
pip install https://s3-us-west-2.amazonaws.com/ai2-s3-public/scispacy/releases/v0.5.4/en_core_sci_sm-0.5.4.tar.gzYou can start only the infrastructure services without building the litrag image:
docker compose up postgres torchserve mlflow prometheus -dexport DATABASE_URL="postgresql+asyncpg://postgres:postgres@localhost:5432/onco_lens"
export ANTHROPIC_API_KEY="sk-ant-..."
export OPENAI_API_KEY="sk-..."Or create a .env file — the app uses pydantic-settings which reads it automatically.
alembic upgrade headuvicorn api.main:app --host 0.0.0.0 --port 8001 --reloadAll settings are read from environment variables (or .env). Required variables have no default and must be set before startup.
| Variable | Description |
|---|---|
ANTHROPIC_API_KEY |
Anthropic API key for Claude synthesis, contradiction detection, gap analysis, and query expansion |
OPENAI_API_KEY |
OpenAI API key for text-embedding-3-small embeddings |
| Variable | Default | Description |
|---|---|---|
DATABASE_URL |
postgresql+asyncpg://postgres:postgres@localhost:5432/onco_lens |
Async PostgreSQL connection string |
DB_POOL_SIZE |
10 |
SQLAlchemy async pool size |
DB_MAX_OVERFLOW |
20 |
Pool overflow limit |
| Variable | Default | Description |
|---|---|---|
LITRAG_TOP_K |
50 |
Candidates retrieved from pgvector (ANN stage) |
LITRAG_TOP_N |
10 |
Candidates kept after cross-encoder reranking |
LITRAG_TOP_M |
5 |
Final chunks sent to Claude for synthesis |
LITRAG_EMBEDDING_MODEL |
text-embedding-3-small |
OpenAI embedding model |
LITRAG_EMBEDDING_BATCH_SIZE |
100 |
Chunk batch size for embedding calls |
LITRAG_DEVICE |
cpu |
Compute device: cpu, cuda, cuda:0, mps |
| Variable | Default | Description |
|---|---|---|
LITRAG_CLAUDE_MODEL |
claude-sonnet-4-20250514 |
Claude model for synthesis and analysis |
LITRAG_CLAUDE_MAX_TOKENS |
4096 |
Maximum tokens in Claude responses |
LITRAG_CLAUDE_TEMPERATURE |
0.0 |
Temperature for Claude calls |
| Variable | Default | Description |
|---|---|---|
TORCHSERVE_HOST |
localhost |
TorchServe hostname (use torchserve in compose) |
TORCHSERVE_INFERENCE_PORT |
8080 |
Inference API port |
TORCHSERVE_RERANKER_MODEL_NAME |
bge-reranker-v2-m3 |
Registered model name |
TORCHSERVE_SCORER_MODEL_NAME |
evidence-scorer |
Registered model name |
TORCHSERVE_TIMEOUT_SECONDS |
30 |
Per-request HTTP timeout |
The four dimension weights must sum to 1.0 (±0.01):
| Variable | Default | Description |
|---|---|---|
EV_WEIGHT_STUDY_DESIGN |
0.40 |
RCT/meta-analysis vs case report |
EV_WEIGHT_SAMPLE_SIZE |
0.25 |
Cohort size regression score |
EV_WEIGHT_RECENCY |
0.20 |
Publication date decay |
EV_WEIGHT_JOURNAL_CREDIBILITY |
0.15 |
Journal Q1–Q3 tier score |
| Variable | Default | Description |
|---|---|---|
NCBI_API_KEY |
(none) | Optional — raises rate limit to 10 req/s |
NCBI_RATE_LIMIT_RPS |
10 |
Requests per second to NCBI |
NCBI_BACKOFF_MAX_RETRIES |
5 |
Max exponential backoff retries |
| Variable | Default | Description |
|---|---|---|
CHUNK_TOKEN_WINDOW |
512 |
Tokens per chunk |
CHUNK_TOKEN_OVERLAP |
64 |
Overlap between adjacent chunks |
| Variable | Default | Description |
|---|---|---|
QUERY_CACHE_TTL_HOURS |
1 |
Full query response cache |
RERANKER_CACHE_TTL_HOURS |
24 |
Cross-encoder score cache |
CONTRADICTION_CACHE_TTL_DAYS |
7 |
Contradiction detection cache |
| Variable | Default | Description |
|---|---|---|
LITRAG_LOG_LEVEL |
INFO |
Log level: DEBUG, INFO, WARNING, ERROR, CRITICAL |
LITRAG_ENABLE_METRICS |
true |
Expose Prometheus metrics at /metrics |
MLFLOW_TRACKING_URI |
http://mlflow-svc:5000 |
MLflow server for experiment tracking |
LitRAG uses Alembic for schema management. The initial migration creates the pgvector extension, all 13 tables, the HNSW vector index, and the v_retrievable_documents view.
# Apply all pending migrations (run this on first setup and after upgrades)
alembic upgrade head
# Check current migration state
alembic current
# Roll back one step
alembic downgrade -1
# Auto-generate a new migration after ORM model changes
alembic revision --autogenerate -m "add new table"The migration connects using the DATABASE_URL environment variable. Ensure it is set before running any Alembic commands.
The ingestion pipeline fetches articles from PubMed, chunks them, embeds them, and stores NER annotations — all as a Prefect flow.
# Ingest articles matching a PubMed search query
curl -X POST http://localhost:8001/litrag/v1/ingest/run \
-H "Content-Type: application/json" \
-d '{
"query": "pembrolizumab NSCLC clinical trial",
"max_results": 100,
"date_from": "2020-01-01",
"date_to": "2026-12-31"
}'
# Ingest a specific list of PMIDs
curl -X POST http://localhost:8001/litrag/v1/ingest/run \
-H "Content-Type: application/json" \
-d '{
"query": "oncology",
"pmids": ["36871102", "36871103", "36871104"]
}'
# Check ingestion run status
curl http://localhost:8001/litrag/v1/ingest/runs?limit=5import asyncio
from ingestion.pipeline import ingestion_flow
from shared.models import IngestionConfig
config = IngestionConfig(
query="BRCA1 breast cancer treatment",
max_results=200,
date_from="2018-01-01",
)
result = asyncio.run(ingestion_flow(config))
print(result)
# {
# "run_id": "...",
# "articles_fetched": 200,
# "articles_stored": 198,
# "chunks_created": 1423,
# "error_count": 2,
# "status": "COMPLETED"
# }Before scoring, seed the journal_tiers table with Q1/Q2/Q3 tier data:
python -m ingestion.seed_journal_tiersBase URL: http://localhost:8001
Interactive docs: http://localhost:8001/litrag/v1/docs
Submit a question and receive a graded, citable answer.
Request body:
{
"query": "What is the efficacy of nivolumab in renal cell carcinoma?",
"top_k": 50,
"top_n": 10,
"top_m": 5,
"include_gaps": false,
"filters": {
"date_from": "2018-01-01",
"date_to": "2026-12-31",
"min_evidence_score": 0.6,
"study_design": ["RCT", "SYSTEMATIC_REVIEW"],
"journals": ["New England Journal of Medicine", "Lancet"],
"cancer_subtype": ["Kidney Neoplasms"],
"include_trials": true
}
}Response:
{
"query_id": "a1b2c3...",
"query": "What is the efficacy of nivolumab...",
"answer": "Nivolumab demonstrates significant OS benefit...",
"evidence_grade": "A",
"citations": [
{
"pmid": "28552485",
"title": "Nivolumab versus Everolimus in Advanced Renal-Cell Carcinoma",
"authors": ["Motzer, Robert J.", "Escudier, Bernard"],
"journal": "New England Journal of Medicine",
"year": 2015,
"chunk_id": "...",
"chunk_text_snippet": "...",
"evidence_score": 0.91,
"score_breakdown": {
"study_design": 0.95,
"sample_size": 0.88,
"recency": 0.72,
"journal_credibility": 1.0
},
"pubmed_url": "https://pubmed.ncbi.nlm.nih.gov/28552485/"
}
],
"contradictions": [],
"research_gaps": [],
"query_intent": "evidence_lookup",
"latency_breakdown": {
"retrieval_ms": 42,
"reranking_ms": 310,
"scoring_ms": 87,
"synthesis_ms": 1250,
"total_ms": 1689
}
}Evidence grades: A (≥0.8), B (≥0.6), C (≥0.4), D (<0.4)
Submit up to 10 queries in a single request. Returns an array of LitRAGResponse objects.
| Method | Path | Description |
|---|---|---|
POST |
/litrag/v1/ingest/run |
Start an ingestion run |
GET |
/litrag/v1/ingest/runs |
List recent ingestion runs |
GET |
/litrag/v1/ingest/runs/{run_id} |
Get a specific run's status |
| Method | Path | Description |
|---|---|---|
POST |
/litrag/v1/scorer/score |
Score a single chunk |
POST |
/litrag/v1/scorer/score/batch |
Score a batch of chunks |
| Method | Path | Description |
|---|---|---|
GET |
/internal/health |
Health check (DB + dependencies) |
GET |
/internal/chunks/{chunk_id}/ner |
Fetch NER annotations for a chunk (Module 3) |
GET |
/metrics |
Prometheus metrics exposition |
pytest tests/ -v \
--ignore=tests/litrag/test_ingestion_integration.py \
--ignore=tests/litrag/test_query_integration.pypytest tests/ \
--cov=litrag \
--cov-report=term-missing \
--cov-fail-under=80 \
--ignore=tests/litrag/test_ingestion_integration.py \
--ignore=tests/litrag/test_query_integration.pySpin up Postgres first:
docker compose up postgres -dThen run:
TEST_DATABASE_URL="postgresql+asyncpg://postgres:postgres@localhost:5432/onco_lens_test" \
pytest tests/litrag/test_ingestion_integration.py \
tests/litrag/test_query_integration.py \
-v -m integrationpytest tests/litrag/test_acceptance.py -v
# Expected: 31 passed, 2 skippedThe 2 skipped tests require external tooling:
- AC-007 (P95 latency): run
locust -f tests/load/locustfile.py --host http://localhost:8001 - AC-010 (K8s stack): requires a live Kubernetes cluster
288 passed, 2 skipped — litrag/ coverage: 86%
Six manifests are provided in k8s/. Apply them in order:
# 1. Create the namespace (if not exists)
kubectl create namespace onco-lens
# 2. Create the secret (fill in real values first)
kubectl apply -f k8s/litrag-secret.yaml -n onco-lens
# 3. Apply all remaining manifests
kubectl apply -f k8s/litrag-configmap.yaml -n onco-lens
kubectl apply -f k8s/litrag-deployment.yaml -n onco-lens
kubectl apply -f k8s/litrag-service.yaml -n onco-lens
kubectl apply -f k8s/litrag-hpa.yaml -n onco-lens
kubectl apply -f k8s/torchserve-deployment.yaml -n onco-lensKey defaults (override in litrag-configmap.yaml / litrag-secret.yaml):
| Resource | Replicas | CPU Request | Memory Request | HPA max |
|---|---|---|---|---|
| litrag-svc | 2 | 1 | 2Gi | 6 |
| torchserve | 1 | 2 | 4Gi | — |
Check rollout status:
kubectl rollout status deployment/litrag -n onco-lens
kubectl get pods -n onco-lensA Helm chart is provided in helm/ for parameterised deployments.
# Install with default values
helm install litrag ./helm \
--namespace onco-lens \
--create-namespace \
--set secrets.anthropicApiKey="sk-ant-..." \
--set secrets.openaiApiKey="sk-..."
# Override retrieval parameters
helm install litrag ./helm \
--set litrag.topK=100 \
--set litrag.topN=20 \
--set litrag.topM=10 \
--set replicaCount=3
# Upgrade an existing release
helm upgrade litrag ./helm --set litrag.logLevel=DEBUG
# Uninstall
helm uninstall litrag -n onco-lensSee helm/values.yaml for the full list of configurable values.
Metrics are scraped from /metrics on port 8001 and exposed via the Prometheus container at http://localhost:9090.
Key metrics:
| Metric | Type | Description |
|---|---|---|
litrag_requests_total |
Counter | Total HTTP requests by method, path, and status |
litrag_request_duration_seconds |
Histogram | Request latency (P50/P95/P99) |
litrag_retrieval_duration_seconds |
Histogram | pgvector query duration |
litrag_reranking_duration_seconds |
Histogram | TorchServe reranker duration |
litrag_scoring_duration_seconds |
Histogram | TorchServe scorer duration |
litrag_synthesis_duration_seconds |
Histogram | Claude synthesis duration |
The evidence scorer training experiments are tracked in MLflow at http://localhost:5000.
Experiment name: onco-lens/evidence-scorer
import mlflow
mlflow.set_tracking_uri("http://localhost:5000")
mlflow.set_experiment("onco-lens/evidence-scorer")All logs are emitted as structured JSON via Loguru:
{
"timestamp": "2026-03-18T12:00:00.000Z",
"level": "INFO",
"message": "Query completed",
"query_id": "a1b2c3...",
"latency_ms": 1689,
"evidence_grade": "A",
"service": "litrag"
}Set LITRAG_LOG_LEVEL=DEBUG to see per-step pipeline timing.
These interfaces are immutable per SRS Section 7. Do not alter the table schemas or endpoint signatures without coordinating with dependent modules.
- Table:
onco_lens.ner_annotations— LitRAG writes, Module 3 reads. Rows must never be deleted. - Endpoint:
GET /internal/chunks/{chunk_id}/ner— returnsNERAnnotation[]for a given chunk UUID.
- Table:
onco_lens.trial_digests— Module 2 writes trial summaries here. - View:
onco_lens.v_retrievable_documents— LitRAG queries this view, which unionschunks(literature) andtrial_digests(trials). - Embedding format: Module 2 must use
text-embedding-3-smallwithvector(1536)format. Any other model will produce incorrect cosine distances.
.
├── api/ # FastAPI application layer
│ ├── main.py # App factory, lifespan, middleware
│ ├── dependencies.py # Dependency injection (QueryEngine, scorer)
│ ├── logging_config.py # Loguru JSON setup
│ ├── metrics.py # Prometheus counter/histogram setup
│ └── routers/
│ ├── litrag.py # /query, /query/batch
│ ├── ingest.py # /ingest/run, /ingest/runs
│ ├── internal.py # /internal/health, /internal/chunks/.../ner
│ └── scorer.py # /scorer/score, /scorer/score/batch
├── litrag/ # Core RAG pipeline
│ ├── config.py # LitRAGConfig (40+ env vars)
│ ├── query_engine.py # 10-step QueryEngine orchestrator
│ ├── query_router.py # Intent classification (heuristic + Claude)
│ ├── query_expander.py # Claude query variant generation
│ ├── retriever.py # pgvector ANN + metadata filter builder
│ ├── reranker.py # TorchServe cross-encoder client
│ ├── evidence_scorer.py # TorchServe 4-dim scorer + fallback
│ ├── contradiction_detector.py # Claude contradiction detection
│ ├── gap_recommender.py # Claude research gap analysis
│ ├── embedding_client.py # OpenAI batched embedding client
│ ├── article_parser.py # PubMed XML → Article model
│ ├── chunking_engine.py # Token-window section-aware chunking
│ ├── ner_extractor.py # spaCy en_core_sci_sm NER
│ ├── pubmed_client.py # NCBI E-utilities async client
│ ├── indexer.py # IngestionPipeline entry point
│ ├── scorer_model/ # PubMedBERT regression model + TorchServe handler
│ ├── reranker_model/ # BGE reranker TorchServe handler
│ └── prompts/ # Jinja2 prompt templates
│ ├── synthesis.jinja2
│ ├── contradiction.jinja2
│ ├── gap_analysis.jinja2
│ └── query_expansion.jinja2
├── shared/ # Cross-module contracts (stable API)
│ ├── models.py # 14 Pydantic v2 models
│ ├── db.py # SQLAlchemy 2.0 async ORM + session factory
│ └── config.py # BaseConfig
├── ingestion/ # Prefect ingestion pipeline
│ ├── pipeline.py # litrag-ingestion flow + 11 tasks
│ ├── seed_journal_tiers.py # Q1/Q2/Q3 journal tier seeder
│ └── expectations/
│ └── article_suite.json # Great Expectations validation suite
├── serving/ # TorchServe deployment config
│ ├── config.properties # TorchServe settings
│ ├── reranker_handler.py # BGE reranker handler
│ └── evidence_scorer_handler.py
├── alembic/ # Database migrations
│ ├── alembic.ini
│ ├── env.py
│ └── versions/
│ └── 20260317_0001_initial_schema.py
├── k8s/ # Kubernetes manifests
├── helm/ # Helm chart
├── monitoring/
│ └── prometheus.yml # Prometheus scrape config
├── tests/ # pytest suite (288 tests, 86% coverage)
│ ├── conftest.py
│ └── litrag/
│ ├── test_acceptance.py # AC-001–AC-010
│ ├── test_coverage_boost.py # Pure-logic unit tests
│ ├── test_ingestion_integration.py # Real PostgreSQL integration
│ ├── test_query_integration.py # Real pgvector retrieval
│ ├── test_api_contracts.py # FastAPI endpoint contracts
│ ├── test_query_engine.py
│ ├── test_ingestion.py
│ ├── test_embedding.py
│ ├── test_evidence_scorer.py
│ ├── test_reranker.py
│ ├── test_contradiction.py
│ └── test_gap_recommender.py
├── .env.example # Environment variable template
├── .github/workflows/ci.yml # GitHub Actions CI
├── docker-compose.yml # 5-service dev stack
├── Dockerfile # Multi-stage builder + runtime
├── requirements.txt # Pinned Python dependencies
├── pyproject.toml # pytest + coverage config
└── ACCEPTANCE_REPORT.md # AC-001–AC-010 test evidence
pip install pgvectorThe database schema has not been initialised. Run:
alembic upgrade headThe reranker and evidence scorer models must be registered in TorchServe before the API will use them. The fallback scorer activates automatically when TorchServe is unavailable, so queries still work — but with reduced evidence quality.
To register models:
curl -X POST "http://localhost:8081/models?url=bge-reranker-v2-m3.mar&initial_workers=1"
curl -X POST "http://localhost:8081/models?url=evidence-scorer.mar&initial_workers=1"Ensure the torchserve service is healthy before litrag-svc starts. In compose this is enforced by depends_on: condition: service_healthy. If TorchServe is still unhealthy after 2 minutes, check that serving/config.properties contains disable_token_authorization=true.
Verify ANTHROPIC_API_KEY is set and valid. The key must have access to claude-sonnet-4-20250514.
This usually means embeddings were stored as zero vectors. Check that OPENAI_API_KEY is valid and the embedding client can reach api.openai.com.
The integration tests require TEST_DATABASE_URL to be set:
TEST_DATABASE_URL="postgresql+asyncpg://postgres:postgres@localhost:5432/onco_lens_test" \
pytest tests/litrag/test_ingestion_integration.py -v -m integrationProprietary — ONCO-LENS Platform v1.0 DRAFT. All rights reserved.