Skip to content

viche72/OncoLens

Repository files navigation

ONCO-LENS LitRAG — Module 1

Oncology Literature RAG Engine — PubMed-grounded retrieval-augmented generation with evidence scoring, contradiction detection, and research gap analysis.

LitRAG is Module 1 of the three-part ONCO-LENS platform. It ingests PubMed articles, embeds them into a pgvector index, and answers clinical oncology questions with graded, citable evidence. Modules 2 (TrialRadar) and 3 (EvidenceGraph) consume LitRAG's NER output and trial-match results.


Table of Contents

  1. Architecture Overview
  2. Prerequisites
  3. Quick Start (Docker Compose)
  4. Local Development Setup
  5. Configuration Reference
  6. Database Migrations
  7. Running the Ingestion Pipeline
  8. API Reference
  9. Running Tests
  10. Kubernetes Deployment
  11. Helm Deployment
  12. Monitoring and Observability
  13. Integration Contracts (Modules 2 & 3)
  14. Project Structure
  15. Troubleshooting

Architecture Overview

┌─────────────────────────────────────────────────────────────┐
│                        User / Client                        │
└───────────────────────────┬─────────────────────────────────┘
                            │ POST /litrag/v1/query
                            ▼
┌─────────────────────────────────────────────────────────────┐
│                   FastAPI Service (port 8001)                │
│  QueryEngine (10-step pipeline):                            │
│  1. QueryRouter   → classify intent (evidence/contradiction/│
│                     gap/trial)                              │
│  2. QueryExpander → 2-3 Claude-generated variants           │
│  3. EmbeddingClient → OpenAI text-embedding-3-small         │
│  4. PgVectorRetriever → HNSW ANN (top-k=50)                 │
│  5. RerankerClient → BAAI/bge-reranker-v2-m3 (top-n=10)    │
│  6. EvidenceScorerClient → 4-dim regression (top-m=5)       │
│  7. ContradictionDetector → Claude pair-wise check          │
│  8. GapRecommender → Claude gap analysis                    │
│  9. Synthesis → Claude grounded answer                      │
│  10. Cache → Redis-like TTL per query hash                  │
└────────────┬──────────────────────────┬─────────────────────┘
             │                          │
             ▼                          ▼
┌─────────────────────┐    ┌─────────────────────────────────┐
│ PostgreSQL 15        │    │ TorchServe (port 8080)          │
│ + pgvector          │    │  • bge-reranker-v2-m3           │
│ Schema: onco_lens   │    │  • PubMedBERT evidence scorer   │
│ HNSW index (m=16)   │    └─────────────────────────────────┘
└─────────────────────┘
┌─────────────────────┐    ┌─────────────────────────────────┐
│ Ingestion Pipeline  │    │ MLflow (port 5000)               │
│ (Prefect flow)      │    │  Experiment: onco-lens/evidence- │
│ PubMed → chunks →   │    │  scorer                         │
│ embeddings → NER    │    └─────────────────────────────────┘
└─────────────────────┘

Prerequisites

Requirement Version Notes
Python 3.11+ 3.13 supported
Docker & Docker Compose 24+ / v2+ For the full stack
PostgreSQL 15+ with pgvector Provided via compose
API keys Anthropic + OpenAI (required)
NCBI API key Optional; raises rate limit from 3 to 10 req/s

Quick Start (Docker Compose)

The fastest way to get the full five-service stack running.

1. Clone and configure

git clone https://github.com/viche72/OncoLens.git
cd OncoLens
cp .env.example .env

Edit .env and add your API keys:

ANTHROPIC_API_KEY=sk-ant-your-key-here
OPENAI_API_KEY=sk-your-key-here

# Optional — raises PubMed rate limit from 3 to 10 req/s
NCBI_API_KEY=your-ncbi-key-here

2. Start the stack

docker compose up --build

All five services start with health checks. Typical startup time is 60–90 seconds. You can watch readiness with:

docker compose ps          # wait for (healthy)
docker compose logs -f litrag-svc

3. Verify the service is up

curl http://localhost:8001/internal/health
# → {"status":"healthy","version":"1.0.0","db":"ok"}

4. Run your first query

curl -X POST http://localhost:8001/litrag/v1/query \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What is the efficacy of pembrolizumab in metastatic NSCLC?",
    "top_m": 5
  }'

5. Open the interactive API docs

Navigate to http://localhost:8001/litrag/v1/docs in your browser.


Local Development Setup

Use this path when you want to run tests, iterate on code, or run the service without Docker.

1. Create a virtual environment

python -m venv .venv
# Windows
.venv\Scripts\activate
# macOS/Linux
source .venv/bin/activate

2. Install dependencies

pip install -r requirements.txt

3. Download the scispaCy model

The NER extractor requires the en_core_sci_sm model. It is downloaded automatically during the Docker build, but for local dev run:

pip install https://s3-us-west-2.amazonaws.com/ai2-s3-public/scispacy/releases/v0.5.4/en_core_sci_sm-0.5.4.tar.gz

4. Start supporting services (PostgreSQL, TorchServe, MLflow)

You can start only the infrastructure services without building the litrag image:

docker compose up postgres torchserve mlflow prometheus -d

5. Set environment variables

export DATABASE_URL="postgresql+asyncpg://postgres:postgres@localhost:5432/onco_lens"
export ANTHROPIC_API_KEY="sk-ant-..."
export OPENAI_API_KEY="sk-..."

Or create a .env file — the app uses pydantic-settings which reads it automatically.

6. Apply database migrations

alembic upgrade head

7. Start the API server

uvicorn api.main:app --host 0.0.0.0 --port 8001 --reload

Configuration Reference

All settings are read from environment variables (or .env). Required variables have no default and must be set before startup.

Required

Variable Description
ANTHROPIC_API_KEY Anthropic API key for Claude synthesis, contradiction detection, gap analysis, and query expansion
OPENAI_API_KEY OpenAI API key for text-embedding-3-small embeddings

Database

Variable Default Description
DATABASE_URL postgresql+asyncpg://postgres:postgres@localhost:5432/onco_lens Async PostgreSQL connection string
DB_POOL_SIZE 10 SQLAlchemy async pool size
DB_MAX_OVERFLOW 20 Pool overflow limit

Retrieval Pipeline

Variable Default Description
LITRAG_TOP_K 50 Candidates retrieved from pgvector (ANN stage)
LITRAG_TOP_N 10 Candidates kept after cross-encoder reranking
LITRAG_TOP_M 5 Final chunks sent to Claude for synthesis
LITRAG_EMBEDDING_MODEL text-embedding-3-small OpenAI embedding model
LITRAG_EMBEDDING_BATCH_SIZE 100 Chunk batch size for embedding calls
LITRAG_DEVICE cpu Compute device: cpu, cuda, cuda:0, mps

LLM

Variable Default Description
LITRAG_CLAUDE_MODEL claude-sonnet-4-20250514 Claude model for synthesis and analysis
LITRAG_CLAUDE_MAX_TOKENS 4096 Maximum tokens in Claude responses
LITRAG_CLAUDE_TEMPERATURE 0.0 Temperature for Claude calls

TorchServe

Variable Default Description
TORCHSERVE_HOST localhost TorchServe hostname (use torchserve in compose)
TORCHSERVE_INFERENCE_PORT 8080 Inference API port
TORCHSERVE_RERANKER_MODEL_NAME bge-reranker-v2-m3 Registered model name
TORCHSERVE_SCORER_MODEL_NAME evidence-scorer Registered model name
TORCHSERVE_TIMEOUT_SECONDS 30 Per-request HTTP timeout

Evidence Scorer Weights

The four dimension weights must sum to 1.0 (±0.01):

Variable Default Description
EV_WEIGHT_STUDY_DESIGN 0.40 RCT/meta-analysis vs case report
EV_WEIGHT_SAMPLE_SIZE 0.25 Cohort size regression score
EV_WEIGHT_RECENCY 0.20 Publication date decay
EV_WEIGHT_JOURNAL_CREDIBILITY 0.15 Journal Q1–Q3 tier score

PubMed / NCBI

Variable Default Description
NCBI_API_KEY (none) Optional — raises rate limit to 10 req/s
NCBI_RATE_LIMIT_RPS 10 Requests per second to NCBI
NCBI_BACKOFF_MAX_RETRIES 5 Max exponential backoff retries

Chunking

Variable Default Description
CHUNK_TOKEN_WINDOW 512 Tokens per chunk
CHUNK_TOKEN_OVERLAP 64 Overlap between adjacent chunks

Caching TTLs

Variable Default Description
QUERY_CACHE_TTL_HOURS 1 Full query response cache
RERANKER_CACHE_TTL_HOURS 24 Cross-encoder score cache
CONTRADICTION_CACHE_TTL_DAYS 7 Contradiction detection cache

Observability

Variable Default Description
LITRAG_LOG_LEVEL INFO Log level: DEBUG, INFO, WARNING, ERROR, CRITICAL
LITRAG_ENABLE_METRICS true Expose Prometheus metrics at /metrics
MLFLOW_TRACKING_URI http://mlflow-svc:5000 MLflow server for experiment tracking

Database Migrations

LitRAG uses Alembic for schema management. The initial migration creates the pgvector extension, all 13 tables, the HNSW vector index, and the v_retrievable_documents view.

# Apply all pending migrations (run this on first setup and after upgrades)
alembic upgrade head

# Check current migration state
alembic current

# Roll back one step
alembic downgrade -1

# Auto-generate a new migration after ORM model changes
alembic revision --autogenerate -m "add new table"

The migration connects using the DATABASE_URL environment variable. Ensure it is set before running any Alembic commands.


Running the Ingestion Pipeline

The ingestion pipeline fetches articles from PubMed, chunks them, embeds them, and stores NER annotations — all as a Prefect flow.

Via the REST API

# Ingest articles matching a PubMed search query
curl -X POST http://localhost:8001/litrag/v1/ingest/run \
  -H "Content-Type: application/json" \
  -d '{
    "query": "pembrolizumab NSCLC clinical trial",
    "max_results": 100,
    "date_from": "2020-01-01",
    "date_to": "2026-12-31"
  }'

# Ingest a specific list of PMIDs
curl -X POST http://localhost:8001/litrag/v1/ingest/run \
  -H "Content-Type: application/json" \
  -d '{
    "query": "oncology",
    "pmids": ["36871102", "36871103", "36871104"]
  }'

# Check ingestion run status
curl http://localhost:8001/litrag/v1/ingest/runs?limit=5

Programmatically (Python)

import asyncio
from ingestion.pipeline import ingestion_flow
from shared.models import IngestionConfig

config = IngestionConfig(
    query="BRCA1 breast cancer treatment",
    max_results=200,
    date_from="2018-01-01",
)

result = asyncio.run(ingestion_flow(config))
print(result)
# {
#   "run_id": "...",
#   "articles_fetched": 200,
#   "articles_stored": 198,
#   "chunks_created": 1423,
#   "error_count": 2,
#   "status": "COMPLETED"
# }

Seeding Journal Tiers

Before scoring, seed the journal_tiers table with Q1/Q2/Q3 tier data:

python -m ingestion.seed_journal_tiers

API Reference

Base URL: http://localhost:8001

Interactive docs: http://localhost:8001/litrag/v1/docs

Query Endpoints

POST /litrag/v1/query

Submit a question and receive a graded, citable answer.

Request body:

{
  "query": "What is the efficacy of nivolumab in renal cell carcinoma?",
  "top_k": 50,
  "top_n": 10,
  "top_m": 5,
  "include_gaps": false,
  "filters": {
    "date_from": "2018-01-01",
    "date_to": "2026-12-31",
    "min_evidence_score": 0.6,
    "study_design": ["RCT", "SYSTEMATIC_REVIEW"],
    "journals": ["New England Journal of Medicine", "Lancet"],
    "cancer_subtype": ["Kidney Neoplasms"],
    "include_trials": true
  }
}

Response:

{
  "query_id": "a1b2c3...",
  "query": "What is the efficacy of nivolumab...",
  "answer": "Nivolumab demonstrates significant OS benefit...",
  "evidence_grade": "A",
  "citations": [
    {
      "pmid": "28552485",
      "title": "Nivolumab versus Everolimus in Advanced Renal-Cell Carcinoma",
      "authors": ["Motzer, Robert J.", "Escudier, Bernard"],
      "journal": "New England Journal of Medicine",
      "year": 2015,
      "chunk_id": "...",
      "chunk_text_snippet": "...",
      "evidence_score": 0.91,
      "score_breakdown": {
        "study_design": 0.95,
        "sample_size": 0.88,
        "recency": 0.72,
        "journal_credibility": 1.0
      },
      "pubmed_url": "https://pubmed.ncbi.nlm.nih.gov/28552485/"
    }
  ],
  "contradictions": [],
  "research_gaps": [],
  "query_intent": "evidence_lookup",
  "latency_breakdown": {
    "retrieval_ms": 42,
    "reranking_ms": 310,
    "scoring_ms": 87,
    "synthesis_ms": 1250,
    "total_ms": 1689
  }
}

Evidence grades: A (≥0.8), B (≥0.6), C (≥0.4), D (<0.4)

POST /litrag/v1/query/batch

Submit up to 10 queries in a single request. Returns an array of LitRAGResponse objects.

Ingestion Endpoints

Method Path Description
POST /litrag/v1/ingest/run Start an ingestion run
GET /litrag/v1/ingest/runs List recent ingestion runs
GET /litrag/v1/ingest/runs/{run_id} Get a specific run's status

Evidence Scorer Endpoints

Method Path Description
POST /litrag/v1/scorer/score Score a single chunk
POST /litrag/v1/scorer/score/batch Score a batch of chunks

Internal Endpoints

Method Path Description
GET /internal/health Health check (DB + dependencies)
GET /internal/chunks/{chunk_id}/ner Fetch NER annotations for a chunk (Module 3)
GET /metrics Prometheus metrics exposition

Running Tests

Unit and acceptance tests (no database required)

pytest tests/ -v \
  --ignore=tests/litrag/test_ingestion_integration.py \
  --ignore=tests/litrag/test_query_integration.py

With coverage gate (≥80% required)

pytest tests/ \
  --cov=litrag \
  --cov-report=term-missing \
  --cov-fail-under=80 \
  --ignore=tests/litrag/test_ingestion_integration.py \
  --ignore=tests/litrag/test_query_integration.py

Integration tests (requires PostgreSQL 15 + pgvector)

Spin up Postgres first:

docker compose up postgres -d

Then run:

TEST_DATABASE_URL="postgresql+asyncpg://postgres:postgres@localhost:5432/onco_lens_test" \
  pytest tests/litrag/test_ingestion_integration.py \
         tests/litrag/test_query_integration.py \
  -v -m integration

Acceptance tests only (AC-001 through AC-010)

pytest tests/litrag/test_acceptance.py -v
# Expected: 31 passed, 2 skipped

The 2 skipped tests require external tooling:

  • AC-007 (P95 latency): run locust -f tests/load/locustfile.py --host http://localhost:8001
  • AC-010 (K8s stack): requires a live Kubernetes cluster

Current test results

288 passed, 2 skipped — litrag/ coverage: 86%

Kubernetes Deployment

Six manifests are provided in k8s/. Apply them in order:

# 1. Create the namespace (if not exists)
kubectl create namespace onco-lens

# 2. Create the secret (fill in real values first)
kubectl apply -f k8s/litrag-secret.yaml -n onco-lens

# 3. Apply all remaining manifests
kubectl apply -f k8s/litrag-configmap.yaml -n onco-lens
kubectl apply -f k8s/litrag-deployment.yaml -n onco-lens
kubectl apply -f k8s/litrag-service.yaml -n onco-lens
kubectl apply -f k8s/litrag-hpa.yaml -n onco-lens
kubectl apply -f k8s/torchserve-deployment.yaml -n onco-lens

Key defaults (override in litrag-configmap.yaml / litrag-secret.yaml):

Resource Replicas CPU Request Memory Request HPA max
litrag-svc 2 1 2Gi 6
torchserve 1 2 4Gi

Check rollout status:

kubectl rollout status deployment/litrag -n onco-lens
kubectl get pods -n onco-lens

Helm Deployment

A Helm chart is provided in helm/ for parameterised deployments.

# Install with default values
helm install litrag ./helm \
  --namespace onco-lens \
  --create-namespace \
  --set secrets.anthropicApiKey="sk-ant-..." \
  --set secrets.openaiApiKey="sk-..."

# Override retrieval parameters
helm install litrag ./helm \
  --set litrag.topK=100 \
  --set litrag.topN=20 \
  --set litrag.topM=10 \
  --set replicaCount=3

# Upgrade an existing release
helm upgrade litrag ./helm --set litrag.logLevel=DEBUG

# Uninstall
helm uninstall litrag -n onco-lens

See helm/values.yaml for the full list of configurable values.


Monitoring and Observability

Prometheus

Metrics are scraped from /metrics on port 8001 and exposed via the Prometheus container at http://localhost:9090.

Key metrics:

Metric Type Description
litrag_requests_total Counter Total HTTP requests by method, path, and status
litrag_request_duration_seconds Histogram Request latency (P50/P95/P99)
litrag_retrieval_duration_seconds Histogram pgvector query duration
litrag_reranking_duration_seconds Histogram TorchServe reranker duration
litrag_scoring_duration_seconds Histogram TorchServe scorer duration
litrag_synthesis_duration_seconds Histogram Claude synthesis duration

MLflow

The evidence scorer training experiments are tracked in MLflow at http://localhost:5000.

Experiment name: onco-lens/evidence-scorer

import mlflow
mlflow.set_tracking_uri("http://localhost:5000")
mlflow.set_experiment("onco-lens/evidence-scorer")

Structured Logging

All logs are emitted as structured JSON via Loguru:

{
  "timestamp": "2026-03-18T12:00:00.000Z",
  "level": "INFO",
  "message": "Query completed",
  "query_id": "a1b2c3...",
  "latency_ms": 1689,
  "evidence_grade": "A",
  "service": "litrag"
}

Set LITRAG_LOG_LEVEL=DEBUG to see per-step pipeline timing.


Integration Contracts (Modules 2 & 3)

These interfaces are immutable per SRS Section 7. Do not alter the table schemas or endpoint signatures without coordinating with dependent modules.

Module 3 (EvidenceGraph) reads from LitRAG

  • Table: onco_lens.ner_annotations — LitRAG writes, Module 3 reads. Rows must never be deleted.
  • Endpoint: GET /internal/chunks/{chunk_id}/ner — returns NERAnnotation[] for a given chunk UUID.

Module 2 (TrialRadar) writes to LitRAG

  • Table: onco_lens.trial_digests — Module 2 writes trial summaries here.
  • View: onco_lens.v_retrievable_documents — LitRAG queries this view, which unions chunks (literature) and trial_digests (trials).
  • Embedding format: Module 2 must use text-embedding-3-small with vector(1536) format. Any other model will produce incorrect cosine distances.

Project Structure

.
├── api/                        # FastAPI application layer
│   ├── main.py                 # App factory, lifespan, middleware
│   ├── dependencies.py         # Dependency injection (QueryEngine, scorer)
│   ├── logging_config.py       # Loguru JSON setup
│   ├── metrics.py              # Prometheus counter/histogram setup
│   └── routers/
│       ├── litrag.py           # /query, /query/batch
│       ├── ingest.py           # /ingest/run, /ingest/runs
│       ├── internal.py         # /internal/health, /internal/chunks/.../ner
│       └── scorer.py           # /scorer/score, /scorer/score/batch
├── litrag/                     # Core RAG pipeline
│   ├── config.py               # LitRAGConfig (40+ env vars)
│   ├── query_engine.py         # 10-step QueryEngine orchestrator
│   ├── query_router.py         # Intent classification (heuristic + Claude)
│   ├── query_expander.py       # Claude query variant generation
│   ├── retriever.py            # pgvector ANN + metadata filter builder
│   ├── reranker.py             # TorchServe cross-encoder client
│   ├── evidence_scorer.py      # TorchServe 4-dim scorer + fallback
│   ├── contradiction_detector.py  # Claude contradiction detection
│   ├── gap_recommender.py      # Claude research gap analysis
│   ├── embedding_client.py     # OpenAI batched embedding client
│   ├── article_parser.py       # PubMed XML → Article model
│   ├── chunking_engine.py      # Token-window section-aware chunking
│   ├── ner_extractor.py        # spaCy en_core_sci_sm NER
│   ├── pubmed_client.py        # NCBI E-utilities async client
│   ├── indexer.py              # IngestionPipeline entry point
│   ├── scorer_model/           # PubMedBERT regression model + TorchServe handler
│   ├── reranker_model/         # BGE reranker TorchServe handler
│   └── prompts/                # Jinja2 prompt templates
│       ├── synthesis.jinja2
│       ├── contradiction.jinja2
│       ├── gap_analysis.jinja2
│       └── query_expansion.jinja2
├── shared/                     # Cross-module contracts (stable API)
│   ├── models.py               # 14 Pydantic v2 models
│   ├── db.py                   # SQLAlchemy 2.0 async ORM + session factory
│   └── config.py               # BaseConfig
├── ingestion/                  # Prefect ingestion pipeline
│   ├── pipeline.py             # litrag-ingestion flow + 11 tasks
│   ├── seed_journal_tiers.py   # Q1/Q2/Q3 journal tier seeder
│   └── expectations/
│       └── article_suite.json  # Great Expectations validation suite
├── serving/                    # TorchServe deployment config
│   ├── config.properties       # TorchServe settings
│   ├── reranker_handler.py     # BGE reranker handler
│   └── evidence_scorer_handler.py
├── alembic/                    # Database migrations
│   ├── alembic.ini
│   ├── env.py
│   └── versions/
│       └── 20260317_0001_initial_schema.py
├── k8s/                        # Kubernetes manifests
├── helm/                       # Helm chart
├── monitoring/
│   └── prometheus.yml          # Prometheus scrape config
├── tests/                      # pytest suite (288 tests, 86% coverage)
│   ├── conftest.py
│   └── litrag/
│       ├── test_acceptance.py          # AC-001–AC-010
│       ├── test_coverage_boost.py      # Pure-logic unit tests
│       ├── test_ingestion_integration.py  # Real PostgreSQL integration
│       ├── test_query_integration.py   # Real pgvector retrieval
│       ├── test_api_contracts.py       # FastAPI endpoint contracts
│       ├── test_query_engine.py
│       ├── test_ingestion.py
│       ├── test_embedding.py
│       ├── test_evidence_scorer.py
│       ├── test_reranker.py
│       ├── test_contradiction.py
│       └── test_gap_recommender.py
├── .env.example                # Environment variable template
├── .github/workflows/ci.yml   # GitHub Actions CI
├── docker-compose.yml          # 5-service dev stack
├── Dockerfile                  # Multi-stage builder + runtime
├── requirements.txt            # Pinned Python dependencies
├── pyproject.toml              # pytest + coverage config
└── ACCEPTANCE_REPORT.md        # AC-001–AC-010 test evidence

Troubleshooting

ModuleNotFoundError: No module named 'pgvector'

pip install pgvector

UndefinedTable: relation "onco_lens.chunks" does not exist

The database schema has not been initialised. Run:

alembic upgrade head

TorchServe returns 404 / model not found

The reranker and evidence scorer models must be registered in TorchServe before the API will use them. The fallback scorer activates automatically when TorchServe is unavailable, so queries still work — but with reduced evidence quality.

To register models:

curl -X POST "http://localhost:8081/models?url=bge-reranker-v2-m3.mar&initial_workers=1"
curl -X POST "http://localhost:8081/models?url=evidence-scorer.mar&initial_workers=1"

TORCHSERVE_HOST connection refused in docker-compose

Ensure the torchserve service is healthy before litrag-svc starts. In compose this is enforced by depends_on: condition: service_healthy. If TorchServe is still unhealthy after 2 minutes, check that serving/config.properties contains disable_token_authorization=true.

anthropic.AuthenticationError

Verify ANTHROPIC_API_KEY is set and valid. The key must have access to claude-sonnet-4-20250514.

pgvector cosine distance always returns 1.0

This usually means embeddings were stored as zero vectors. Check that OPENAI_API_KEY is valid and the embedding client can reach api.openai.com.

Integration tests skipped

The integration tests require TEST_DATABASE_URL to be set:

TEST_DATABASE_URL="postgresql+asyncpg://postgres:postgres@localhost:5432/onco_lens_test" \
  pytest tests/litrag/test_ingestion_integration.py -v -m integration

Licence

Proprietary — ONCO-LENS Platform v1.0 DRAFT. All rights reserved.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages