Immersion Project — CanHeal AI Chatbot

A RAG-based AI chatbot toolkit built for the CanHeal Cancer oncology platform, helping patients and caregivers navigate cancer resources through natural language.

Overview

CanHeal needed an intelligent assistant that could answer questions about oncology resources, treatment options, and support networks — without hallucinating or going off-topic. This project delivers a retrieval-augmented generation pipeline that grounds every response in trusted source documents.

Features

RAG Pipeline — embeds and indexes CanHeal's resource corpus; retrieves relevant chunks before generation
Conversational Memory — multi-turn context management so follow-up questions resolve correctly
Source Attribution — every answer cites the specific documents it drew from
Guardrails — topic filtering to keep responses within the oncology/support domain
Streaming Responses — token-by-token output for a responsive UX

Architecture

User Query
    ↓
Query Embedding (text-embedding-ada-002)
    ↓
Vector Search (FAISS / Chroma)
    ↓
Top-K Chunk Retrieval
    ↓
Prompt Assembly [system + context + history + query]
    ↓
LLM Generation (GPT-4)
    ↓
Streamed Response + Source Citations

Tech Stack

Python · LangChain · OpenAI API · FAISS · FastAPI · React

Setup

git clone https://github.com/Harsh7115/Immersion-project
cd Immersion-project
pip install -r requirements.txt
cp .env.example .env  # add OPENAI_API_KEY
python ingest.py       # embed and index documents
uvicorn main:app --reload

Key Design Decisions

Chunk size 512 tokens with 64-token overlap — balances retrieval precision vs. context richness
Top-3 retrieval — enough context without overwhelming the prompt
Temperature 0.3 — factual, grounded answers over creative generation
System prompt enforces domain scope and mandates source citation

Built for the CanHeal Cancer platform to make oncology information more accessible through conversational AI.

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
data		data
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
Toolkit_Content_results.json		Toolkit_Content_results.json
Toolkit_Resources_results.json		Toolkit_Resources_results.json
app.py		app.py
compare.py		compare.py
preprocess.py		preprocess.py
rag_pipeline.py		rag_pipeline.py
rag_pipeline_lc.py		rag_pipeline_lc.py
requirements.txt		requirements.txt
retriever.py		retriever.py
retriver_lc.py		retriver_lc.py
server.py		server.py
spellcheck.py		spellcheck.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Immersion Project — CanHeal AI Chatbot

Overview

Features

Architecture

Tech Stack

Setup

Key Design Decisions

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Immersion Project — CanHeal AI Chatbot

Overview

Features

Architecture

Tech Stack

Setup

Key Design Decisions

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages