NG12 Assessor - Readme

NG12 Cancer Risk Assessor

A clinical decision support system that uses RAG to assess cancer risk based on NICE NG12 guidelines.

Overview

This application ingests the NICE NG12 PDF (Suspected Cancer: Recognition and Referral), parses it into structured chunks, stores them in ChromaDB vector collections, and provides two AI-powered interfaces:

Part 1 — Patient Assessment: Given a patient's clinical data (age, symptoms, smoking history, gender), the system retrieves relevant guideline passages and uses Gemini 2.0 Flash to assess cancer risk, producing structured recommendations with citations.

Part 2 — Conversational Chat: A multi-turn Q&A interface where clinicians can ask questions about NG12 guidelines. Features dual guardrails (input classification + output quality assessment), context-aware follow-up handling, and grounded answers with citations.

Tech Stack

FastAPI
Backend framework with async support
LangGraph
Workflow orchestration (state machines)
ChromaDB
Vector database (2 collections)
Vertex AI
text-embedding-004 for embeddings
Gemini 2.0 Flash
LLM for reasoning & assessment
PyMuPDF
PDF parsing & text extraction

Quick Start

1. Configure credentials

cp .env.example .env # Edit .env and set: GOOGLE_APPLICATION_CREDENTIALS=your-credentials.json

2a. Run with Docker (recommended)

docker compose up --build

2b. Run locally

pip install -r requirements.txt python -m uvicorn app.main:app --port 8000

Then open http://localhost:8000

API Endpoints

MethodEndpointDescription
GET/assess/patientsList all test patients for quick-select
POST/assess/{patient_id}Run clinical risk assessment for a patient
POST/chatSend a chat message (session_id, message)
GET/chat/history/{session_id}Retrieve conversation history
DELETE/chat/history/{session_id}Clear conversation history
GET/admin/statsChromaDB collection statistics
GET/admin/chunksBrowse chunks with filters & pagination
POST/admin/refreshRe-ingest PDF and rebuild collections

Project Structure

ng12_assessor/ app/ main.py # FastAPI entry point config.py # Environment config agents/ assessment_workflow.py # Part 1: LangGraph assessment (4 nodes) chat_workflow.py # Part 2: LangGraph chat (12 nodes) core/ rag_pipeline.py # RAG retrieval + reranking vector_store.py # ChromaDB wrapper (2 collections) embeddings.py # Vertex AI embedding client gemini_client.py # Gemini 2.0 Flash client query_builder.py # A+C+B tiered query strategy patient_db.py # Test patient data loader ingestion/ chunker.py # PDF parser & state machine chunker prompts/ assessment.py # Part 1 prompts & formatting chat.py # Part 2 prompts, guardrails, citations routers/ assess.py # /assess/* endpoints chat.py # /chat/* endpoints admin.py # /admin/* endpoints memory/ session_store.py # In-memory session & topic store static/ index.html # Main app (3 tabs) architecture.html # Architecture diagrams gallery.html # Diagram gallery notes.html # Design notes readme.html # This page image/ # Architecture diagram images data/ patients.json # 10 test patient records

Prompt Files

app/prompts/assessment.py — Part 1: Clinical Decision Support prompts. Contains the system and user prompts used by the assessment workflow (/assess/{patient_id}) to instruct Gemini on how to evaluate patient data against NG12 guideline passages and return structured JSON results.

app/prompts/chat.py — Part 2: Conversational Chat prompts. Contains the system and user prompts for the chat workflow (/chat), including query rewriting, qualification, refusal templates, and citation formatting helpers.

ChromaDB Collections

CollectionContentsPurposeQuery Method
ng12_canonicalrule_canonicalVerbatim PDF text for citation displayID-based lookup
ng12_guidelinesrule_searchTemplate-enriched text with synonym expansion for better vector retrievalVector similarity (filtered by doc_type=rule_search)
ng12_guidelinessymptom_indexSymptom-to-cancer mapping with cross-references back to Part A rulesVector similarity (filtered by doc_type=symptom_index)