NG12 - Notes

PDF Ingestion & Chunking

Limitation Chunk boundaries

PDF parsing and chunk boundaries are not always perfectly aligned with clinical structure (e.g. criteria vs rationale vs action). Table extraction from the symptom index can be lossy, especially for age and duration thresholds.

Improvement Structure-aware ingestion

With more time, ingestion would be made more structure-aware and metadata-driven.

RAG Retrieval Limitations

Trade-off Flat retrieval & cross-references

Current retrieval is flat — it does not follow cross-references between chunks (e.g. "see section on…").

Improvement Soft reference expansion

During indexing, detect phrases like "see section on…", "see recommendation…"
Store referenced chunk IDs in chunk metadata
At retrieval time, automatically include referenced chunks alongside the primary result

Improvement Deep cross-referencing

GraphRAG and Elasticsearch / hybrid search
Multi-step / agentic RAG to validate thresholds and exceptions

Embedding Gap

Trade-off Lay language vs clinical terminology

User queries may use non-clinical language that does not match guideline terminology, reducing retrieval accuracy.

Memory & Multi-turn Coherence

Trade-off Multi-turn coherence limitations

Topic tracking, follow-up detection, and summary extraction across multi-turn conversations have room for improvement. Topic drift, history contamination, and ambiguous follow-up messages remain challenging areas.

Improvement Separate memory layers

Global-level memory: symptom-focused memory for better multi-turn reasoning — track the full clinical picture (all symptoms, age, risk factors) as structured state rather than relying on LLM extraction each turn
User-level memory: per-user context and preferences, persistent across sessions
Topic-aware reranking: use the session topic to rerank retrieved chunks, not just prepend to queries, reducing drift when strong off-topic keywords appear in follow-ups
Structured clinical state: replace free-text summary extraction with a structured state object (age, sex, symptoms list, durations) that updates incrementally per turn rather than re-extracting from full history each time

Guardrails

Assumption Input guardrail coverage

Input classification uses deterministic regex/keyword matching (no LLM). Fast and predictable, but cannot catch nuanced edge cases.

Improvement Output guardrails

Currently limited to citation validation and qualified response tiers. A dedicated output guardrail pass would better catch hallucinated recommendations and unsupported clinical claims.

Guideline Document Control

Improvement Source & version tracking

Track document source, version history, and update status to ensure the system always reflects the latest published guidelines.