Notes
Assumptions, trade-offs, and future improvements
PDF Ingestion & Chunking
Limitation Chunk boundaries
PDF parsing and chunk boundaries are not always perfectly aligned with clinical structure (e.g. criteria vs rationale vs action). Table extraction from the symptom index can be lossy, especially for age and duration thresholds.
Improvement Structure-aware ingestion
With more time, ingestion would be made more structure-aware and metadata-driven.
RAG Retrieval Limitations
Trade-off Flat retrieval & cross-references
Current retrieval is flat — it does not follow cross-references between chunks (e.g. "see section on…").
Improvement Soft reference expansion
- During indexing, detect phrases like "see section on…", "see recommendation…"
- Store referenced chunk IDs in chunk metadata
- At retrieval time, automatically include referenced chunks alongside the primary result
Improvement Deep cross-referencing
- GraphRAG and Elasticsearch / hybrid search
- Multi-step / agentic RAG to validate thresholds and exceptions
Embedding Gap
Trade-off Lay language vs clinical terminology
User queries may use non-clinical language that does not match guideline terminology, reducing retrieval accuracy.
Memory & Multi-turn Coherence
Trade-off Multi-turn coherence limitations
Topic tracking, follow-up detection, and summary extraction across multi-turn conversations have room for improvement. Topic drift, history contamination, and ambiguous follow-up messages remain challenging areas.
Improvement Separate memory layers
- Global-level memory: symptom-focused memory for better multi-turn reasoning — track the full clinical picture (all symptoms, age, risk factors) as structured state rather than relying on LLM extraction each turn
- User-level memory: per-user context and preferences, persistent across sessions
- Topic-aware reranking: use the session topic to rerank retrieved chunks, not just prepend to queries, reducing drift when strong off-topic keywords appear in follow-ups
- Structured clinical state: replace free-text summary extraction with a structured state object (age, sex, symptoms list, durations) that updates incrementally per turn rather than re-extracting from full history each time
Guardrails
Assumption Input guardrail coverage
Input classification uses deterministic regex/keyword matching (no LLM). Fast and predictable, but cannot catch nuanced edge cases.
Improvement Output guardrails
Currently limited to citation validation and qualified response tiers. A dedicated output guardrail pass would better catch hallucinated recommendations and unsupported clinical claims.
Guideline Document Control
Improvement Source & version tracking
Track document source, version history, and update status to ensure the system always reflects the latest published guidelines.