Clinical Workflow Assistant
A critical care specialist needed a clinical workflow tool, not a general-purpose chatbot. Built as the lead developer across ten months and 31 sprints — shipped to both stores, solo.
Clinical workflows are documentation-heavy. A critical care specialist receives DEXA scans, pulmonary function tests, renal function tests, and patient records — each requiring structured summarisation and risk scoring before any clinical decision can be made. The time cost is significant.
ChatGPT became a workaround. It could summarise a document and extract some structure — until the limitations became the workflow. It has no memory of the patient. It requires repetitive prompting for the same tasks. It fails in low-connectivity environments. And it has no concept of a clinical encounter — the fundamental unit of doctor-patient interaction. Generic AI chat was not built around clinical workflows. Every session started from scratch.
The product began as a focused solution: structured document summarisation and automated risk scoring. Through real use, it evolved into a full clinical decision assistant — conversational AI with patient-scoped memory, encounter management, and multi-format note generation across iOS, Android, and a provider web portal.
The defining constraint throughout was trust. In a clinical workflow, an incorrect value is not a minor UX issue. Every architecture, data modelling, and orchestration decision was made with that constraint first.
02 / DecisionsMigrating from serverless to EC2
The initial architecture was AWS Lambda — stateless, cheap, appropriate for the summarisation phase. Chat changed the equation. WebSockets require persistent, stateful connections. Lambda does not support this reliably. The move to Fargate solved statefulness but introduced disproportionate cost at our scale. A single EC2 instance gave equivalent capability with more control and lower overhead. Two infrastructure migrations, both driven by what the product required at each stage.
Migrating from custom tool infrastructure to LangChain and LangGraph
The first chat implementation was built entirely on the raw GenAI SDK — including a custom tool registry and tool-calling infrastructure written from scratch. It worked, but as clinical flows became more complex, the maintenance burden compounded. Retry logic, state management, and error handling all had to be manually coordinated across an expanding codebase.
LangGraph provided graph-based flow management with built-in retries, checkpointing, and idempotency. The migration replaced the custom infrastructure with framework-level primitives. What followed — multi-step encounter generation pipelines, conditional flow branching, two-pass AI orchestration — would not have been maintainable in the original approach.
Designing the Encounters feature
Most ambient scribe tools are web-based. A mobile-native encounter module was an underserved gap in the clinical workflow space. The client needed something that fit how clinicians actually work — interrupted, context-switching, time-pressured.
My first framing was a sequential wizard: select patient → record → generate notes. Logically coherent, wrong for clinical reality. The reframe: everything optional, everything additive.
The final structure has three screens. An encounter screen where patient selection, document attachment, and recording are all independent — a doctor in a rush can skip straight to recording. A recording screen where audio can be recorded live or uploaded, documents can be attached as context, and generation can happen from documents alone. A review screen where additional inputs can be added and notes regenerated across multiple templates — SOAP, DAP, H&P, and others — as many times as needed.
The client approved it without revision. Automatic patient detection and document attachment support were added in the two weeks following initial delivery.
Moving correction handling out of the model
The patient memory system needed to handle corrections — when a clinician updates information that contradicts what was previously recorded. Every prompt-level technique failed identically: flat prohibitions, XML sectioning, synonym banlists, scratchpad reasoning, few-shot examples, and pipeline splitting. The model kept treating corrections as clinical timeline events rather than resolving them.
The fix was to stop fighting the model weights. Correction logic moved entirely into deterministic application-layer code via LangGraph node interception — corrections are stripped before the model ever sees them, resolved directly in the application, and applied as targeted field patches. The principle generalises: prompt engineering has hard limits for deeply trained behaviors. When every surface-level technique fails identically, the fix is architectural, not instructional.
Patient memory in chat
Previously, patient-scoped chat only carried demographic context. The model knew who the patient was, not what their case was.
Patient memory restructured this by maintaining a clinical context object per patient, synthesised entirely from encounter documents, audio transcripts, and facts provided directly in the chat. The model now enters each session with grounded clinical knowledge, which also populates the patient profile to give clinicians an instant clinical snapshot.
Aligning UX friction with action priority
A comprehensive UX audit aligned UI friction and contextual clarity directly with the stakes of the operation.
Destructive actions and errors now demand explicit modal confirmations or blocking acknowledgements, while low-stakes updates use non-intrusive toasts. Similarly, high-stakes operations feature immediate inline hint text, whereas low-stakes operations rely on unobtrusive info icons, ensuring critical information is impossible to ignore without cluttering the core workflow.
The App Store and Play Store submission process required as much engineering discipline as the product itself. Apple rejected the initial submission for missing EULA links, a test account credential mismatch causing in-app errors, screenshot non-compliance, and the absence of Apple Sign-In — required whenever any third-party authentication exists. Google Play flagged incorrect permission declarations, a missing account deletion feature, and incorrect feature declarations. Each rejection required policy analysis, targeted fixes, and resubmission. RevenueCat subscription integration added further complexity to an already pressured timeline. The app went live on both stores in December 2025.
The most sustained challenge was requirement volatility. The product's scope expanded significantly through active clinical use — chat, encounters, patient memory — each addition surfacing new architectural decisions with downstream consequences. Database design choices made in the first phase shaped what was possible months later. Getting those decisions right under uncertainty, without the ability to fully anticipate what the product would become, was the consistent pressure throughout. Fifty-three database migrations over ten months trace that evolution directly.
04 / OutcomeClinicians now have a workflow assistant that operates reliably where general-purpose AI tools do not — including low-connectivity environments where connectivity cannot be guaranteed. The encounter module, delivered without a brief in two weeks, shipped without revision and has since been extended with automatic patient detection and document support.
The product is in active daily use across iOS and Android. The backend supports the full platform: 22 API route modules, 33 services, and three BullMQ worker pipelines for transcription, document extraction, and patient profile rebuilds. An independent assessment of the codebase concluded that a team of 5–7 engineers — backend, AI/ML, mobile, web, and DevOps — would typically be required to build and maintain a system of this scope.
Patient memory, expanded note templates, document support across encounter types, and a full-featured web app are live or in active development as of early 2026.