The request envelope
action— the quick-action hint, orfree-chatwhen the student just typed.body— the student’s message.context.conversation_id— the thread to continue. Omit it on the first message; the response returns one to reuse.context.page— what the student is looking at, used as a routing/answering hint.
The pipeline
Context card
Identity is read from the JWT into a context card (student, tenant, role, and any programme scope present in the claims). Academic signals (level, performance) come from a read-only profile store and are simply absent until that store is wired — never guessed.
Daily quota
A per-student daily counter. When a cap is configured and exceeded, the turn ends immediately with a
message event — no model call. Unlimited by default until a cap is set.Input guardrails
Run before any model call: an assessment lock (MIND pauses during an active assessment) and a light prompt-injection check. A block ends the turn with a
message event.Routing
A cheap, fast model classifies the message into a lane. An explicit quick action is trusted as-is (no model call);
free-chat is classified and may be upgraded — e.g. “How do I submit my assignment?” becomes submission-guide. The resolved lane is emitted as the meta event.Conversational graph
The message enters a stateful graph that loads the conversation history, composes the system prompt and context, and generates the reply. History is persisted per conversation (see Memory).
Memory and resumption
The conversation lives in a checkpoint keyed byconversation_id. Each turn appends the student’s message and MIND’s reply; the next request with the same conversation_id automatically loads the full history — no transcript needs to be replayed by the client.
This makes resumption a property of the application, not the transport. If the network drops mid-stream, the client simply re-issues the request with the same conversation_id and the thread continues. To start fresh, omit conversation_id and a new one is returned.
Only a recent window of the history is sent to the model each turn, so cost stays bounded even on long conversations. The full history is always retained in the checkpoint.
The core services
The pipeline is assembled from a small set of channel-agnostic services. Each one has a single implementation today and a clean seam to swap later, so the chat surface never changes when an internal provider does.| Service | Today | Designed to become |
|---|---|---|
| Gateway | Google Gemini, with separate router and reasoning tiers | A self-hosted LiteLLM gateway |
| Prompts | Managed in Langfuse, with an in-repo fallback | — |
| Context | Identity from the JWT | + academic signals from the profile store |
| Guardrails | Assessment lock, injection, PII | + LLM-judged answer-leak, full PII |
| Quota | Daily per-student counter | Shared/Redis-backed limits |
| Retrieval | Single payload-partitioned collection for course materials | Grounded answers as a tool |