Architecture - MIND API

Every message runs through the same pipeline. Each stage is a guard or a decision before any expensive work happens, which is what keeps MIND predictable and cheap.

The request envelope

{
  "action": "free-chat",
  "body": "How do I submit my assignment?",
  "context": {
    "conversation_id": "1f0c…",
    "page": { "title": "Assignment 2", "course_code": "CSC201" }
  }
}

action — the quick-action hint, or free-chat when the student just typed.
body — the student’s message.
context.conversation_id — the thread to continue. Omit it on the first message; the response returns one to reuse.
context.page — what the student is looking at, used as a routing/answering hint.

Identity is taken only from the verified JWT — tenant, student, and role. Any user object in the request body is ignored. The client cannot assert who it is.

The pipeline

Context card

Identity is read from the JWT into a context card (student, tenant, role, and any programme scope present in the claims). Academic signals (level, performance) come from a read-only profile store and are simply absent until that store is wired — never guessed.

Daily quota

A per-student daily counter. When a cap is configured and exceeded, the turn ends immediately with a message event — no model call. Unlimited by default until a cap is set.

Input guardrails

Run before any model call: an assessment lock (MIND pauses during an active assessment) and a light prompt-injection check. A block ends the turn with a message event.

Routing

A cheap, fast model classifies the message into a lane. An explicit quick action is trusted as-is (no model call); free-chat is classified and may be upgraded — e.g. “How do I submit my assignment?” becomes submission-guide. The resolved lane is emitted as the meta event.

Conversational graph

The message enters a stateful graph that loads the conversation history, composes the system prompt and context, and generates the reply. History is persisted per conversation (see Memory).

Streaming

Tokens stream out as delta events the moment the model produces them.

Output guardrails

The complete answer is checked (PII redaction now; an answer-leak judge later). Any correction is surfaced on the done event as revised.

Memory and resumption

The conversation lives in a checkpoint keyed by conversation_id. Each turn appends the student’s message and MIND’s reply; the next request with the same conversation_id automatically loads the full history — no transcript needs to be replayed by the client. This makes resumption a property of the application, not the transport. If the network drops mid-stream, the client simply re-issues the request with the same conversation_id and the thread continues. To start fresh, omit conversation_id and a new one is returned.

Only a recent window of the history is sent to the model each turn, so cost stays bounded even on long conversations. The full history is always retained in the checkpoint.

The core services

The pipeline is assembled from a small set of channel-agnostic services. Each one has a single implementation today and a clean seam to swap later, so the chat surface never changes when an internal provider does.

Service	Today	Designed to become
Gateway	Google Gemini, with separate router and reasoning tiers	A self-hosted LiteLLM gateway
Prompts	Managed in Langfuse, with an in-repo fallback	—
Context	Identity from the JWT	+ academic signals from the profile store
Guardrails	Assessment lock, injection, PII	+ LLM-judged answer-leak, full PII
Quota	Daily per-student counter	Shared/Redis-backed limits
Retrieval	Single payload-partitioned collection for course materials	Grounded answers as a tool

Prompts are versioned

System and router prompts are managed in Langfuse and resolved at request time, with an in-repo copy as a fallback if Langfuse is unavailable. Every prompt carries a version stamp, so a given reply can always be traced back to the exact prompt that produced it. Editing a prompt in Langfuse changes behaviour without a deploy — which is the primary way MIND’s tone and guardrails are tightened.

​The request envelope

​The pipeline

​Memory and resumption

​The core services

​Prompts are versioned

The request envelope

The pipeline

Memory and resumption

The core services

Prompts are versioned