Knowledge Base - MIND API

Breaking changes in v2.2 — GET /v1/console/knowledge-base/collections response shape.The list endpoint now returns a paginated envelope with collection metadata, not a flat array of names. Clients that consumed the old {name}-only items will keep working (the name field is still there), but the envelope shape and the new required fields will trip strict-schema clients.

Old: APIResponse[list[{ name }]] — array under data.
New: PaginatedResponse[CollectionResponse] — data is still an array, but the items now carry collection_id, name, description, size_bytes, document_count, created_by, created_at, and the response itself adds total, page, page_size, total_pages.

POST /v1/console/knowledge-base/collections now also returns the full metadata shape (additive — old fields preserved).KBDocument responses across the board (list, get, delete, reingest, upload) gained an optional collection_id field. Additive, not breaking.Lazy backfill: the first call to the list endpoint after deployment will create Mongo metadata rows for any Qdrant collection that existed before this change. Backfilled rows get created_by: "system" and a created_at of the backfill time (Qdrant doesn’t track real creation time). If you need accurate creation dates for known legacy collections, update the Mongo row by hand.

What changed in v2.1. The knowledge base used to be a per-tenant CRUD entity; each KB had its own Qdrant collection and a case study pointed at one KB. The model is simpler now:

A collection is the unit. Create as many as you need (case-studies, assessments, etc.). Documents live in a collection.
A document has a stable source_id (its document id). Other domains reference documents by id, not by KB.
The legacy KB CRUD endpoints (POST /v1/console/knowledge-bases, GET /…/{kb_id}, etc.) are gone. The base path is now /v1/console/knowledge-base (singular). Document upload moved under /collections/{name}/documents.
Embeddings use gemini-embedding-2. The v1 embedding space is incompatible — existing vectors must be re-ingested.

If you’re migrating an environment that ran the legacy shape, see the Migration section at the bottom.

The knowledge base stores reference documents the AI agent retrieves during conversations. Documents are uploaded to a named collection, automatically processed (extraction → chunking → embedding → vector indexing), and made addressable by their source_id. Domains that need retrieval — case studies today, others later — attach documents by id. There is one Qdrant collection per named collection, not per case study. Retrieval scopes by source_id, so a single collection happily holds many overlapping document sets.

Collections

Create Collection

POST /v1/console/knowledge-base/collections

Idempotent — if the collection already exists, the call succeeds and returns its current metadata. Creating a collection prepares the underlying Qdrant collection with the configured embedding dimension and writes a metadata row that tracks created_by and created_at. Requires KNOWLEDGE_BASES.can_create permission.

Request body

name

string

required

Lowercase letters, digits, - or _; 3–64 characters; cannot start or end with separators. Examples: case-studies, assessments, training_v2.

description

string

Optional. Free-text description for humans. Max 500 characters.

Example request

curl -X POST https://staging-be.mind.miva.university/v1/console/knowledge-base/collections \
  -H "Authorization: Bearer <access_token>" \
  -H "Content-Type: application/json" \
  -d '{ "name": "assessments", "description": "Past-paper bank for grading prompts" }'

Response

{
  "success": true,
  "data": {
    "collection_id": "6650f7a8b9c0d1e2f3a4b5c6",
    "name": "assessments",
    "description": "Past-paper bank for grading prompts",
    "size_bytes": 0,
    "document_count": 0,
    "created_by": "6650a1b2c3d4e5f6a7b8c9d0",
    "created_at": "2026-05-18T10:00:00Z"
  },
  "message": "Collection ready"
}

List Collections

GET /v1/console/knowledge-base/collections

Paginated list of every Qdrant collection visible to this instance, with metadata and per-tenant stats. Requires KNOWLEDGE_BASES.can_view. Collections are global within the instance (one Qdrant index per name). The size_bytes and document_count columns are tenant-scoped — they only count documents the caller’s tenant uploaded into each collection. Two tenants viewing the same collection will see different counts. The first call to this endpoint after deployment lazily creates Mongo metadata rows for any Qdrant collection that existed before v2.2; see the breaking-changes note above for the implications.

Query parameters

string

Case-insensitive substring filter on collection name. Example: search=case matches case-studies and case-archive.

skip

integer

default:"0"

Records to skip.

limit

integer

default:"20"

Max records to return (1–100).

Example request

curl "https://staging-be.mind.miva.university/v1/console/knowledge-base/collections?search=case&limit=10" \
  -H "Authorization: Bearer <access_token>"

Example response

{
  "success": true,
  "data": [
    {
      "collection_id": "6650f7a8b9c0d1e2f3a4b5c6",
      "name": "case-studies",
      "description": "Clinical and business case PDFs",
      "size_bytes": 8472394,
      "document_count": 12,
      "created_by": "6650a1b2c3d4e5f6a7b8c9d0",
      "created_at": "2026-05-01T09:14:00Z"
    },
    {
      "collection_id": "6650f8b9c0d1e2f3a4b5c6d7",
      "name": "case-archive",
      "description": null,
      "size_bytes": 0,
      "document_count": 0,
      "created_by": "system",
      "created_at": "2026-05-18T10:00:00Z"
    }
  ],
  "total": 2,
  "page": 1,
  "page_size": 10,
  "total_pages": 1
}

Field reference

collection_id

string

Stable Mongo _id of the collection’s metadata row. Use this when other endpoints ask for collection_id.

size_bytes

integer

Sum of source file bytes (KBDocument.file_size) for documents in this collection that belong to the caller’s tenant. Returns 0 when the collection holds nothing for this tenant.

document_count

integer

Count of KBDocument rows in this collection that belong to the caller’s tenant. Returns 0 when the collection is empty for this tenant.

created_by

string

The user_id of the admin who created the collection. "system" for collections that pre-date v2.2 (backfilled on first list).

created_at

datetime

ISO 8601 timestamp. For backfilled (legacy) collections, this is the time of backfill — Qdrant doesn’t track real creation time.

Errors

Status	Code	Condition
`401`	`UNAUTHORIZED`	Missing or invalid bearer token.
`403`	`FORBIDDEN`	Caller lacks `KNOWLEDGE_BASES.can_view`.
`400`	`VALIDATION_ERROR`	`skip`/`limit` out of bounds or `search` longer than 100 chars.

Delete Collection

DELETE /v1/console/knowledge-base/collections/{name}

Drops the Qdrant collection, soft-deletes every KBDocument recorded in it, and $pulls those document ids out of every case study’s document_ids. Requires KNOWLEDGE_BASES.can_delete. This is destructive. Vectors are gone after this call; the documents would have to be re-uploaded and re-ingested to restore retrieval.

Example response

{
  "success": true,
  "message": "Collection deleted; documents detached from any case studies"
}

Documents

A document is a file (PDF, DOCX, PPT(X), TXT, MD) stored in S3 and indexed in a single collection. Each upload creates one KBDocument row whose id doubles as the source_id carried in every Qdrant vector belonging to that document.

Upload Document

POST /v1/console/knowledge-base/collections/{name}/documents

Requires KNOWLEDGE_BASES.can_create. The collection must exist (create it first if needed). The upload stores the file in S3 and queues a Celery ingestion task; the response is returned immediately with status: "pending".

Path parameters

name

string

required

The target collection name.

Request body

Content-Type: multipart/form-data

file

required

The document file to upload.

Supported types: PDF, DOCX, DOC, PPTX, PPT, TXT, Markdown. Max size: 50 MiB.

Example request

curl -X POST https://staging-be.mind.miva.university/v1/console/knowledge-base/collections/case-studies/documents \
  -H "Authorization: Bearer <access_token>" \
  -F "file=@diabetes-guidelines.pdf"

Example response

{
  "success": true,
  "data": {
    "id": "6650f6a7b8c9d0e1f2a3b4c5",
    "collection": "case-studies",
    "collection_id": "6650f7a8b9c0d1e2f3a4b5c6",
    "filename": "diabetes-guidelines.pdf",
    "file_size": 2048576,
    "content_type": "application/pdf",
    "status": "pending",
    "error_message": null,
    "chunk_count": 0,
    "created_at": "2026-04-28T10:00:00Z"
  },
  "message": "Document uploaded and queued for ingestion"
}

Ingestion status

Status	Meaning
`pending`	Uploaded, awaiting the worker.
`processing`	Extraction, chunking, embedding in progress.
`completed`	Vectors indexed, document is searchable.
`failed`	Ingestion failed — check `error_message`.

List Collection Documents

GET /v1/console/knowledge-base/collections/{name}/documents

Paginated. Requires KNOWLEDGE_BASES.can_view.

Path parameters

name

string

required

The collection name.

Query parameters

skip

integer

default:"0"

Records to skip.

limit

integer

default:"20"

Max records (1–100).

Example response

{
  "success": true,
  "data": [
    {
      "id": "6650f6a7b8c9d0e1f2a3b4c5",
      "collection": "case-studies",
      "collection_id": "6650f7a8b9c0d1e2f3a4b5c6",
      "filename": "diabetes-guidelines.pdf",
      "status": "completed",
      "chunk_count": 42,
      "created_at": "2026-04-28T10:00:00Z"
    }
  ],
  "total": 1,
  "page": 1,
  "page_size": 20,
  "total_pages": 1
}

Every item in data shares the same collection_id (the parent collection — same as the one returned by List Collections for this name). collection_id is null only in the rare case where Qdrant has the collection but the Mongo metadata row hasn’t been backfilled yet — see the breaking-changes note at the top.

Get Document

GET /v1/console/knowledge-base/documents/{document_id}

Returns one document with its full status. Requires KNOWLEDGE_BASES.can_view. The id alone identifies the document; you don’t need to know its collection.

Delete Document

DELETE /v1/console/knowledge-base/documents/{document_id}

Removes the document’s vectors from Qdrant, deletes the file from S3, soft-deletes the row, and $pulls the id out of every case study that referenced it. Requires KNOWLEDGE_BASES.can_delete.

Example response

{
  "success": true,
  "data": {
    "id": "6650f6a7b8c9d0e1f2a3b4c5",
    "collection": "case-studies",
    "collection_id": "6650f7a8b9c0d1e2f3a4b5c6",
    "filename": "diabetes-guidelines.pdf",
    "file_size": 2048576,
    "content_type": "application/pdf",
    "status": "completed",
    "error_message": null,
    "chunk_count": 42,
    "created_at": "2026-04-28T10:00:00Z"
  },
  "message": "Document deleted; detached from any case studies that referenced it"
}

Re-ingest Document

POST /v1/console/knowledge-base/documents/{document_id}/reingest

Drops the existing vectors for this document, resets status to pending, and re-queues the ingestion task. Use after the source file has been replaced in-place, or after switching embedding models. Requires KNOWLEDGE_BASES.can_edit.

How retrieval works

Each ingested chunk is stored with a payload that includes:

{
  "text": "...",
  "headings": ["..."],
  "page": 4,
  "source_id": "6650f6a7b8c9d0e1f2a3b4c5",
  "filename": "diabetes-guidelines.pdf",
  "chunk_index": 12,
  "tenant_id": "..."
}

At session start, the agent receives a list of document_ids from the case study (or whichever domain owns retrieval). On every user turn, the agent embeds the question with gemini-embedding-2 and queries the relevant collection with payload.source_id IN [document_ids]. Empty list = no RAG, no fallback to the whole collection. This means: one collection can hold documents for many case studies safely; cross-pollination only happens when an admin explicitly attaches a document to multiple case studies.

Settings

Variable	Default	Purpose
`KB_COLLECTION_NAME`	`case-studies`	Default collection for case-study uploads.
`KB_EMBEDDING_MODEL`	`gemini-embedding-2`	Embedding model used for ingestion + queries.
`KB_EMBEDDING_DIMENSION`	`1536`	Output dimension for new collections. Don’t change without a re-ingest plan.

​Collections

​Create Collection

​Request body

​Example request

​Response

​List Collections

​Query parameters

​Example request

​Example response

​Field reference

​Errors

​Delete Collection

​Example response

​Documents

​Upload Document

​Path parameters

​Request body

​Example request

​Example response

​Ingestion status

​List Collection Documents

​Path parameters

​Query parameters

​Example response

​Get Document

​Delete Document

​Example response

​Re-ingest Document

​How retrieval works

​Settings

Collections

Create Collection

Request body

Example request

Response

List Collections

Query parameters

Example request

Example response

Field reference

Errors

Delete Collection

Example response

Documents

Upload Document

Path parameters

Request body

Example request

Example response

Ingestion status

List Collection Documents

Path parameters

Query parameters

Example response

Get Document

Delete Document

Example response

Re-ingest Document

How retrieval works

Settings