Skip to main content
Breaking changes in v2.2 — GET /v1/console/knowledge-base/collections response shape.The list endpoint now returns a paginated envelope with collection metadata, not a flat array of names. Clients that consumed the old {name}-only items will keep working (the name field is still there), but the envelope shape and the new required fields will trip strict-schema clients.
  • Old: APIResponse[list[{ name }]] — array under data.
  • New: PaginatedResponse[CollectionResponse]data is still an array, but the items now carry collection_id, name, description, size_bytes, document_count, created_by, created_at, and the response itself adds total, page, page_size, total_pages.
POST /v1/console/knowledge-base/collections now also returns the full metadata shape (additive — old fields preserved).KBDocument responses across the board (list, get, delete, reingest, upload) gained an optional collection_id field. Additive, not breaking.Lazy backfill: the first call to the list endpoint after deployment will create Mongo metadata rows for any Qdrant collection that existed before this change. Backfilled rows get created_by: "system" and a created_at of the backfill time (Qdrant doesn’t track real creation time). If you need accurate creation dates for known legacy collections, update the Mongo row by hand.
What changed in v2.1. The knowledge base used to be a per-tenant CRUD entity; each KB had its own Qdrant collection and a case study pointed at one KB. The model is simpler now:
  • A collection is the unit. Create as many as you need (case-studies, assessments, etc.). Documents live in a collection.
  • A document has a stable source_id (its document id). Other domains reference documents by id, not by KB.
  • The legacy KB CRUD endpoints (POST /v1/console/knowledge-bases, GET /…/{kb_id}, etc.) are gone. The base path is now /v1/console/knowledge-base (singular). Document upload moved under /collections/{name}/documents.
  • Embeddings use gemini-embedding-2. The v1 embedding space is incompatible — existing vectors must be re-ingested.
If you’re migrating an environment that ran the legacy shape, see the Migration section at the bottom.
The knowledge base stores reference documents the AI agent retrieves during conversations. Documents are uploaded to a named collection, automatically processed (extraction → chunking → embedding → vector indexing), and made addressable by their source_id. Domains that need retrieval — case studies today, others later — attach documents by id. There is one Qdrant collection per named collection, not per case study. Retrieval scopes by source_id, so a single collection happily holds many overlapping document sets.

Collections

Create Collection

POST /v1/console/knowledge-base/collections
Idempotent — if the collection already exists, the call succeeds and returns its current metadata. Creating a collection prepares the underlying Qdrant collection with the configured embedding dimension and writes a metadata row that tracks created_by and created_at. Requires KNOWLEDGE_BASES.can_create permission.

Request body

name
string
required
Lowercase letters, digits, - or _; 3–64 characters; cannot start or end with separators. Examples: case-studies, assessments, training_v2.
description
string
Optional. Free-text description for humans. Max 500 characters.

Example request

curl -X POST https://staging-be.mind.miva.university/v1/console/knowledge-base/collections \
  -H "Authorization: Bearer <access_token>" \
  -H "Content-Type: application/json" \
  -d '{ "name": "assessments", "description": "Past-paper bank for grading prompts" }'

Response

{
  "success": true,
  "data": {
    "collection_id": "6650f7a8b9c0d1e2f3a4b5c6",
    "name": "assessments",
    "description": "Past-paper bank for grading prompts",
    "size_bytes": 0,
    "document_count": 0,
    "created_by": "6650a1b2c3d4e5f6a7b8c9d0",
    "created_at": "2026-05-18T10:00:00Z"
  },
  "message": "Collection ready"
}

List Collections

GET /v1/console/knowledge-base/collections
Paginated list of every Qdrant collection visible to this instance, with metadata and per-tenant stats. Requires KNOWLEDGE_BASES.can_view. Collections are global within the instance (one Qdrant index per name). The size_bytes and document_count columns are tenant-scoped — they only count documents the caller’s tenant uploaded into each collection. Two tenants viewing the same collection will see different counts. The first call to this endpoint after deployment lazily creates Mongo metadata rows for any Qdrant collection that existed before v2.2; see the breaking-changes note above for the implications.

Query parameters

Case-insensitive substring filter on collection name. Example: search=case matches case-studies and case-archive.
skip
integer
default:"0"
Records to skip.
limit
integer
default:"20"
Max records to return (1–100).

Example request

curl "https://staging-be.mind.miva.university/v1/console/knowledge-base/collections?search=case&limit=10" \
  -H "Authorization: Bearer <access_token>"

Example response

{
  "success": true,
  "data": [
    {
      "collection_id": "6650f7a8b9c0d1e2f3a4b5c6",
      "name": "case-studies",
      "description": "Clinical and business case PDFs",
      "size_bytes": 8472394,
      "document_count": 12,
      "created_by": "6650a1b2c3d4e5f6a7b8c9d0",
      "created_at": "2026-05-01T09:14:00Z"
    },
    {
      "collection_id": "6650f8b9c0d1e2f3a4b5c6d7",
      "name": "case-archive",
      "description": null,
      "size_bytes": 0,
      "document_count": 0,
      "created_by": "system",
      "created_at": "2026-05-18T10:00:00Z"
    }
  ],
  "total": 2,
  "page": 1,
  "page_size": 10,
  "total_pages": 1
}

Field reference

collection_id
string
Stable Mongo _id of the collection’s metadata row. Use this when other endpoints ask for collection_id.
size_bytes
integer
Sum of source file bytes (KBDocument.file_size) for documents in this collection that belong to the caller’s tenant. Returns 0 when the collection holds nothing for this tenant.
document_count
integer
Count of KBDocument rows in this collection that belong to the caller’s tenant. Returns 0 when the collection is empty for this tenant.
created_by
string
The user_id of the admin who created the collection. "system" for collections that pre-date v2.2 (backfilled on first list).
created_at
datetime
ISO 8601 timestamp. For backfilled (legacy) collections, this is the time of backfill — Qdrant doesn’t track real creation time.

Errors

StatusCodeCondition
401UNAUTHORIZEDMissing or invalid bearer token.
403FORBIDDENCaller lacks KNOWLEDGE_BASES.can_view.
400VALIDATION_ERRORskip/limit out of bounds or search longer than 100 chars.

Delete Collection

DELETE /v1/console/knowledge-base/collections/{name}
Drops the Qdrant collection, soft-deletes every KBDocument recorded in it, and $pulls those document ids out of every case study’s document_ids. Requires KNOWLEDGE_BASES.can_delete. This is destructive. Vectors are gone after this call; the documents would have to be re-uploaded and re-ingested to restore retrieval.

Example response

{
  "success": true,
  "message": "Collection deleted; documents detached from any case studies"
}

Documents

A document is a file (PDF, DOCX, PPT(X), TXT, MD) stored in S3 and indexed in a single collection. Each upload creates one KBDocument row whose id doubles as the source_id carried in every Qdrant vector belonging to that document.

Upload Document

POST /v1/console/knowledge-base/collections/{name}/documents
Requires KNOWLEDGE_BASES.can_create. The collection must exist (create it first if needed). The upload stores the file in S3 and queues a Celery ingestion task; the response is returned immediately with status: "pending".

Path parameters

name
string
required
The target collection name.

Request body

Content-Type: multipart/form-data
file
file
required
The document file to upload.
Supported types: PDF, DOCX, DOC, PPTX, PPT, TXT, Markdown. Max size: 50 MiB.

Example request

curl -X POST https://staging-be.mind.miva.university/v1/console/knowledge-base/collections/case-studies/documents \
  -H "Authorization: Bearer <access_token>" \
  -F "file=@diabetes-guidelines.pdf"

Example response

{
  "success": true,
  "data": {
    "id": "6650f6a7b8c9d0e1f2a3b4c5",
    "collection": "case-studies",
    "collection_id": "6650f7a8b9c0d1e2f3a4b5c6",
    "filename": "diabetes-guidelines.pdf",
    "file_size": 2048576,
    "content_type": "application/pdf",
    "status": "pending",
    "error_message": null,
    "chunk_count": 0,
    "created_at": "2026-04-28T10:00:00Z"
  },
  "message": "Document uploaded and queued for ingestion"
}

Ingestion status

StatusMeaning
pendingUploaded, awaiting the worker.
processingExtraction, chunking, embedding in progress.
completedVectors indexed, document is searchable.
failedIngestion failed — check error_message.

List Collection Documents

GET /v1/console/knowledge-base/collections/{name}/documents
Paginated. Requires KNOWLEDGE_BASES.can_view.

Path parameters

name
string
required
The collection name.

Query parameters

skip
integer
default:"0"
Records to skip.
limit
integer
default:"20"
Max records (1–100).

Example response

{
  "success": true,
  "data": [
    {
      "id": "6650f6a7b8c9d0e1f2a3b4c5",
      "collection": "case-studies",
      "collection_id": "6650f7a8b9c0d1e2f3a4b5c6",
      "filename": "diabetes-guidelines.pdf",
      "status": "completed",
      "chunk_count": 42,
      "created_at": "2026-04-28T10:00:00Z"
    }
  ],
  "total": 1,
  "page": 1,
  "page_size": 20,
  "total_pages": 1
}
Every item in data shares the same collection_id (the parent collection — same as the one returned by List Collections for this name). collection_id is null only in the rare case where Qdrant has the collection but the Mongo metadata row hasn’t been backfilled yet — see the breaking-changes note at the top.

Get Document

GET /v1/console/knowledge-base/documents/{document_id}
Returns one document with its full status. Requires KNOWLEDGE_BASES.can_view. The id alone identifies the document; you don’t need to know its collection.

Delete Document

DELETE /v1/console/knowledge-base/documents/{document_id}
Removes the document’s vectors from Qdrant, deletes the file from S3, soft-deletes the row, and $pulls the id out of every case study that referenced it. Requires KNOWLEDGE_BASES.can_delete.

Example response

{
  "success": true,
  "data": {
    "id": "6650f6a7b8c9d0e1f2a3b4c5",
    "collection": "case-studies",
    "collection_id": "6650f7a8b9c0d1e2f3a4b5c6",
    "filename": "diabetes-guidelines.pdf",
    "file_size": 2048576,
    "content_type": "application/pdf",
    "status": "completed",
    "error_message": null,
    "chunk_count": 42,
    "created_at": "2026-04-28T10:00:00Z"
  },
  "message": "Document deleted; detached from any case studies that referenced it"
}

Re-ingest Document

POST /v1/console/knowledge-base/documents/{document_id}/reingest
Drops the existing vectors for this document, resets status to pending, and re-queues the ingestion task. Use after the source file has been replaced in-place, or after switching embedding models. Requires KNOWLEDGE_BASES.can_edit.

How retrieval works

Each ingested chunk is stored with a payload that includes:
{
  "text": "...",
  "headings": ["..."],
  "page": 4,
  "source_id": "6650f6a7b8c9d0e1f2a3b4c5",
  "filename": "diabetes-guidelines.pdf",
  "chunk_index": 12,
  "tenant_id": "..."
}
At session start, the agent receives a list of document_ids from the case study (or whichever domain owns retrieval). On every user turn, the agent embeds the question with gemini-embedding-2 and queries the relevant collection with payload.source_id IN [document_ids]. Empty list = no RAG, no fallback to the whole collection. This means: one collection can hold documents for many case studies safely; cross-pollination only happens when an admin explicitly attaches a document to multiple case studies.

Settings

VariableDefaultPurpose
KB_COLLECTION_NAMEcase-studiesDefault collection for case-study uploads.
KB_EMBEDDING_MODELgemini-embedding-2Embedding model used for ingestion + queries.
KB_EMBEDDING_DIMENSION1536Output dimension for new collections. Don’t change without a re-ingest plan.