Skip to content

Indexes & Documents

An index is an app-scoped container of RAG-ready documents. Upload PDFs, Markdown, HTML, DOCX, XLSX, or text into an index, and the platform chunks + embeds them so the agent can recall passages by semantic search inside its think step.

Indexes are for knowledge — the kind of bytes you want the agent to retrieve from. Customer file blobs and structured records belong in your backend (PocketBase / Supabase / your own DB), exposed to the agent via MCP.

index, _ := client.CreateIndex(ctx, tavora.CreateIndexInput{
Name: "Support docs",
Description: "FAQ + product manuals",
})
indexes, _ := client.ListIndexes(ctx)
got, _ := client.GetIndex(ctx, index.ID)
_, _ = client.UpdateIndex(ctx, index.ID, tavora.UpdateIndexInput{
Name: "Support knowledge",
})
_ = client.DeleteIndex(ctx, index.ID)

Documents are uploaded as multipart bytes plus optional provenance metadata. Indexable file types (PDF, DOCX, XLSX, CSV, MD, HTML, TXT, images) chunk + embed; other types are stored opaque (status: "stored") and are listable but not searchable.

doc, _ := client.UploadDocument(ctx, tavora.UploadDocumentInput{
IndexID: index.ID,
FilePath: "./faq.md",
// Optional provenance — round-trips through document metadata.
Source: "https://example.com/faq",
Task: "support-handbook",
Type: "faq",
Tags: []string{"v2026.05", "public"},
})

Re-uploading with the same name to the same index bumps the document’s version; older versions stay queryable (is_latest=false) and fetchable via ?version=N. Pass if_version for optimistic concurrency — 409 on mismatch, returnable through asVersionConflict to retry the rewrite.

Non-markdown indexable types (PDF, DOCX, XLSX, …) generate an extracted markdown sibling on upload — a second document row with content_type=text/markdown, parent_id pointing at the original, and metadata.derived_from="extraction". Chunks attach to the sibling so search hits cite the editable form.

Every uploaded document is hashed server-side; the hex sha256 is returned as content_sha256. Find duplicates with ?content_sha256=<hex> or the sugar ?duplicate_of=<id>.

chunks, _ := client.Search(ctx, tavora.SearchInput{
Query: "what document formats are supported?",
IndexID: index.ID,
Limit: 5,
})
// One row per matched chunk.
docs, _ := client.SearchDocuments(ctx, tavora.SearchInput{
Query: "what artifacts cover refund policy?",
IndexID: index.ID,
Limit: 5,
})
// One row per distinct document, best chunk inlined as best_chunk.preview.

search() is the agent’s default (chunks). searchDocuments() is the right call when the question is “what artifacts are about X” rather than “what passages are about X”.

deleteDocument is soft by default (sets deleted_at, drops is_latest, idempotent — 204 whether the row existed or was already gone). Use deleteDocumentHard to remove the row and the on-disk file.

Pass index_ids on createAgentSession to scope the agent’s retrieval to a subset of an app’s indexes — useful for per-tenant knowledge slicing (one index per tenant, pinned at session create). Omit it and the agent can search every enabled index in the app.