Ingest and Index Documents into a Knowledge Base

Feather’s knowledge base pipeline accepts documents from virtually any source — local files, raw text, and third-party platforms like Notion, Google Drive, and S3. Once ingested, documents are chunked, embedded, and indexed so your agents can perform semantic search at inference time. This guide covers the full ingestion lifecycle: uploading content, tracking job status, connecting external sources, and validating results with a live search query.

Ingest files and text

All file and text uploads go through a single unified endpoint that accepts multipart/form-data. You describe what you’re uploading with a JSON manifest and attach the actual file bytes separately. Endpoint: POST /v1/knowledge-base/knowledge-bases/{kb_id}/ingest The request has two parts:

Field	Type	Description
`manifest`	JSON string	Array of item descriptors — one object per document
`files`	File parts	The actual file bytes, referenced by `file_index` in the manifest

Manifest item formats:

[
  {
    "type": "text",
    "title": "FAQ",
    "content": "What is Feather? Feather is an AI-powered customer experience platform."
  },
  {
    "type": "file",
    "file_index": 0,
    "title": "Product Guide"
  }
]

Each item has a type of either text (inline content) or file (references an uploaded file part by its zero-based index). The title is stored with the document and surfaces in search results. Full example — uploading a file and a text snippet together:

curl -X POST https://api-sandbox.featherhq.com/v1/knowledge-base/knowledge-bases/kb_01hx4mno5pqrstuvwxyz0001/ingest \
  -H "x-api-key: $FEATHER_API_KEY" \
  -F 'manifest=[{"type":"text","title":"FAQ","content":"What is Feather? Feather is an AI-powered customer experience platform."},{"type":"file","file_index":0,"title":"Product Guide"}]' \
  -F 'files=@./product-guide.pdf'

{
  "ingestion_job_id": "job_01hx7ingest9876543abcde",
  "items": [
    {
      "manifest_index": 0,
      "result": "queued",
      "document_id": "doc_01hx7faq000000000000001"
    },
    {
      "manifest_index": 1,
      "result": "queued",
      "document_id": "doc_01hx7pdf000000000000002"
    }
  ]
}

Each item in the response reports either queued (accepted for processing) or duplicate (an identical document already exists in the knowledge base and was skipped). Save the ingestion_job_id to track progress.

Duplicate detection is hash-based. If you re-upload the same file after editing it, the content hash changes and it will be ingested as a new version.

Track ingestion status

Ingestion is asynchronous. Documents move through a pipeline — queued → processing → completed (or failed). Poll the job endpoint until status reaches a terminal state. Poll a specific job:

curl https://api-sandbox.featherhq.com/v1/knowledge-base/ingestion-jobs/job_01hx7ingest9876543abcde \
  -H "x-api-key: $FEATHER_API_KEY"

{
  "id": "job_01hx7ingest9876543abcde",
  "knowledge_base_id": "kb_01hx4mno5pqrstuvwxyz0001",
  "status": "processing",
  "total_documents": 2,
  "completed": 1,
  "failed": 0,
  "progress_pct": 50,
  "created_at": "2024-05-14T11:00:00Z",
  "updated_at": "2024-05-14T11:00:08Z"
}

Field	Description
`status`	`queued` · `processing` · `completed` · `failed`
`total_documents`	Total items submitted in this job
`completed`	Documents successfully embedded and indexed
`failed`	Documents that encountered an error
`progress_pct`	Integer 0–100 derived from completed / total

Check aggregate status across the whole knowledge base:

curl https://api-sandbox.featherhq.com/v1/knowledge-base/knowledge-bases/kb_01hx4mno5pqrstuvwxyz0001/ingestion-status \
  -H "x-api-key: $FEATHER_API_KEY"

{
  "knowledge_base_id": "kb_01hx4mno5pqrstuvwxyz0001",
  "pending": 0,
  "processing": 1,
  "active": 40,
  "failed": 1,
  "total": 42
}

The aggregate counts documents by lifecycle state: pending, processing, active (indexed and live), draft, archived, failed, and quarantined, with total summing them. Use this endpoint on the dashboard or during health checks to confirm your knowledge base is fully indexed — i.e. all documents are active — before going live.

Retry failed documents without re-uploading them. Send a POST to /v1/knowledge-base/ingestion-jobs/{job_id}/retry and Feather will re-queue only the failed items from that job.

Connect external sources

For Notion pages, Google Drive folders, and S3 buckets, Feather syncs content automatically rather than requiring manual uploads. The flow is: create a connection → add the source → define roots → trigger a sync → optionally schedule recurring syncs.

External source integrations require an active integration connection for your provider. See Connect an Integration to set one up before continuing.

Add the source to your knowledge base

curl -X POST https://api-sandbox.featherhq.com/v1/knowledge-base/knowledge-bases/kb_01hx4mno5pqrstuvwxyz0001/sources \
  -H "x-api-key: $FEATHER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "provider": "notion",
    "integration_connection_id": "conn_01hx8notion123456789abc"
  }'

{
  "id": "src_01hx8source9876543210ab",
  "knowledge_base_id": "kb_01hx4mno5pqrstuvwxyz0001",
  "provider": "notion",
  "integration_connection_id": "conn_01hx8notion123456789abc",
  "auto_sync_enabled": false,
  "created_at": "2024-05-14T11:10:00Z"
}

Supported values for provider: notion, google_drive, s3.

Add roots

Roots define the top-level pages, folders, or prefixes that Feather will crawl. Everything nested beneath a root is included in the sync.

curl -X POST https://api-sandbox.featherhq.com/v1/knowledge-base/knowledge-bases/kb_01hx4mno5pqrstuvwxyz0001/sources/src_01hx8source9876543210ab/roots \
  -H "x-api-key: $FEATHER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "provider_item_id": "1a2b3c4d-notion-page-id-here"
  }'

You can add multiple roots to the same source — for example, different top-level Notion pages for different product areas.

Trigger a manual sync

curl -X POST https://api-sandbox.featherhq.com/v1/knowledge-base/knowledge-bases/kb_01hx4mno5pqrstuvwxyz0001/sources/src_01hx8source9876543210ab/sync \
  -H "x-api-key: $FEATHER_API_KEY"

{
  "job_id": "job_01hx8sync0000001234abcd",
  "status": "queued",
  "message": "Sync has been queued"
}

Track this job using the ingestion-jobs/{job_id} endpoint covered in the previous section.

Enable automatic recurring syncs

Keep your knowledge base fresh by scheduling periodic syncs. Set sync_interval_seconds to control how often Feather re-crawls the source.

curl -X PATCH https://api-sandbox.featherhq.com/v1/knowledge-base/knowledge-bases/kb_01hx4mno5pqrstuvwxyz0001/sources/src_01hx8source9876543210ab/schedule \
  -H "x-api-key: $FEATHER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "auto_sync_enabled": true,
    "sync_interval_seconds": 3600
  }'

The example above syncs the source every hour (3600 seconds). Set auto_sync_enabled: false to pause automatic syncing without deleting the schedule configuration.

Search your knowledge base

Before connecting a knowledge base to an agent, validate that your content is indexed and returning relevant results.

curl -X POST https://api-sandbox.featherhq.com/v1/knowledge-base/search \
  -H "x-api-key: $FEATHER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "How do I reset my password?",
    "kb_ids": ["kb_01hx4mno5pqrstuvwxyz0001"],
    "limit": 5
  }'

{
  "results": [
    {
      "chunk_id": "chk_01hx9chunk111111111aaaa",
      "kb_id": "kb_01hx4mno5pqrstuvwxyz0001",
      "document_id": "doc_01hx7pdf000000000000002",
      "document_title": "Product Guide",
      "source_type": "file",
      "content": "To reset your password, visit the login page and click 'Forgot password'. Enter your registered email address and follow the link sent to your inbox.",
      "score": 0.94,
      "chunk_index": 3
    },
    {
      "chunk_id": "chk_01hx9chunk222222222bbbb",
      "kb_id": "kb_01hx4mno5pqrstuvwxyz0001",
      "document_id": "doc_01hx7faq000000000000001",
      "document_title": "FAQ",
      "source_type": "text",
      "content": "Password resets expire after 30 minutes. If your link has expired, request a new one from the login page.",
      "score": 0.87,
      "chunk_index": 0
    }
  ],
  "total_results": 2,
  "ranking_mode": "global_score",
  "answerability": {
    "label": "answerable",
    "reason_code": "above_high_threshold",
    "raw_top_score": 0.94,
    "score_basis": "raw_dotproduct"
  }
}

Field	Description
`total_results`	Total candidate count before the flat `results` list is truncated by `limit`
`ranking_mode`	How results were ordered: `global_score`, `reranked`, or `grouped_unranked`
`answerability`	Object describing whether the knowledge base likely contains a sufficient answer. Its `label` is one of `answerable`, `low_confidence`, `not_answerable`, or `unknown`, alongside `reason_code`, `raw_top_score`, and `score_basis`. Present only when the answerability gate is enabled for your org; otherwise `null`
`chunk_id`	Unique ID of the retrieved text chunk
`document_title`	Title assigned during ingestion
`content`	The raw text of the chunk
`score`	Semantic similarity score (0–1). Higher is more relevant

A high score means the chunk is semantically similar to the query — not that it’s factually correct. Always review new knowledge bases manually before enabling them in production agents.

What’s next?

Knowledge Base API Reference

Full endpoint reference for knowledge bases, ingestion jobs, sources, and search.

​Ingest files and text

​Track ingestion status

​Connect external sources

​Search your knowledge base

​What’s next?

Knowledge Base API Reference

Ingest files and text

Track ingestion status

Connect external sources

Search your knowledge base

What’s next?