Skip to main content
Feather’s knowledge base pipeline accepts documents from virtually any source — local files, raw text, and third-party platforms like Notion, Google Drive, and S3. Once ingested, documents are chunked, embedded, and indexed so your agents can perform semantic search at inference time. This guide covers the full ingestion lifecycle: uploading content, tracking job status, connecting external sources, and validating results with a live search query.

Ingest files and text

All file and text uploads go through a single unified endpoint that accepts multipart/form-data. You describe what you’re uploading with a JSON manifest and attach the actual file bytes separately. Endpoint: POST /v1/knowledge-base/knowledge-bases/{kb_id}/ingest The request has two parts:
FieldTypeDescription
manifestJSON stringArray of item descriptors — one object per document
filesFile partsThe actual file bytes, referenced by file_index in the manifest
Manifest item formats:
[
  {
    "type": "text",
    "title": "FAQ",
    "content": "What is Feather? Feather is an AI-powered customer experience platform."
  },
  {
    "type": "file",
    "file_index": 0,
    "title": "Product Guide"
  }
]
Each item has a type of either text (inline content) or file (references an uploaded file part by its zero-based index). The title is stored with the document and surfaces in search results. Full example — uploading a file and a text snippet together:
curl -X POST https://api-sandbox.featherhq.com/v1/knowledge-base/knowledge-bases/kb_01hx4mno5pqrstuvwxyz0001/ingest \
  -H "x-api-key: $FEATHER_API_KEY" \
  -F 'manifest=[{"type":"text","title":"FAQ","content":"What is Feather? Feather is an AI-powered customer experience platform."},{"type":"file","file_index":0,"title":"Product Guide"}]' \
  -F 'files=@./product-guide.pdf'
{
  "ingestion_job_id": "job_01hx7ingest9876543abcde",
  "items": [
    {
      "manifest_index": 0,
      "result": "queued",
      "document_id": "doc_01hx7faq000000000000001"
    },
    {
      "manifest_index": 1,
      "result": "queued",
      "document_id": "doc_01hx7pdf000000000000002"
    }
  ]
}
Each item in the response reports either queued (accepted for processing) or duplicate (an identical document already exists in the knowledge base and was skipped). Save the ingestion_job_id to track progress.
Duplicate detection is hash-based. If you re-upload the same file after editing it, the content hash changes and it will be ingested as a new version.

Track ingestion status

Ingestion is asynchronous. Documents move through a pipeline — queued → processing → completed (or failed). Poll the job endpoint until status reaches a terminal state. Poll a specific job:
curl https://api-sandbox.featherhq.com/v1/knowledge-base/ingestion-jobs/job_01hx7ingest9876543abcde \
  -H "x-api-key: $FEATHER_API_KEY"
{
  "id": "job_01hx7ingest9876543abcde",
  "knowledge_base_id": "kb_01hx4mno5pqrstuvwxyz0001",
  "status": "processing",
  "total_documents": 2,
  "completed": 1,
  "failed": 0,
  "progress_pct": 50,
  "created_at": "2024-05-14T11:00:00Z",
  "updated_at": "2024-05-14T11:00:08Z"
}
FieldDescription
statusqueued · processing · completed · failed
total_documentsTotal items submitted in this job
completedDocuments successfully embedded and indexed
failedDocuments that encountered an error
progress_pctInteger 0–100 derived from completed / total
Check aggregate status across the whole knowledge base:
curl https://api-sandbox.featherhq.com/v1/knowledge-base/knowledge-bases/kb_01hx4mno5pqrstuvwxyz0001/ingestion-status \
  -H "x-api-key: $FEATHER_API_KEY"
{
  "knowledge_base_id": "kb_01hx4mno5pqrstuvwxyz0001",
  "pending": 0,
  "processing": 1,
  "active": 40,
  "failed": 1,
  "total": 42
}
The aggregate counts documents by lifecycle state: pending, processing, active (indexed and live), draft, archived, failed, and quarantined, with total summing them. Use this endpoint on the dashboard or during health checks to confirm your knowledge base is fully indexed — i.e. all documents are active — before going live.
Retry failed documents without re-uploading them. Send a POST to /v1/knowledge-base/ingestion-jobs/{job_id}/retry and Feather will re-queue only the failed items from that job.

Connect external sources

For Notion pages, Google Drive folders, and S3 buckets, Feather syncs content automatically rather than requiring manual uploads. The flow is: create a connection → add the source → define roots → trigger a sync → optionally schedule recurring syncs.
External source integrations require an active integration connection for your provider. See Connect an Integration to set one up before continuing.
1

Add the source to your knowledge base

curl -X POST https://api-sandbox.featherhq.com/v1/knowledge-base/knowledge-bases/kb_01hx4mno5pqrstuvwxyz0001/sources \
  -H "x-api-key: $FEATHER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "provider": "notion",
    "integration_connection_id": "conn_01hx8notion123456789abc"
  }'
{
  "id": "src_01hx8source9876543210ab",
  "knowledge_base_id": "kb_01hx4mno5pqrstuvwxyz0001",
  "provider": "notion",
  "integration_connection_id": "conn_01hx8notion123456789abc",
  "auto_sync_enabled": false,
  "created_at": "2024-05-14T11:10:00Z"
}
Supported values for provider: notion, google_drive, s3.
2

Add roots

Roots define the top-level pages, folders, or prefixes that Feather will crawl. Everything nested beneath a root is included in the sync.
curl -X POST https://api-sandbox.featherhq.com/v1/knowledge-base/knowledge-bases/kb_01hx4mno5pqrstuvwxyz0001/sources/src_01hx8source9876543210ab/roots \
  -H "x-api-key: $FEATHER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "provider_item_id": "1a2b3c4d-notion-page-id-here"
  }'
You can add multiple roots to the same source — for example, different top-level Notion pages for different product areas.
3

Trigger a manual sync

curl -X POST https://api-sandbox.featherhq.com/v1/knowledge-base/knowledge-bases/kb_01hx4mno5pqrstuvwxyz0001/sources/src_01hx8source9876543210ab/sync \
  -H "x-api-key: $FEATHER_API_KEY"
{
  "job_id": "job_01hx8sync0000001234abcd",
  "status": "queued",
  "message": "Sync has been queued"
}
Track this job using the ingestion-jobs/{job_id} endpoint covered in the previous section.
4

Enable automatic recurring syncs

Keep your knowledge base fresh by scheduling periodic syncs. Set sync_interval_seconds to control how often Feather re-crawls the source.
curl -X PATCH https://api-sandbox.featherhq.com/v1/knowledge-base/knowledge-bases/kb_01hx4mno5pqrstuvwxyz0001/sources/src_01hx8source9876543210ab/schedule \
  -H "x-api-key: $FEATHER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "auto_sync_enabled": true,
    "sync_interval_seconds": 3600
  }'
The example above syncs the source every hour (3600 seconds). Set auto_sync_enabled: false to pause automatic syncing without deleting the schedule configuration.

Search your knowledge base

Before connecting a knowledge base to an agent, validate that your content is indexed and returning relevant results.
curl -X POST https://api-sandbox.featherhq.com/v1/knowledge-base/search \
  -H "x-api-key: $FEATHER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "How do I reset my password?",
    "kb_ids": ["kb_01hx4mno5pqrstuvwxyz0001"],
    "limit": 5
  }'
{
  "results": [
    {
      "chunk_id": "chk_01hx9chunk111111111aaaa",
      "kb_id": "kb_01hx4mno5pqrstuvwxyz0001",
      "document_id": "doc_01hx7pdf000000000000002",
      "document_title": "Product Guide",
      "source_type": "file",
      "content": "To reset your password, visit the login page and click 'Forgot password'. Enter your registered email address and follow the link sent to your inbox.",
      "score": 0.94,
      "chunk_index": 3
    },
    {
      "chunk_id": "chk_01hx9chunk222222222bbbb",
      "kb_id": "kb_01hx4mno5pqrstuvwxyz0001",
      "document_id": "doc_01hx7faq000000000000001",
      "document_title": "FAQ",
      "source_type": "text",
      "content": "Password resets expire after 30 minutes. If your link has expired, request a new one from the login page.",
      "score": 0.87,
      "chunk_index": 0
    }
  ],
  "total_results": 2,
  "ranking_mode": "global_score",
  "answerability": {
    "label": "answerable",
    "reason_code": "above_high_threshold",
    "raw_top_score": 0.94,
    "score_basis": "raw_dotproduct"
  }
}
FieldDescription
total_resultsTotal candidate count before the flat results list is truncated by limit
ranking_modeHow results were ordered: global_score, reranked, or grouped_unranked
answerabilityObject describing whether the knowledge base likely contains a sufficient answer. Its label is one of answerable, low_confidence, not_answerable, or unknown, alongside reason_code, raw_top_score, and score_basis. Present only when the answerability gate is enabled for your org; otherwise null
chunk_idUnique ID of the retrieved text chunk
document_titleTitle assigned during ingestion
contentThe raw text of the chunk
scoreSemantic similarity score (0–1). Higher is more relevant
A high score means the chunk is semantically similar to the query — not that it’s factually correct. Always review new knowledge bases manually before enabling them in production agents.

What’s next?

Knowledge Base API Reference

Full endpoint reference for knowledge bases, ingestion jobs, sources, and search.