Quickstart
Upload a file, parse it, extract against a schema, and poll async jobs.
What is Docspeed?
Docspeed is a document API for turning PDFs and images into structured JSON with source-linked evidence. It is built for AP invoices, tax invoices, tables, review workflows, and multi-document question answering where the output must point back to the page.
Unlike a plain OCR API, Docspeed keeps OCR, layout, extraction, grounding, and async job handling in one contract. The response is designed for applications that need both machine-readable fields and reviewer-visible citations.
Key Features & Use Cases
- Grounded extraction: field and table values can include
region_idsfor page evidence. - AP invoice automation: extract invoice fields, GST/tax fields, and repeated line items.
- Schema builder: generate a reusable extraction schema from instructions or sample documents.
- Table review: preserve cell-level structure and page provenance.
- Async workloads: queue longer jobs and poll for completion with stable job IDs.
- Multi-document QA: ask questions across a document set with cited answers.
Quick Example
The first integration path has four steps: upload a document, parse it, extract against a schema, and fetch the async result.
The Basic Workflow
- Upload: send a PDF or image and receive a
file_id. - Parse: inspect OCR-backed page structure and markdown.
- Extract: submit a schema for grounded JSON output.
- Fetch: poll the job and retrieve the final result.
Document -> file_id -> parsed structure -> job_id -> grounded extraction result
Step 1: Upload
Purpose: stores the source file and returns the identifier used by the rest of the API.
Request:
curl -sS -X POST "https://api.docspeed.ai/api/v1/upload" \
-H "Authorization: Bearer ${DOCSPEED_API_KEY}" \
-F "file=@sample-invoice.pdf"
Response:
{
"file_id": "file_123",
"files": [
{
"file_id": "file_123",
"filename": "sample-invoice.pdf"
}
]
}
Step 2: Parse
Purpose: runs OCR/layout normalization and returns page-level structure that can be displayed or inspected before extraction.
Request:
curl -sS -X POST "https://api.docspeed.ai/api/v1/parse" \
-H "Authorization: Bearer ${DOCSPEED_API_KEY}" \
-H "Content-Type: application/json" \
-d '{
"input": {"file_id": "file_123"},
"execution_mode": "sync",
"grounding": "ocr_lines",
"include_artifacts": true
}'
Response fragment:
{
"markdown": "# Invoice\n\nTotal Due: 1240.00",
"pages": [
{
"page_index": 0,
"markdown": "# Invoice",
"ocr_lines": []
}
]
}
Step 3: Create an Extract Job
Purpose: applies your schema to the document and returns a job ID for longer grounded extraction work.
Request:
curl -sS -X POST "https://api.docspeed.ai/v1/extract" \
-H "Authorization: Bearer ${DOCSPEED_API_KEY}" \
-H "Content-Type: application/json" \
-d '{
"input": {"file_id": "file_123"},
"execution_mode": "async",
"grounding": "cell",
"schema": {
"doc_class": "invoice",
"invoice_level_fields": {
"invoice_number": "string: invoice identifier",
"supplier_name": "string: supplier name",
"invoice_total": "number: total amount due"
},
"line_item_structures": {
"line_items": {
"description": "Invoice line items",
"target_fields": [
"description (string): line item description",
"amount (number): line item amount"
]
}
}
}
}'
Response:
{
"job_id": "job_123",
"operation": "extract",
"status": "queued",
"execution_mode": "async"
}
Step 4: Poll for Status
Purpose: checks whether the async job is queued, processing, completed, or failed.
Request:
curl -sS \
-H "Authorization: Bearer ${DOCSPEED_API_KEY}" \
"https://api.docspeed.ai/v1/jobs/job_123"
Response:
{
"job_id": "job_123",
"operation": "extract",
"status": "processing",
"execution_mode": "async",
"created_at": 1776054000.0,
"updated_at": 1776054003.0
}
Step 5: Get Results
Purpose: retrieves the grounded extraction result after the job completes.
Request:
curl -sS \
-H "Authorization: Bearer ${DOCSPEED_API_KEY}" \
"https://api.docspeed.ai/v1/jobs/job_123/result"
Response fragment:
{
"invoice_level_fields": {
"invoice_number": {
"value": "GST-2026-0042",
"region_ids": ["p1:l11"]
}
},
"line_item_structures": {
"line_items": []
},
"grounding_regions": []
}
cURL
export DOCSPEED_API_KEY="YOUR_API_KEY"
export DOCSPEED_BASE_URL="https://api.docspeed.ai"
FILE_ID=$(
curl -sS -X POST "${DOCSPEED_BASE_URL}/api/v1/upload" \
-H "Authorization: Bearer ${DOCSPEED_API_KEY}" \
-F "file=@sample-invoice.pdf" | jq -r '.file_id'
)
curl -sS -X POST "${DOCSPEED_BASE_URL}/api/v1/parse" \
-H "Authorization: Bearer ${DOCSPEED_API_KEY}" \
-H "Content-Type: application/json" \
-d "{
\"input\": {\"file_id\": \"${FILE_ID}\"},
\"execution_mode\": \"sync\",
\"grounding\": \"ocr_lines\",
\"include_artifacts\": true
}"
JOB_ID=$(
curl -sS -X POST "${DOCSPEED_BASE_URL}/v1/extract" \
-H "Authorization: Bearer ${DOCSPEED_API_KEY}" \
-H "Content-Type: application/json" \
-d '{
"input": {"file_id": "'"${FILE_ID}"'"},
"execution_mode": "async",
"grounding": "cell",
"schema": {
"doc_class": "invoice",
"invoice_level_fields": {
"invoice_number": "string: invoice identifier",
"supplier_name": "string: supplier name",
"invoice_total": "number: total amount due"
},
"line_item_structures": {
"line_items": {
"description": "Invoice line items",
"target_fields": [
"description (string): line item description",
"amount (number): line item amount"
]
}
}
}
}' | jq -r '.job_id'
)
curl -sS -H "Authorization: Bearer ${DOCSPEED_API_KEY}" \
"${DOCSPEED_BASE_URL}/v1/jobs/${JOB_ID}"
curl -sS -H "Authorization: Bearer ${DOCSPEED_API_KEY}" \
"${DOCSPEED_BASE_URL}/v1/jobs/${JOB_ID}/result"
Python
import requests
BASE_URL = "https://api.docspeed.ai"
headers = {"Authorization": "Bearer YOUR_API_KEY"}
with open("sample-invoice.pdf", "rb") as handle:
upload = requests.post(
f"{BASE_URL}/api/v1/upload",
headers=headers,
files={"file": ("sample-invoice.pdf", handle, "application/pdf")},
timeout=120,
)
upload.raise_for_status()
file_id = upload.json()["file_id"]
parse = requests.post(
f"{BASE_URL}/api/v1/parse",
headers={**headers, "Content-Type": "application/json"},
json={
"input": {"file_id": file_id},
"execution_mode": "sync",
"grounding": "ocr_lines",
"include_artifacts": True,
},
timeout=300,
)
parse.raise_for_status()
extract = requests.post(
f"{BASE_URL}/v1/extract",
headers={**headers, "Content-Type": "application/json"},
json={
"input": {"file_id": file_id},
"execution_mode": "async",
"grounding": "cell",
"schema": {
"doc_class": "invoice",
"invoice_level_fields": {
"invoice_number": "string: invoice identifier",
"supplier_name": "string: supplier name",
"invoice_total": "number: total amount due",
},
"line_item_structures": {
"line_items": {
"description": "Invoice line items",
"target_fields": [
"description (string): line item description",
"amount (number): line item amount",
],
}
},
},
},
timeout=300,
)
extract.raise_for_status()
job_id = extract.json()["job_id"]
status = requests.get(f"{BASE_URL}/v1/jobs/{job_id}", headers=headers, timeout=60)
status.raise_for_status()
result = requests.get(f"{BASE_URL}/v1/jobs/{job_id}/result", headers=headers, timeout=60)
result.raise_for_status()
print(result.json())
TypeScript
const baseUrl = "https://api.docspeed.ai";
const headers = { Authorization: "Bearer YOUR_API_KEY" };
const formData = new FormData();
formData.append("file", new File([await Bun.file("sample-invoice.pdf").arrayBuffer()], "sample-invoice.pdf"));
const upload = await fetch(`${baseUrl}/api/v1/upload`, {
method: "POST",
headers,
body: formData,
});
const uploadJson = await upload.json();
const fileId = uploadJson.file_id as string;
await fetch(`${baseUrl}/api/v1/parse`, {
method: "POST",
headers: { ...headers, "Content-Type": "application/json" },
body: JSON.stringify({
input: { file_id: fileId },
execution_mode: "sync",
grounding: "ocr_lines",
include_artifacts: true,
}),
});
const extract = await fetch(`${baseUrl}/v1/extract`, {
method: "POST",
headers: { ...headers, "Content-Type": "application/json" },
body: JSON.stringify({
input: { file_id: fileId },
execution_mode: "async",
grounding: "cell",
schema: {
doc_class: "invoice",
invoice_level_fields: {
invoice_number: "string: invoice identifier",
supplier_name: "string: supplier name",
invoice_total: "number: total amount due",
},
line_item_structures: {
line_items: {
description: "Invoice line items",
target_fields: [
"description (string): line item description",
"amount (number): line item amount",
],
},
},
},
}),
});
const { job_id } = await extract.json();
const status = await fetch(`${baseUrl}/v1/jobs/${job_id}`, { headers });
const result = await fetch(`${baseUrl}/v1/jobs/${job_id}/result`, { headers });
console.log(await status.json());
console.log(await result.json());
Troubleshooting
Run returns an empty extraction result:
- Confirm the uploaded
file_idis the same file used for extraction. - Inspect the parse response to verify OCR text and page structure are present.
- Make field descriptions more explicit in the schema.
- Use
grounding: "cell"when line items or table cells need review evidence.
Job remains queued or processing:
- Poll
GET /v1/jobs/{job_id}with backoff instead of tight loops. - Check for a failed status before reading the result endpoint.
- Use sync mode only when the workload fits an interactive request budget.
Next Steps
- Authentication & API Keys: configure bearer auth.
- Schema Builder: generate an extraction schema.
- Extract: understand grounded extraction response shapes.
- Grounding: render evidence regions in review UIs.
- API Reference: inspect every public endpoint.
Rate Limits
Default limits depend on account configuration and workload shape. For production traffic, contact support with expected document volume, page count, and sync versus async mix.