Quickstart

Upload a file, parse it, extract against a schema, and poll async jobs.

What is Docspeed?

Docspeed is a document API for turning PDFs and images into structured JSON with source-linked evidence. It is built for AP invoices, tax invoices, tables, review workflows, and multi-document question answering where the output must point back to the page.

Unlike a plain OCR API, Docspeed keeps OCR, layout, extraction, grounding, and async job handling in one contract. The response is designed for applications that need both machine-readable fields and reviewer-visible citations.

Key Features & Use Cases

Grounded extraction: field and table values can include region_ids for page evidence.
AP invoice automation: extract invoice fields, GST/tax fields, and repeated line items.
Schema builder: generate a reusable extraction schema from instructions or sample documents.
Table review: preserve cell-level structure and page provenance.
Async workloads: queue longer jobs and poll for completion with stable job IDs.
Multi-document QA: ask questions across a document set with cited answers.

Quick Example

The first integration path has four steps: upload a document, parse it, extract against a schema, and fetch the async result.

The Basic Workflow

Upload: send a PDF or image and receive a file_id.
Parse: inspect OCR-backed page structure and markdown.
Extract: submit a schema for grounded JSON output.
Fetch: poll the job and retrieve the final result.

Document -> file_id -> parsed structure -> job_id -> grounded extraction result

Step 1: Upload

Purpose: stores the source file and returns the identifier used by the rest of the API.

Request:

curl -sS -X POST "https://api.docspeed.ai/api/v1/upload" \
  -H "Authorization: Bearer ${DOCSPEED_API_KEY}" \
  -F "file=@sample-invoice.pdf"

Response:

{
  "file_id": "file_123",
  "files": [
    {
      "file_id": "file_123",
      "filename": "sample-invoice.pdf"
    }
  ]
}

Step 2: Parse

Purpose: runs OCR/layout normalization and returns page-level structure that can be displayed or inspected before extraction.

Request:

curl -sS -X POST "https://api.docspeed.ai/api/v1/parse" \
  -H "Authorization: Bearer ${DOCSPEED_API_KEY}" \
  -H "Content-Type: application/json" \
  -d '{
    "input": {"file_id": "file_123"},
    "execution_mode": "sync",
    "grounding": "ocr_lines",
    "include_artifacts": true
  }'

Response fragment:

{
  "markdown": "# Invoice\n\nTotal Due: 1240.00",
  "pages": [
    {
      "page_index": 0,
      "markdown": "# Invoice",
      "ocr_lines": []
    }
  ]
}

Step 3: Create an Extract Job

Purpose: applies your schema to the document and returns a job ID for longer grounded extraction work.

Request:

curl -sS -X POST "https://api.docspeed.ai/v1/extract" \
  -H "Authorization: Bearer ${DOCSPEED_API_KEY}" \
  -H "Content-Type: application/json" \
  -d '{
    "input": {"file_id": "file_123"},
    "execution_mode": "async",
    "grounding": "cell",
    "schema": {
      "doc_class": "invoice",
      "invoice_level_fields": {
        "invoice_number": "string: invoice identifier",
        "supplier_name": "string: supplier name",
        "invoice_total": "number: total amount due"
      },
      "line_item_structures": {
        "line_items": {
          "description": "Invoice line items",
          "target_fields": [
            "description (string): line item description",
            "amount (number): line item amount"
          ]
        }
      }
    }
  }'

Response:

{
  "job_id": "job_123",
  "operation": "extract",
  "status": "queued",
  "execution_mode": "async"
}

Step 4: Poll for Status

Purpose: checks whether the async job is queued, processing, completed, or failed.

Request:

curl -sS \
  -H "Authorization: Bearer ${DOCSPEED_API_KEY}" \
  "https://api.docspeed.ai/v1/jobs/job_123"

Response:

{
  "job_id": "job_123",
  "operation": "extract",
  "status": "processing",
  "execution_mode": "async",
  "created_at": 1776054000.0,
  "updated_at": 1776054003.0
}

Step 5: Get Results

Purpose: retrieves the grounded extraction result after the job completes.

Request:

curl -sS \
  -H "Authorization: Bearer ${DOCSPEED_API_KEY}" \
  "https://api.docspeed.ai/v1/jobs/job_123/result"

Response fragment:

{
  "invoice_level_fields": {
    "invoice_number": {
      "value": "GST-2026-0042",
      "region_ids": ["p1:l11"]
    }
  },
  "line_item_structures": {
    "line_items": []
  },
  "grounding_regions": []
}

cURL

export DOCSPEED_API_KEY="YOUR_API_KEY"
export DOCSPEED_BASE_URL="https://api.docspeed.ai"

FILE_ID=$(
  curl -sS -X POST "${DOCSPEED_BASE_URL}/api/v1/upload" \
    -H "Authorization: Bearer ${DOCSPEED_API_KEY}" \
    -F "file=@sample-invoice.pdf" | jq -r '.file_id'
)

curl -sS -X POST "${DOCSPEED_BASE_URL}/api/v1/parse" \
  -H "Authorization: Bearer ${DOCSPEED_API_KEY}" \
  -H "Content-Type: application/json" \
  -d "{
    \"input\": {\"file_id\": \"${FILE_ID}\"},
    \"execution_mode\": \"sync\",
    \"grounding\": \"ocr_lines\",
    \"include_artifacts\": true
  }"

JOB_ID=$(
  curl -sS -X POST "${DOCSPEED_BASE_URL}/v1/extract" \
    -H "Authorization: Bearer ${DOCSPEED_API_KEY}" \
    -H "Content-Type: application/json" \
    -d '{
      "input": {"file_id": "'"${FILE_ID}"'"},
      "execution_mode": "async",
      "grounding": "cell",
      "schema": {
        "doc_class": "invoice",
        "invoice_level_fields": {
          "invoice_number": "string: invoice identifier",
          "supplier_name": "string: supplier name",
          "invoice_total": "number: total amount due"
        },
        "line_item_structures": {
          "line_items": {
            "description": "Invoice line items",
            "target_fields": [
              "description (string): line item description",
              "amount (number): line item amount"
            ]
          }
        }
      }
    }' | jq -r '.job_id'
)

curl -sS -H "Authorization: Bearer ${DOCSPEED_API_KEY}" \
  "${DOCSPEED_BASE_URL}/v1/jobs/${JOB_ID}"

curl -sS -H "Authorization: Bearer ${DOCSPEED_API_KEY}" \
  "${DOCSPEED_BASE_URL}/v1/jobs/${JOB_ID}/result"

Python

import requests

BASE_URL = "https://api.docspeed.ai"
headers = {"Authorization": "Bearer YOUR_API_KEY"}

with open("sample-invoice.pdf", "rb") as handle:
    upload = requests.post(
        f"{BASE_URL}/api/v1/upload",
        headers=headers,
        files={"file": ("sample-invoice.pdf", handle, "application/pdf")},
        timeout=120,
    )
upload.raise_for_status()
file_id = upload.json()["file_id"]

parse = requests.post(
    f"{BASE_URL}/api/v1/parse",
    headers={**headers, "Content-Type": "application/json"},
    json={
        "input": {"file_id": file_id},
        "execution_mode": "sync",
        "grounding": "ocr_lines",
        "include_artifacts": True,
    },
    timeout=300,
)
parse.raise_for_status()

extract = requests.post(
    f"{BASE_URL}/v1/extract",
    headers={**headers, "Content-Type": "application/json"},
    json={
        "input": {"file_id": file_id},
        "execution_mode": "async",
        "grounding": "cell",
        "schema": {
            "doc_class": "invoice",
            "invoice_level_fields": {
                "invoice_number": "string: invoice identifier",
                "supplier_name": "string: supplier name",
                "invoice_total": "number: total amount due",
            },
            "line_item_structures": {
                "line_items": {
                    "description": "Invoice line items",
                    "target_fields": [
                        "description (string): line item description",
                        "amount (number): line item amount",
                    ],
                }
            },
        },
    },
    timeout=300,
)
extract.raise_for_status()
job_id = extract.json()["job_id"]

status = requests.get(f"{BASE_URL}/v1/jobs/{job_id}", headers=headers, timeout=60)
status.raise_for_status()
result = requests.get(f"{BASE_URL}/v1/jobs/{job_id}/result", headers=headers, timeout=60)
result.raise_for_status()
print(result.json())

TypeScript

const baseUrl = "https://api.docspeed.ai";
const headers = { Authorization: "Bearer YOUR_API_KEY" };

const formData = new FormData();
formData.append("file", new File([await Bun.file("sample-invoice.pdf").arrayBuffer()], "sample-invoice.pdf"));

const upload = await fetch(`${baseUrl}/api/v1/upload`, {
  method: "POST",
  headers,
  body: formData,
});
const uploadJson = await upload.json();
const fileId = uploadJson.file_id as string;

await fetch(`${baseUrl}/api/v1/parse`, {
  method: "POST",
  headers: { ...headers, "Content-Type": "application/json" },
  body: JSON.stringify({
    input: { file_id: fileId },
    execution_mode: "sync",
    grounding: "ocr_lines",
    include_artifacts: true,
  }),
});

const extract = await fetch(`${baseUrl}/v1/extract`, {
  method: "POST",
  headers: { ...headers, "Content-Type": "application/json" },
  body: JSON.stringify({
    input: { file_id: fileId },
    execution_mode: "async",
    grounding: "cell",
    schema: {
      doc_class: "invoice",
      invoice_level_fields: {
        invoice_number: "string: invoice identifier",
        supplier_name: "string: supplier name",
        invoice_total: "number: total amount due",
      },
      line_item_structures: {
        line_items: {
          description: "Invoice line items",
          target_fields: [
            "description (string): line item description",
            "amount (number): line item amount",
          ],
        },
      },
    },
  }),
});

const { job_id } = await extract.json();
const status = await fetch(`${baseUrl}/v1/jobs/${job_id}`, { headers });
const result = await fetch(`${baseUrl}/v1/jobs/${job_id}/result`, { headers });

console.log(await status.json());
console.log(await result.json());

Troubleshooting

Run returns an empty extraction result:

Confirm the uploaded file_id is the same file used for extraction.
Inspect the parse response to verify OCR text and page structure are present.
Make field descriptions more explicit in the schema.
Use grounding: "cell" when line items or table cells need review evidence.

Job remains queued or processing:

Poll GET /v1/jobs/{job_id} with backoff instead of tight loops.
Check for a failed status before reading the result endpoint.
Use sync mode only when the workload fits an interactive request budget.

Next Steps

Authentication & API Keys: configure bearer auth.
Schema Builder: generate an extraction schema.
Extract: understand grounded extraction response shapes.
Grounding: render evidence regions in review UIs.
API Reference: inspect every public endpoint.

Rate Limits

Default limits depend on account configuration and workload shape. For production traffic, contact support with expected document volume, page count, and sync versus async mix.