ExtractAPI is a document extraction API. You POST any document (PDF, image, DOCX) plus a JSON schema defining what fields you want, and get back validated structured JSON. It works on any document type without training — invoices, contracts, receipts, forms, medical records, anything.

How does schema-enforced extraction work?

You define a JSON schema with the fields you want extracted (e.g. invoice_number, total, line_items). ExtractAPI uses Claude AI to read the document and return structured JSON matching your schema exactly. No training, no templates — works on any document type instantly.

What file types does ExtractAPI support?

ExtractAPI supports PDF, PNG, JPEG, GIF, WEBP, DOCX, and TXT files. Multi-page PDFs are supported and billed per page.

Is there a free tier?

Yes. The free tier includes 50 pages per month, 10MB max file size, and up to 20 schema fields. No credit card required — just sign up with your email to get an API key.

Can AI agents use ExtractAPI?

Yes. ExtractAPI is designed for AI agents and automation. It has a simple REST API with Bearer token auth, returns clean JSON, and includes an OpenAPI spec at /openapi.json for automatic tool integration.

ExtractAPI — Schema-Enforced Document Extraction API

Request
                
                
                
            

curl -X POST https://extractapi.net/v1/extract \
  -H "Authorization: Bearer ex_your_api_key" \
  -F "file=@invoice.pdf" \
  -F 'schema={
    "invoice_number": "string",
    "vendor_name": "string",
    "total_amount": "number",
    "due_date": "string",
    "line_items": [{"description": "string", "amount": "number"}]
  }'
        

import requests

response = requests.post(
    "https://extractapi.net/v1/extract",
    headers={"Authorization": "Bearer ex_your_api_key"},
    files={"file": open("invoice.pdf", "rb")},
    data={"schema": '''{
        "invoice_number": "string",
        "vendor_name": "string",
        "total_amount": "number",
        "due_date": "string",
        "line_items": [{"description": "string", "amount": "number"}]
    }'''}
)

data = response.json()["data"]
print(data["total_amount"]) # 1,250.00
        

const form = new FormData();
form.append('file', fs.createReadStream('invoice.pdf'));
form.append('schema', JSON.stringify({
    invoice_number: "string",
    vendor_name: "string",
    total_amount: "number",
    due_date: "string",
    line_items: [{description: "string", amount: "number"}]
}));

const res = await fetch('https://extractapi.net/v1/extract', {
    method: 'POST',
    headers: { Authorization: 'Bearer ex_your_api_key' },
    body: form
});

const { data } = await res.json();
console.log(data.total_amount); // 1250.00
        

↓

Response — validated JSON

{
  "request_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "data": {
    "invoice_number": "INV-2026-0142",
    "vendor_name": "Acme Corporation",
    "total_amount": 1250.00,
    "due_date": "2026-04-15",
    "line_items": [
      { "description": "Consulting services", "amount": 1000.00 },
      { "description": "Travel expenses", "amount": 250.00 }
    ]
  },
  "meta": {
    "pages_processed": 1,
    "latency_ms": 2340,
    "pages_remaining": 49
  }
}
        

Built for developers and AI agents

One endpoint. Any document. Structured output.

📄

Any document type

PDF, images, DOCX, TXT. Invoices, contracts, receipts, medical forms, tax documents — no pre-training required.

🧩

Schema-enforced output

Define your JSON schema, get exactly those fields back. Nested objects, arrays, typed values — all validated.

⚡

Zero-shot extraction

No templates, no training data, no config files. Send a document you've never seen before and get structured data back.

🤖

Agent-native

REST API with Bearer auth. OpenAPI spec at /openapi.json. Works with LangChain, CrewAI, AutoGen, or any HTTP client.

📦

Batch processing

Upload 20 files with one schema in a single request. Perfect for processing folders of invoices, statements, or forms.

📊

Usage metering

Track pages processed, latency, and remaining quota via the /v1/usage endpoint. Build billing into your own product.

What developers extract

📑 Invoices & receipts

Vendor, amount, line items, dates, tax — from any format or layout.

📝 Contracts & legal docs

Parties, clauses, obligations, effective dates, termination terms.

🏦 Bank statements

Transactions, balances, account numbers, date ranges.

🏥 Medical records

Patient info, diagnoses, medications, lab values, dates.

📋 Government forms

W-2s, 1099s, tax returns, permits, applications — any form.

🏠 Real estate docs

Leases, appraisals, inspection reports, title documents.

🚢 Shipping & logistics

Bills of lading, packing slips, customs declarations.

🔧 Your use case

If it's a document with data, ExtractAPI can extract it. No limits on document types.

Simple, usage-based pricing

Private onboarding and controlled rollout.

Pilot

By approval

invite-only

Manual onboarding
Dedicated API key provisioning
Usage guardrails enabled
Abuse protection + billing controls
Production rollout after QA

Request Access

Starter

$49/mo

~$0.025/page

2,000 pages/month
50MB max file size
100 schema fields
Batch extraction (20 files)
Email support

Start with Starter

Pro

$199/mo

~$0.02/page

10,000 pages/month
100MB max file size
500 schema fields
Batch extraction (20 files)
Priority support

Start with Pro

Scale

$499/mo

~$0.01/page

50,000 pages/month
200MB max file size
1,000 schema fields
Batch extraction (20 files)
Dedicated support

Start with Scale

Document → JSONin one API call