Document Extraction API
A reliable REST API for extracting structured data from PDFs, emails, and documents. Define your schema once, get consistent JSON output every time — with webhooks, retries, and GDPR compliance built in.
Upload a document, get structured JSON
Request
curl -X POST https://api.airparser.com/inboxes/INBOX_ID/upload-sync \
-H "X-API-Key: YOUR_API_KEY" \
-F "[email protected]"Response
{
"doc_id": "64abc123def456...",
"parsing_in_progress": false,
"status": "parsed",
"name": "invoice.pdf",
"content_type": "application/pdf",
"created_at": "2026-03-10T12:00:00.000Z",
"processed_at": "2026-03-10T12:00:04.321Z",
"json": {
"invoice_number": "INV-2024-0042",
"invoice_date": "2024-03-15",
"vendor_name": "Acme Supplies Ltd",
"total_amount": 1284.50,
"currency": "USD",
"due_date": "2024-04-14",
"line_items": [
{
"description": "Office supplies",
"quantity": 3,
"unit_price": 428.17,
"total": 1284.50
}
]
}
}Built for production, not prototyping
Guaranteed schema
Define your output schema once. Every document returns the exact same JSON structure — no format variations, no parsing surprises in production.
Webhook delivery
Results are pushed to your endpoint the moment extraction completes. Automatic retries with exponential backoff — no polling required.
Multi-engine fallback
Text LLM → Vision LLM → AI OCR. Scanned documents, image-based PDFs, and handwritten text are handled automatically without extra configuration.
GDPR by default
AES-256 encryption, configurable data retention, no training on your data, EU-based processing available. Compliance built in, not bolted on.
Python post-processing
Run custom Python code on extracted data before it's delivered — normalize values, apply business rules, enrich fields, or filter results.
60+ languages
Extract data from documents in any language. The AI understands context and field meaning — not just character patterns — so accuracy holds across languages.
How the API works
Define your extraction schema
Describe the fields you want to extract in plain English — field name, type, and a brief description. No templates, no training data. Airparser uses this to instruct the AI on what to extract and how to format it.
Upload documents via API or email
POST documents to /api/v1/inboxes/:id/documents via multipart upload or base64. Or forward emails to your Airparser inbox address — attachments are parsed automatically.
Receive structured JSON
Results are delivered to your webhook URL as a JSON payload matching your schema. Or poll the document endpoint. Or export to Google Sheets, Airtable, HubSpot, or any Zapier-connected app.
Supported document types
Every format, automatic engine selection
Native & scanned
Body & attachments
Images
JPG, PNG, TIFF
Word / DOCX
All versions
CSV / Excel
Tables & sheets
HTML
Web pages & emails
Handwritten
Via Vision LLM
Plain text
TXT, RTF, MD
Common API use cases
Invoice processing pipeline
Resume / CV screening
Lead capture from forms and emails
Shipping & logistics documents
AI agent document tool (via MCP)
Start extracting in minutes
Free trial — 30 documents included. No credit card required.
API keys available immediately after signup.