REST API · Webhooks · JSON

Document Extraction API

A reliable REST API for extracting structured data from PDFs, emails, and documents. Define your schema once, get consistent JSON output every time — with webhooks, retries, and GDPR compliance built in.

Upload a document, get structured JSON

Request

BASH
curl -X POST https://api.airparser.com/inboxes/INBOX_ID/upload-sync \
  -H "X-API-Key: YOUR_API_KEY" \
  -F "[email protected]"

Response

JSON
{
  "doc_id": "64abc123def456...",
  "parsing_in_progress": false,
  "status": "parsed",
  "name": "invoice.pdf",
  "content_type": "application/pdf",
  "created_at": "2026-03-10T12:00:00.000Z",
  "processed_at": "2026-03-10T12:00:04.321Z",
  "json": {
    "invoice_number": "INV-2024-0042",
    "invoice_date": "2024-03-15",
    "vendor_name": "Acme Supplies Ltd",
    "total_amount": 1284.50,
    "currency": "USD",
    "due_date": "2024-04-14",
    "line_items": [
      {
        "description": "Office supplies",
        "quantity": 3,
        "unit_price": 428.17,
        "total": 1284.50
      }
    ]
  }
}

Built for production, not prototyping

Guaranteed schema

Define your output schema once. Every document returns the exact same JSON structure — no format variations, no parsing surprises in production.

Webhook delivery

Results are pushed to your endpoint the moment extraction completes. Automatic retries with exponential backoff — no polling required.

Multi-engine fallback

Text LLM → Vision LLM → AI OCR. Scanned documents, image-based PDFs, and handwritten text are handled automatically without extra configuration.

GDPR by default

AES-256 encryption, configurable data retention, no training on your data, EU-based processing available. Compliance built in, not bolted on.

Python post-processing

Run custom Python code on extracted data before it's delivered — normalize values, apply business rules, enrich fields, or filter results.

60+ languages

Extract data from documents in any language. The AI understands context and field meaning — not just character patterns — so accuracy holds across languages.

How the API works

1

Define your extraction schema

Describe the fields you want to extract in plain English — field name, type, and a brief description. No templates, no training data. Airparser uses this to instruct the AI on what to extract and how to format it.

2

Upload documents via API or email

POST documents to /api/v1/inboxes/:id/documents via multipart upload or base64. Or forward emails to your Airparser inbox address — attachments are parsed automatically.

3

Receive structured JSON

Results are delivered to your webhook URL as a JSON payload matching your schema. Or poll the document endpoint. Or export to Google Sheets, Airtable, HubSpot, or any Zapier-connected app.

Supported document types

Every format, automatic engine selection

PDF

Native & scanned

Email

Body & attachments

Images

JPG, PNG, TIFF

Word / DOCX

All versions

CSV / Excel

Tables & sheets

HTML

Web pages & emails

Handwritten

Via Vision LLM

Plain text

TXT, RTF, MD

Common API use cases

Invoice processing pipeline
Forward vendor invoices to your Airparser inbox or POST them via API. Extract invoice number, date, vendor, line items, and totals as structured JSON. Deliver to your accounting software via webhook or direct integration with QuickBooks, Xero, or Google Sheets.
Resume / CV screening
Parse incoming CVs and extract candidate name, contact details, skills, work history, and education as structured data. Push to your ATS or CRM via webhook. Works with PDF, DOCX, and plain text resumes in any language.
Lead capture from forms and emails
Extract lead information from inquiry emails, contact form notifications, or uploaded documents. Normalize the data and push structured leads to HubSpot, Salesforce, or your CRM of choice via webhook.
Shipping & logistics documents
Parse bills of lading, packing slips, customs declarations, and delivery confirmations. Extract shipment IDs, origins, destinations, item counts, and status into structured JSON for your logistics platform.
AI agent document tool (via MCP)
Connect Airparser to Claude, Cursor, or any MCP-compatible AI agent as a document extraction tool. Your agent can upload documents and receive structured data without building custom parsing logic — fully GDPR-compliant extraction inside any agentic workflow.

Start extracting in minutes

Free trial — 30 documents included. No credit card required.

API keys available immediately after signup.

API frequently asked questions

Ready to grow your business? This is where you start.