Python

How to Use Python Post-Processing in Airparser

A hands-on guide to Airparser's built-in Python post-processing sandbox: what you can do, what's restricted, and copy-paste code for normalizing dates, cleaning currency, processing line items, and more.

Camille H.

Jun 2, 2026 — 8 min read

Last updated: 2026-06-02

TL;DR: Airparser has a built-in Python post-processing editor that runs after extraction and before delivery. It runs in a restricted sandbox — only re, decimal, and datetime are available, no external imports, no classes, no while loops. This tutorial explains exactly what you can do, with copy-paste code for the most common tasks: normalizing dates, cleaning currency, processing line items, and conditional field logic.

What Is Post-Processing in Airparser?

After Airparser extracts structured data from a document, you can run a Python script on that data before it gets sent to your webhook, Zapier, Make, Google Sheets, or any other destination. This script runs inside Airparser — you write it directly in the parser settings, no server or infrastructure needed.

Post-processing is the right tool for things like:

Reformatting dates to a consistent ISO format
Stripping currency symbols from price fields and converting to numbers
Creating derived fields (full_name from first_name + last_name)
Removing fields you don't want delivered downstream
Calculating totals or summaries across line items
Preventing certain documents from being exported at all

To find the post-processing editor: open your parser, go to Post-processing in the left sidebar. The editor shows your extracted data on the left and accepts Python code on the right. Press Ctrl+S to save and run against the sample document.

The Sandbox: What's Available and What Isn't

Airparser's post-processing runs in a restricted Python environment powered by RestrictedPython. This means standard Python — but with a specific set of constraints you need to know upfront.

Available Libraries (pre-imported, no import needed)

re — regular expressions
decimal — precise decimal arithmetic (Decimal class)
datetime — date and time parsing (datetime class)

These are already available without any import statement. You cannot import anything else — no requests, no json, no os, no third-party packages.

Available Built-in Functions

str(), int(), float(), len(), abs(), max(), min(), pow(), range(), filter(), enumerate() — plus the special helpers json_loads() and json_dumps() for working with JSON strings.

Restrictions

No while loops — use for loops with range() instead
No print() — the function is not available in the sandbox
No classes — you cannot define classes or use dataclasses
No variables starting with underscores — _x will cause an error
No additional imports — only the three pre-imported libraries work

Everything else — for loops, if/elif/else, list comprehensions, string methods, dictionary operations, arithmetic — works exactly as you'd expect in Python.

How Data Flows: The `data` Dictionary

Your extracted fields are available as the data dictionary. Each key is a field name from your parser schema; each value is what Airparser extracted from the document.

For an invoice parser, data might look like this when your script runs:

{
  "vendor_name": "Acme Supplies Ltd",
  "invoice_number": "INV-2026-00441",
  "invoice_date": "17/05/2026",
  "due_date": "16/06/2026",
  "total": "\u20ac2,808.00",
  "line_items": [
    {"description": "Widget A", "quantity": 100, "unit_price": "\u20ac18.50"},
    {"description": "Widget B", "quantity": 20,  "unit_price": "\u20ac24.50"}
  ]
}

Your script modifies data however you need, then returns it. Whatever you return is what gets delivered to your integrations.

# Minimal post-processing script: return data unchanged
return data

If you return None instead, the document is suppressed — nothing gets delivered downstream. This is useful for filtering out documents you don't want to export.

Basic Field Operations

Create a Field

# Add a new field with a fixed value
data['status'] = 'pending_review'

# Add a field with a default if the extracted value is missing
data['currency'] = data.get('currency', 'USD')

Delete a Field

# Remove a field before delivery
del data['internal_notes']

Rename a Field

# Rename 'email' to 'customer_email'
data['customer_email'] = data.pop('email')

Merge Fields Into One

# Combine first and last name
data['full_name'] = data['first_name'] + ' ' + data['last_name']

# f-string syntax also works
data['full_name'] = f"{data['first_name']} {data['last_name']}"

Check If a Field Exists

# Conditional based on field presence
if 'discount' in data:
    data['has_discount'] = True
else:
    data['has_discount'] = False

# Using .get() with a default
data['tax_rate'] = data.get('tax_rate', '0%')

Normalizing Dates

Documents arrive with dates in many formats: "17/05/2026", "May 17, 2026", "2026-05-17". Here's a normalizer that handles the most common patterns and converts everything to ISO 8601 (YYYY-MM-DD):

def normalize_date(value):
    if not value:
        return None
    value = str(value).strip()
    formats = [
        '%d/%m/%Y',    # 17/05/2026
        '%m/%d/%Y',    # 05/17/2026
        '%Y-%m-%d',    # 2026-05-17
        '%d-%m-%Y',    # 17-05-2026
        '%d %B %Y',    # 17 May 2026
        '%B %d, %Y',   # May 17, 2026
        '%d-%b-%Y',    # 17-May-2026
        '%d-%b-%y',    # 17-May-26
        '%d.%m.%Y',    # 17.05.2026
    ]
    for fmt in formats:
        try:
            return datetime.strptime(value, fmt).strftime('%Y-%m-%d')
        except ValueError:
            continue
    return value  # Return original if no format matched

data['invoice_date'] = normalize_date(data.get('invoice_date'))
data['due_date'] = normalize_date(data.get('due_date'))

return data

The datetime class is pre-imported, so datetime.strptime() works directly — no import needed. The function tries each format in order and returns the original string if none match, so you never lose data.

Cleaning Numbers and Currency

Extracted currency values often include symbols and separators: "€2,808.00", "$1,234", "1.234,56" (European convention). Use re and Decimal (both pre-imported) to clean them:

def clean_currency(value):
    if value is None:
        return None
    cleaned = re.sub(r'[\u20ac$\u00a3\u00a5\u20b9A-Za-z\s]', '', str(value)).strip()
    # Detect European decimal convention: 1.234,56
    if re.search(r'\d{1,3}(\.\d{3})+,\d{2}$', cleaned):
        cleaned = cleaned.replace('.', '').replace(',', '.')
    else:
        cleaned = cleaned.replace(',', '')
    try:
        return str(Decimal(cleaned))
    except Exception:
        return value  # Return original if parsing fails

data['total'] = clean_currency(data.get('total'))
data['subtotal'] = clean_currency(data.get('subtotal'))
data['tax'] = clean_currency(data.get('tax'))

return data

The function returns a string representation of the Decimal — clean, no floating-point errors, and safe to pass to downstream systems. Note: Decimal is available directly (from the pre-imported decimal module) without any import statement.

For a simple integer or float field, int() and float() are available as built-ins:

# Coerce quantity to integer
if data.get('quantity'):
    try:
        data['quantity'] = int(str(data['quantity']).strip())
    except Exception:
        pass  # Leave as-is if conversion fails

return data

Conditional Logic

You can branch on any field value to apply different transformations or route documents:

# Tag document by total amount
total_raw = data.get('total', '0')
try:
    total = Decimal(re.sub(r'[^\d.]', '', str(total_raw)))
except Exception:
    total = Decimal('0')

if total >= Decimal('10000'):
    data['approval_tier'] = 'requires_manager_approval'
elif total >= Decimal('1000'):
    data['approval_tier'] = 'standard'
else:
    data['approval_tier'] = 'auto_approve'

return data

Preventing Export Based on Conditions

Return None to suppress a document entirely — nothing will be sent to webhooks or integrations:

# Don't export documents marked as cancelled
if data.get('status', '').lower() == 'cancelled':
    return None

# Don't export email attachments (only parse the email body)
if data.get('_content_type_') == 'message/rfc822':
    return None

return data

Processing Line Items With For Loops

When your parser extracts a table (like invoice line items), data contains a list of objects. Use for loops to process each row. while loops are not supported — use for with range() or enumerate() instead.

Calculate Line Totals and an Order Total

order_total = Decimal('0')

for item in data.get('line_items', []):
    qty_raw = item.get('quantity', 0)
    price_raw = item.get('unit_price', '0')

    try:
        qty = int(str(qty_raw).strip())
    except Exception:
        qty = 0

    price_str = re.sub(r'[^\d.,]', '', str(price_raw))
    # Handle European decimal convention
    if re.search(r'\d{1,3}(\.\d{3})+,\d{2}$', price_str):
        price_str = price_str.replace('.', '').replace(',', '.')
    else:
        price_str = price_str.replace(',', '')

    try:
        price = Decimal(price_str)
    except Exception:
        price = Decimal('0')

    item['total'] = str(price * qty)
    item['formatted_total'] = f"{price * qty:.2f}"
    order_total = order_total + (price * qty)

data['order_total'] = str(order_total)
data['order_total_formatted'] = f"{order_total:.2f}"

return data

Modify Items Using enumerate()

# Add a sequential line number to each item
for index, item in enumerate(data.get('line_items', [])):
    item['line_number'] = index + 1

return data

Filter Out Rows You Don't Need

# Keep only line items that have a description
cleaned_items = []
for item in data.get('line_items', []):
    if item.get('description', '').strip():
        cleaned_items.append(item)
data['line_items'] = cleaned_items

return data

Iterating Over All Fields

In some cases you want to apply a transformation to every field — for example, stripping whitespace from all string values:

for field_name, field_value in data.items():
    if isinstance(field_value, str):
        data[field_name] = field_value.strip()

return data

Working With JSON Strings

If one of your extracted fields contains a JSON string, use the built-in json_loads() and json_dumps() helpers (no import needed):

# Parse a JSON string field into a dict
if data.get('metadata_json'):
    parsed = json_loads(data['metadata_json'])
    data['account_id'] = parsed.get('account_id')
    del data['metadata_json']

return data

Putting It Together: A Complete Invoice Example

Here's a single post-processing script that combines date normalization, currency cleaning, line item processing, and a required-field check — all within the sandbox constraints:

def normalize_date(value):
    if not value:
        return None
    value = str(value).strip()
    formats = [
        '%d/%m/%Y', '%m/%d/%Y', '%Y-%m-%d',
        '%d-%m-%Y', '%d %B %Y', '%B %d, %Y',
        '%d-%b-%Y', '%d-%b-%y', '%d.%m.%Y',
    ]
    for fmt in formats:
        try:
            return datetime.strptime(value, fmt).strftime('%Y-%m-%d')
        except ValueError:
            continue
    return value

def clean_amount(value):
    if value is None:
        return None
    cleaned = re.sub(r'[^\d.,]', '', str(value)).strip()
    if re.search(r'\d{1,3}(\.\d{3})+,\d{2}$', cleaned):
        cleaned = cleaned.replace('.', '').replace(',', '.')
    else:
        cleaned = cleaned.replace(',', '')
    try:
        return str(Decimal(cleaned))
    except Exception:
        return None

# --- Normalize dates ---
data['invoice_date'] = normalize_date(data.get('invoice_date'))
data['due_date'] = normalize_date(data.get('due_date'))

# --- Clean monetary fields ---
data['subtotal'] = clean_amount(data.get('subtotal'))
data['tax'] = clean_amount(data.get('tax'))
data['total'] = clean_amount(data.get('total'))

# --- Process line items ---
order_total = Decimal('0')
for item in data.get('line_items', []):
    price_cleaned = clean_amount(item.get('unit_price'))
    try:
        qty = int(str(item.get('quantity', 0)).strip())
        price = Decimal(price_cleaned) if price_cleaned else Decimal('0')
    except Exception:
        qty = 0
        price = Decimal('0')
    item['line_total'] = str(price * qty)
    order_total = order_total + (price * qty)

data['calculated_total'] = str(order_total)

# --- Suppress if required fields are missing ---
required = ['vendor_name', 'invoice_number', 'invoice_date', 'total']
for field in required:
    if not data.get(field):
        return None  # Suppress document, send nothing downstream

return data

This script is safe to paste into any invoice parser. It never crashes on bad input — every conversion is wrapped in a try/except — and it returns None only when truly required fields are absent.

Editor Shortcuts

Inside the post-processing editor:

Ctrl+S — save and run against the sample document
Ctrl+/ — toggle line comment
Ctrl+Z — undo

Frequently Asked Questions

Why can't I import additional Python libraries?

Post-processing runs in a sandboxed environment. Allowing arbitrary imports would mean allowing arbitrary code execution — including network calls, file access, or anything else a Python package might do. The sandbox gives you re, decimal, and datetime, which cover the vast majority of field-level transformation needs. For tasks that genuinely require additional libraries (database writes, API calls, complex business logic), use Airparser's webhook delivery to send the clean extracted data to your own server, where you run unrestricted Python.

Why are `while` loops not supported?

Infinite loops would hang the sandbox indefinitely. for loops are bounded by the collection they iterate over, so they're safe. In practice you can accomplish everything a while loop would do using for with range():

# Simulate a while loop with a for loop and break
for attempt in range(10):
    if data.get('vendor_name'):
        break
    data['vendor_name'] = data.get('supplier_name', '')

How do I debug my post-processing code without `print()`?

The editor runs your script against the sample document and shows the resulting data object on the left side. Add temporary fields to capture intermediate values:

# Temporary debug field — remove before saving final version
data['debug_date_parsed'] = normalize_date(data.get('invoice_date'))

Check the output panel to see what the field resolved to, then remove the debug field once you're satisfied.

What's the difference between returning `data` and returning `None`?

Returning data (or a modified copy) passes the document through to all configured integrations — webhooks, Zapier, Make, Google Sheets, etc. Returning None suppresses the document entirely: nothing is delivered, no webhook is fired. Use return None when you want to filter out documents that don't meet certain criteria — for example, emails you're not interested in, documents where required fields are missing, or test uploads you don't want in your downstream systems.

I want to use a dataclass or Pydantic model — can I?

No — class definitions are not supported in the sandbox. Structure your logic using plain dictionaries and functions instead. Everything you can do with a dataclass can be done with a dict:

# Instead of a class, build a plain dict
result = {
    'vendor_name': str(data.get('vendor_name', '')).strip(),
    'invoice_date': normalize_date(data.get('invoice_date')),
    'total': clean_amount(data.get('total')),
}
return result

Can I call an external API or look up a database from post-processing?

No — the sandbox has no network access. Post-processing is designed for field-level transformations on the data that's already been extracted. If you need to enrich extracted data with external lookups (vendor IDs from your ERP, account codes from your chart of accounts), the right pattern is: let Airparser deliver the clean extracted fields to your webhook, then do the enrichment in your own server-side code before writing to your database.

Why do I get an error when I use a variable like `_temp`?

Variable names starting with underscores are reserved by the RestrictedPython runtime. Rename your variable — use temp, buf, or any name that doesn't start with _.

How to Use Python Post-Processing in Airparser

Camille H.

What Is Post-Processing in Airparser?

The Sandbox: What's Available and What Isn't

Available Libraries (pre-imported, no import needed)

Available Built-in Functions

Restrictions

How Data Flows: The `data` Dictionary

Basic Field Operations

Create a Field

Delete a Field

Rename a Field

Merge Fields Into One

Check If a Field Exists

Normalizing Dates

Cleaning Numbers and Currency

Conditional Logic

Preventing Export Based on Conditions

Processing Line Items With For Loops

Calculate Line Totals and an Order Total

Modify Items Using enumerate()

Filter Out Rows You Don't Need

Iterating Over All Fields

Working With JSON Strings

Putting It Together: A Complete Invoice Example

Editor Shortcuts

Frequently Asked Questions

Why can't I import additional Python libraries?

Why are `while` loops not supported?

How do I debug my post-processing code without `print()`?

What's the difference between returning `data` and returning `None`?

I want to use a dataclass or Pydantic model — can I?

Can I call an external API or look up a database from post-processing?

Why do I get an error when I use a variable like `_temp`?

Read more

Structured, Semi-Structured, and Unstructured Documents: A Practical Guide to Data Extraction

How to Extract Data from Shopify Order Confirmation Emails Automatically

Best Document Parsing Tools for Property Management Teams in 2026

Best Document Parsing Tools for Accounts Payable Teams in 2026

What Is Post-Processing in Airparser?

The Sandbox: What's Available and What Isn't

Available Libraries (pre-imported, no import needed)

Available Built-in Functions

Restrictions

How Data Flows: The data Dictionary

Basic Field Operations

Create a Field

Delete a Field

Rename a Field

Merge Fields Into One

Check If a Field Exists

Normalizing Dates

Cleaning Numbers and Currency

Conditional Logic

Preventing Export Based on Conditions

Processing Line Items With For Loops

Calculate Line Totals and an Order Total

Modify Items Using enumerate()

Filter Out Rows You Don't Need

Iterating Over All Fields

Working With JSON Strings

Putting It Together: A Complete Invoice Example

Editor Shortcuts

Frequently Asked Questions

Why can't I import additional Python libraries?

Why are while loops not supported?

How do I debug my post-processing code without print()?

What's the difference between returning data and returning None?

I want to use a dataclass or Pydantic model — can I?

Can I call an external API or look up a database from post-processing?

Why do I get an error when I use a variable like _temp?

Read more

Structured, Semi-Structured, and Unstructured Documents: A Practical Guide to Data Extraction

How to Extract Data from Shopify Order Confirmation Emails Automatically

Best Document Parsing Tools for Property Management Teams in 2026

Best Document Parsing Tools for Accounts Payable Teams in 2026

How Data Flows: The `data` Dictionary

Why are `while` loops not supported?

How do I debug my post-processing code without `print()`?

What's the difference between returning `data` and returning `None`?

Why do I get an error when I use a variable like `_temp`?