How to Use Python Post-Processing in Airparser
A hands-on guide to Airparser's built-in Python post-processing sandbox: what you can do, what's restricted, and copy-paste code for normalizing dates, cleaning currency, processing line items, and more.
Last updated: 2026-06-02
TL;DR: Airparser has a built-in Python post-processing editor that runs after extraction and before delivery. It runs in a restricted sandbox — only re, decimal, and datetime are available, no external imports, no classes, no while loops. This tutorial explains exactly what you can do, with copy-paste code for the most common tasks: normalizing dates, cleaning currency, processing line items, and conditional field logic.
What Is Post-Processing in Airparser?
After Airparser extracts structured data from a document, you can run a Python script on that data before it gets sent to your webhook, Zapier, Make, Google Sheets, or any other destination. This script runs inside Airparser — you write it directly in the parser settings, no server or infrastructure needed.
Post-processing is the right tool for things like:
- Reformatting dates to a consistent ISO format
- Stripping currency symbols from price fields and converting to numbers
- Creating derived fields (
full_namefromfirst_name+last_name) - Removing fields you don't want delivered downstream
- Calculating totals or summaries across line items
- Preventing certain documents from being exported at all
To find the post-processing editor: open your parser, go to Post-processing in the left sidebar. The editor shows your extracted data on the left and accepts Python code on the right. Press Ctrl+S to save and run against the sample document.
The Sandbox: What's Available and What Isn't
Airparser's post-processing runs in a restricted Python environment powered by RestrictedPython. This means standard Python — but with a specific set of constraints you need to know upfront.
Available Libraries (pre-imported, no import needed)
re— regular expressionsdecimal— precise decimal arithmetic (Decimalclass)datetime— date and time parsing (datetimeclass)
These are already available without any import statement. You cannot import anything else — no requests, no json, no os, no third-party packages.
Available Built-in Functions
str(), int(), float(), len(), abs(), max(), min(), pow(), range(), filter(), enumerate() — plus the special helpers json_loads() and json_dumps() for working with JSON strings.
Restrictions
- No
whileloops — useforloops withrange()instead - No
print()— the function is not available in the sandbox - No classes — you cannot define classes or use dataclasses
- No variables starting with underscores —
_xwill cause an error - No additional imports — only the three pre-imported libraries work
Everything else — for loops, if/elif/else, list comprehensions, string methods, dictionary operations, arithmetic — works exactly as you'd expect in Python.
How Data Flows: The data Dictionary
Your extracted fields are available as the data dictionary. Each key is a field name from your parser schema; each value is what Airparser extracted from the document.
For an invoice parser, data might look like this when your script runs:
{
"vendor_name": "Acme Supplies Ltd",
"invoice_number": "INV-2026-00441",
"invoice_date": "17/05/2026",
"due_date": "16/06/2026",
"total": "\u20ac2,808.00",
"line_items": [
{"description": "Widget A", "quantity": 100, "unit_price": "\u20ac18.50"},
{"description": "Widget B", "quantity": 20, "unit_price": "\u20ac24.50"}
]
}
Your script modifies data however you need, then returns it. Whatever you return is what gets delivered to your integrations.
# Minimal post-processing script: return data unchanged
return data
If you return None instead, the document is suppressed — nothing gets delivered downstream. This is useful for filtering out documents you don't want to export.
Basic Field Operations
Create a Field
# Add a new field with a fixed value
data['status'] = 'pending_review'
# Add a field with a default if the extracted value is missing
data['currency'] = data.get('currency', 'USD')
Delete a Field
# Remove a field before delivery
del data['internal_notes']
Rename a Field
# Rename 'email' to 'customer_email'
data['customer_email'] = data.pop('email')
Merge Fields Into One
# Combine first and last name
data['full_name'] = data['first_name'] + ' ' + data['last_name']
# f-string syntax also works
data['full_name'] = f"{data['first_name']} {data['last_name']}"
Check If a Field Exists
# Conditional based on field presence
if 'discount' in data:
data['has_discount'] = True
else:
data['has_discount'] = False
# Using .get() with a default
data['tax_rate'] = data.get('tax_rate', '0%')
Normalizing Dates
Documents arrive with dates in many formats: "17/05/2026", "May 17, 2026", "2026-05-17". Here's a normalizer that handles the most common patterns and converts everything to ISO 8601 (YYYY-MM-DD):
def normalize_date(value):
if not value:
return None
value = str(value).strip()
formats = [
'%d/%m/%Y', # 17/05/2026
'%m/%d/%Y', # 05/17/2026
'%Y-%m-%d', # 2026-05-17
'%d-%m-%Y', # 17-05-2026
'%d %B %Y', # 17 May 2026
'%B %d, %Y', # May 17, 2026
'%d-%b-%Y', # 17-May-2026
'%d-%b-%y', # 17-May-26
'%d.%m.%Y', # 17.05.2026
]
for fmt in formats:
try:
return datetime.strptime(value, fmt).strftime('%Y-%m-%d')
except ValueError:
continue
return value # Return original if no format matched
data['invoice_date'] = normalize_date(data.get('invoice_date'))
data['due_date'] = normalize_date(data.get('due_date'))
return data
The datetime class is pre-imported, so datetime.strptime() works directly — no import needed. The function tries each format in order and returns the original string if none match, so you never lose data.
Cleaning Numbers and Currency
Extracted currency values often include symbols and separators: "€2,808.00", "$1,234", "1.234,56" (European convention). Use re and Decimal (both pre-imported) to clean them:
def clean_currency(value):
if value is None:
return None
cleaned = re.sub(r'[\u20ac$\u00a3\u00a5\u20b9A-Za-z\s]', '', str(value)).strip()
# Detect European decimal convention: 1.234,56
if re.search(r'\d{1,3}(\.\d{3})+,\d{2}$', cleaned):
cleaned = cleaned.replace('.', '').replace(',', '.')
else:
cleaned = cleaned.replace(',', '')
try:
return str(Decimal(cleaned))
except Exception:
return value # Return original if parsing fails
data['total'] = clean_currency(data.get('total'))
data['subtotal'] = clean_currency(data.get('subtotal'))
data['tax'] = clean_currency(data.get('tax'))
return data
The function returns a string representation of the Decimal — clean, no floating-point errors, and safe to pass to downstream systems. Note: Decimal is available directly (from the pre-imported decimal module) without any import statement.
For a simple integer or float field, int() and float() are available as built-ins:
# Coerce quantity to integer
if data.get('quantity'):
try:
data['quantity'] = int(str(data['quantity']).strip())
except Exception:
pass # Leave as-is if conversion fails
return data
Conditional Logic
You can branch on any field value to apply different transformations or route documents:
# Tag document by total amount
total_raw = data.get('total', '0')
try:
total = Decimal(re.sub(r'[^\d.]', '', str(total_raw)))
except Exception:
total = Decimal('0')
if total >= Decimal('10000'):
data['approval_tier'] = 'requires_manager_approval'
elif total >= Decimal('1000'):
data['approval_tier'] = 'standard'
else:
data['approval_tier'] = 'auto_approve'
return data
Preventing Export Based on Conditions
Return None to suppress a document entirely — nothing will be sent to webhooks or integrations:
# Don't export documents marked as cancelled
if data.get('status', '').lower() == 'cancelled':
return None
# Don't export email attachments (only parse the email body)
if data.get('_content_type_') == 'message/rfc822':
return None
return data
Processing Line Items With For Loops
When your parser extracts a table (like invoice line items), data contains a list of objects. Use for loops to process each row. while loops are not supported — use for with range() or enumerate() instead.
Calculate Line Totals and an Order Total
order_total = Decimal('0')
for item in data.get('line_items', []):
qty_raw = item.get('quantity', 0)
price_raw = item.get('unit_price', '0')
try:
qty = int(str(qty_raw).strip())
except Exception:
qty = 0
price_str = re.sub(r'[^\d.,]', '', str(price_raw))
# Handle European decimal convention
if re.search(r'\d{1,3}(\.\d{3})+,\d{2}$', price_str):
price_str = price_str.replace('.', '').replace(',', '.')
else:
price_str = price_str.replace(',', '')
try:
price = Decimal(price_str)
except Exception:
price = Decimal('0')
item['total'] = str(price * qty)
item['formatted_total'] = f"{price * qty:.2f}"
order_total = order_total + (price * qty)
data['order_total'] = str(order_total)
data['order_total_formatted'] = f"{order_total:.2f}"
return data
Modify Items Using enumerate()
# Add a sequential line number to each item
for index, item in enumerate(data.get('line_items', [])):
item['line_number'] = index + 1
return data
Filter Out Rows You Don't Need
# Keep only line items that have a description
cleaned_items = []
for item in data.get('line_items', []):
if item.get('description', '').strip():
cleaned_items.append(item)
data['line_items'] = cleaned_items
return data
Iterating Over All Fields
In some cases you want to apply a transformation to every field — for example, stripping whitespace from all string values:
for field_name, field_value in data.items():
if isinstance(field_value, str):
data[field_name] = field_value.strip()
return data
Working With JSON Strings
If one of your extracted fields contains a JSON string, use the built-in json_loads() and json_dumps() helpers (no import needed):
# Parse a JSON string field into a dict
if data.get('metadata_json'):
parsed = json_loads(data['metadata_json'])
data['account_id'] = parsed.get('account_id')
del data['metadata_json']
return data
Putting It Together: A Complete Invoice Example
Here's a single post-processing script that combines date normalization, currency cleaning, line item processing, and a required-field check — all within the sandbox constraints:
def normalize_date(value):
if not value:
return None
value = str(value).strip()
formats = [
'%d/%m/%Y', '%m/%d/%Y', '%Y-%m-%d',
'%d-%m-%Y', '%d %B %Y', '%B %d, %Y',
'%d-%b-%Y', '%d-%b-%y', '%d.%m.%Y',
]
for fmt in formats:
try:
return datetime.strptime(value, fmt).strftime('%Y-%m-%d')
except ValueError:
continue
return value
def clean_amount(value):
if value is None:
return None
cleaned = re.sub(r'[^\d.,]', '', str(value)).strip()
if re.search(r'\d{1,3}(\.\d{3})+,\d{2}$', cleaned):
cleaned = cleaned.replace('.', '').replace(',', '.')
else:
cleaned = cleaned.replace(',', '')
try:
return str(Decimal(cleaned))
except Exception:
return None
# --- Normalize dates ---
data['invoice_date'] = normalize_date(data.get('invoice_date'))
data['due_date'] = normalize_date(data.get('due_date'))
# --- Clean monetary fields ---
data['subtotal'] = clean_amount(data.get('subtotal'))
data['tax'] = clean_amount(data.get('tax'))
data['total'] = clean_amount(data.get('total'))
# --- Process line items ---
order_total = Decimal('0')
for item in data.get('line_items', []):
price_cleaned = clean_amount(item.get('unit_price'))
try:
qty = int(str(item.get('quantity', 0)).strip())
price = Decimal(price_cleaned) if price_cleaned else Decimal('0')
except Exception:
qty = 0
price = Decimal('0')
item['line_total'] = str(price * qty)
order_total = order_total + (price * qty)
data['calculated_total'] = str(order_total)
# --- Suppress if required fields are missing ---
required = ['vendor_name', 'invoice_number', 'invoice_date', 'total']
for field in required:
if not data.get(field):
return None # Suppress document, send nothing downstream
return data
This script is safe to paste into any invoice parser. It never crashes on bad input — every conversion is wrapped in a try/except — and it returns None only when truly required fields are absent.
Editor Shortcuts
Inside the post-processing editor:
- Ctrl+S — save and run against the sample document
- Ctrl+/ — toggle line comment
- Ctrl+Z — undo
Frequently Asked Questions
Why can't I import additional Python libraries?
Post-processing runs in a sandboxed environment. Allowing arbitrary imports would mean allowing arbitrary code execution — including network calls, file access, or anything else a Python package might do. The sandbox gives you re, decimal, and datetime, which cover the vast majority of field-level transformation needs. For tasks that genuinely require additional libraries (database writes, API calls, complex business logic), use Airparser's webhook delivery to send the clean extracted data to your own server, where you run unrestricted Python.
Why are while loops not supported?
Infinite loops would hang the sandbox indefinitely. for loops are bounded by the collection they iterate over, so they're safe. In practice you can accomplish everything a while loop would do using for with range():
# Simulate a while loop with a for loop and break
for attempt in range(10):
if data.get('vendor_name'):
break
data['vendor_name'] = data.get('supplier_name', '')
How do I debug my post-processing code without print()?
The editor runs your script against the sample document and shows the resulting data object on the left side. Add temporary fields to capture intermediate values:
# Temporary debug field — remove before saving final version
data['debug_date_parsed'] = normalize_date(data.get('invoice_date'))
Check the output panel to see what the field resolved to, then remove the debug field once you're satisfied.
What's the difference between returning data and returning None?
Returning data (or a modified copy) passes the document through to all configured integrations — webhooks, Zapier, Make, Google Sheets, etc. Returning None suppresses the document entirely: nothing is delivered, no webhook is fired. Use return None when you want to filter out documents that don't meet certain criteria — for example, emails you're not interested in, documents where required fields are missing, or test uploads you don't want in your downstream systems.
I want to use a dataclass or Pydantic model — can I?
No — class definitions are not supported in the sandbox. Structure your logic using plain dictionaries and functions instead. Everything you can do with a dataclass can be done with a dict:
# Instead of a class, build a plain dict
result = {
'vendor_name': str(data.get('vendor_name', '')).strip(),
'invoice_date': normalize_date(data.get('invoice_date')),
'total': clean_amount(data.get('total')),
}
return result
Can I call an external API or look up a database from post-processing?
No — the sandbox has no network access. Post-processing is designed for field-level transformations on the data that's already been extracted. If you need to enrich extracted data with external lookups (vendor IDs from your ERP, account codes from your chart of accounts), the right pattern is: let Airparser deliver the clean extracted fields to your webhook, then do the enrichment in your own server-side code before writing to your database.
Why do I get an error when I use a variable like _temp?
Variable names starting with underscores are reserved by the RestrictedPython runtime. Rename your variable — use temp, buf, or any name that doesn't start with _.