
Consider a typical ESG reporting scenario: a business needs to ingest thousands of utility bills (electricity, gas, water) from hundreds of different providers. Each provider uses a different layout. Traditional OCR templates (like AWS Textract queries) are brittle - they break whenever a layout changes.
Instead of one giant "Extract Everything" prompt, the proven approach breaks the problem down into a chain of specialised agents.
Role: Look at the file and determine: Is this an electricity bill? A gas bill? Or junk mail? Result: Routes the document to the correct specialized extractor.
We use GPT-4o-mini (Vision) to transcribe the document. Unlike standard OCR, it understands tables and column relationships, preserving the semantic structure of the bill.
Role: Extract specific fields (kWh usage, billing period, meter number) into a Zod schema. Constraint: If usage is missing, check the second page.
This agent doesn't look at the document. It looks at the extraction result. "Does the Start Date come before the End Date?" "Do the line items sum up to the total?" If not, it sends the job back to the Extraction Agent with feedback.
Based on industry implementations of multi-agent document processing, this architecture pattern typically delivers:
| Metric | Template-Based OCR | Multi-Agent AI |
|---|---|---|
| Error Rate | 15-25% extraction errors | 2-5% extraction errors |
| New Provider Handling | Developer intervention required | Automatic adaptation |
| Processing Speed | Minutes per document | Seconds per document |
| Layout Change Response | System breaks, needs update | Handles automatically |
| Scalability | Limited by template library | Unlimited document variety |
Specialised agents outperform generalist prompts. Breaking complex extraction into distinct phases - classification, transcription, extraction, validation - produces more reliable results than attempting everything in a single prompt.
This pattern applies beyond ESG reporting to any domain requiring extraction from varied document formats: invoice processing, contract analysis, medical records, and more.
Want to discuss multi-agent architectures for your document processing needs? Book a consultation.
Related Reading:

A detailed technical breakdown of how AI-powered invoice automation integrates with Xero and MYOB. Understand the architecture before you invest.

Practical comparison of ChatGPT, Claude, and Gemini for Australian businesses in 2026. Covers pricing, capabilities, API costs, data privacy, and which AI suits which task.

Chatbots wait for input. Agents take action. Gartner predicts 40% of enterprise apps will include AI agents by end of 2026. Here's why agents are the next evolution.