
Consider a typical ESG reporting scenario: a business needs to ingest thousands of utility bills (electricity, gas, water) from hundreds of different providers. Each provider uses a different layout. Traditional OCR templates (like AWS Textract queries) are brittle - they break whenever a layout changes.
Instead of one giant "Extract Everything" prompt, the proven approach breaks the problem down into a chain of specialised agents.
Role: Look at the file and determine: Is this an electricity bill? A gas bill? Or junk mail? Result: Routes the document to the correct specialized extractor.
We use GPT-4o-mini (Vision) to transcribe the document. Unlike standard OCR, it understands tables and column relationships, preserving the semantic structure of the bill.
Role: Extract specific fields (kWh usage, billing period, meter number) into a Zod schema. Constraint: If usage is missing, check the second page.
This agent doesn't look at the document. It looks at the extraction result. "Does the Start Date come before the End Date?" "Do the line items sum up to the total?" If not, it sends the job back to the Extraction Agent with feedback.
Based on industry implementations of multi-agent document processing, this architecture pattern typically delivers:
| Metric | Template-Based OCR | Multi-Agent AI |
|---|---|---|
| Error Rate | 15-25% extraction errors | 2-5% extraction errors |
| New Provider Handling | Developer intervention required | Automatic adaptation |
| Processing Speed | Minutes per document | Seconds per document |
| Layout Change Response | System breaks, needs update | Handles automatically |
| Scalability | Limited by template library | Unlimited document variety |
Specialised agents outperform generalist prompts. Breaking complex extraction into distinct phases - classification, transcription, extraction, validation - produces more reliable results than attempting everything in a single prompt.
This pattern applies beyond ESG reporting to any domain requiring extraction from varied document formats: invoice processing, contract analysis, medical records, and more.
Want to discuss multi-agent architectures for your document processing needs? Book a consultation.
Related Reading: