Multi-Agent AI Systems Architecture

Introduction

A pattern repeats itself across Australian mid-market businesses. A company builds a single AI agent, it works brilliantly for the demo, and then reality sets in. The agent that handled customer inquiries beautifully starts hallucinating when asked about billing. The document processor that extracted data flawlessly chokes when it encounters a new form type.

The uncomfortable truth? A single AI agent will fail you when tasks cross domains, require specialisation, or demand parallel processing.

According to recent industry analysis, organisations using multi-agent architectures achieve 45% faster problem resolution and 60% more accurate outcomes compared to single-agent systems. The AI agents market is projected to grow from $5.25 billion in 2024 to $52.62 billion by 2030, with multi-agent systems representing the fastest-growing segment.

This article is a technical deep dive for decision-makers who need to understand when and how to architect multi-agent systems. It uses a 7-phase document processing architecture (the "Carbonly" pattern) as a concrete example throughout.

1. When Single Agents Fail: The Hard Limits

Before discussing multi-agent systems, you need to understand precisely where single agents break down. Three failure modes appear consistently across deployments.

The Scalability Wall

Single-agent systems become bottlenecked as tasks or data grow. They work on a single thread, limiting them to one task at a time. In environments requiring quick multitasking or high-volume processing, this becomes crippling.

Consider a Melbourne logistics company with a single agent handling shipment tracking queries. At 50 concurrent users, response times are acceptable. At 200 users during peak season, the system collapses to 45-second response times.

The Specialisation Trade-off

Research from Google in late 2024 revealed a critical insight: there is a potential trade-off within single models between strong memorisation (needed for precise tool use) and effective in-context learning (needed for adapting to novel situations). You cannot optimise for both in a single agent.

IBM experts summarise it bluntly: "You are going to hit a limit on what single agents can do, and then you are going to go back to multi-agent collaboration again."

The Context Window Ceiling

Every prompt you send to an LLM has a finite context window. When a single agent must understand billing systems, customer history, product catalogues, and compliance rules simultaneously, you exhaust that window rapidly. The agent loses critical context and starts making errors.

The Rule of Thumb: If your task requires expertise in more than two distinct domains, or if you need to process multiple requests concurrently, a single agent will fail you.

2. The 7-Phase Architecture: A Real Implementation Pattern

Here's a multi-agent architecture that works for complex document processing. The Carbonly pattern emerged from work with a carbon accounting firm processing thousands of supplier invoices.

Phase 1: Intake Agent

Receives documents, classifies type, extracts metadata. Uses a lightweight model (GPT-4o-mini equivalent) for speed.

Phase 2: Validation Agent

Checks document completeness, identifies missing fields, flags anomalies. Specialised prompts for each document type.

Phase 3: Extraction Agent

Deep extraction using a more capable model. OCR integration, table parsing, entity recognition.

Phase 4: Enrichment Agent

Cross-references extracted data with external systems (ABN lookup, supplier databases, pricing catalogues).

Phase 5: Compliance Agent

Checks against business rules, regulatory requirements, approval thresholds.

Phase 6: Review Agent

Confidence scoring, exception flagging, human escalation decisions.

Phase 7: Integration Agent

Formats output for downstream systems (MYOB, Xero, custom ERPs), handles API calls.

Document Input
    |
    v
[Phase 1: Intake Agent] --classify--> [Phase 2: Validation Agent]
    |                                        |
    |                                        v
    |                              [Phase 3: Extraction Agent]
    |                                        |
    |                                        v
    |                              [Phase 4: Enrichment Agent]
    |                                        |
    |                                        v
    |                              [Phase 5: Compliance Agent]
    |                                        |
    |                                        v
    |                              [Phase 6: Review Agent]
    |                                        |
    |                                        v
    |                              [Phase 7: Integration Agent]
    |                                        |
    v                                        v
[Exception Queue] <--escalate--    [Completed Output]

Each agent has a single responsibility, uses a model optimised for its task, and communicates through structured handoffs. The system processes 400% more documents per hour than the single-agent version it replaced.

3. Orchestration Patterns: Choosing the Right Architecture

According to Microsoft's AI architecture guidance, there are five primary orchestration patterns. Choosing the wrong one is the most common mistake.

Sequential Orchestration

Structure: Linear pipeline where each agent processes the previous agent's output.

Best For:

Multistage processes with clear dependencies
Data transformation pipelines (draft, review, polish)
Progressive refinement workflows

Avoid When:

Tasks can be parallelised (you are wasting time)
Early failures propagate downstream
Dynamic routing based on intermediate results needed

The Carbonly architecture uses sequential orchestration for its core flow because document processing has inherent dependencies. You cannot validate what you have not classified.

Concurrent Orchestration

Structure: Multiple agents run simultaneously on the same task, then aggregate results.

Best For:

Tasks benefiting from multiple perspectives
Ensemble reasoning and voting-based decisions
Time-sensitive parallel processing

Example: A stock analysis system runs fundamental, technical, sentiment, and ESG analysis agents concurrently, then aggregates recommendations.

Handoff Orchestration

Structure: Dynamic delegation where agents assess tasks and transfer to specialists.

Best For:

Unpredictable task routing
Multiple-domain problems requiring sequential specialists
Customer support with escalation paths

Critical Warning: Microsoft's guidance specifically recommends limiting group chat patterns to 3 or fewer agents to prevent infinite loops and maintain control.

Hub-and-Spoke vs Mesh

Hub-and-spoke uses a central orchestrator managing all interactions. Predictable, but creates a bottleneck and single point of failure.

Mesh architectures let agents communicate directly. More resilient (agents route around failures), but harder to debug and monitor.

Recommendation: Start with hub-and-spoke for simplicity. Move to hybrid patterns (high-level orchestrators with local mesh networks for tactical execution) only when you have the observability infrastructure to support it.

4. Agent Communication and Context Passing

This is where implementations succeed or fail. The technical details matter enormously.

The Handoff Mechanism

An agentic handoff occurs when one agent directly and dynamically passes control to another after finishing its work. The critical element is context transfer: the receiving agent must have sufficient state to act appropriately.

In technical terms, handoffs involve:

State Packaging: The sending agent packages relevant context (conversation history, extracted data, confidence scores)
Routing Decision: The orchestrator or sending agent determines the next agent
Context Injection: The receiving agent's prompt is constructed with transferred context

# Conceptual handoff structure
handoff_payload = {
    "source_agent": "validation_agent",
    "target_agent": "extraction_agent",
    "context": {
        "document_type": "invoice",
        "confidence": 0.94,
        "validated_fields": ["vendor_name", "date", "total"],
        "flagged_anomalies": []
    },
    "instructions": "Extract line items and payment terms"
}

Communication Protocols

Four major protocols have emerged for agent communication:

Protocol	Purpose	Use Case
MCP (Model Context Protocol)	Tool and context sharing	Agents sharing access to databases, APIs
A2A (Agent-to-Agent)	Direct agent negotiation	Peer-to-peer workflows without central orchestration
ACP (Agent Communication Protocol)	Structured message passing	Enterprise systems with strict data contracts
AG-UI	Agent-user interaction	Handling human-in-the-loop touchpoints

For most Australian mid-market implementations, MCP provides the right balance of standardisation and flexibility.

Context Window Management

Here is what vendors will not tell you: accumulated context across multiple agents can exhaust token budgets rapidly. The Microsoft architecture guide explicitly warns about "growing context windows" leading to "token exhaustion."

Practical Solutions:

Summarise aggressively: Each agent should summarise its work, not pass raw conversation history
Filter relevance: Only pass context the next agent actually needs
Use structured data: JSON payloads are more token-efficient than natural language descriptions

5. Resolving Conflicting Outputs

When multiple agents analyse the same problem, they will sometimes disagree. This is not a bug; it is often a feature. But you need mechanisms to resolve conflicts.

Confidence-Weighted Voting

Each agent provides a confidence score with its output. The system weights votes by confidence.

# Simplified voting mechanism
agent_outputs = [
    {"agent": "fundamental", "recommendation": "buy", "confidence": 0.82},
    {"agent": "technical", "recommendation": "hold", "confidence": 0.71},
    {"agent": "sentiment", "recommendation": "buy", "confidence": 0.68}
]

# Weight by confidence
weighted_vote = calculate_weighted_consensus(agent_outputs)
# Result: "buy" with aggregated confidence 0.74

Hierarchical Override

Designate certain agents as authoritative for specific domains. If the compliance agent flags a risk, it overrides the efficiency recommendations from other agents.

Human Escalation Thresholds

When agent confidence falls below a threshold, or when agents disagree beyond a tolerance level, escalate to human review. This is not failure; it is appropriate system design.

In the Carbonly implementation, documents where agents disagree by more than 20% on extracted values automatically route to a human reviewer. This typically catches approximately 3% of documents and prevents costly errors.

6. Error Handling in Distributed Agent Systems

Errors in multi-agent systems are fundamentally different from traditional software errors. Failures cascade unpredictably because agents develop dynamic, context-dependent relationships.

The Cascade Problem

When one agent fails, the cascade effect propagates unpredictably because other agents develop dependencies on that agent's specific knowledge or decision-making patterns. State synchronisation becomes nearly impossible at scale.

Circuit Breaker Patterns for Agents

Traditional circuit breakers assume stateless services. AI agents violate this assumption. Deploy circuit breakers at the cluster level rather than individual connections:

Agent Cluster A (Intake + Validation)
    |
    [Circuit Breaker - monitors cluster health]
    |
Agent Cluster B (Extraction + Enrichment)
    |
    [Circuit Breaker - monitors cluster health]
    |
Agent Cluster C (Compliance + Review + Integration)

Use adaptive triggers monitoring interaction success rates, response times, and behavioural anomalies rather than fixed thresholds.

Timeout Configuration

A common early mistake: using average response times for timeout configuration. LLM inference varies dramatically. Use 95th percentile response times to capture realistic worst-case behaviour. This prevents premature timeouts and false failure signals.

For GPT-4 class models, typical configurations include:

Simple classification tasks: 15 second timeout
Complex extraction: 45 second timeout
Multi-step reasoning: 90 second timeout

Recovery Sequencing

When systems fail, you cannot simply restart everything. Map explicit dependencies (data flow) and implicit ones (learned coordination patterns). Implement staged recovery:

Restore stateless orchestration layer
Recover state stores from checkpoints
Restart agents in dependency order
Validate restored state before resuming operations
Gradually reintroduce load

7. Monitoring Multi-Agent Systems in Production

Traditional APM tools will not tell you why your multi-agent system is misbehaving. You need specialised observability.

Distributed Tracing for Agents

LangSmith and similar platforms provide nested spans for fine-grained debugging across multi-agent environments. Every agent-level decision and sub-action, including LLM generations, tool calls, and data retrievals, gets captured.

LangSmith demonstrated exceptional efficiency with virtually no measurable overhead, making it suitable for performance-critical production environments. It operates asynchronously and does not add latency to your application.

Key Metrics to Track

Metric	Why It Matters
Per-agent latency (p50, p95, p99)	Identifies bottleneck agents
Handoff success rate	Detects communication failures
Context size per handoff	Warns of token exhaustion
Agent confidence distributions	Catches model degradation
Error rate by agent type	Focuses debugging effort
Human escalation rate	Measures system confidence

Alerting Configuration

LangSmith offers enterprise-grade alerting via PagerDuty and webhooks. Configure alerts for:

Agent response time exceeding 2x baseline
Handoff failure rate above 5%
Human escalation rate above threshold
Circuit breaker activations

For Australian enterprises requiring data sovereignty, LangSmith offers self-hosted deployments on your Kubernetes cluster where data never leaves your environment.

8. Practical Implementation Guidance

Based on implementations across Australian businesses, here is practical guidance.

Start Sequential, Add Complexity Later

Google research found that if a task is sequential and a single agent could perform it accurately at least 45% of the time, using multiple agents actually reduced performance by 39% to 70%. The coordination overhead overwhelms the benefits.

Only introduce multi-agent complexity when you have:

Tasks that genuinely benefit from parallelisation
Clear specialisation requirements across domains
Volume that demands distributed processing

The Three-Agent Rule

Microsoft's architecture guidance recommends limiting agent groups to 3 or fewer to maintain control. Start there. You can always add complexity; removing it is much harder.

Cost Considerations

Multi-agent systems multiply your inference costs. The Carbonly implementation uses:

Cheap models (GPT-4o-mini) for intake, validation, and routing
Capable models (GPT-4o/Claude) for extraction and compliance
Lightweight models for integration and formatting

This tiered approach reduced inference costs by 65% compared to using capable models throughout.

Australian Data Sovereignty

For Australian businesses processing sensitive data:

Use Azure OpenAI Service (Sydney region) or AWS Bedrock (ap-southeast-2)
Anthropic Claude is available via AWS Bedrock in Sydney
Self-host open-source models (Llama, Mistral) for classified environments

Conclusion: The Architecture Decision Framework

Multi-agent AI is not about having more agents; it is about having the right agents, doing the right things, communicating effectively.

Use multi-agent architectures when:

Tasks span multiple domains requiring genuine specialisation
Parallel processing provides meaningful throughput benefits
Workload volume exceeds single-agent capacity
Reliability requirements demand redundancy

Stay with single agents when:

Tasks are sequential and a single agent achieves 45%+ accuracy
Domain expertise requirements are narrow
Simplicity of debugging and monitoring is critical
Volume does not justify coordination overhead

The Carbonly 7-phase architecture works because each agent has clear responsibility, uses an appropriate model, and communicates through well-defined handoffs. The orchestration layer handles failures gracefully. The monitoring infrastructure provides visibility into every decision.

Start small. Measure ruthlessly. Add complexity only when the data demands it.

Ready to evaluate multi-agent architecture for your business? Book a technical consultation with our engineering team. We will assess your specific workflows and recommend whether multi-agent complexity is justified for your use case.

Related Resources:

Sources: Research synthesized from Microsoft Azure AI Agent Design Patterns, Towards Data Science on Agent Handoffs, Galileo Multi-Agent Failure Recovery, LangChain LangSmith Observability, and IBM AI Agents 2025, with Australian enterprise implementation experience.

Multi-Agent AI Systems: When One AI Isn't Enough

Introduction

1. When Single Agents Fail: The Hard Limits

The Scalability Wall

The Specialisation Trade-off

The Context Window Ceiling

2. The 7-Phase Architecture: A Real Implementation Pattern

Phase 1: Intake Agent

Phase 2: Validation Agent

Phase 3: Extraction Agent

Phase 4: Enrichment Agent

Phase 5: Compliance Agent

Phase 6: Review Agent

Phase 7: Integration Agent

3. Orchestration Patterns: Choosing the Right Architecture

Sequential Orchestration

Concurrent Orchestration

Handoff Orchestration

Hub-and-Spoke vs Mesh

4. Agent Communication and Context Passing

The Handoff Mechanism

Communication Protocols

Context Window Management

5. Resolving Conflicting Outputs

Confidence-Weighted Voting

Hierarchical Override

Human Escalation Thresholds

6. Error Handling in Distributed Agent Systems

The Cascade Problem

Circuit Breaker Patterns for Agents

Timeout Configuration

Recovery Sequencing

7. Monitoring Multi-Agent Systems in Production

Distributed Tracing for Agents

Key Metrics to Track

Alerting Configuration

8. Practical Implementation Guidance

Start Sequential, Add Complexity Later

The Three-Agent Rule

Cost Considerations

Australian Data Sovereignty

Conclusion: The Architecture Decision Framework

Related Articles

How to Build Your AI Agent Ecosystem Without Building Seven Separate Systems

Understanding LLMs: A Technical Implementation Guide for Australian CTOs

From Claude Code to Production in 30 Days