
MIT's NANDA initiative found that roughly 95% of generative AI pilots at enterprises have no measurable impact on profit and loss (MIT, "The GenAI Divide: State of AI in Business 2025"). Gartner predicted that 30% of generative AI projects would be abandoned after proof of concept by the end of 2025, citing poor data quality, escalating costs, and unclear business value (Gartner, July 2024). And an S&P Global survey found that 42% of companies abandoned most of their AI initiatives in 2025, up from 17% in 2024.
These numbers are sobering. But they are also instructive.
Most AI pilots do not fail because the technology does not work. They fail because organisations treat them as technology experiments rather than business validation exercises. They scope too broadly, measure the wrong things, and lack executive commitment to act on the results. The MIT research specifically noted that the principal barrier is integration issues, not weaknesses in the underlying AI models.
The 4-week framework below is designed around common patterns identified in research from MIT, Gartner, McKinsey, and Deloitte, as well as practical experience from enterprise data platform programmes at companies like BHP and Rio Tinto. It answers one question: Should we invest further in this AI solution, or pivot to something else?
Four weeks is enough time to validate feasibility and business value without burning through budget or patience.
Each week has specific deliverables, checkpoints, and go/no-go criteria. Skip any of these, and you risk the common failure modes that derail the majority of AI pilots.
Do not start the 4-week clock until you can answer these questions:
| Requirement | Question | Ready? |
|---|---|---|
| Executive Sponsor | Who has authority to approve budget for production? | [ ] |
| Problem Owner | Who lives with this problem daily and will test the solution? | [ ] |
| Data Access | Can you extract 3-6 months of representative data this week? | [ ] |
| Success Definition | Can you define success in one measurable sentence? | [ ] |
| Resource Commitment | Do you have 10-15 hours/week from key stakeholders? | [ ] |
| Budget Clarity | Is there approved budget for production if the POC succeeds? | [ ] |
If you cannot tick all six boxes, spend time on these first. A POC without clear success criteria is a science experiment. A POC without executive sponsorship will stall at the decision point.
Deep Dive: If you are still building your broader AI roadmap, see our step-by-step AI strategy guide before committing to a POC.
Objective: Lock down exactly what you are testing, how you will measure success, and confirm the data is available.
Time commitment: 15-20 hours total (stakeholder time)
Gather the executive sponsor, problem owner, IT representative, and end users. In 2-3 hours, answer:
What specific problem are we solving?
What is this problem costing us today?
What does "good enough" look like?
What are the boundaries?
Write down exactly how you will measure success. Be specific:
Primary Metric:
Secondary Metrics:
Qualitative Criteria:
Go/No-Go Threshold:
This is where most POCs fail before they start. Industry research consistently shows that data preparation consumes 60-80% of AI project timelines, yet organisations routinely underestimate this phase.
Data audit checklist:
| Data Requirement | Status | Notes |
|---|---|---|
| Can we access the source system? | [ ] | |
| Can we extract 3-6 months of data? | [ ] | |
| Is the data labelled (outcomes known)? | [ ] | |
| What is the data format? | ||
| What cleaning is required? | ||
| What edge cases exist? | ||
| Who approves data use for testing? |
Red flags in data audit:
If any of these emerge, pause the POC and resolve data access first. Having worked on data platform programmes at major mining operations, one pattern was clear: the organisations that invested in data readiness before starting AI work consistently achieved better outcomes than those who tried to fix data problems mid-project.
| Deliverable | Description | Owner |
|---|---|---|
| Problem Statement | 1-paragraph definition of the problem | Project Lead |
| Success Criteria | Documented metrics with thresholds | Project Lead + Sponsor |
| Data Inventory | List of data sources, formats, access confirmed | IT + Data Owner |
| Stakeholder Matrix | Who does what during the POC | Project Lead |
| Risk Register | Top 5 risks with mitigation plans | Project Lead |
At the end of Week 1, the sponsor must decide: Proceed or pause.
Proceed if:
Pause if:
Objective: Build a working prototype that processes real data and produces usable output.
Time commitment: 20-30 hours development, 5-10 hours stakeholder involvement
Build the minimum viable solution. Not the "nice to have" solution. The question this week answers is: "Can this technology do what we need at a basic level?"
MVP scope rules:
| Metric | Full Solution | MVP for POC | Improvement |
|---|---|---|---|
| Invoice formats | Multi-format upload | PDF upload only | Focused |
| Matching | Auto-matching to POs | Manual verification | Simpler |
| Integration | Direct Xero/MYOB API | CSV export to import | Faster |
| Notifications | Email + SMS alerts | Dashboard only | Minimal |
| Supplier coverage | 50+ suppliers | 10 representative suppliers | Targeted |
Connect the MVP to real data. This is the integration test, not a production build.
For the POC, manual steps are acceptable. The goal is to prove the AI component works, not to build production automation.
End users need to interact with the solution, even in POC. This does not need to be polished, but it needs to be usable.
Acceptable POC interfaces:
Not acceptable:
| Deliverable | Description | Owner |
|---|---|---|
| Working MVP | Functional prototype on real data | Development Team |
| Data Pipeline | Documented extraction and transformation | Development + IT |
| User Interface | Basic but usable interaction method | Development |
| Initial Results | First batch processed with accuracy noted | Development |
| Issue Log | Known bugs and limitations documented | Development |
Mid-week check-in with sponsor (30 minutes):
Objective: Put the MVP in front of real users, test edge cases, and iterate based on feedback.
Time commitment: 15-20 hours user testing, 15-20 hours development refinement
The problem owner and 2-3 end users test the solution on routine cases.
Testing protocol:
What to capture:
| Example | Time (AI) | Time (Manual) | Accurate? | User Comments |
|---|---|---|---|---|
| INV-001 | 45 sec | 8 min | Yes | Clear output |
| INV-002 | 1 min | 7 min | Yes | Needed clarification |
| INV-003 | Failed | N/A | N/A | Missing supplier |
Now test the scenarios that break things:
Edge case log:
| Edge Case | Result | Severity | Resolution |
|---|---|---|---|
| Missing ABN | Failed silently | High | Add validation message |
| Handwritten notes | 30% accuracy | Medium | Flag for manual review |
| 100 invoices at once | Timeout after 50 | High | Batch processing needed |
| Non-PDF format | Rejected | Medium | Accept PNG/JPG |
Based on testing feedback, make targeted improvements:
Scope discipline: Do not add new features in Week 3. Fix what is broken. Document what needs future work.
| Deliverable | Description | Owner |
|---|---|---|
| User Testing Report | Time savings, accuracy rates, usability scores | Problem Owner |
| Edge Case Analysis | Documented failures with severity ratings | Development |
| Iteration Log | Changes made based on feedback | Development |
| Known Limitations | Documented constraints for production | Development |
| Updated Accuracy | Refined accuracy metrics post-iteration | Development |
End-of-week review with sponsor and problem owner (1 hour):
Objective: Analyse results, make a go/no-go decision, and define next steps.
Time commitment: 10-15 hours analysis and documentation, 2-3 hour decision meeting
Compile all testing data and measure against Week 1 success criteria. A typical results summary might look like this:
Primary Metric:
Secondary Metrics:
Qualitative Assessment:
Based on POC results, project production ROI. Consider a typical mid-market business processing 500 invoices per month:
See our AI ROI Calculator for detailed ROI frameworks tailored to Australian businesses.
Document everything for the decision meeting. The POC Final Report should cover:
Present findings to the sponsor and key stakeholders. This meeting should result in a clear decision.
| Deliverable | Description | Owner |
|---|---|---|
| Final Report | Comprehensive POC documentation | Project Lead |
| ROI Analysis | Financial projection for production | Project Lead + Finance |
| Decision | Documented Go/No-Go/Pivot | Sponsor |
| Production Plan | If Go: timeline, budget, resources | Project Lead |
| Lessons Learned | If Stop/Pivot: what to do differently | Project Lead |
Research from MIT, Gartner, and Deloitte consistently identifies the same patterns that derail AI pilots. Here are six to watch for:
What happens: "While we are at it, can we also add..." Feature requests expand scope until the POC becomes a full project.
How to avoid: Lock scope in Week 1. Any new requests go on a "Phase 2" list. The POC answers one question: Does this core capability work?
What happens: The team spends 3 weeks cleaning data before testing. The POC runs out of time.
How to avoid: Test with 80% clean data. Document data quality issues as production requirements, not POC blockers.
What happens: The executive sponsors the project, IT builds it, but the team lead who will actually use it was consulted once for 30 minutes.
How to avoid: The problem owner must be involved 5-10 hours per week. They test, they provide feedback, they validate results. MIT's research found that empowering line managers -- not just central AI labs -- was a key factor in the 5% of pilots that succeeded.
What happens: "The model achieved 97% accuracy!" But nobody asked if that accuracy translates to business value.
How to avoid: Always measure business outcomes (time saved, errors reduced, revenue impact), not just technical metrics. This is one of the central findings in why AI projects fail.
What happens: POC finishes, the report sits on a desk, no decision is made, and the project drifts.
How to avoid: Schedule the decision meeting before the POC starts. The sponsor must commit to attending and deciding.
What happens: Week 1 reveals data is scattered across 14 systems with no consistent format. The POC stalls.
How to avoid: The data audit in Week 1 is non-negotiable. If data is not accessible, pause the POC clock until it is. This pattern is also explored in our guide on why 70% of AI projects fail in Australia.
A successful POC is not the finish line. It is the starting point for production planning.
| Metric | POC State | Production Requirement | Improvement |
|---|---|---|---|
| Data extraction | Manual export | Automated API integration | Essential |
| Data sources | Single source | Multi-source consolidation | Essential |
| Edge cases | Happy path only | Full edge case handling | Essential |
| Interface | Basic UI | User-friendly interface | High |
| Triggers | Manual triggers | Scheduled automation | High |
| Error handling | Minimal | Comprehensive logging and alerts | Essential |
| Code quality | Prototype code | Production-grade architecture | Essential |
Understanding realistic costs is critical for Australian mid-market businesses. According to Australian AI development consultancies (Dataclysm, 2025; Lanex, 2025), typical investment ranges are:
These are Australian mid-market figures. Enterprise scales higher; smaller deployments using off-the-shelf AI tools can be significantly lower. For a detailed analysis of build vs. buy economics, see our complete TCO guide.
Deloitte Australia reports that only 65% of Australian respondents plan to increase AI investment in the next financial year, nearly 20% lower than the global average. This suggests many Australian organisations are still cautious -- making a well-structured POC even more important for securing ongoing investment (Deloitte, "State of AI in the Enterprise", 2026).
Deloitte's 2026 State of AI report found that while 28% of Australian respondents have moved at least 40% of their AI pilots into production, most have yet to see broad enterprise-wide impact. Over half expect to reach this milestone within the next six months.
For Australian SMBs specifically, the AI strategy challenge is compounded by:
The 4-week framework accounts for these realities by keeping scope tight, requiring real data from Day 1, and building in decision points that prevent open-ended spending.
Ready to run a 4-week POC? Here is your immediate action plan:
This week:
Do not do:
Need guidance on your POC?
We run AI POC engagements for Australian businesses. Fixed scope, fixed timeline, fixed price. At the end of 4 weeks, you know whether to invest further.
No lock-in. No upsell. Just answers.
Related Reading:
Sources: Research synthesised from MIT NANDA initiative "The GenAI Divide: State of AI in Business 2025", Gartner press release on generative AI project abandonment (July 2024), S&P Global AI project survey (2025), Deloitte Australia "State of AI in the Enterprise" (2026), Australian Department of Industry "AI Adoption in Australian Businesses Q1 2025", Dataclysm AI development cost analysis (2025), and HBR "Most AI Initiatives Fail: A 5-Part Framework" (November 2025).