Back to Blog
    Business Strategy

    Are Your Staff Leaking Business Data Into ChatGPT? An Australian Risk Guide

    Apr 22, 2026By Solve8 Team12 min read

    Illustration of data flowing from a business laptop into an external AI model, representing accidental data leakage

    Your Sales Manager Just Pasted a Client Contract Into ChatGPT

    Right now, somewhere in your business, a well-meaning staff member is pasting something sensitive into a free AI tool. A draft contract. A spreadsheet of customer emails. A supplier price list. A board paper. They are trying to get work done faster, and on the surface, it looks harmless.

    The problem is that on the default settings of many consumer AI tools, that input may be retained, reviewed by humans for quality checks, and in some cases used to improve future models. For a midsize Australian business, that is not just a productivity question. It is a data governance, contractual, and Privacy Act question.

    This guide explains, in plain language, how the leak happens, what the real risk is under Australian law, how consumer and enterprise AI tiers actually differ, and what a practical control framework looks like for a business of 50 to 500 employees.

    The core problem: Most "free" AI tools are not free. You are paying with your inputs. The question is whether the inputs you are paying with belong to you or to your customers.


    How the Leak Actually Happens

    The typical leakage pattern has nothing to do with hackers or sophisticated attacks. It is simply staff using the wrong tier of tool for work data.

    How Business Data Ends Up in a Public Model

    Staff has a task
    Summarise a contract, draft an email, clean up a spreadsheet
    Pastes into free tool
    ChatGPT Free, Gemini Free, Copilot consumer tier
    Input is retained
    Default settings retain conversations, often offshore
    Used for model improvement
    Depending on tier and settings, content may train future models
    Data is no longer yours
    You cannot pull it back, and you may not know it left

    In 2023, Samsung publicly confirmed that engineers had pasted sensitive source code and internal meeting notes into ChatGPT while trying to debug and summarise work. The incident was widely reported by Bloomberg and other outlets, and Samsung responded by restricting generative AI use on corporate devices. The technical failure was not a breach of OpenAI. It was a policy failure at Samsung. Staff used a consumer tier tool for confidential work.

    That same pattern plays out quietly across Australian midsize businesses every week. The difference is that Samsung had the scale to detect it and publish it. Most businesses do not.


    What the Australian Privacy Act Actually Says

    Under the Privacy Act 1988 and the Australian Privacy Principles (APPs), if your business handles personal information, you have specific obligations around how it is stored, disclosed, and sent overseas. The Office of the Australian Information Commissioner (OAIC) has published guidance in 2024 and 2025 specifically addressing generative AI.

    The short version: pasting personal information into an offshore AI service can constitute a cross-border disclosure under APP 8. If that service can then use the data to train a model, you may also be breaching APP 6 (use and disclosure) and APP 11 (security of personal information). If a breach causes serious harm, it is notifiable under the Notifiable Data Breaches (NDB) scheme, with fines for serious or repeated interference now up to 50 million dollars or 30 percent of adjusted turnover under the 2022 amendments.

    The risk is not theoretical. It is a direct, already-legislated exposure. And "we didn't know the tool kept the data" is not a defence.

    Privacy PrincipleTypical AI Tool Risk
    APP 6 (Use and disclosure)Data pasted into AI may be used beyond original purpose
    APP 8 (Cross-border disclosure)Most consumer AI tools process data in the US or EU
    APP 11 (Security)Retention on third-party servers outside your control
    NDB SchemeSerious breaches must be notified to OAIC and affected individuals

    Consumer Tier vs Enterprise Tier: The Real Differences

    This is where most businesses get it wrong. They either assume all AI tools are dangerous, or they assume the paid version fixes everything. Neither is true. The picture is more specific.

    Major AI vendors now offer clearly separated consumer and enterprise tiers with different data handling commitments. These are public, published policies, and they matter.

    Consumer vs Enterprise AI Tier Data Handling (as published by vendors)

    Metric
    Consumer Tier (Free / Personal)
    Enterprise Tier (Business / Work)
    Improvement
    Default training useMay be used for training (opt-out available)Not used for training by defaultContractual
    Data retentionConversations retained indefinitely in accountConfigurable retention, often 30 daysControllable
    Admin controlsNone, per-user settings onlyCentralised admin, SSO, audit logsGovernable
    Data residencyUsually US-hosted, no choiceRegional options available on some plansSelectable
    Contractual protectionsConsumer T&Cs onlyEnterprise DPA, indemnities, SOC 2 reportsEnforceable

    The enterprise tiers of ChatGPT (Enterprise and Team), Microsoft 365 Copilot, Gemini for Google Workspace, and Claude for Work all publish commitments that inputs and outputs will not be used to train their foundation models. That is a real, contractual difference, and for many midsize businesses it is enough.

    What it does not solve is data residency. Most of these enterprise tiers still process data in the US or EU by default. For businesses working with highly regulated customers (government, health, defence, critical infrastructure), that offshore processing can still be a problem even when training is off the table.


    Where Data Sovereignty Comes In

    For workloads where offshore processing is not acceptable, there are three broad technical options, each with different trade-offs.

    Choosing the Right AI Approach for Your Data

    What type of data will the AI process?
    Public / marketing content, no PII
    → Enterprise consumer tools are fine (ChatGPT Enterprise, Copilot)
    Customer PII, financial data, PII at scale
    → Cloud AI in AU region (Azure OpenAI AU, Claude via AU-hosted API)
    Regulated data (health, government, defence, IP)
    → On-premise or sovereign cloud (Ollama, self-hosted, private tenancy)
    You genuinely do not know yet
    → Do a data classification audit first

    Option 1: Enterprise cloud AI, offshore processing. Fast to deploy, well supported. Suitable for general productivity work with low sensitivity data.

    Option 2: Enterprise cloud AI with Australian region processing. Microsoft offers Azure OpenAI in Australia East. Anthropic offers Claude via enterprise channels with regional hosting. AWS Bedrock offers models in Sydney. These give you strong models without the data leaving Australia.

    Option 3: On-premise or self-hosted open models. Ollama, LM Studio, or private deployments of open-weight models (Llama, Mistral, Qwen) running inside your own infrastructure. No data ever leaves your environment. The trade-off is lower capability on the frontier tasks, plus infrastructure and maintenance cost.

    For a deeper comparison of how these options stack up in practice, see our guide OpenAI vs Claude vs Ollama for Australian business and our Data Sovereignty Australia guide.

    In regulated industries, this is not a new problem. Working across enterprise data platforms in mining, resources, and energy (the kind of environment where BHP, Rio Tinto, and Senex Energy operate), the governance pattern is always the same: classify the data first, then pick the tool. Generative AI does not change that pattern. It just makes it urgent, because the tooling is now in every staff member's browser.


    The Practical Control Framework

    Here is the sequence that actually works for a midsize Australian business. It is not theoretical. It is what a sensible rollout looks like.

    AI Governance Rollout for a Midsize Business

    1
    Week 1
    Shadow AI audit
    Survey staff, check DNS logs, identify which AI tools are already in use
    2
    Week 2
    Data classification
    Agree what is public, internal, confidential, and restricted
    3
    Week 3
    Acceptable use policy
    Publish a clear, one-page policy tied to the classifications
    4
    Week 4-5
    Provision enterprise tools
    Roll out licensed enterprise AI so staff have a sanctioned path
    5
    Week 6
    Network and endpoint controls
    Block consumer tiers where appropriate, enable logging
    6
    Week 7-8
    Staff training
    Train staff on what they can and cannot paste, and where

    1. Run a Shadow AI Audit

    Most leaders significantly underestimate how much AI is already being used in their business. Before any policy, find out what is actually happening. Survey staff directly (anonymously is fine), check DNS or proxy logs for traffic to openai.com, anthropic.com, gemini.google.com, and similar domains, and check browser extensions installed on corporate devices.

    2. Classify Your Data First

    You cannot write a sensible policy without knowing what you are protecting. Four buckets is usually enough:

    • Public. Marketing content, published reports, public web pages.
    • Internal. Internal documents that would be awkward but not damaging if leaked.
    • Confidential. Customer data, financial data, contracts, PII.
    • Restricted. Regulated data (health, government, defence), IP that is core to the business.

    3. Publish a One-Page Acceptable Use Policy

    Tie the policy to the classifications. Public and internal data can go into sanctioned enterprise AI tools. Confidential data only goes into sanctioned AI tools with contractual training-off commitments. Restricted data does not go into any third-party AI, only sovereign or on-premise tools.

    Keep it one page. If it is longer than one page, staff will not read it.

    4. Provide a Sanctioned Path

    Blocking tools without providing alternatives does not stop shadow AI. It just pushes it to personal devices, which is worse. Provision an enterprise tier tool (ChatGPT Enterprise, Microsoft 365 Copilot, Gemini for Workspace, or Claude for Work) so staff have a compliant, fast path for their work.

    5. Add Network and Endpoint Controls

    Once the sanctioned path exists, you can reasonably block consumer tiers at the network layer. Most secure web gateways (Zscaler, Netskope, Cloudflare Gateway, and even simpler options) can block specific domains or categories. Consider blocking browser extensions that paste data to AI services unless explicitly approved.

    6. Train Staff

    The single highest-return control is a 30-minute training session showing staff exactly what they can paste, what they cannot, and where. Most staff are not trying to cause a breach. They just do not know where the lines are.


    What This Looks Like in Practice

    Scenario: a 180-person professional services firm. They discover that 70 percent of staff have used ChatGPT Free at work in the last month, about 15 percent have pasted client data into it, and there is no policy in place. A typical remediation path runs six to eight weeks: audit, classify, policy, deploy Microsoft 365 Copilot for general work, deploy Azure OpenAI in Australia East for workloads touching client PII, block consumer tiers, train staff. The ongoing licence cost is real, but so is the pre-existing exposure.

    What Responsible AI Governance Actually Costs

    Enterprise AI licences (180 users, blended)~$60,000/yr
    Azure OpenAI AU processing (usage-based, indicative)~$15,000/yr
    One-off setup, policy, training~$25,000
    Cost of one notifiable data breach (industry average, OAIC data)$3M+

    The economics are clear. The annual cost of getting this right across a 180-person firm is typically less than the cost of a single moderate breach, and far less than the cost of a serious one.


    Getting Started

    The hardest part of AI governance is not technical. It is deciding to treat AI as a governed capability rather than a personal productivity tool staff install on the side.

    A sensible starting sequence for a midsize business:

    1. Run a shadow AI audit this month.
    2. Draft a one-page acceptable use policy within four weeks.
    3. Decide which enterprise tier (or tiers) match your data classifications.
    4. Block consumer tiers only after the sanctioned path is live.
    5. Review quarterly, because the vendor landscape changes fast.

    For a broader view of how AI governance fits into a wider strategy, see our AI Strategy service and our related post on AI agent governance, data access, privacy, and human override. For workflows where you want to replace risky copy-and-paste habits with proper integrated automation, see our Process Automation service.

    If you would like help auditing your current AI exposure, classifying data, or planning the right enterprise and sovereign stack for your business, book a 30 minute call. We work with Australian midsize businesses on exactly this problem, and the first call is a conversation, not a pitch.


    Related Reading:

    Sources: Research synthesised from the Office of the Australian Information Commissioner (OAIC) generative AI guidance (2024), the Privacy Act 1988 and 2022 amendments, published enterprise data handling policies from OpenAI, Microsoft, Google, and Anthropic, and the publicly reported Samsung ChatGPT incident (Bloomberg, 2023).