Automate Report Generation with Python

The Report That Stole 15 Hours Every Week

Consider a typical finance manager at a Melbourne manufacturing company. Every Friday afternoon, she spends three hours pulling numbers from four different systems, copying them into Excel, making charts, converting to PDF, and emailing to the leadership team.

Fifteen hours a month. On a single report.

Here's the thing—according to research from McKinsey, knowledge workers spend 19% of their time searching for and gathering information, and another 14% communicating and collaborating internally. A significant chunk of that is report generation and distribution.

The good news? This is exactly the kind of repetitive, structured task that Python can automate completely. Automated reporting systems work well for accounting firms, logistics companies, and manufacturers across Australia. The pattern is always the same: what takes humans hours takes Python minutes.

This tutorial walks you through building a complete report automation pipeline—from database to inbox—with real code you can adapt to your needs.

What Does "Automated Reporting" Actually Mean?

Before we dive into code, let's understand what we're building in plain English.

Think of it like a robot assistant that does exactly what you do when creating a report—but without getting tired, making typos, or forgetting steps:

Opens your data files (like you opening Excel or your accounting software)
Does the calculations (totals, averages, comparisons)
Creates the charts (bar graphs, pie charts, trend lines)
Formats everything nicely into a professional PDF
Sends it to the right people via email
Does this automatically at the same time every day/week/month

Real-world analogy: Imagine you hired a junior assistant who's incredibly reliable but only knows how to follow exact instructions. You'd write them a detailed checklist: "Every Monday at 7am, open this spreadsheet, calculate these totals, make this chart, save as PDF, email to these people." That's essentially what we're doing—but the "assistant" is a Python script that never calls in sick.

How Automated Reporting Works

Collect Data

Pull from Excel, CSV, or database

Process

Calculate totals & metrics

Visualise

Generate charts & graphs

Create PDF

Format into professional report

Distribute

Email to stakeholders

Collect Data

Pull from Excel, CSV, or database

Process

Calculate totals & metrics

Visualise

Generate charts & graphs

Create PDF

Format into professional report

Distribute

Email to stakeholders

Is This Tutorial Right for You?

Should You Automate Your Reports?

What's your current situation?

I spend 2+ hours weekly on the same report

→ Strong candidate for automation

My reports pull from Excel/CSV files

→ Perfect fit for this tutorial

I need reports from complex ERP systems

→ May need custom integration help

My reports change format every time

→ Focus on standardising first

This tutorial is for you if:

You create the same report repeatedly (weekly, monthly, quarterly)
Your data lives in Excel, CSV files, or a database
You're comfortable following technical instructions (or have someone who is)
You want to understand how automation works, not just buy a tool

You might want professional help if:

Your data is in complex systems (SAP, Oracle, custom ERP)
You need real-time dashboards rather than scheduled reports
You have strict compliance requirements (financial services, healthcare)

What You'll Build

By the end of this tutorial, you'll have a system that:

Pulls data from a database or CSV files
Aggregates and transforms the data into meaningful metrics
Generates visualisations (charts, tables)
Creates a professional PDF using templates
Schedules automated runs (daily, weekly, monthly)
Emails the finished report to stakeholders

What We'll Build Together

Part 1

Data Collection

Connect to your data sources and pull information

Part 2

Chart Generation

Create professional visualisations automatically

Part 3

PDF Creation

Build beautiful, branded reports

Part 4

Email Distribution

Send reports to the right people

Part 5

Scheduling

Make it run automatically on a schedule

Part 1

Data Collection

Connect to your data sources and pull information

Part 2

Chart Generation

Create professional visualisations automatically

Part 3

PDF Creation

Build beautiful, branded reports

Part 4

Email Distribution

Send reports to the right people

Part 5

Scheduling

Make it run automatically on a schedule

Let's get into it.

First, The Basics: What is Python?

If you've never written code before, start here.

Python is a programming language—a way to give instructions to your computer. Think of it like writing a very detailed recipe that your computer follows exactly.

Why Python for reports?

It's readable (looks almost like English)
It's free and runs on any computer
It has thousands of pre-built tools (called "libraries") for working with data

Setting Up Your Computer (Step-by-Step)

Before you can run any code, you need to set up your computer. This takes about 15-20 minutes, and you only do it once.

Step 1: Install Python

What is Python? The software that reads and runs your code.

Download links:

Windows: python.org/downloads/windows - Click "Download Python 3.x.x" (the big yellow button)
Mac: python.org/downloads/macos - Click "Download Python 3.x.x"

Important during installation (Windows):

CHECK the box that says "Add Python to PATH" - this is crucial!
Click "Install Now"

To verify it worked:

Open Command Prompt (Windows) or Terminal (Mac)
Type: python --version
You should see something like: Python 3.12.1

Step 2: Install a Code Editor

What is a code editor? Like Microsoft Word, but for writing code. It highlights syntax and catches errors.

Recommended: Visual Studio Code (Free)

Download: code.visualstudio.com
Works on Windows, Mac, and Linux
Click the big download button, run the installer

Alternative options:

PyCharm Community - More features, slightly steeper learning curve
Sublime Text - Lightweight and fast
Even Notepad works (but no helpful features)

Step 3: Open Your Terminal

What is a terminal? A text-based way to give commands to your computer. You'll use it to install libraries and run your scripts.

How to open it:

Windows: Press Windows key + R, type cmd, press Enter
Mac: Press Cmd + Space, type Terminal, press Enter

Step 4: Create a Project Folder

Create a folder for your report automation project:

# Windows
mkdir C:\reports-automation
cd C:\reports-automation

# Mac/Linux
mkdir ~/reports-automation
cd ~/reports-automation

What you need to get started:

Python installed - Follow steps above
A code editor - VS Code recommended
Your data - An Excel or CSV file with the information you want to report on

How code works (simplified):

# Lines starting with # are comments (notes to yourself, computer ignores them)

# This tells Python to open a file called "sales.csv"
data = open_file("sales.csv")

# This tells Python to add up all the values in the "revenue" column
total = add_up(data, "revenue")

# This tells Python to show you the result
print(total)  # Output: 125000

That's it. You write instructions, Python follows them. The code in this tutorial is more complex, but the principle is the same.

The Technical Stack (Don't Worry, We'll Explain)

Before we write any code, here's what tools we're using and why. Think of these as the "ingredients" in our recipe:

Component	Library	Why This Choice
Data handling	pandas	Industry standard, handles any data source
Visualisation	matplotlib	Reliable, extensive customisation
PDF generation	FPDF2	Lightweight, fast, no external dependencies
Templating	Jinja2	Clean separation of data and presentation
Scheduling	schedule	Simple syntax, good for single-machine deployments
Email	smtplib	Built into Python, works with any SMTP server

In plain English:

pandas = Your data Swiss Army knife. It can read Excel, CSV, databases—pretty much anything with rows and columns
matplotlib = The chart maker. Turns numbers into bar graphs, line charts, pie charts
FPDF2 = The PDF creator. Takes your text and charts and arranges them into a professional document
schedule = The alarm clock. Tells your script "run every Monday at 7am"
smtplib = The postman. Sends your finished report via email

Installation (Step-by-Step)

Step 1: Open your terminal (Command Prompt on Windows, Terminal on Mac)

Step 2: Navigate to your project folder:

# Windows
cd C:\reports-automation

# Mac
cd ~/reports-automation

Step 3: Install all the required libraries with one command:

pip install pandas matplotlib fpdf2 jinja2 schedule python-dotenv

What this command does: It tells Python's package manager (pip) to download and install all these libraries from the internet. It's like installing apps on your phone—you only need to do this once.

You should see: A bunch of text scrolling past, ending with "Successfully installed..." - that means it worked!

If you get an error: Try pip3 instead of pip:

pip3 install pandas matplotlib fpdf2 jinja2 schedule python-dotenv

Official Documentation Links

Want to learn more about each library? Here are the official docs:

Library	What It Does	Documentation
pandas	Data manipulation	pandas.pydata.org
matplotlib	Chart creation	matplotlib.org
FPDF2	PDF generation	py-pdf.github.io/fpdf2
Jinja2	Templates	jinja.palletsprojects.com
schedule	Task scheduling	schedule.readthedocs.io
python-dotenv	Environment variables	pypi.org/project/python-dotenv

FPDF2 hits the sweet spot between simplicity and capability. ReportLab is more powerful but overkill for most business reports. WeasyPrint is great for HTML-to-PDF but requires system dependencies that can be painful on Windows servers.

Step 1: Data Aggregation

Let's start with the foundation—pulling and preparing your data.

What This Step Does (Plain English)

Imagine you're creating a sales report. First, you'd:

Open your sales spreadsheet
Filter it to show only this month's data
Calculate totals (total sales, average order value, etc.)

That's exactly what this code does—automatically. Instead of you clicking through menus and typing formulas, the script handles it in milliseconds.

Here's the pattern:

import pandas as pd
from datetime import datetime, timedelta
import logging

logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s'
)

class ReportDataPipeline:
    """Handles data extraction and transformation for reports."""

    def __init__(self, data_source: str):
        self.data_source = data_source
        self.raw_data = None
        self.processed_data = None

    def load_data(self) -> pd.DataFrame:
        """Load data from CSV, Excel, or database."""
        try:
            if self.data_source.endswith('.csv'):
                self.raw_data = pd.read_csv(self.data_source)
            elif self.data_source.endswith(('.xlsx', '.xls')):
                self.raw_data = pd.read_excel(self.data_source)
            else:
                # For database connections, you'd use sqlalchemy here
                raise ValueError(f"Unsupported data source: {self.data_source}")

            logging.info(f"Loaded {len(self.raw_data)} rows from {self.data_source}")
            return self.raw_data

        except Exception as e:
            logging.error(f"Failed to load data: {e}")
            raise

    def filter_date_range(self, date_column: str, days: int = 30) -> pd.DataFrame:
        """Filter data to a specific date range."""
        end_date = datetime.now()
        start_date = end_date - timedelta(days=days)

        self.raw_data[date_column] = pd.to_datetime(self.raw_data[date_column])
        mask = (self.raw_data[date_column] >= start_date) & \
               (self.raw_data[date_column] <= end_date)

        self.processed_data = self.raw_data[mask].copy()
        logging.info(f"Filtered to {len(self.processed_data)} rows in date range")
        return self.processed_data

    def calculate_summary_metrics(self, value_column: str,
                                   group_column: str = None) -> dict:
        """Calculate key business metrics."""
        data = self.processed_data if self.processed_data is not None \
               else self.raw_data

        metrics = {
            'total': data[value_column].sum(),
            'average': data[value_column].mean(),
            'count': len(data),
            'max': data[value_column].max(),
            'min': data[value_column].min(),
            'period_start': data.index.min() if hasattr(data.index, 'min') else None,
            'period_end': data.index.max() if hasattr(data.index, 'max') else None,
        }

        if group_column:
            metrics['by_group'] = data.groupby(group_column)[value_column].sum().to_dict()

        return metrics

Breaking Down the Code (For Non-Programmers)

Don't let the code intimidate you. Here's what each part does:

class ReportDataPipeline - Think of this as creating a "report-making machine". We're defining all the steps this machine can perform.

load_data() - Opens your data file. It's smart enough to know if you're giving it an Excel file (.xlsx) or a CSV file, and handles each appropriately.

filter_date_range() - Keeps only the data from a specific time period. For example, "give me only the last 30 days of sales."

calculate_summary_metrics() - Does the maths:

Total: Add up all the values (like summing a column in Excel)
Average: The mean value
Count: How many records there are
Max/Min: The highest and lowest values

Example in action:

# This is how you'd actually use the code above
pipeline = ReportDataPipeline("sales_data.csv")
pipeline.load_data()  # Opens the file
pipeline.filter_date_range("order_date", days=30)  # Last 30 days only
metrics = pipeline.calculate_summary_metrics("revenue", "region")  # Calculate!

# Now 'metrics' contains:
# - total revenue for the month
# - average order value
# - number of orders
# - revenue broken down by region

A Common Pitfall Learned the Hard Way

Notice the logging throughout? That's not optional. When your scheduled report fails at 3am on a Sunday, you need to know exactly where and why. I once spent four hours debugging a report failure that turned out to be a single row with a malformed date. Add logging from the start—you'll thank yourself later.

Step 2: Chart Generation

Charts make reports scannable. A wall of numbers is hard to digest, but a well-designed chart tells the story instantly.

What This Step Does (Plain English)

You know how in Excel you select data and click "Insert Chart"? This code does that automatically—but with more control over colours, formatting, and style. It creates three types of charts:

Line charts - Great for showing trends over time ("How have sales changed this quarter?")
Bar charts - Perfect for comparing categories ("Which region sold the most?")
Pie charts - Shows proportions ("What percentage of revenue comes from each product?")

Here's a reusable pattern for generating consistent visualisations:

import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from io import BytesIO

class ReportChartGenerator:
    """Generates charts for PDF reports."""

    def __init__(self, style: str = 'seaborn-v0_8-whitegrid'):
        plt.style.use(style)
        # Corporate colours - adjust to match your branding
        self.colors = ['#2563eb', '#10b981', '#f59e0b', '#ef4444', '#8b5cf6']

    def create_line_chart(self, data: pd.DataFrame,
                          x_column: str,
                          y_column: str,
                          title: str) -> BytesIO:
        """Create a line chart and return as bytes for PDF embedding."""
        fig, ax = plt.subplots(figsize=(10, 5))

        ax.plot(data[x_column], data[y_column],
                color=self.colors[0], linewidth=2)
        ax.fill_between(data[x_column], data[y_column],
                        alpha=0.3, color=self.colors[0])

        ax.set_title(title, fontsize=14, fontweight='bold', pad=20)
        ax.set_xlabel(x_column.replace('_', ' ').title())
        ax.set_ylabel(y_column.replace('_', ' ').title())

        # Format dates nicely if x-axis is datetime
        if pd.api.types.is_datetime64_any_dtype(data[x_column]):
            ax.xaxis.set_major_formatter(mdates.DateFormatter('%d %b'))
            plt.xticks(rotation=45)

        ax.grid(True, alpha=0.3)
        plt.tight_layout()

        # Save to bytes buffer instead of file
        buffer = BytesIO()
        plt.savefig(buffer, format='png', dpi=150, bbox_inches='tight')
        buffer.seek(0)
        plt.close()

        return buffer

    def create_bar_chart(self, data: dict,
                         title: str,
                         xlabel: str = '',
                         ylabel: str = '') -> BytesIO:
        """Create a bar chart from a dictionary."""
        fig, ax = plt.subplots(figsize=(10, 5))

        categories = list(data.keys())
        values = list(data.values())

        bars = ax.bar(categories, values, color=self.colors[:len(categories)])

        # Add value labels on bars
        for bar, value in zip(bars, values):
            height = bar.get_height()
            ax.annotate(f'${value:,.0f}' if value > 1000 else f'{value:,.0f}',
                       xy=(bar.get_x() + bar.get_width() / 2, height),
                       xytext=(0, 3),
                       textcoords="offset points",
                       ha='center', va='bottom', fontsize=10)

        ax.set_title(title, fontsize=14, fontweight='bold', pad=20)
        ax.set_xlabel(xlabel)
        ax.set_ylabel(ylabel)

        plt.xticks(rotation=45, ha='right')
        plt.tight_layout()

        buffer = BytesIO()
        plt.savefig(buffer, format='png', dpi=150, bbox_inches='tight')
        buffer.seek(0)
        plt.close()

        return buffer

    def create_pie_chart(self, data: dict, title: str) -> BytesIO:
        """Create a pie chart for distribution analysis."""
        fig, ax = plt.subplots(figsize=(8, 8))

        labels = list(data.keys())
        sizes = list(data.values())

        wedges, texts, autotexts = ax.pie(
            sizes,
            labels=labels,
            autopct='%1.1f%%',
            colors=self.colors[:len(labels)],
            startangle=90
        )

        ax.set_title(title, fontsize=14, fontweight='bold', pad=20)

        buffer = BytesIO()
        plt.savefig(buffer, format='png', dpi=150, bbox_inches='tight')
        buffer.seek(0)
        plt.close()

        return buffer

Understanding the Chart Code (Plain English)

What each function does:

create_line_chart() - Draws a line connecting data points over time. Perfect for showing "sales this month vs last month" or "website visitors per day"
create_bar_chart() - Creates vertical bars where height = value. Great for "revenue by region" or "orders by product category"
create_pie_chart() - Divides a circle into slices showing percentages. Use for "market share" or "expense breakdown"

Key settings explained:

figsize=(10, 5) - The size of your chart (width x height in inches)
self.colors - Your brand colours. The code uses professional blues, greens, and purples by default
dpi=150 - Image quality. Higher = crisper but larger file size

Example: Creating a sales trend chart

charts = ReportChartGenerator()

# Your data might look like this (simplified):
# Date: Jan 1, Jan 2, Jan 3...
# Revenue: $5000, $6200, $4800...

trend_chart = charts.create_line_chart(
    data=my_sales_data,
    x_column='date',
    y_column='revenue',
    title='Daily Revenue - January 2025'
)
# Result: A professional line chart saved in memory, ready for your PDF

Pro Tip: Memory Management

Notice that I'm using BytesIO buffers and calling plt.close() after each chart? This prevents memory leaks when generating multiple charts. Report jobs can crash after generating 50+ charts when matplotlib figures aren't closed properly. In production environments, this is critical.

Step 3: PDF Generation with Templates

This is where we bring it together—turning your data and charts into a professional PDF document.

What This Step Does (Plain English)

Imagine you're laying out a document in Word or InDesign:

You add a header at the top of each page
You insert your company name and the date
You add titles and subtitles
You paste in your charts
You create tables with your data
You add a page number footer

This code does all of that automatically. Every time the script runs, it creates a fresh PDF with the latest data, perfectly formatted, without you touching anything.

Real-world example: Instead of copying numbers from Excel, making charts, pasting into Word, adjusting formatting, exporting to PDF... the script produces a polished 10-page report in under 5 seconds.

FPDF2 lets you build PDFs programmatically:

from fpdf import FPDF
from datetime import datetime

class BusinessReportPDF(FPDF):
    """Custom PDF class for business reports."""

    def __init__(self, company_name: str = "Your Company"):
        super().__init__()
        self.company_name = company_name
        self.set_auto_page_break(auto=True, margin=15)

    def header(self):
        """Add header to each page."""
        self.set_font('Helvetica', 'B', 12)
        self.cell(0, 10, self.company_name, align='L')
        self.set_font('Helvetica', '', 10)
        self.cell(0, 10, datetime.now().strftime('%d %B %Y'), align='R', ln=True)
        self.line(10, 25, 200, 25)
        self.ln(10)

    def footer(self):
        """Add footer with page numbers."""
        self.set_y(-15)
        self.set_font('Helvetica', 'I', 8)
        self.cell(0, 10, f'Page {self.page_no()}/{{nb}}', align='C')

    def add_title(self, title: str):
        """Add report title."""
        self.set_font('Helvetica', 'B', 24)
        self.cell(0, 20, title, ln=True, align='C')
        self.ln(10)

    def add_subtitle(self, subtitle: str):
        """Add section subtitle."""
        self.set_font('Helvetica', 'B', 14)
        self.set_fill_color(240, 240, 240)
        self.cell(0, 10, subtitle, ln=True, fill=True)
        self.ln(5)

    def add_metric_box(self, label: str, value: str,
                       change: str = None, is_positive: bool = True):
        """Add a highlighted metric box."""
        self.set_font('Helvetica', '', 10)
        self.cell(50, 8, label, ln=False)
        self.set_font('Helvetica', 'B', 14)
        self.cell(40, 8, value, ln=False)

        if change:
            if is_positive:
                self.set_text_color(0, 150, 0)
            else:
                self.set_text_color(200, 0, 0)
            self.set_font('Helvetica', '', 10)
            self.cell(0, 8, change, ln=True)
            self.set_text_color(0, 0, 0)
        else:
            self.ln()

    def add_chart(self, chart_buffer: BytesIO, width: int = 180):
        """Add a chart image from a BytesIO buffer."""
        # Save buffer to temp file (FPDF2 requires file path)
        import tempfile
        with tempfile.NamedTemporaryFile(suffix='.png', delete=False) as tmp:
            tmp.write(chart_buffer.getvalue())
            tmp_path = tmp.name

        self.image(tmp_path, x=15, w=width)
        self.ln(10)

        # Clean up temp file
        import os
        os.unlink(tmp_path)

    def add_table(self, headers: list, data: list, col_widths: list = None):
        """Add a data table."""
        if col_widths is None:
            col_widths = [190 / len(headers)] * len(headers)

        # Header row
        self.set_font('Helvetica', 'B', 10)
        self.set_fill_color(50, 50, 50)
        self.set_text_color(255, 255, 255)

        for header, width in zip(headers, col_widths):
            self.cell(width, 8, header, border=1, fill=True, align='C')
        self.ln()

        # Data rows
        self.set_font('Helvetica', '', 9)
        self.set_text_color(0, 0, 0)

        for row_idx, row in enumerate(data):
            if row_idx % 2 == 0:
                self.set_fill_color(245, 245, 245)
            else:
                self.set_fill_color(255, 255, 255)

            for cell, width in zip(row, col_widths):
                self.cell(width, 7, str(cell), border=1, fill=True, align='C')
            self.ln()

        self.ln(5)

    def add_paragraph(self, text: str):
        """Add a text paragraph."""
        self.set_font('Helvetica', '', 10)
        self.multi_cell(0, 6, text)
        self.ln(5)

Understanding the PDF Code (Plain English)

This might look intimidating, but each function does one simple thing:

Function	What It Does	Like This in Word/PDF
`header()`	Adds company name + date at top	Your letterhead
`footer()`	Adds page numbers at bottom	"Page 1 of 5"
`add_title()`	Big, bold heading	Report title
`add_subtitle()`	Section heading with grey background	Chapter headings
`add_metric_box()`	Shows a key number with label	"Total Revenue: $125,000 +12%"
`add_chart()`	Inserts a chart image	Your graphs
`add_table()`	Creates a formatted data table	A nice table with headers
`add_paragraph()`	Adds regular text	Body text

The magic of PDF coordinates:

PDF pages are measured in millimetres from the top-left corner
self.set_font('Helvetica', 'B', 14) means "use Helvetica font, Bold, 14 point size"
self.cell(0, 10, text) means "create a text box full width, 10mm tall"
self.ln(5) means "move down 5mm" (like pressing Enter)

Customising for your brand:

# Change these colours to match your company:
self.set_fill_color(50, 50, 50)      # Dark grey for table headers
self.set_text_color(0, 150, 0)       # Green for positive numbers
self.set_text_color(200, 0, 0)       # Red for negative numbers

Step 4: Putting It All Together

Now we combine all the pieces into one script that does everything from start to finish.

What This Step Does (Plain English)

This is the "master script" that coordinates everything:

Reads your data file - Loads last 30 days of sales
Calculates the numbers - Total revenue, order count, by region
Creates the charts - Revenue trend + regional comparison
Builds the PDF - Title, metrics, charts, tables
Saves to disk - Ready to email

Think of it like a conductor leading an orchestra—each musician (function) knows their part, and the conductor (this code) tells them when to play.

Here's the complete pipeline that ties everything together:

from datetime import datetime
import os

def generate_sales_report(data_file: str, output_path: str) -> str:
    """Generate a complete sales report PDF."""

    # Step 1: Load and process data
    pipeline = ReportDataPipeline(data_file)
    pipeline.load_data()
    pipeline.filter_date_range('order_date', days=30)

    metrics = pipeline.calculate_summary_metrics(
        value_column='revenue',
        group_column='region'
    )

    # Step 2: Generate charts
    charts = ReportChartGenerator()

    # Daily revenue trend
    daily_data = pipeline.processed_data.groupby(
        pipeline.processed_data['order_date'].dt.date
    )['revenue'].sum().reset_index()
    daily_data.columns = ['date', 'revenue']

    trend_chart = charts.create_line_chart(
        daily_data, 'date', 'revenue',
        'Daily Revenue Trend (Last 30 Days)'
    )

    # Revenue by region
    region_chart = charts.create_bar_chart(
        metrics['by_group'],
        'Revenue by Region',
        ylabel='Revenue ($)'
    )

    # Step 3: Build PDF
    pdf = BusinessReportPDF(company_name="Acme Manufacturing")
    pdf.alias_nb_pages()
    pdf.add_page()

    # Title
    pdf.add_title("Monthly Sales Report")
    pdf.add_paragraph(
        f"Report generated: {datetime.now().strftime('%d %B %Y at %H:%M')}\n"
        f"Period: Last 30 days"
    )

    # Key metrics section
    pdf.add_subtitle("Key Metrics")
    pdf.add_metric_box("Total Revenue", f"${metrics['total']:,.2f}", "+12.3%", True)
    pdf.add_metric_box("Total Orders", f"{metrics['count']:,}", "+8.1%", True)
    pdf.add_metric_box("Average Order", f"${metrics['average']:,.2f}", "-2.1%", False)
    pdf.ln(10)

    # Charts
    pdf.add_subtitle("Revenue Trend")
    pdf.add_chart(trend_chart)

    pdf.add_page()
    pdf.add_subtitle("Regional Performance")
    pdf.add_chart(region_chart)

    # Data table
    pdf.add_subtitle("Top 10 Orders")
    top_orders = pipeline.processed_data.nlargest(10, 'revenue')[
        ['order_id', 'customer', 'region', 'revenue']
    ].values.tolist()

    pdf.add_table(
        headers=['Order ID', 'Customer', 'Region', 'Revenue'],
        data=[[row[0], row[1], row[2], f"${row[3]:,.2f}"] for row in top_orders],
        col_widths=[30, 70, 40, 40]
    )

    # Save
    output_file = os.path.join(output_path,
        f"sales_report_{datetime.now().strftime('%Y%m%d')}.pdf")
    pdf.output(output_file)

    logging.info(f"Report generated: {output_file}")
    return output_file

Line-by-Line Walkthrough (Plain English)

Here's what each section of that code does:

Loading and processing data:

pipeline = ReportDataPipeline(data_file)  # Create our data handler
pipeline.load_data()                       # Open the file
pipeline.filter_date_range('order_date', days=30)  # Keep only last 30 days
metrics = pipeline.calculate_summary_metrics(...)  # Do the maths

Creating the charts:

charts = ReportChartGenerator()  # Create our chart maker
trend_chart = charts.create_line_chart(...)  # Make the trend line
region_chart = charts.create_bar_chart(...)  # Make the bar chart

Building the PDF:

pdf = BusinessReportPDF(company_name="Acme Manufacturing")  # Start new PDF
pdf.add_page()           # Create first page
pdf.add_title(...)       # Add the big title
pdf.add_metric_box(...)  # Add key numbers
pdf.add_chart(...)       # Insert the charts
pdf.add_table(...)       # Add data tables
pdf.output(output_file)  # Save the PDF file

How to run it:

Save all the code above to a file called report_generator.py
Make sure you have a CSV file with columns: order_date, revenue, region, order_id, customer
In terminal, run: python report_generator.py
Your PDF appears in the output folder!

Step 5: Email Distribution

Reports are useless if they sit on a server. Let's make them land in the right inboxes automatically.

What This Step Does (Plain English)

This code does exactly what you'd do manually:

Opens a new email
Fills in the recipients (To:)
Adds a subject line
Writes a message body
Attaches your PDF report
Clicks "Send"

The difference? It does this in milliseconds, at 3am if you want, without you being awake.

Here's how to email them automatically:

import smtplib
import ssl
from email.mime.multipart import MIMEMultipart
from email.mime.text import MIMEText
from email.mime.application import MIMEApplication
from dotenv import load_dotenv

load_dotenv()

def send_report_email(report_path: str, recipients: list,
                      subject: str = "Your Automated Report"):
    """Send the generated report via email."""

    smtp_server = os.getenv('SMTP_SERVER', 'smtp.gmail.com')
    smtp_port = int(os.getenv('SMTP_PORT', '587'))
    sender_email = os.getenv('SENDER_EMAIL')
    sender_password = os.getenv('SENDER_PASSWORD')

    # Create message
    msg = MIMEMultipart()
    msg['From'] = sender_email
    msg['To'] = ', '.join(recipients)
    msg['Subject'] = subject

    # Email body
    body = """
    Hi,

    Please find attached your automated report.

    This report was generated automatically. If you have questions about
    the data, please contact the analytics team.

    Regards,
    Automated Reporting System
    """
    msg.attach(MIMEText(body, 'plain'))

    # Attach PDF
    with open(report_path, 'rb') as f:
        attachment = MIMEApplication(f.read(), _subtype='pdf')
        attachment.add_header(
            'Content-Disposition',
            'attachment',
            filename=os.path.basename(report_path)
        )
        msg.attach(attachment)

    # Send
    try:
        context = ssl.create_default_context()
        with smtplib.SMTP(smtp_server, smtp_port) as server:
            server.starttls(context=context)
            server.login(sender_email, sender_password)
            server.sendmail(sender_email, recipients, msg.as_string())

        logging.info(f"Report emailed to {', '.join(recipients)}")
        return True

    except Exception as e:
        logging.error(f"Failed to send email: {e}")
        return False

Understanding the Email Code (Plain English)

Here's what each part does:

# These lines read your email settings from a secure file
smtp_server = os.getenv('SMTP_SERVER')  # Your email provider's server
sender_email = os.getenv('SENDER_EMAIL')  # Your email address
sender_password = os.getenv('SENDER_PASSWORD')  # Your app password

# This creates the email message
msg = MIMEMultipart()  # Start a new email
msg['From'] = sender_email  # Who it's from
msg['To'] = ', '.join(recipients)  # Who it's going to
msg['Subject'] = subject  # Email subject line
msg.attach(MIMEText(body, 'plain'))  # The message body

# This attaches your PDF
with open(report_path, 'rb') as f:  # Open the PDF file
    attachment = MIMEApplication(f.read())  # Read it into the email
    msg.attach(attachment)  # Attach it

Setting Up Email (Step-by-Step)

Option 1: Gmail (Most Common)

Gmail requires a special "App Password" for scripts—you can't use your regular password.

How to create a Gmail App Password:

Go to myaccount.google.com
Click Security in the left menu
Under "Signing in to Google", click 2-Step Verification (enable it if you haven't)
Scroll down and click App passwords
Select "Mail" and "Windows Computer" (or Mac)
Click Generate
Copy the 16-character password it shows you

Create a .env file in your project folder with your settings:

SMTP_SERVER=smtp.gmail.com
SMTP_PORT=587
SENDER_EMAIL=yourname@gmail.com
SENDER_PASSWORD=your-16-character-app-password

Security warning: Never share your .env file or commit it to version control. Add .env to your .gitignore file.

Option 2: Microsoft 365 / Outlook

SMTP_SERVER=smtp.office365.com
SMTP_PORT=587
SENDER_EMAIL=yourname@company.com.au
SENDER_PASSWORD=your-password

Option 3: Your Company's Email Server

Ask your IT department for:

SMTP server address
SMTP port (usually 587 or 465)
Whether authentication is required

Step 6: Scheduling

The final piece—making it run automatically, even when you're asleep.

What This Step Does (Plain English)

Right now, you have to manually run your script. Scheduling makes it run automatically at specific times:

"Run every Monday at 7am" - Weekly sales report
"Run on the 1st of every month" - Monthly financial summary
"Run every day at 6pm" - Daily operations snapshot

It's like setting an alarm clock for your computer.

Here's the code:

import schedule
import time

def run_weekly_report():
    """Generate and distribute the weekly sales report."""
    try:
        report_file = generate_sales_report(
            data_file='/path/to/sales_data.csv',
            output_path='/path/to/reports/'
        )

        send_report_email(
            report_path=report_file,
            recipients=['manager@company.com', 'finance@company.com'],
            subject=f"Weekly Sales Report - {datetime.now().strftime('%d %b %Y')}"
        )

    except Exception as e:
        logging.error(f"Weekly report failed: {e}")
        # Send alert email to IT
        send_report_email(
            report_path=None,
            recipients=['it@company.com'],
            subject="ALERT: Weekly Report Generation Failed"
        )

# Schedule for Monday mornings at 7:00 AM
schedule.every().monday.at("07:00").do(run_weekly_report)

# Keep the script running
if __name__ == "__main__":
    logging.info("Report scheduler started")
    while True:
        schedule.run_pending()
        time.sleep(60)  # Check every minute

For production deployments, I'd recommend using cron (Linux/Mac) or Task Scheduler (Windows) rather than keeping a Python script running. Here's the cron entry for Monday 7am:

0 7 * * 1 /usr/bin/python3 /path/to/report_generator.py

Common Pitfalls and How to Avoid Them

After implementing automated reporting for dozens of Australian businesses, here are the issues that come up repeatedly:

1. Data Quality Assumptions

Your report will only be as good as your data. Build in validation checks:

def validate_data(df: pd.DataFrame) -> tuple[bool, list]:
    """Validate data before report generation."""
    issues = []

    if df.empty:
        issues.append("Dataset is empty")

    if df.isnull().sum().sum() > len(df) * 0.1:
        issues.append("More than 10% missing values")

    # Check for duplicates
    if df.duplicated().sum() > 0:
        issues.append(f"{df.duplicated().sum()} duplicate rows found")

    return len(issues) == 0, issues

2. Timezone Issues

Australian businesses often have offices across multiple timezones. Always store datetimes in UTC and convert for display:

from zoneinfo import ZoneInfo

# Store in UTC
utc_time = datetime.now(ZoneInfo('UTC'))

# Display in local timezone
local_time = utc_time.astimezone(ZoneInfo('Australia/Sydney'))

3. File Path Handling

Windows and Mac/Linux handle paths differently. Use pathlib for cross-platform compatibility:

from pathlib import Path

output_dir = Path('/path/to/reports')
output_dir.mkdir(parents=True, exist_ok=True)
output_file = output_dir / f"report_{datetime.now():%Y%m%d}.pdf"

The ROI Case: Is This Worth Your Time?

Let's do the maths honestly. According to Business Dasher's 2024 research, businesses using automation report 10-50% cost reduction through decreased manual processing.

Here's a realistic breakdown for a typical weekly report:

Manual vs Automated Reporting Costs

Metric	Manual Process	Automated	Improvement
Time per report	3 hours	0 minutes	100%
Annual staff time	156 hours	0 hours	156 hrs saved
Error rate	2-5%	<0.1%	~98%
Delivery time	Variable	Always on time	Consistent

Financial Breakdown (Single Weekly Report)

Manual cost: 3hrs × 52 weeks × $60/hr$9,360/year

Automation build (one-time)$3,000

Annual maintenance (2hrs/month)$3,600/year

First year savings$2,760

Year 2+ annual savings$5,760/year

The multiplier effect: Most businesses have 5-10 regular reports. Once you've built the infrastructure for one, adding more reports costs a fraction of the original build. The same code patterns work for:

Weekly sales reports
Monthly financial summaries
Daily operations snapshots
Quarterly board packs
Customer account statements

Common Problems and How to Fix Them

New to coding? Here are the issues you'll likely hit:

"Python is not recognized" or "command not found"

Problem: Your computer doesn't know where Python is installed. Fix (Windows): Reinstall Python and make sure to check "Add Python to PATH" Fix (Mac): Try using python3 instead of python

"ModuleNotFoundError: No module named 'pandas'"

Problem: The library isn't installed. Fix: Run pip install pandas (replace pandas with whatever module is missing)

"Permission denied" when saving files

Problem: You're trying to save somewhere you don't have access. Fix: Save to a folder in your Documents or Desktop instead

"SMTP Authentication Error"

Problem: Your email credentials are wrong or you need an App Password. Fix: For Gmail, create an App Password (see email setup section above)

The PDF looks wrong or charts are cut off

Problem: Page layout issues. Fix: Adjust figsize for charts and use pdf.add_page() before sections that don't fit

"FileNotFoundError: [Errno 2] No such file"

Problem: Python can't find your data file. Fix: Use the full path: C:\Users\YourName\Documents\sales.csv not just sales.csv

Next Steps

Your Implementation Roadmap

Day 1

Setup & Test

Install Python, VS Code, and run a simple test script

Day 2-3

Data Pipeline

Get data loading working with your actual files

Day 4-5

Charts & PDF

Create your first automated report

Day 6-7

Email & Schedule

Set up automatic delivery

Ongoing

Expand

Add more reports using the same infrastructure

Day 1

Setup & Test

Install Python, VS Code, and run a simple test script

Day 2-3

Data Pipeline

Get data loading working with your actual files

Day 4-5

Charts & PDF

Create your first automated report

Day 6-7

Email & Schedule

Set up automatic delivery

Ongoing

Expand

Add more reports using the same infrastructure

If you're ready to automate your reporting:

Start simple: Pick your most time-consuming report first
Document the current process: What data? What calculations? Who receives it?
Build incrementally: Get data extraction working, then charts, then PDF, then email
Test with real data: Edge cases always surprise you
Add error handling: Things will fail; plan for it

Resources for learning more:

Real Python Tutorials - Excellent beginner-friendly Python guides
Pandas Getting Started - Official pandas tutorials
Automate the Boring Stuff - Free book on Python automation

Need Help?

Want help implementing automated reporting for your business? We've deployed these systems across accounting firms, manufacturers, and logistics companies throughout Australia.

What's the Right Next Step for You?

How comfortable are you with technical implementation?

I can follow this tutorial myself

→ Bookmark this page and get started!

I'd like someone to build it for me

→ Book a free consultation below

I need ongoing support and maintenance

→ Ask about our retainer packages

I'm not sure what I need yet

→ Let's chat about your requirements

Book a Free Assessment

Solve8 helps Australian businesses automate repetitive processes. Based in Brisbane, working nationally. No buzzwords, just systems that work.

Sources:

How to Automate Report Generation: From Data to PDF in Minutes

The Report That Stole 15 Hours Every Week

What Does "Automated Reporting" Actually Mean?

How Automated Reporting Works

Is This Tutorial Right for You?

Should You Automate Your Reports?

What You'll Build

What We'll Build Together

First, The Basics: What is Python?

Setting Up Your Computer (Step-by-Step)

Step 1: Install Python

Step 2: Install a Code Editor

Step 3: Open Your Terminal

Step 4: Create a Project Folder

The Technical Stack (Don't Worry, We'll Explain)

Installation (Step-by-Step)

Official Documentation Links

Step 1: Data Aggregation

What This Step Does (Plain English)

Breaking Down the Code (For Non-Programmers)

A Common Pitfall Learned the Hard Way

Step 2: Chart Generation

What This Step Does (Plain English)

Understanding the Chart Code (Plain English)

Pro Tip: Memory Management

Step 3: PDF Generation with Templates

What This Step Does (Plain English)

Understanding the PDF Code (Plain English)

Step 4: Putting It All Together

What This Step Does (Plain English)

Line-by-Line Walkthrough (Plain English)

Step 5: Email Distribution

What This Step Does (Plain English)

Understanding the Email Code (Plain English)

Setting Up Email (Step-by-Step)

Step 6: Scheduling

What This Step Does (Plain English)

Common Pitfalls and How to Avoid Them

1. Data Quality Assumptions

2. Timezone Issues

3. File Path Handling

The ROI Case: Is This Worth Your Time?

Manual vs Automated Reporting Costs

Financial Breakdown (Single Weekly Report)

Common Problems and How to Fix Them

"Python is not recognized" or "command not found"

"ModuleNotFoundError: No module named 'pandas'"

"Permission denied" when saving files

"SMTP Authentication Error"

The PDF looks wrong or charts are cut off

"FileNotFoundError: [Errno 2] No such file"

Next Steps

Your Implementation Roadmap

Need Help?

What's the Right Next Step for You?