Back to Blog
    Tutorial

    How to Automate Report Generation: From Data to PDF in Minutes

    Dec 17, 2024By Solve8 Team15 min read

    Automate Report Generation with Python

    The Report That Stole 15 Hours Every Week

    Consider a typical finance manager at a Melbourne manufacturing company. Every Friday afternoon, she spends three hours pulling numbers from four different systems, copying them into Excel, making charts, converting to PDF, and emailing to the leadership team.

    Fifteen hours a month. On a single report.

    Here's the thing—according to research from McKinsey, knowledge workers spend 19% of their time searching for and gathering information, and another 14% communicating and collaborating internally. A significant chunk of that is report generation and distribution.

    The good news? This is exactly the kind of repetitive, structured task that Python can automate completely. Automated reporting systems work well for accounting firms, logistics companies, and manufacturers across Australia. The pattern is always the same: what takes humans hours takes Python minutes.

    This tutorial walks you through building a complete report automation pipeline—from database to inbox—with real code you can adapt to your needs.


    What Does "Automated Reporting" Actually Mean?

    Before we dive into code, let's understand what we're building in plain English.

    Think of it like a robot assistant that does exactly what you do when creating a report—but without getting tired, making typos, or forgetting steps:

    1. Opens your data files (like you opening Excel or your accounting software)
    2. Does the calculations (totals, averages, comparisons)
    3. Creates the charts (bar graphs, pie charts, trend lines)
    4. Formats everything nicely into a professional PDF
    5. Sends it to the right people via email
    6. Does this automatically at the same time every day/week/month

    Real-world analogy: Imagine you hired a junior assistant who's incredibly reliable but only knows how to follow exact instructions. You'd write them a detailed checklist: "Every Monday at 7am, open this spreadsheet, calculate these totals, make this chart, save as PDF, email to these people." That's essentially what we're doing—but the "assistant" is a Python script that never calls in sick.

    How Automated Reporting Works

    Collect Data
    Pull from Excel, CSV, or database
    Process
    Calculate totals & metrics
    Visualise
    Generate charts & graphs
    Create PDF
    Format into professional report
    Distribute
    Email to stakeholders

    Is This Tutorial Right for You?

    Should You Automate Your Reports?

    What's your current situation?
    I spend 2+ hours weekly on the same report
    → Strong candidate for automation
    My reports pull from Excel/CSV files
    → Perfect fit for this tutorial
    I need reports from complex ERP systems
    → May need custom integration help
    My reports change format every time
    → Focus on standardising first

    This tutorial is for you if:

    • You create the same report repeatedly (weekly, monthly, quarterly)
    • Your data lives in Excel, CSV files, or a database
    • You're comfortable following technical instructions (or have someone who is)
    • You want to understand how automation works, not just buy a tool

    You might want professional help if:

    • Your data is in complex systems (SAP, Oracle, custom ERP)
    • You need real-time dashboards rather than scheduled reports
    • You have strict compliance requirements (financial services, healthcare)

    What You'll Build

    By the end of this tutorial, you'll have a system that:

    1. Pulls data from a database or CSV files
    2. Aggregates and transforms the data into meaningful metrics
    3. Generates visualisations (charts, tables)
    4. Creates a professional PDF using templates
    5. Schedules automated runs (daily, weekly, monthly)
    6. Emails the finished report to stakeholders

    What We'll Build Together

    1
    Part 1
    Data Collection
    Connect to your data sources and pull information
    2
    Part 2
    Chart Generation
    Create professional visualisations automatically
    3
    Part 3
    PDF Creation
    Build beautiful, branded reports
    4
    Part 4
    Email Distribution
    Send reports to the right people
    5
    Part 5
    Scheduling
    Make it run automatically on a schedule

    Let's get into it.


    First, The Basics: What is Python?

    If you've never written code before, start here.

    Python is a programming language—a way to give instructions to your computer. Think of it like writing a very detailed recipe that your computer follows exactly.

    Why Python for reports?

    • It's readable (looks almost like English)
    • It's free and runs on any computer
    • It has thousands of pre-built tools (called "libraries") for working with data

    Setting Up Your Computer (Step-by-Step)

    Before you can run any code, you need to set up your computer. This takes about 15-20 minutes, and you only do it once.

    Step 1: Install Python

    What is Python? The software that reads and runs your code.

    Download links:

    Important during installation (Windows):

    • CHECK the box that says "Add Python to PATH" - this is crucial!
    • Click "Install Now"

    To verify it worked:

    1. Open Command Prompt (Windows) or Terminal (Mac)
    2. Type: python --version
    3. You should see something like: Python 3.12.1

    Step 2: Install a Code Editor

    What is a code editor? Like Microsoft Word, but for writing code. It highlights syntax and catches errors.

    Recommended: Visual Studio Code (Free)

    • Download: code.visualstudio.com
    • Works on Windows, Mac, and Linux
    • Click the big download button, run the installer

    Alternative options:

    • PyCharm Community - More features, slightly steeper learning curve
    • Sublime Text - Lightweight and fast
    • Even Notepad works (but no helpful features)

    Step 3: Open Your Terminal

    What is a terminal? A text-based way to give commands to your computer. You'll use it to install libraries and run your scripts.

    How to open it:

    • Windows: Press Windows key + R, type cmd, press Enter
    • Mac: Press Cmd + Space, type Terminal, press Enter

    Step 4: Create a Project Folder

    Create a folder for your report automation project:

    # Windows
    mkdir C:\reports-automation
    cd C:\reports-automation
    
    # Mac/Linux
    mkdir ~/reports-automation
    cd ~/reports-automation
    

    What you need to get started:

    1. Python installed - Follow steps above
    2. A code editor - VS Code recommended
    3. Your data - An Excel or CSV file with the information you want to report on

    How code works (simplified):

    # Lines starting with # are comments (notes to yourself, computer ignores them)
    
    # This tells Python to open a file called "sales.csv"
    data = open_file("sales.csv")
    
    # This tells Python to add up all the values in the "revenue" column
    total = add_up(data, "revenue")
    
    # This tells Python to show you the result
    print(total)  # Output: 125000
    

    That's it. You write instructions, Python follows them. The code in this tutorial is more complex, but the principle is the same.


    The Technical Stack (Don't Worry, We'll Explain)

    Before we write any code, here's what tools we're using and why. Think of these as the "ingredients" in our recipe:

    ComponentLibraryWhy This Choice
    Data handlingpandasIndustry standard, handles any data source
    VisualisationmatplotlibReliable, extensive customisation
    PDF generationFPDF2Lightweight, fast, no external dependencies
    TemplatingJinja2Clean separation of data and presentation
    SchedulingscheduleSimple syntax, good for single-machine deployments
    EmailsmtplibBuilt into Python, works with any SMTP server

    In plain English:

    • pandas = Your data Swiss Army knife. It can read Excel, CSV, databases—pretty much anything with rows and columns
    • matplotlib = The chart maker. Turns numbers into bar graphs, line charts, pie charts
    • FPDF2 = The PDF creator. Takes your text and charts and arranges them into a professional document
    • schedule = The alarm clock. Tells your script "run every Monday at 7am"
    • smtplib = The postman. Sends your finished report via email

    Installation (Step-by-Step)

    Step 1: Open your terminal (Command Prompt on Windows, Terminal on Mac)

    Step 2: Navigate to your project folder:

    # Windows
    cd C:\reports-automation
    
    # Mac
    cd ~/reports-automation
    

    Step 3: Install all the required libraries with one command:

    pip install pandas matplotlib fpdf2 jinja2 schedule python-dotenv
    

    What this command does: It tells Python's package manager (pip) to download and install all these libraries from the internet. It's like installing apps on your phone—you only need to do this once.

    You should see: A bunch of text scrolling past, ending with "Successfully installed..." - that means it worked!

    If you get an error: Try pip3 instead of pip:

    pip3 install pandas matplotlib fpdf2 jinja2 schedule python-dotenv
    

    Official Documentation Links

    Want to learn more about each library? Here are the official docs:

    LibraryWhat It DoesDocumentation
    pandasData manipulationpandas.pydata.org
    matplotlibChart creationmatplotlib.org
    FPDF2PDF generationpy-pdf.github.io/fpdf2
    Jinja2Templatesjinja.palletsprojects.com
    scheduleTask schedulingschedule.readthedocs.io
    python-dotenvEnvironment variablespypi.org/project/python-dotenv

    FPDF2 hits the sweet spot between simplicity and capability. ReportLab is more powerful but overkill for most business reports. WeasyPrint is great for HTML-to-PDF but requires system dependencies that can be painful on Windows servers.


    Step 1: Data Aggregation

    Let's start with the foundation—pulling and preparing your data.

    What This Step Does (Plain English)

    Imagine you're creating a sales report. First, you'd:

    1. Open your sales spreadsheet
    2. Filter it to show only this month's data
    3. Calculate totals (total sales, average order value, etc.)

    That's exactly what this code does—automatically. Instead of you clicking through menus and typing formulas, the script handles it in milliseconds.

    Here's the pattern:

    import pandas as pd
    from datetime import datetime, timedelta
    import logging
    
    logging.basicConfig(
        level=logging.INFO,
        format='%(asctime)s - %(levelname)s - %(message)s'
    )
    
    class ReportDataPipeline:
        """Handles data extraction and transformation for reports."""
    
        def __init__(self, data_source: str):
            self.data_source = data_source
            self.raw_data = None
            self.processed_data = None
    
        def load_data(self) -> pd.DataFrame:
            """Load data from CSV, Excel, or database."""
            try:
                if self.data_source.endswith('.csv'):
                    self.raw_data = pd.read_csv(self.data_source)
                elif self.data_source.endswith(('.xlsx', '.xls')):
                    self.raw_data = pd.read_excel(self.data_source)
                else:
                    # For database connections, you'd use sqlalchemy here
                    raise ValueError(f"Unsupported data source: {self.data_source}")
    
                logging.info(f"Loaded {len(self.raw_data)} rows from {self.data_source}")
                return self.raw_data
    
            except Exception as e:
                logging.error(f"Failed to load data: {e}")
                raise
    
        def filter_date_range(self, date_column: str, days: int = 30) -> pd.DataFrame:
            """Filter data to a specific date range."""
            end_date = datetime.now()
            start_date = end_date - timedelta(days=days)
    
            self.raw_data[date_column] = pd.to_datetime(self.raw_data[date_column])
            mask = (self.raw_data[date_column] >= start_date) & \
                   (self.raw_data[date_column] <= end_date)
    
            self.processed_data = self.raw_data[mask].copy()
            logging.info(f"Filtered to {len(self.processed_data)} rows in date range")
            return self.processed_data
    
        def calculate_summary_metrics(self, value_column: str,
                                       group_column: str = None) -> dict:
            """Calculate key business metrics."""
            data = self.processed_data if self.processed_data is not None \
                   else self.raw_data
    
            metrics = {
                'total': data[value_column].sum(),
                'average': data[value_column].mean(),
                'count': len(data),
                'max': data[value_column].max(),
                'min': data[value_column].min(),
                'period_start': data.index.min() if hasattr(data.index, 'min') else None,
                'period_end': data.index.max() if hasattr(data.index, 'max') else None,
            }
    
            if group_column:
                metrics['by_group'] = data.groupby(group_column)[value_column].sum().to_dict()
    
            return metrics
    

    Breaking Down the Code (For Non-Programmers)

    Don't let the code intimidate you. Here's what each part does:

    class ReportDataPipeline - Think of this as creating a "report-making machine". We're defining all the steps this machine can perform.

    load_data() - Opens your data file. It's smart enough to know if you're giving it an Excel file (.xlsx) or a CSV file, and handles each appropriately.

    filter_date_range() - Keeps only the data from a specific time period. For example, "give me only the last 30 days of sales."

    calculate_summary_metrics() - Does the maths:

    • Total: Add up all the values (like summing a column in Excel)
    • Average: The mean value
    • Count: How many records there are
    • Max/Min: The highest and lowest values

    Example in action:

    # This is how you'd actually use the code above
    pipeline = ReportDataPipeline("sales_data.csv")
    pipeline.load_data()  # Opens the file
    pipeline.filter_date_range("order_date", days=30)  # Last 30 days only
    metrics = pipeline.calculate_summary_metrics("revenue", "region")  # Calculate!
    
    # Now 'metrics' contains:
    # - total revenue for the month
    # - average order value
    # - number of orders
    # - revenue broken down by region
    

    A Common Pitfall Learned the Hard Way

    Notice the logging throughout? That's not optional. When your scheduled report fails at 3am on a Sunday, you need to know exactly where and why. I once spent four hours debugging a report failure that turned out to be a single row with a malformed date. Add logging from the start—you'll thank yourself later.


    Step 2: Chart Generation

    Charts make reports scannable. A wall of numbers is hard to digest, but a well-designed chart tells the story instantly.

    What This Step Does (Plain English)

    You know how in Excel you select data and click "Insert Chart"? This code does that automatically—but with more control over colours, formatting, and style. It creates three types of charts:

    1. Line charts - Great for showing trends over time ("How have sales changed this quarter?")
    2. Bar charts - Perfect for comparing categories ("Which region sold the most?")
    3. Pie charts - Shows proportions ("What percentage of revenue comes from each product?")

    Here's a reusable pattern for generating consistent visualisations:

    import matplotlib.pyplot as plt
    import matplotlib.dates as mdates
    from io import BytesIO
    
    class ReportChartGenerator:
        """Generates charts for PDF reports."""
    
        def __init__(self, style: str = 'seaborn-v0_8-whitegrid'):
            plt.style.use(style)
            # Corporate colours - adjust to match your branding
            self.colors = ['#2563eb', '#10b981', '#f59e0b', '#ef4444', '#8b5cf6']
    
        def create_line_chart(self, data: pd.DataFrame,
                              x_column: str,
                              y_column: str,
                              title: str) -> BytesIO:
            """Create a line chart and return as bytes for PDF embedding."""
            fig, ax = plt.subplots(figsize=(10, 5))
    
            ax.plot(data[x_column], data[y_column],
                    color=self.colors[0], linewidth=2)
            ax.fill_between(data[x_column], data[y_column],
                            alpha=0.3, color=self.colors[0])
    
            ax.set_title(title, fontsize=14, fontweight='bold', pad=20)
            ax.set_xlabel(x_column.replace('_', ' ').title())
            ax.set_ylabel(y_column.replace('_', ' ').title())
    
            # Format dates nicely if x-axis is datetime
            if pd.api.types.is_datetime64_any_dtype(data[x_column]):
                ax.xaxis.set_major_formatter(mdates.DateFormatter('%d %b'))
                plt.xticks(rotation=45)
    
            ax.grid(True, alpha=0.3)
            plt.tight_layout()
    
            # Save to bytes buffer instead of file
            buffer = BytesIO()
            plt.savefig(buffer, format='png', dpi=150, bbox_inches='tight')
            buffer.seek(0)
            plt.close()
    
            return buffer
    
        def create_bar_chart(self, data: dict,
                             title: str,
                             xlabel: str = '',
                             ylabel: str = '') -> BytesIO:
            """Create a bar chart from a dictionary."""
            fig, ax = plt.subplots(figsize=(10, 5))
    
            categories = list(data.keys())
            values = list(data.values())
    
            bars = ax.bar(categories, values, color=self.colors[:len(categories)])
    
            # Add value labels on bars
            for bar, value in zip(bars, values):
                height = bar.get_height()
                ax.annotate(f'${value:,.0f}' if value > 1000 else f'{value:,.0f}',
                           xy=(bar.get_x() + bar.get_width() / 2, height),
                           xytext=(0, 3),
                           textcoords="offset points",
                           ha='center', va='bottom', fontsize=10)
    
            ax.set_title(title, fontsize=14, fontweight='bold', pad=20)
            ax.set_xlabel(xlabel)
            ax.set_ylabel(ylabel)
    
            plt.xticks(rotation=45, ha='right')
            plt.tight_layout()
    
            buffer = BytesIO()
            plt.savefig(buffer, format='png', dpi=150, bbox_inches='tight')
            buffer.seek(0)
            plt.close()
    
            return buffer
    
        def create_pie_chart(self, data: dict, title: str) -> BytesIO:
            """Create a pie chart for distribution analysis."""
            fig, ax = plt.subplots(figsize=(8, 8))
    
            labels = list(data.keys())
            sizes = list(data.values())
    
            wedges, texts, autotexts = ax.pie(
                sizes,
                labels=labels,
                autopct='%1.1f%%',
                colors=self.colors[:len(labels)],
                startangle=90
            )
    
            ax.set_title(title, fontsize=14, fontweight='bold', pad=20)
    
            buffer = BytesIO()
            plt.savefig(buffer, format='png', dpi=150, bbox_inches='tight')
            buffer.seek(0)
            plt.close()
    
            return buffer
    

    Understanding the Chart Code (Plain English)

    What each function does:

    • create_line_chart() - Draws a line connecting data points over time. Perfect for showing "sales this month vs last month" or "website visitors per day"

    • create_bar_chart() - Creates vertical bars where height = value. Great for "revenue by region" or "orders by product category"

    • create_pie_chart() - Divides a circle into slices showing percentages. Use for "market share" or "expense breakdown"

    Key settings explained:

    • figsize=(10, 5) - The size of your chart (width x height in inches)
    • self.colors - Your brand colours. The code uses professional blues, greens, and purples by default
    • dpi=150 - Image quality. Higher = crisper but larger file size

    Example: Creating a sales trend chart

    charts = ReportChartGenerator()
    
    # Your data might look like this (simplified):
    # Date: Jan 1, Jan 2, Jan 3...
    # Revenue: $5000, $6200, $4800...
    
    trend_chart = charts.create_line_chart(
        data=my_sales_data,
        x_column='date',
        y_column='revenue',
        title='Daily Revenue - January 2025'
    )
    # Result: A professional line chart saved in memory, ready for your PDF
    

    Pro Tip: Memory Management

    Notice that I'm using BytesIO buffers and calling plt.close() after each chart? This prevents memory leaks when generating multiple charts. Report jobs can crash after generating 50+ charts when matplotlib figures aren't closed properly. In production environments, this is critical.


    Step 3: PDF Generation with Templates

    This is where we bring it together—turning your data and charts into a professional PDF document.

    What This Step Does (Plain English)

    Imagine you're laying out a document in Word or InDesign:

    • You add a header at the top of each page
    • You insert your company name and the date
    • You add titles and subtitles
    • You paste in your charts
    • You create tables with your data
    • You add a page number footer

    This code does all of that automatically. Every time the script runs, it creates a fresh PDF with the latest data, perfectly formatted, without you touching anything.

    Real-world example: Instead of copying numbers from Excel, making charts, pasting into Word, adjusting formatting, exporting to PDF... the script produces a polished 10-page report in under 5 seconds.

    FPDF2 lets you build PDFs programmatically:

    from fpdf import FPDF
    from datetime import datetime
    
    class BusinessReportPDF(FPDF):
        """Custom PDF class for business reports."""
    
        def __init__(self, company_name: str = "Your Company"):
            super().__init__()
            self.company_name = company_name
            self.set_auto_page_break(auto=True, margin=15)
    
        def header(self):
            """Add header to each page."""
            self.set_font('Helvetica', 'B', 12)
            self.cell(0, 10, self.company_name, align='L')
            self.set_font('Helvetica', '', 10)
            self.cell(0, 10, datetime.now().strftime('%d %B %Y'), align='R', ln=True)
            self.line(10, 25, 200, 25)
            self.ln(10)
    
        def footer(self):
            """Add footer with page numbers."""
            self.set_y(-15)
            self.set_font('Helvetica', 'I', 8)
            self.cell(0, 10, f'Page {self.page_no()}/{{nb}}', align='C')
    
        def add_title(self, title: str):
            """Add report title."""
            self.set_font('Helvetica', 'B', 24)
            self.cell(0, 20, title, ln=True, align='C')
            self.ln(10)
    
        def add_subtitle(self, subtitle: str):
            """Add section subtitle."""
            self.set_font('Helvetica', 'B', 14)
            self.set_fill_color(240, 240, 240)
            self.cell(0, 10, subtitle, ln=True, fill=True)
            self.ln(5)
    
        def add_metric_box(self, label: str, value: str,
                           change: str = None, is_positive: bool = True):
            """Add a highlighted metric box."""
            self.set_font('Helvetica', '', 10)
            self.cell(50, 8, label, ln=False)
            self.set_font('Helvetica', 'B', 14)
            self.cell(40, 8, value, ln=False)
    
            if change:
                if is_positive:
                    self.set_text_color(0, 150, 0)
                else:
                    self.set_text_color(200, 0, 0)
                self.set_font('Helvetica', '', 10)
                self.cell(0, 8, change, ln=True)
                self.set_text_color(0, 0, 0)
            else:
                self.ln()
    
        def add_chart(self, chart_buffer: BytesIO, width: int = 180):
            """Add a chart image from a BytesIO buffer."""
            # Save buffer to temp file (FPDF2 requires file path)
            import tempfile
            with tempfile.NamedTemporaryFile(suffix='.png', delete=False) as tmp:
                tmp.write(chart_buffer.getvalue())
                tmp_path = tmp.name
    
            self.image(tmp_path, x=15, w=width)
            self.ln(10)
    
            # Clean up temp file
            import os
            os.unlink(tmp_path)
    
        def add_table(self, headers: list, data: list, col_widths: list = None):
            """Add a data table."""
            if col_widths is None:
                col_widths = [190 / len(headers)] * len(headers)
    
            # Header row
            self.set_font('Helvetica', 'B', 10)
            self.set_fill_color(50, 50, 50)
            self.set_text_color(255, 255, 255)
    
            for header, width in zip(headers, col_widths):
                self.cell(width, 8, header, border=1, fill=True, align='C')
            self.ln()
    
            # Data rows
            self.set_font('Helvetica', '', 9)
            self.set_text_color(0, 0, 0)
    
            for row_idx, row in enumerate(data):
                if row_idx % 2 == 0:
                    self.set_fill_color(245, 245, 245)
                else:
                    self.set_fill_color(255, 255, 255)
    
                for cell, width in zip(row, col_widths):
                    self.cell(width, 7, str(cell), border=1, fill=True, align='C')
                self.ln()
    
            self.ln(5)
    
        def add_paragraph(self, text: str):
            """Add a text paragraph."""
            self.set_font('Helvetica', '', 10)
            self.multi_cell(0, 6, text)
            self.ln(5)
    

    Understanding the PDF Code (Plain English)

    This might look intimidating, but each function does one simple thing:

    FunctionWhat It DoesLike This in Word/PDF
    header()Adds company name + date at topYour letterhead
    footer()Adds page numbers at bottom"Page 1 of 5"
    add_title()Big, bold headingReport title
    add_subtitle()Section heading with grey backgroundChapter headings
    add_metric_box()Shows a key number with label"Total Revenue: $125,000 +12%"
    add_chart()Inserts a chart imageYour graphs
    add_table()Creates a formatted data tableA nice table with headers
    add_paragraph()Adds regular textBody text

    The magic of PDF coordinates:

    • PDF pages are measured in millimetres from the top-left corner
    • self.set_font('Helvetica', 'B', 14) means "use Helvetica font, Bold, 14 point size"
    • self.cell(0, 10, text) means "create a text box full width, 10mm tall"
    • self.ln(5) means "move down 5mm" (like pressing Enter)

    Customising for your brand:

    # Change these colours to match your company:
    self.set_fill_color(50, 50, 50)      # Dark grey for table headers
    self.set_text_color(0, 150, 0)       # Green for positive numbers
    self.set_text_color(200, 0, 0)       # Red for negative numbers
    

    Step 4: Putting It All Together

    Now we combine all the pieces into one script that does everything from start to finish.

    What This Step Does (Plain English)

    This is the "master script" that coordinates everything:

    1. Reads your data file - Loads last 30 days of sales
    2. Calculates the numbers - Total revenue, order count, by region
    3. Creates the charts - Revenue trend + regional comparison
    4. Builds the PDF - Title, metrics, charts, tables
    5. Saves to disk - Ready to email

    Think of it like a conductor leading an orchestra—each musician (function) knows their part, and the conductor (this code) tells them when to play.

    Here's the complete pipeline that ties everything together:

    from datetime import datetime
    import os
    
    def generate_sales_report(data_file: str, output_path: str) -> str:
        """Generate a complete sales report PDF."""
    
        # Step 1: Load and process data
        pipeline = ReportDataPipeline(data_file)
        pipeline.load_data()
        pipeline.filter_date_range('order_date', days=30)
    
        metrics = pipeline.calculate_summary_metrics(
            value_column='revenue',
            group_column='region'
        )
    
        # Step 2: Generate charts
        charts = ReportChartGenerator()
    
        # Daily revenue trend
        daily_data = pipeline.processed_data.groupby(
            pipeline.processed_data['order_date'].dt.date
        )['revenue'].sum().reset_index()
        daily_data.columns = ['date', 'revenue']
    
        trend_chart = charts.create_line_chart(
            daily_data, 'date', 'revenue',
            'Daily Revenue Trend (Last 30 Days)'
        )
    
        # Revenue by region
        region_chart = charts.create_bar_chart(
            metrics['by_group'],
            'Revenue by Region',
            ylabel='Revenue ($)'
        )
    
        # Step 3: Build PDF
        pdf = BusinessReportPDF(company_name="Acme Manufacturing")
        pdf.alias_nb_pages()
        pdf.add_page()
    
        # Title
        pdf.add_title("Monthly Sales Report")
        pdf.add_paragraph(
            f"Report generated: {datetime.now().strftime('%d %B %Y at %H:%M')}\n"
            f"Period: Last 30 days"
        )
    
        # Key metrics section
        pdf.add_subtitle("Key Metrics")
        pdf.add_metric_box("Total Revenue", f"${metrics['total']:,.2f}", "+12.3%", True)
        pdf.add_metric_box("Total Orders", f"{metrics['count']:,}", "+8.1%", True)
        pdf.add_metric_box("Average Order", f"${metrics['average']:,.2f}", "-2.1%", False)
        pdf.ln(10)
    
        # Charts
        pdf.add_subtitle("Revenue Trend")
        pdf.add_chart(trend_chart)
    
        pdf.add_page()
        pdf.add_subtitle("Regional Performance")
        pdf.add_chart(region_chart)
    
        # Data table
        pdf.add_subtitle("Top 10 Orders")
        top_orders = pipeline.processed_data.nlargest(10, 'revenue')[
            ['order_id', 'customer', 'region', 'revenue']
        ].values.tolist()
    
        pdf.add_table(
            headers=['Order ID', 'Customer', 'Region', 'Revenue'],
            data=[[row[0], row[1], row[2], f"${row[3]:,.2f}"] for row in top_orders],
            col_widths=[30, 70, 40, 40]
        )
    
        # Save
        output_file = os.path.join(output_path,
            f"sales_report_{datetime.now().strftime('%Y%m%d')}.pdf")
        pdf.output(output_file)
    
        logging.info(f"Report generated: {output_file}")
        return output_file
    

    Line-by-Line Walkthrough (Plain English)

    Here's what each section of that code does:

    Loading and processing data:

    pipeline = ReportDataPipeline(data_file)  # Create our data handler
    pipeline.load_data()                       # Open the file
    pipeline.filter_date_range('order_date', days=30)  # Keep only last 30 days
    metrics = pipeline.calculate_summary_metrics(...)  # Do the maths
    

    Creating the charts:

    charts = ReportChartGenerator()  # Create our chart maker
    trend_chart = charts.create_line_chart(...)  # Make the trend line
    region_chart = charts.create_bar_chart(...)  # Make the bar chart
    

    Building the PDF:

    pdf = BusinessReportPDF(company_name="Acme Manufacturing")  # Start new PDF
    pdf.add_page()           # Create first page
    pdf.add_title(...)       # Add the big title
    pdf.add_metric_box(...)  # Add key numbers
    pdf.add_chart(...)       # Insert the charts
    pdf.add_table(...)       # Add data tables
    pdf.output(output_file)  # Save the PDF file
    

    How to run it:

    1. Save all the code above to a file called report_generator.py
    2. Make sure you have a CSV file with columns: order_date, revenue, region, order_id, customer
    3. In terminal, run: python report_generator.py
    4. Your PDF appears in the output folder!

    Step 5: Email Distribution

    Reports are useless if they sit on a server. Let's make them land in the right inboxes automatically.

    What This Step Does (Plain English)

    This code does exactly what you'd do manually:

    1. Opens a new email
    2. Fills in the recipients (To:)
    3. Adds a subject line
    4. Writes a message body
    5. Attaches your PDF report
    6. Clicks "Send"

    The difference? It does this in milliseconds, at 3am if you want, without you being awake.

    Here's how to email them automatically:

    import smtplib
    import ssl
    from email.mime.multipart import MIMEMultipart
    from email.mime.text import MIMEText
    from email.mime.application import MIMEApplication
    from dotenv import load_dotenv
    
    load_dotenv()
    
    def send_report_email(report_path: str, recipients: list,
                          subject: str = "Your Automated Report"):
        """Send the generated report via email."""
    
        smtp_server = os.getenv('SMTP_SERVER', 'smtp.gmail.com')
        smtp_port = int(os.getenv('SMTP_PORT', '587'))
        sender_email = os.getenv('SENDER_EMAIL')
        sender_password = os.getenv('SENDER_PASSWORD')
    
        # Create message
        msg = MIMEMultipart()
        msg['From'] = sender_email
        msg['To'] = ', '.join(recipients)
        msg['Subject'] = subject
    
        # Email body
        body = """
        Hi,
    
        Please find attached your automated report.
    
        This report was generated automatically. If you have questions about
        the data, please contact the analytics team.
    
        Regards,
        Automated Reporting System
        """
        msg.attach(MIMEText(body, 'plain'))
    
        # Attach PDF
        with open(report_path, 'rb') as f:
            attachment = MIMEApplication(f.read(), _subtype='pdf')
            attachment.add_header(
                'Content-Disposition',
                'attachment',
                filename=os.path.basename(report_path)
            )
            msg.attach(attachment)
    
        # Send
        try:
            context = ssl.create_default_context()
            with smtplib.SMTP(smtp_server, smtp_port) as server:
                server.starttls(context=context)
                server.login(sender_email, sender_password)
                server.sendmail(sender_email, recipients, msg.as_string())
    
            logging.info(f"Report emailed to {', '.join(recipients)}")
            return True
    
        except Exception as e:
            logging.error(f"Failed to send email: {e}")
            return False
    

    Understanding the Email Code (Plain English)

    Here's what each part does:

    # These lines read your email settings from a secure file
    smtp_server = os.getenv('SMTP_SERVER')  # Your email provider's server
    sender_email = os.getenv('SENDER_EMAIL')  # Your email address
    sender_password = os.getenv('SENDER_PASSWORD')  # Your app password
    
    # This creates the email message
    msg = MIMEMultipart()  # Start a new email
    msg['From'] = sender_email  # Who it's from
    msg['To'] = ', '.join(recipients)  # Who it's going to
    msg['Subject'] = subject  # Email subject line
    msg.attach(MIMEText(body, 'plain'))  # The message body
    
    # This attaches your PDF
    with open(report_path, 'rb') as f:  # Open the PDF file
        attachment = MIMEApplication(f.read())  # Read it into the email
        msg.attach(attachment)  # Attach it
    

    Setting Up Email (Step-by-Step)

    Option 1: Gmail (Most Common)

    Gmail requires a special "App Password" for scripts—you can't use your regular password.

    How to create a Gmail App Password:

    1. Go to myaccount.google.com
    2. Click Security in the left menu
    3. Under "Signing in to Google", click 2-Step Verification (enable it if you haven't)
    4. Scroll down and click App passwords
    5. Select "Mail" and "Windows Computer" (or Mac)
    6. Click Generate
    7. Copy the 16-character password it shows you

    Create a .env file in your project folder with your settings:

    SMTP_SERVER=smtp.gmail.com
    SMTP_PORT=587
    SENDER_EMAIL=yourname@gmail.com
    SENDER_PASSWORD=your-16-character-app-password
    

    Security warning: Never share your .env file or commit it to version control. Add .env to your .gitignore file.

    Option 2: Microsoft 365 / Outlook

    SMTP_SERVER=smtp.office365.com
    SMTP_PORT=587
    SENDER_EMAIL=yourname@company.com.au
    SENDER_PASSWORD=your-password
    

    Option 3: Your Company's Email Server

    Ask your IT department for:

    • SMTP server address
    • SMTP port (usually 587 or 465)
    • Whether authentication is required

    Step 6: Scheduling

    The final piece—making it run automatically, even when you're asleep.

    What This Step Does (Plain English)

    Right now, you have to manually run your script. Scheduling makes it run automatically at specific times:

    • "Run every Monday at 7am" - Weekly sales report
    • "Run on the 1st of every month" - Monthly financial summary
    • "Run every day at 6pm" - Daily operations snapshot

    It's like setting an alarm clock for your computer.

    Here's the code:

    import schedule
    import time
    
    def run_weekly_report():
        """Generate and distribute the weekly sales report."""
        try:
            report_file = generate_sales_report(
                data_file='/path/to/sales_data.csv',
                output_path='/path/to/reports/'
            )
    
            send_report_email(
                report_path=report_file,
                recipients=['manager@company.com', 'finance@company.com'],
                subject=f"Weekly Sales Report - {datetime.now().strftime('%d %b %Y')}"
            )
    
        except Exception as e:
            logging.error(f"Weekly report failed: {e}")
            # Send alert email to IT
            send_report_email(
                report_path=None,
                recipients=['it@company.com'],
                subject="ALERT: Weekly Report Generation Failed"
            )
    
    # Schedule for Monday mornings at 7:00 AM
    schedule.every().monday.at("07:00").do(run_weekly_report)
    
    # Keep the script running
    if __name__ == "__main__":
        logging.info("Report scheduler started")
        while True:
            schedule.run_pending()
            time.sleep(60)  # Check every minute
    

    For production deployments, I'd recommend using cron (Linux/Mac) or Task Scheduler (Windows) rather than keeping a Python script running. Here's the cron entry for Monday 7am:

    0 7 * * 1 /usr/bin/python3 /path/to/report_generator.py
    

    Common Pitfalls and How to Avoid Them

    After implementing automated reporting for dozens of Australian businesses, here are the issues that come up repeatedly:

    1. Data Quality Assumptions

    Your report will only be as good as your data. Build in validation checks:

    def validate_data(df: pd.DataFrame) -> tuple[bool, list]:
        """Validate data before report generation."""
        issues = []
    
        if df.empty:
            issues.append("Dataset is empty")
    
        if df.isnull().sum().sum() > len(df) * 0.1:
            issues.append("More than 10% missing values")
    
        # Check for duplicates
        if df.duplicated().sum() > 0:
            issues.append(f"{df.duplicated().sum()} duplicate rows found")
    
        return len(issues) == 0, issues
    

    2. Timezone Issues

    Australian businesses often have offices across multiple timezones. Always store datetimes in UTC and convert for display:

    from zoneinfo import ZoneInfo
    
    # Store in UTC
    utc_time = datetime.now(ZoneInfo('UTC'))
    
    # Display in local timezone
    local_time = utc_time.astimezone(ZoneInfo('Australia/Sydney'))
    

    3. File Path Handling

    Windows and Mac/Linux handle paths differently. Use pathlib for cross-platform compatibility:

    from pathlib import Path
    
    output_dir = Path('/path/to/reports')
    output_dir.mkdir(parents=True, exist_ok=True)
    output_file = output_dir / f"report_{datetime.now():%Y%m%d}.pdf"
    

    The ROI Case: Is This Worth Your Time?

    Let's do the maths honestly. According to Business Dasher's 2024 research, businesses using automation report 10-50% cost reduction through decreased manual processing.

    Here's a realistic breakdown for a typical weekly report:

    Manual vs Automated Reporting Costs

    Metric
    Manual Process
    Automated
    Improvement
    Time per report3 hours0 minutes100%
    Annual staff time156 hours0 hours156 hrs saved
    Error rate2-5%<0.1%~98%
    Delivery timeVariableAlways on timeConsistent

    Financial Breakdown (Single Weekly Report)

    Manual cost: 3hrs × 52 weeks × $60/hr$9,360/year
    Automation build (one-time)$3,000
    Annual maintenance (2hrs/month)$3,600/year
    First year savings$2,760
    Year 2+ annual savings$5,760/year

    The multiplier effect: Most businesses have 5-10 regular reports. Once you've built the infrastructure for one, adding more reports costs a fraction of the original build. The same code patterns work for:

    • Weekly sales reports
    • Monthly financial summaries
    • Daily operations snapshots
    • Quarterly board packs
    • Customer account statements

    Common Problems and How to Fix Them

    New to coding? Here are the issues you'll likely hit:

    "Python is not recognized" or "command not found"

    Problem: Your computer doesn't know where Python is installed. Fix (Windows): Reinstall Python and make sure to check "Add Python to PATH" Fix (Mac): Try using python3 instead of python

    "ModuleNotFoundError: No module named 'pandas'"

    Problem: The library isn't installed. Fix: Run pip install pandas (replace pandas with whatever module is missing)

    "Permission denied" when saving files

    Problem: You're trying to save somewhere you don't have access. Fix: Save to a folder in your Documents or Desktop instead

    "SMTP Authentication Error"

    Problem: Your email credentials are wrong or you need an App Password. Fix: For Gmail, create an App Password (see email setup section above)

    The PDF looks wrong or charts are cut off

    Problem: Page layout issues. Fix: Adjust figsize for charts and use pdf.add_page() before sections that don't fit

    "FileNotFoundError: [Errno 2] No such file"

    Problem: Python can't find your data file. Fix: Use the full path: C:\Users\YourName\Documents\sales.csv not just sales.csv


    Next Steps

    Your Implementation Roadmap

    1
    Day 1
    Setup & Test
    Install Python, VS Code, and run a simple test script
    2
    Day 2-3
    Data Pipeline
    Get data loading working with your actual files
    3
    Day 4-5
    Charts & PDF
    Create your first automated report
    4
    Day 6-7
    Email & Schedule
    Set up automatic delivery
    5
    Ongoing
    Expand
    Add more reports using the same infrastructure

    If you're ready to automate your reporting:

    1. Start simple: Pick your most time-consuming report first
    2. Document the current process: What data? What calculations? Who receives it?
    3. Build incrementally: Get data extraction working, then charts, then PDF, then email
    4. Test with real data: Edge cases always surprise you
    5. Add error handling: Things will fail; plan for it

    Resources for learning more:


    Need Help?

    Want help implementing automated reporting for your business? We've deployed these systems across accounting firms, manufacturers, and logistics companies throughout Australia.

    What's the Right Next Step for You?

    How comfortable are you with technical implementation?
    I can follow this tutorial myself
    → Bookmark this page and get started!
    I'd like someone to build it for me
    → Book a free consultation below
    I need ongoing support and maintenance
    → Ask about our retainer packages
    I'm not sure what I need yet
    → Let's chat about your requirements

    Book a Free Assessment


    Solve8 helps Australian businesses automate repetitive processes. Based in Brisbane, working nationally. No buzzwords, just systems that work.

    Sources: