Hassan Agmir Hassan Agmir

Excel with Python: Mastering Automation (1)

Hassan Agmir
Excel with Python: Mastering Automation (1)

Introduction to Python and Excel

Microsoft Excel has long been the de facto standard for storing, manipulating, and visualizing tabular data in business, academia, and research environments. From simple personal expense trackers to complex corporate financial models, Excel spreadsheets play a central role in diverse workflows. However, as datasets have grown larger and more complex, manual point‐and‐click interaction within Excel can become both time‐consuming and error‐prone. Moreover, repeatedly performing identical tasks—such as cleaning data, generating reports, or applying standard formatting—can be tedious when done entirely by hand.

Enter Python. Over the past decade, Python has evolved into one of the most popular general‐purpose programming languages, especially within data science, automation, web development, and scientific research communities. Its simplicity, readability, and expansive ecosystem of libraries make it an ideal choice for automating Excel workflows, processing large datasets, and bridging Excel with other data sources (e.g., databases, web APIs). By integrating Python with Excel, one can leverage powerful data‐manipulation libraries (such as pandas), advanced Excel file writers (such as XlsxWriter or openpyxl), and even control a live Excel application via COM automation or xlwings to create highly customized, reproducible, and scalable processes.

This article provides an in‐depth, comprehensive guide to working with Excel files using Python. We will explore popular libraries, demonstrate how to read from and write to spreadsheets, showcase advanced features like formatting, formulas, and charts, and discuss best practices for performance and maintenance. Whether you are an Excel power user seeking to automate repetitive tasks, a data analyst aiming to generate recurring reports, or a developer looking to integrate Excel output into a larger pipeline, this guide will serve as a reference and inspiration for harnessing Python’s capabilities alongside Excel.

Why Use Python with Excel

Before diving into specific libraries and code examples, it is worth understanding why combining Python with Excel is so appealing:

  1. Automation of Repetitive Tasks
    Many professionals find themselves repeating the same series of steps in Excel: opening a workbook, cleaning data, applying formulas, formatting cells, generating charts, and saving a new file. Automating these tasks with Python not only saves time but also reduces the risk of human error.
  2. Scalability
    Excel’s performance declines when working with very large spreadsheets—especially when more than a few hundred thousand rows are involved. Python, coupled with libraries like pandas, can handle datasets in the millions of rows (depending on system memory), perform bulk transformations quickly, and then write results back into one or more optimized Excel files.
  3. Reproducibility and Version Control
    Code is inherently version‐controlled and reproducible. Writing a Python script or Jupyter notebook to process and output Excel files ensures that every transformation step is documented. If a colleague or future you needs to understand exactly how a spreadsheet was generated, the Python code provides an auditable trail.
  4. Integration with Other Data Sources
    Real‐world data rarely lives only in Excel. Python excels (no pun intended) at connecting to databases (SQL Server, PostgreSQL, MySQL, etc.), consuming web APIs, scraping websites, reading JSON/CSV/Parquet files, and then consolidating that data. By using Python to centralize data ingestion and processing, you can produce unified Excel reports that combine insights from disparate sources.
  5. Advanced Analytics
    Beyond simple data cleaning, Python offers libraries for statistical modeling (statsmodels, SciPy), machine learning (scikit‐learn, TensorFlow, PyTorch), geospatial analysis (GeoPandas), and NLP (NLTK, spaCy). You can perform a regression, predict customer churn, analyze text, or cluster data—all in Python—and then export results to Excel for distribution to non‐technical stakeholders.
  6. Customization and Extensibility
    Excel’s built‐in charting and formatting features are powerful but can be limiting. Python libraries like XlsxWriter and openpyxl allow for highly customized cell styling, conditional formatting rules, images, and even embedding of Python‐generated plots. If you need to automate creation of complex dashboards or visually compelling reports, Python provides a level of control that manual Excel simply cannot match.
  7. Cost‐Effectiveness and Accessibility
    While Excel is a paid product, the Python ecosystem is entirely open source and free. By learning Python, you unlock a range of free tools that can augment or even replace certain commercial Excel add‐ins or specialized software. Furthermore, Python code can be reused across platforms (Windows, macOS, Linux) without modification, ensuring accessibility for diverse teams.

Python Libraries for Excel

Over the years, numerous Python libraries have been developed to interact with Excel files. Some focus on reading and writing data, while others allow full manipulation of workbooks and even control of a running Excel application. Below is an overview of the most commonly used libraries:

  1. pandas
    • Primary Use: Data ingestion, transformation, and analysis.
    • Supported File Formats: .xls, .xlsx, .xlsm, .xlsb. Under the hood, pandas can utilize xlrd, openpyxl, pyxlsb, and odf to read various formats, and openpyxl, xlsxwriter, or xlwt to write.
    • Strengths: High‐level DataFrame API, powerful groupby/aggregation, excellent performance on large tabular data, built‐in read/write to Excel.
    • Limitations: Less control over low‐level Excel features (e.g., intricate styling, chart creation). For advanced Excel features, pandas usually defers to openpyxl or XlsxWriter.
  2. openpyxl
    • Primary Use: Read/write .xlsx and .xlsm files, manipulate workbooks (styles, merged cells, formulas, charts).
    • Supported File Formats: Modern Office Open XML (i.e., .xlsx, .xlsm).
    • Strengths: Officially maintained library for editing existing .xlsx files; good support for complex features like conditional formatting, named styles, data validation, charts, and VBA integration.
    • Limitations: Does not support older .xls format. For reading .xls, one typically falls back on xlrd.
  3. XlsxWriter
    • Primary Use: Write new .xlsx files with extensive formatting, rich features like charts, images, conditional formatting, and more.
    • Supported File Formats: .xlsx (write‐only).
    • Strengths: Extremely fast writing speed, rich feature set for formatting and charting.
    • Limitations: Write‐only: cannot read or modify existing workbooks. If you need to update an existing file, you must load it with openpyxl and then save or generate a new workbook.
  4. xlrd and xlwt
    • xlrd: Read old Excel files (both .xls and earlier .xlsx), although as of version 2.0.1, xlrd dropped support for .xlsx. If you need to read .xlsx, you must use an older xlrd <= 1.2.0 or rely on openpyxl.
    • xlwt: Write .xls files (BIFF8 format). Generally considered legacy since modern Excel uses .xlsx.
    • Limitations: Slow on large datasets, limited formatting capabilities compared to openpyxl and XlsxWriter. Both libraries see minimal maintenance as most work has moved to openpyxl and XlsxWriter.
  5. pyxlsb
    • Primary Use: Read binary Excel files (.xlsb) quickly in Python.
    • Supported File Formats: .xlsb.
    • Strengths: Can parse binary format fast, good for large files saved in .xlsb for performance.
    • Limitations: Read‐only, limited formatting information.
  6. xlwings
    • Primary Use: Automate and control a live Excel application via the COM API (Windows) or appscript (macOS). Read/write cells, call Python functions directly from VBA, embed Python in Excel.
    • Strengths: Interact with the actual Excel application—perfect for macros, UDFs (user‐defined functions), and real‐time read/write operations. You can build interactive customizations where Python does the heavy lifting and Excel serves as a front end.
    • Limitations: Requires Excel to be installed; not suited for headless environments (e.g., Linux without a GUI). Can be slower for bulk I/O compared to file‐based libraries.
  7. pywin32 (win32com)
    • Primary Use: Low‐level COM automation of Excel (Windows only). Similar to xlwings but requires more boilerplate code.
    • Strengths: Full control over the live Excel COM interface, can execute any action possible from VBA.
    • Limitations: Verbose code, tricky COM error handling, Windows‐only, slower for large reads/writes.
  8. openpyxl‐templater, morpholib, pyxll, ezodf
    • Secondary Tools: Depending on specialized needs (e.g., templating, connecting to ODF spreadsheets, embedding Python UDFs directly into Excel). These libraries serve niche requirements and may not be as widely used.

In most modern Python–Excel workflows, one or more of the above libraries are combined. A typical pattern would be:

  1. Use pandas.read_excel (which under the hood might call xlrd, openpyxl, or pyxlsb) to load Excel data into a pandas DataFrame.
  2. Perform data cleaning/analysis using pandas (apply filters, groupbys, calculations, merges).
  3. Write results back to Excel via DataFrame.to_excel with xlsxwriter or openpyxl as the engine, adding custom styles or charts as needed via the chosen engine (e.g., using XlsxWriter’s Workbook.add_format(), add_chart(), etc.).
  4. If you need to update an existing file’s styles, formulas, or maintain macros/VBA, load the file with openpyxl, make incremental changes, and save.

The sections that follow will delve into these steps, providing concrete code examples and best practices to help you choose the right tool for each task.

Reading Excel Files in Python

At its simplest, reading an Excel file means importing data into a structure you can manipulate—most commonly, a pandas DataFrame. Pandas abstracts away differences between .xls, .xlsx, and .xlsb: you simply call pd.read_excel, specify the filename (or URL), and pandas handles the details.

1. Using pandas to Read Excel

import pandas as pd

# Read the first sheet from a .xlsx file
df = pd.read_excel("sales_data.xlsx")
print(df.head())

# Read a specific sheet by name
df_january = pd.read_excel("sales_data.xlsx", sheet_name="January")

# Read multiple sheets into a dict of DataFrames
all_sheets = pd.read_excel("sales_data.xlsx", sheet_name=None)
# 'all_sheets' will be a dict like {'January': DataFrame, 'February': DataFrame, ...}

# Read only specific columns (by index or name)
df_subset = pd.read_excel(
    "sales_data.xlsx",
    sheet_name="January",
    usecols=["Date", "Customer", "Amount"]
)

# Read a subset of rows (skip rows or use nrows)
df_skip = pd.read_excel("sales_data.xlsx", skiprows=2, nrows=100)

Key Parameters for pd.read_excel()

  • sheet_name: Default is 0 (first sheet). Can be a string, integer, list of sheet names/indices, or None to read all sheets.
  • usecols: Specify which columns to parse (e.g., ["A:C", "E"] or column names).
  • skiprows: Number of rows (or list of row indices) to skip at the start. Useful if your file has descriptive headers.
  • nrows: Number of rows to read.
  • header: Row index (0‐based) to use for column labels; None if there is no header row.
  • dtype: Dictionary of column names to data types.
  • parse_dates: List of columns (or combinations) to parse as dates.
  • engine: Specify “openpyxl”, “xlrd”, or “pyxlsb” explicitly if needed (e.g., to read .xlsb, you must set engine="pyxlsb").

Because pandas sits atop lower‐level libraries, installing pandas often pulls in the recommended Excel engines (e.g., openpyxl). However, to read .xlsb you must install pyxlsb:

pip install pyxlsb

Then:

df_xlsb = pd.read_excel("large_data.xlsb", engine="pyxlsb")

2. Using openpyxl Directly for Granular Control

While pandas is excellent for bulk data import into DataFrames, there are times when you need to examine cell formatting, merge cells, or modify formulas in an existing workbook. In such cases, you use openpyxl’s load_workbook():

from openpyxl import load_workbook

# Load the workbook; data_only=True means read cell values instead of formulas
wb = load_workbook(filename="report.xlsx", data_only=True)

# List sheet names
print(wb.sheetnames)

# Access a worksheet by name
ws = wb["January"]

# Read a single cell’s value
cell_value = ws["B2"].value
print(cell_value)

# Iterate through rows and columns
for row in ws.iter_rows(min_row=2, max_row=10, min_col=1, max_col=5):
    for cell in row:
        print(cell.value, end="  ")
    print()

When to use openpyxl directly?

  • You need to access cell styles (fonts, fills, borders).
  • You need to read formulas (without evaluating them).
  • You need to retrieve comments or hyperlinks.
  • You want to manipulate named ranges, data validation rules, or create charts manually.
  • You need to modify an existing file without losing existing formatting or VBA.

However, openpyxl loads entire workbooks into memory as Python objects. If you have very large sheets (hundreds of thousands of rows), openpyxl can be quite slow. For sheer data‐loading speed into DataFrames, pandas is preferable; pandas effectively uses optimized C‐implemented parsers beneath the hood.

Writing Excel Files in Python

Writing data (and optionally styles/charts) to Excel from Python is one of the most common reasons for integrating the two. Whether you’re generating a weekly report, producing a template for business users, or exporting analysis results, Python offers multiple tools to write Excel files. Below are the most prevalent methods:

1. Using pandas.DataFrame.to_excel

If your primary goal is to output a DataFrame to an Excel file without too much custom styling, to_excel is extremely straightforward:

import pandas as pd

# Suppose 'df' is your DataFrame
df = pd.DataFrame({
    "Date": pd.date_range("2025-01-01", periods=5),
    "Region": ["East", "West", "North", "South", "East"],
    "Sales": [500, 700, 450, 620, 550]
})

# Write to a new Excel file
df.to_excel("sales_summary.xlsx", index=False)

# Write multiple DataFrames to different sheets
with pd.ExcelWriter("multi_sheet_report.xlsx", engine="xlsxwriter") as writer:
    df.to_excel(writer, sheet_name="Jan", index=False)
    df.describe().to_excel(writer, sheet_name="Stats", index=True)

By default, to_excel uses openpyxl for .xlsx output if it is installed; otherwise, it will fall back to XlsxWriter. You can explicitly choose the engine:

df.to_excel("styled_report.xlsx", engine="xlsxwriter", sheet_name="Data", index=False)

Advantages

  • Minimal code required.
  • Integrates seamlessly with pandas data manipulation pipeline.
  • Automatically writes column headers, handles index, etc.

Disadvantages

  • Limited styling: you can pass a format_dataframe() step later, but for advanced formatting, you often need to use the chosen engine’s API directly.
  • Not ideal if you need to append to an existing file’s sheets or retain macros.

2. Creating a Workbook with XlsxWriter

XlsxWriter is a powerful library dedicated solely to writing .xlsx files. Its feature set includes extensive formatting options, multiple chart types, conditional formatting, and the ability to insert images. Because it is write‐only, it cannot modify existing workbooks; instead, you generate a new workbook from scratch.

import pandas as pd
import xlsxwriter

# Create a workbook and add a worksheet
workbook = xlsxwriter.Workbook("financial_report.xlsx")
worksheet = workbook.add_worksheet("Summary")

# Define some cell formats
header_format = workbook.add_format({
    "bold": True,
    "text_wrap": True,
    "valign": "top",
    "fg_color": "#D7E4BC",
    "border": 1
})
currency_format = workbook.add_format({"num_format": "$#,##0"})

# Write headers
headers = ["Account", "Q1", "Q2", "Q3", "Q4", "Total"]
for col_num, header in enumerate(headers):
    worksheet.write(0, col_num, header, header_format)

# Sample financial data
financial_data = [
    ["Revenue", 50000, 60000, 55000, 62000],
    ["Cost of Goods Sold", 20000, 24000, 22000, 26000],
    ["Gross Profit", 30000, 36000, 33000, 36000],
]

# Write data rows and compute totals
for row_num, row_data in enumerate(financial_data, start=1):
    worksheet.write(row_num, 0, row_data[0])  # Account name
    for col_offset, val in enumerate(row_data[1:], start=1):
        worksheet.write(row_num, col_offset, val, currency_format)
    # Write a formula for Total column
    worksheet.write(
        row_num,
        5,
        f"=SUM(B{row_num + 1}:E{row_num + 1})",
        currency_format
    )

# Create a chart
chart = workbook.add_chart({"type": "column"})
chart.add_series({
    "name": "Revenue",
    "categories": ["Summary", 1, 0, 1, 0],
    "values": ["Summary", 1, 1, 1, 1],
})
worksheet.insert_chart("H2", chart)

# Close the workbook (saves the file)
workbook.close()

Highlights of XlsxWriter

  • Formats: Create named formats (add_format) for fonts, borders, colors, number formats, alignment, etc.
  • Formulas: Write Excel formulas directly as strings.
  • Charts: Support for dozens of chart types (line, column, bar, pie, scatter, radar, etc.) with fine‐grained customization.
  • Conditional Formatting: Define rules (e.g., color‐scale, data bars) to highlight cells based on criteria.
  • Images: Insert PNG, JPEG, BMP, and other image types.
  • Data Validation: Create drop‐down lists, restrict cell input by type/range.
  • Protection: Set worksheet/protection, allow specific cells to be edited.

Because XlsxWriter is write‐only, if you need to start from an existing template (e.g., with corporate branding, macros), you need either to load that template as a starting point in openpyxl (which can read the file), copy over necessary content, and then write out. Alternatively, with newer versions, you can use XlsxWriter’s “ext_endpoint” to modify a template, but that is more advanced.

3. Using openpyxl to Create or Modify Workbooks

Openpyxl serves two main roles:

  1. Modify Existing Workbooks
    If you have a template file (template.xlsx) that already contains corporate headers, a logo, or some pre‐formatted cells, you can load it and make incremental changes:
  2. from openpyxl import load_workbook
    
    # Load existing workbook
    wb = load_workbook("template.xlsx")
    
    # Select a sheet
    ws = wb.active  # or wb["DataSheet"]
    
    # Write data to specific cells
    ws["B2"] = "Quarterly Report"
    ws["B3"] = 2025
    
    # Suppose we have a list of data rows
    data = [
        ["Account", "Value"],
        ["Revenue", 150000],
        ["Expenses", 80000],
        ["Profit", 70000],
    ]
    for row_idx, row_data in enumerate(data, start=5):
        for col_idx, val in enumerate(row_data, start=1):
            ws.cell(row=row_idx, column=col_idx, value=val)
    
    # Modify an existing cell’s style
    bold_font = openpyxl.styles.Font(bold=True)
    ws["A5"].font = bold_font
    
    # Save changes
    wb.save("report_with_data.xlsx")
  3. Create a Workbook from Scratch
    Although XlsxWriter often outperforms openpyxl in raw writing speed, openpyxl can also create brand‐new workbooks:
  4. from openpyxl import Workbook
    from openpyxl.styles import PatternFill, Border, Side, Alignment, Font
    
    # Create a new workbook and select active worksheet
    wb = Workbook()
    ws = wb.active
    ws.title = "SalesData"
    
    # Write headers with style
    headers = ["Region", "Month", "Sales"]
    header_font = Font(bold=True, color="FFFFFF")
    header_fill = PatternFill("solid", fgColor="4F81BD")
    for col_num, header in enumerate(headers, start=1):
        cell = ws.cell(row=1, column=col_num, value=header)
        cell.font = header_font
        cell.fill = header_fill
        cell.alignment = Alignment(horizontal="center")
    
    # Write sample data
    rows = [
        ["East", "Jan", 50000],
        ["West", "Jan", 42000],
        ["North", "Jan", 38000],
    ]
    for row_idx, row_data in enumerate(rows, start=2):
        for col_idx, value in enumerate(row_data, start=1):
            ws.cell(row=row_idx, column=col_idx, value=value)
    
    # Auto‐adjust column widths
    for column_cells in ws.columns:
        length = max(len(str(cell.value)) for cell in column_cells) + 2
        column_letter = column_cells[0].column_letter
        ws.column_dimensions[column_letter].width = length
    
    # Save workbook
    wb.save("new_sales_report.xlsx")

Highlights of openpyxl

  • Read & Write: Can modify existing .xlsx files and create new ones.
  • Styles: Offers rich styling support (fonts, fills, borders, alignment, number formats).
  • Charts: Support for embedding charts (bar, line, pie, scatter, bubble, etc.) but with slightly different API than XlsxWriter.
  • Formulas: Read formulas (when data_only=False) and write new formulas.
  • Data Validation: Add/drop‐down lists, date restrictions, numeric constraints.
  • Comments and Hyperlinks: Add cell comments and hyperlinks.
  • Merge/Split Cells: Merge cell ranges, unmerge as needed.
  • Named Ranges, Filters, and Tables: Create and manipulate Excel tables with column headers.
  • Limitations: Slower than XlsxWriter for writing large datasets; memory usage can be high for huge workbooks.

Formatting Cells, Styles, and Conditional Formatting

Creating a polished, professional‐looking Excel report often requires cell formatting: setting fonts, background colors, borders, number formats, alignment, and conditional highlighting. Although pandas’ to_excel can accept a formatters argument for simple formatting, for advanced styling you rely on openpyxl or XlsxWriter directly.

1. Basic Cell Formatting with XlsxWriter

import xlsxwriter

workbook = xlsxwriter.Workbook("styled_data.xlsx")
worksheet = workbook.add_worksheet("Data")

# Define formats
header_format = workbook.add_format({
    "bold": True,
    "font_color": "#FFFFFF",
    "bg_color": "#4F81BD",
    "align": "center",
    "valign": "vcenter",
    "border": 1
})
money_format = workbook.add_format({
    "num_format": "$#,##0.00",
    "align": "right"
})
date_format = workbook.add_format({
    "num_format": "yyyy-mm-dd",
    "align": "center"
})

# Write headers
headers = ["Date", "Item", "Category", "Amount"]
for col, header in enumerate(headers):
    worksheet.write(0, col, header, header_format)

# Sample data
data = [
    ["2025-01-05", "Widget A", "Widgets", 150.75],
    ["2025-01-06", "Gadget B", "Gadgets", 230.00],
    ["2025-01-07", "Widget C", "Widgets", 125.50],
]

# Write data with formatting
for row, (date_str, item, category, amount) in enumerate(data, start=1):
    worksheet.write_datetime(row, 0, pd.to_datetime(date_str), date_format)
    worksheet.write(row, 1, item)
    worksheet.write(row, 2, category)
    worksheet.write_number(row, 3, amount, money_format)

# Conditional formatting: highlight amounts > 200
worksheet.conditional_format("D2:D4", {
    "type": "cell",
    "criteria": ">",
    "value": 200,
    "format": workbook.add_format({"bg_color": "#FFC7CE", "font_color": "#9C0006"})
})

workbook.close()

Key points:

  • Use add_format() to define named formats with options such as num_format, border, align, valign, font_color, bg_color, bold, italic, text_wrap, etc.
  • Use conditional_format() to define cell‐level rules, specifying a cell range (e.g., "D2:D4") and rule properties (type, criteria, value, and format).
  • Writing dates using write_datetime() ensures correct Excel serialization as a numeric date value; otherwise, Excel might treat your date as a string.

2. Advanced Styling with openpyxl

from openpyxl import Workbook
from openpyxl.styles import Font, PatternFill, Border, Side, Alignment

wb = Workbook()
ws = wb.active
ws.title = "Inventory"

# Define styles
header_font = Font(name="Calibri", size=12, bold=True, color="FFFFFF")
header_fill = PatternFill("solid", fgColor="368BC1")
thin_border = Border(
    left=Side(style="thin"), 
    right=Side(style="thin"), 
    top=Side(style="thin"), 
    bottom=Side(style="thin")
)
center_align = Alignment(horizontal="center", vertical="center")

# Headers
headers = ["Product ID", "Name", "Qty", "Reorder Level"]
for col_idx, header in enumerate(headers, start=1):
    cell = ws.cell(row=1, column=col_idx, value=header)
    cell.font = header_font
    cell.fill = header_fill
    cell.border = thin_border
    cell.alignment = center_align

# Sample inventory data
inventory = [
    [101, "Widget A", 120, 50],
    [102, "Widget B", 45, 50],
    [103, "Gadget C", 200, 30],
    [104, "Gadget D", 25, 30],
]

for row_idx, row_data in enumerate(inventory, start=2):
    for col_idx, val in enumerate(row_data, start=1):
        cell = ws.cell(row=row_idx, column=col_idx, value=val)
        cell.border = thin_border
        if col_idx == 3 and val < row_data[3]:
            # If quantity < reorder level, highlight in red
            cell.fill = PatternFill("solid", fgColor="FF0000")
            cell.font = Font(color="FFFFFF")

# Adjust column widths
for col in ws.columns:
    max_length = 0
    column = col[0].column_letter
    for cell in col:
        try:
            if len(str(cell.value)) > max_length:
                max_length = len(str(cell.value))
        except:
            pass
    ws.column_dimensions[column].width = max_length + 2

wb.save("inventory_report.xlsx")

Key points:

  • Font, PatternFill, Border, Side, and Alignment are building blocks for openpyxl styles.
  • You can apply style attributes to individual Cell objects. For batch styling, consider creating a named style via NamedStyle.
  • Conditional formatting in openpyxl is achieved via worksheet.conditional_formatting, though the API is more verbose than XlsxWriter’s.

3. Conditional Formatting Rules

Beyond basic highlight‐if‐above/below conditions, Excel supports a variety of conditional formatting types:

  • Color scales: A gradient from low to high values.
  • Data bars: Bars within cells that represent magnitude.
  • Icon sets: Display icons (arrows, traffic lights, stars) based on value.
  • Top/Bottom rules: Highlight top 10%, bottom 10 items, etc.
  • Custom formulas: Use an Excel formula to determine whether to apply formatting.

XlsxWriter Example: Data Bars

import xlsxwriter

workbook = xlsxwriter.Workbook("data_bars.xlsx")
worksheet = workbook.add_worksheet("Scores")

# Sample data
scores = [88, 75, 92, 58, 68, 80, 95, 62, 73, 85]
worksheet.write_column("A1", scores)

# Apply data bar conditional formatting
worksheet.conditional_format("A1:A10", {
    "type": "data_bar",
    "bar_color": "#63C384"
})

workbook.close()

openpyxl Example: Color Scale

from openpyxl import Workbook
from openpyxl.formatting.rule import ColorScaleRule

wb = Workbook()
ws = wb.active
ws.title = "TestScores"

# Sample test scores
test_scores = [65, 78, 82, 90, 55, 70, 88, 95, 60, 72]
for idx, score in enumerate(test_scores, start=1):
    ws.cell(row=idx, column=1, value=score)

# Define a 2‐color scale (from red at low values to green at high values)
color_scale_rule = ColorScaleRule(
    start_type="min",
    start_color="FF0000",
    end_type="max",
    end_color="00FF00"
)
ws.conditional_formatting.add("A1:A10", color_scale_rule)

wb.save("color_scale_scores.xlsx")

Creating Charts in Excel via Python

Visualizing data is often crucial for effective reporting. Both XlsxWriter and openpyxl support chart creation, though XlsxWriter tends to have a more user‐friendly and flexible API.

1. Charts with XlsxWriter

import pandas as pd
import xlsxwriter

# Sample DataFrame
df = pd.DataFrame({
    "Month": ["Jan", "Feb", "Mar", "Apr", "May"],
    "Sales": [12000, 15000, 17000, 16000, 19000],
    "Expenses": [8000, 9000, 9500, 10000, 11000],
})

# Create a workbook
workbook = xlsxwriter.Workbook("sales_chart.xlsx")
worksheet = workbook.add_worksheet("Data")
chart_sheet = workbook.add_worksheet("Chart")

# Write data into the Data worksheet
worksheet.write_row("A1", df.columns)
for idx, row in df.iterrows():
    worksheet.write(idx + 1, 0, row["Month"])
    worksheet.write(idx + 1, 1, row["Sales"])
    worksheet.write(idx + 1, 2, row["Expenses"])

# Create a column chart
chart = workbook.add_chart({"type": "column"})

# Configure the "Sales" series
chart.add_series({
    "name": "Sales",
    "categories": ["Data", 1, 0, 5, 0],   # (sheet, first_row, first_col, last_row, last_col)
    "values": ["Data", 1, 1, 5, 1],
    "fill": {"color": "#5B9BD5"},
})

# Configure the "Expenses" series
chart.add_series({
    "name": "Expenses",
    "categories": ["Data", 1, 0, 5, 0],
    "values": ["Data", 1, 2, 5, 2],
    "fill": {"color": "#ED7D31"},
})

# Add chart title and axis labels
chart.set_title({"name": "Monthly Sales vs. Expenses"})
chart.set_x_axis({"name": "Month"})
chart.set_y_axis({"name": "Amount (USD)"})

# Move chart to designated sheet
chart_sheet.insert_chart("B2", chart, {"x_scale": 1.5, "y_scale": 1.5})

workbook.close()

Key points:

  • Use add_chart({"type": "<chart_type>"}) to create a chart object.
  • Each series requires name, categories, and values, specified as [sheetname, first_row, first_col, last_row, last_col] tuples.
  • You can customize colors, line styles, markers, data labels, trendlines, and more via series.set_* methods or additional add_series parameters.
  • Insert the chart into a sheet via insert_chart(cell_location, chart_object, options).

2. Charts with openpyxl

from openpyxl import Workbook
from openpyxl.chart import BarChart, Reference, LineChart

wb = Workbook()
ws = wb.active
ws.title = "Performance"

# Write sample data
data = [
    ["Year", "Sales", "Profit"],
    [2021, 50000, 10000],
    [2022, 60000, 15000],
    [2023, 75000, 20000],
    [2024, 82000, 24000],
]
for row in data:
    ws.append(row)

# Create a bar chart
bar_chart = BarChart()
bar_chart.title = "Sales by Year"
bar_chart.x_axis.title = "Year"
bar_chart.y_axis.title = "Sales (USD)"

# Data for the chart (exclude header row)
data_ref = Reference(ws, min_col=2, min_row=1, max_row=5, max_col=2)
cats_ref = Reference(ws, min_col=1, min_row=2, max_row=5)
bar_chart.add_data(data_ref, titles_from_data=True)
bar_chart.set_categories(cats_ref)

# Place the chart on the sheet
ws.add_chart(bar_chart, "E2")

# Create a line chart for Profit
line_chart = LineChart()
line_chart.title = "Profit by Year"
line_chart.x_axis.title = "Year"
line_chart.y_axis.title = "Profit (USD)"

data_ref2 = Reference(ws, min_col=3, min_row=1, max_row=5, max_col=3)
line_chart.add_data(data_ref2, titles_from_data=True)
line_chart.set_categories(cats_ref)
ws.add_chart(line_chart, "E20")

wb.save("yearly_performance.xlsx")

Key points:

  • Instantiate a chart object (e.g., BarChart(), LineChart(), PieChart()).
  • Use Reference to specify the data range and categories.
  • Call chart.add_data(data_ref, titles_from_data=True) to include column headers as series names.
  • Position the chart via worksheet.add_chart(chart_object, position_string).

While openpyxl’s charting API is comprehensive, it can feel more verbose compared to XlsxWriter. Additionally, XlsxWriter tends to produce smaller file sizes and faster writes for chart‐heavy workbooks.

Read More ->

Subscribe to my Newsletters

Stay updated with the latest programming tips, tricks, and IT insights! Join my community to receive exclusive content on coding best practices.

© Copyright 2025 by Hassan Agmir . Built with ❤ by Me