Excel with Python: Mastering Automation (2)
Working with Formulas
Often, you want Python to insert formulas into the Excel so that users can adjust inputs later and let Excel recalculate dynamically. Both XlsxWriter and openpyxl enable writing formulas directly into cells.
1. Writing Formulas with XlsxWriter
import xlsxwriter
workbook = xlsxwriter.Workbook("formula_example.xlsx")
worksheet = workbook.add_worksheet("Calculations")
# Write sample numbers
worksheet.write("A1", 10)
worksheet.write("A2", 20)
worksheet.write("A3", 30)
# Write a formula that sums A1:A3
worksheet.write("A4", "=SUM(A1:A3)")
# Write a formula that calculates average
worksheet.write("A5", "=AVERAGE(A1:A3)")
# Write a formula with a reference to a different sheet
sheet2 = workbook.add_worksheet("MoreData")
sheet2.write("B1", 100)
worksheet.write("A6", "=MoreData!B1 * 2")
workbook.close()Key points:
- Always prefix the string with = in write(cell, formula_string).
- You cannot evaluate formulas in Python; Excel will evaluate upon opening.
2. Writing and Reading Formulas with openpyxl
from openpyxl import Workbook, load_workbook
# Create a new workbook and write formulas
wb = Workbook()
ws = wb.active
ws["A1"] = 5
ws["A2"] = 15
ws["A3"] = 25
ws["A4"] = "=SUM(A1:A3)"
ws["A5"] = "=A4 / 3"
wb.save("openpyxl_formulas.xlsx")
# Load workbook in data_only mode to read evaluated values
wb2 = load_workbook("openpyxl_formulas.xlsx", data_only=True)
ws2 = wb2.active
print(ws2["A4"].value) # This might be None if Excel hasn’t calculated yetNote: By default, openpyxl does not calculate formulas; it displays the last stored value in the file (if present). If you require up‐to‐date formula results, you must open the file in Excel so that it recalculates the formulas and then save, or rely on Excel’s auto‐calculation. There are external libraries (e.g., formulas, xlcalculator) that attempt to evaluate formulas within Python, but they do not cover all Excel functions perfectly.
Advanced Features: Merged Cells, Filters, Data Validation, and Tables
Python libraries give you access to various Excel features that go beyond simply writing data.
1. Merging and Unmerging Cells
XlsxWriter
import xlsxwriter
wb = xlsxwriter.Workbook("merge_cells.xlsx")
ws = wb.add_worksheet("Summary")
# Merge cells A1 to C1
ws.merge_range("A1:C1", "Quarterly Summary", wb.add_format({
"bold": True,
"align": "center",
"valign": "vcenter",
"font_size": 14
}))
wb.close()openpyxl
from openpyxl import Workbook
wb = Workbook()
ws = wb.active
ws.title = "Header"
# Merge cells A1:C1
ws.merge_cells("A1:C1")
ws["A1"] = "Merged Header"
# Unmerge cells
ws.unmerge_cells("A1:C1")
wb.save("openpyxl_merge.xlsx")2. Filters and Freezing Panes
XlsxWriter
import xlsxwriter
wb = xlsxwriter.Workbook("filters_freeze.xlsx")
ws = wb.add_worksheet("Data")
# Write sample data with headers
headers = ["Product", "Category", "Price"]
data = [
["Widget A", "Widgets", 25.50],
["Gadget B", "Gadgets", 40.00],
["Widget C", "Widgets", 30.75],
["Gizmo D", "Gizmos", 15.20],
]
# Write headers
ws.write_row("A1", headers)
# Write data
for i, row in enumerate(data, start=1):
ws.write_row(i, 0, row)
# Add filter to header row (row=0, columns 0-2)
ws.autofilter("A1:C5")
# Freeze header row (row 1)
ws.freeze_panes(1, 0)
wb.close()openpyxl
from openpyxl import Workbook
wb = Workbook()
ws = wb.active
ws.title = "Sales"
# Write sample data with headers
headers = ["Date", "Region", "Sales"]
rows = [
["2025-01-01", "East", 50000],
["2025-01-02", "West", 42000],
["2025-01-03", "North", 38000],
]
ws.append(headers)
for row in rows:
ws.append(row)
# Add auto filter
ws.auto_filter.ref = "A1:C4"
# Freeze top row
ws.freeze_panes = "A2"
wb.save("openpyxl_filters.xlsx")
3. Data Validation (Drop‐down Lists, Numeric Constraints)
XlsxWriter
import xlsxwriter
wb = xlsxwriter.Workbook("data_validation.xlsx")
ws = wb.add_worksheet("Form")
# Create a drop‐down list in cell A1
ws.data_validation("A1", {
"validate": "list",
"source": ["Option 1", "Option 2", "Option 3"]
})
# Numeric validation: value between 1 and 100 in B1
ws.data_validation("B1", {
"validate": "integer",
"criteria": "between",
"minimum": 1,
"maximum": 100,
"input_title": "Enter number",
"input_message": "Please enter a number between 1 and 100."
})
wb.close()openpyxl
from openpyxl import Workbook
from openpyxl.worksheet.datavalidation import DataValidation
wb = Workbook()
ws = wb.active
ws.title = "Sheet1"
# Drop‐down validation
dv_list = DataValidation(type="list", formula1='"Red,Green,Blue"', allow_blank=True)
ws.add_data_validation(dv_list)
dv_list.add("A1")
# Numeric validation between 10 and 20
dv_int = DataValidation(type="whole", operator="between", formula1=10, formula2=20)
dv_int.error = "Your entry must be between 10 and 20"
ws.add_data_validation(dv_int)
dv_int.add("B1")
wb.save("openpyxl_validation.xlsx")4. Creating Excel Tables
Excel tables (sometimes called ListObjects) provide automatic filtering, banded rows, and structured references.
openpyxl
from openpyxl import Workbook
from openpyxl.worksheet.table import Table, TableStyleInfo
wb = Workbook()
ws = wb.active
ws.title = "SalesData"
# Write headers and data
headers = ["Date", "Region", "Sales"]
ws.append(headers)
rows = [
["2025-01-01", "East", 50000],
["2025-01-02", "West", 42000],
["2025-01-03", "North", 38000],
["2025-01-04", "South", 45000],
]
for row in rows:
ws.append(row)
# Determine table dimensions
min_row = 1
max_row = ws.max_row
min_col = 1
max_col = ws.max_column
# Create table
table = Table(displayName="SalesTable", ref=f"A1:{chr(64+max_col)}{max_row}")
# Add a default style
style = TableStyleInfo(
name="TableStyleMedium9",
showFirstColumn=False,
showLastColumn=False,
showRowStripes=True,
showColumnStripes=False
)
table.tableStyleInfo = style
ws.add_table(table)
wb.save("sales_table.xlsx")XlsxWriter also supports creating tables via add_table():
import xlsxwriter
wb = xlsxwriter.Workbook("xlsxwriter_table.xlsx")
ws = wb.add_worksheet("Data")
# Sample data
data = [
["Date", "Region", "Sales"],
["2025-01-01", "East", 50000],
["2025-01-02", "West", 42000],
["2025-01-03", "North", 38000],
]
for row_idx, row in enumerate(data):
ws.write_row(row_idx, 0, row)
# Define table properties
ws.add_table(
"A1:C4",
{ "name": "SalesTable",
"header_row": True,
"columns": [
{"header": "Date"},
{"header": "Region"},
{"header": "Sales", "format": wb.add_format({"num_format": "$#,##0"})},
],
"style": "Table Style Medium 9"
}
)
wb.close()Automating the Live Excel Application (xlwings & pywin32)
Up until now, we’ve focused on file‐based workflows. But sometimes you need to drive a live Excel session—perhaps to refresh pivot tables, call VBA macros, or let non‐technical users input parameters directly in an Excel UX and then trigger Python analysis under the hood. This is where COM automation libraries (xlwings, pywin32) shine.
1. xlwings: Excel as a Front‐End for Python
Installation
pip install xlwings
Simple Automation Example
import xlwings as xw
# Launch Excel (or connect to an existing instance)
app = xw.App(visible=True)
wb = app.books.open("template_analysis.xlsx")
# Reference a sheet
sheet = wb.sheets["Input"]
# Read values from named range or cell
param_value = sheet.range("A1").value
print("Parameter from Excel:", param_value)
# Perform Python calculation
result = param_value * 2
# Write result back to Excel
sheet.range("B1").value = result
# Refresh a chart or pivot table if needed
# sheet.api.PivotTables("PivotTable1").PivotCache().Refresh()
# Save and close
wb.save("updated_analysis.xlsx")
wb.close()
app.quit()Pros of xlwings
- User‐Defined Functions (UDFs): You can write Python functions, register them as UDFs, and call them directly from Excel cells as if they were native Excel functions.
- Macros Replacement: Instead of VBA macros, you can write Python macros.
- Interactive GUI: Excel remains visible (“visible=True”), making it intuitive for users who prefer the traditional spreadsheet interface.
- Bidirectional Data Transfer: Use sheet.range("A1").options(transpose=True).value = [1, 2, 3] to write lists/arrays; similarly, read blocks of data into Python lists or pandas DataFrames.
- Platform Support: Works on both Windows and macOS (on macOS, uses AppleScript under the hood).
Cons of xlwings
- Requires Excel to be installed.
- Performance can be slower compared to bulk file read/write, since COM calls for each cell or range can incur overhead.
- Not suitable for headless or server environments without GUI/Excel.
2. pywin32 (win32com.client): Low‐Level COM Automation (Windows Only)
While xlwings abstracts a lot of complexity, sometimes you want direct access to the Excel COM object model. This requires installing the pywin32 package:
pip install pywin32
Example: Generating a Report via pywin32
import win32com.client as win32
# Start Excel
excel = win32.gencache.EnsureDispatch("Excel.Application")
excel.Visible = True # If you want to see Excel
# Open workbook (read/write)
wb = excel.Workbooks.Open(r"C:\path\to\template.xlsm")
ws = wb.Sheets("Data")
# Read a cell
param = ws.Range("A1").Value
print("Param in Excel:", param)
# Write to a cell
ws.Range("B1").Value = param * 3
# Refresh all pivot tables
for pt in ws.PivotTables():
pt.RefreshTable()
# Run an existing macro within the workbook
excel.Application.Run(r"template.xlsm!Module1.MyMacro")
# Save and close
wb.SaveAs(r"C:\path\to\report_output.xlsx")
wb.Close()
excel.Application.Quit()Pros of pywin32
- Full access to every aspect of the Excel Object Model, equivalent to VBA.
- Suitable for advanced automation tasks (e.g., interacting with COM add‐ins, controlling Ribbon, copying charts as pictures).
Cons of pywin32
- Only works on Windows with Excel installed.
- Boilerplate COM initialization and error handling can become cumbersome.
- Not as “Pythonic” as xlwings.
Performance Considerations
When working with Excel files in Python, especially large ones, performance can become a bottleneck. Below are some guidelines to ensure your scripts execute efficiently:
- Choose the Right Library for the Task
- For bulk data read/write: Use pandas with engine="pyxlsb" or engine="openpyxl" for moderate file sizes.
- For very large .csv or .txt: Read directly via pd.read_csv or chunk via chunksize. Then write smaller summaries to Excel.
- For writing complex formatted workbooks: Use XlsxWriter (fastest for .xlsx). If you need to preserve macros or modify a template, use openpyxl but limit operations to only changed cells.
- Minimize Cell‐by‐Cell Operations
- Excel engines like openpyxl and XlsxWriter are significantly slower if you call Cell() or worksheet.write() for each individual cell. Instead, where possible:
- Use pandas.DataFrame.to_excel() to dump entire tables at once.
- In XlsxWriter, use worksheet.write_row() or worksheet.write_column() to write contiguous ranges.
- In openpyxl, use ws.append() to append an entire row at once.
- Excel engines like openpyxl and XlsxWriter are significantly slower if you call Cell() or worksheet.write() for each individual cell. Instead, where possible:
- Avoid Excessive Style Creation Inside Loops
- Creating a new format/style object for every cell is expensive. Instead, define a format once (workbook.add_format()) and reuse it across multiple cells. In openpyxl, create a NamedStyle or a single Font/Border etc., and assign it repeatedly.
- Read Only Required Sheets and Columns
- With pandas: set sheet_name and usecols to limit the amount of data being parsed.
- If your Excel file is huge (e.g., hundreds of MB), consider splitting it into smaller chunks or converting to CSV/Parquet for intermediate processing.
- Use Chunking for Massive Files
- If you must process an extremely large file (e.g., >1 million rows), consider reading it in chunks:
for chunk in pd.read_excel("huge_file.xlsx", sheet_name="Data", chunksize=100000): # Process each chunk (e.g., filter, aggregate)- Note: read_excel(..., chunksize=...) isn’t supported directly for Excel. Instead, convert to CSV first or use libraries like pyxlsb which can be iterated.
- Cache Frequent Computations in Memory
- If you generate the same lookup tables repeatedly (e.g., mapping product IDs to names), load them once and reuse across the session rather than reading them from Excel every time.
- Limit Chart/Formatting for Very Large Ranges
- Charting thousands of points can bloat file size. Consider summarizing data (e.g., group by categories) for charts.
- Excessive conditional formatting rules (hundreds of individual rules) can drastically increase file size and slow down opening times in Excel. Use Excel tables or range‐level rules where possible.
Common Use Cases and Examples
To illustrate how Python–Excel integration becomes powerful in real‐world scenarios, let us examine a few common use cases.
1. Automated Sales Reporting
Scenario: Every week, a company’s ERP system exports a raw sales file (weekly_sales.xlsx) containing thousands of transactions. A business analyst needs to produce a report showing total sales by region and product category, highlight any returns, and email stakeholders an Excel file with a summary sheet (pivot‐style) and a chart.
Solution Steps:
- Use Python to read the raw data.
- Clean data (e.g., parse dates, remove canceled orders, convert currency).
- Aggregate data to compute total sales by region/category.
- Create a new workbook with:
- A “Summary” sheet showing a table of aggregated results.
- A bar chart comparing regions.
- Email the Excel report to stakeholders via SMTP or an Outlook COM interface.
import pandas as pd
import xlsxwriter
import smtplib
from email.message import EmailMessage
# Step 1: Read raw data
df = pd.read_excel("weekly_sales.xlsx", sheet_name="Transactions", parse_dates=["OrderDate"])
# Step 2: Clean data
df = df[df["Status"] != "Canceled"]
df["NetSales"] = df["Quantity"] * df["UnitPrice"]
# Step 3: Aggregate
agg = df.groupby(["Region", "Category"], as_index=False)["NetSales"].sum()
agg_pivot = agg.pivot(index="Region", columns="Category", values="NetSales").fillna(0)
# Step 4: Write summary to Excel
report_path = "weekly_sales_report.xlsx"
with pd.ExcelWriter(report_path, engine="xlsxwriter") as writer:
agg_pivot.to_excel(writer, sheet_name="Summary")
workbook = writer.book
worksheet = writer.sheets["Summary"]
# Add a chart
chart = workbook.add_chart({"type": "column"})
num_regions = len(agg_pivot)
num_categories = len(agg_pivot.columns)
# Configure each series
for i, category in enumerate(agg_pivot.columns, start=1):
chart.add_series({
"name": [ "Summary", 0, i ],
"categories": [ "Summary", 1, 0, num_regions, 0 ],
"values": [ "Summary", 1, i, num_regions, i ],
})
chart.set_title({"name": "Sales by Region & Category"})
chart.set_x_axis({"name": "Region"})
chart.set_y_axis({"name": "Net Sales (USD)"})
chart.set_style(11)
# Position chart
worksheet.insert_chart("H2", chart, {"x_scale": 1.3, "y_scale": 1.5})
# Step 5: Email the report
msg = EmailMessage()
msg["Subject"] = "Weekly Sales Report"
msg["From"] = "analyst@example.com"
msg["To"] = "team@example.com"
msg.set_content("Please find attached the weekly sales report.")
# Read the file content
with open(report_path, "rb") as f:
file_data = f.read()
file_name = f.name
msg.add_attachment(file_data, maintype="application", subtype="vnd.openxmlformats-officedocument.spreadsheetml.sheet", filename=file_name)
# Send via SMTP (example using Gmail SMTP server)
smtp_server = "smtp.gmail.com"
smtp_port = 587
smtp_user = "analyst@example.com"
smtp_password = "your_app_password"
with smtplib.SMTP(smtp_server, smtp_port) as server:
server.starttls()
server.login(smtp_user, smtp_password)
server.send_message(msg)This script automates the entire pipeline—from reading raw data to emailing a neatly formatted Excel report—reducing the manual steps to a single scheduled task or button click.
2. Creating Quarterly Finance Templates for Business Users
Scenario: Accounting teams need a standardized template each quarter to input budget projections. The template must include input cells with data validation (e.g., selecting department from a drop‐down), pre‐formatted tables, and an embedded dashboard sheet showing charts based on those inputs. Instead of manually sending an updated template each quarter, you want to generate it programmatically, updating the year in headers and adding placeholder formulas.
Solution Steps:
- Build a Python script that loads a base template (finance_template_base.xlsx).
- Update the “Year” cell and “Quarter” drop‐down list (e.g., “Q1”, “Q2”, …).
- Clear any leftover sample data from prior quarters.
- Create placeholder formulas on the “Dashboard” sheet referencing input cells.
- Save as finance_template_2025_Q2.xlsx.
from openpyxl import load_workbook
from openpyxl.worksheet.datavalidation import DataValidation
def create_finance_template(year: int, quarter: str):
# Load the base template
wb = load_workbook("finance_template_base.xlsx")
ws_input = wb["Inputs"]
ws_dashboard = wb["Dashboard"]
# Update year cell
ws_input["B1"] = year
# Update quarter drop-down: assume cell C1
valid_quarters = ["Q1", "Q2", "Q3", "Q4"]
dv = DataValidation(type="list", formula1=f'"{",".join(valid_quarters)}"', allow_blank=False)
ws_input.add_data_validation(dv)
dv.add("C1")
ws_input["C1"] = quarter
# Clear old input data range (e.g., A5:D20)
for row in ws_input["A5":"D20"]:
for cell in row:
cell.value = None
# Insert placeholder formulas on Dashboard
# E.g., cell B2: =SUM(Inputs!D5:D20)
ws_dashboard["B2"] = f"=SUM(Inputs!D5:D20)"
ws_dashboard["B3"] = f"=AVERAGE(Inputs!D5:D20)"
# Save new template
output_filename = f"finance_template_{year}_{quarter}.xlsx"
wb.save(output_filename)
return output_filename
if __name__ == "__main__":
new_file = create_finance_template(2025, "Q2")
print(f"Created template: {new_file}")By encapsulating template‐generation logic in a function, you guarantee that every time a new quarter arrives, the script can run unattended, generating a fresh file with all the required validations and formulas in place.
3. Merging Data from Multiple Sheets and Pivoting
Scenario: A logistics company maintains daily shipment logs in separate sheets within an Excel file (shipments_monthly.xlsx). Each sheet (named “Day01”, “Day02”, …) contains columns: Date, Origin, Destination, Weight, Cost. The manager wants a consolidated overview by origin–destination pairs, showing total shipments and average cost.
Solution Steps:
- Read all sheets into individual DataFrames.
- Concatenate them into a single DataFrame.
- Perform groupby aggregation.
- Write the pivoted result to a new “Dashboard” sheet.
import pandas as pd
# Step 1: Read all sheets
file_path = "shipments_monthly.xlsx"
all_sheets = pd.read_excel(file_path, sheet_name=None) # dict of DataFrames
# Step 2: Concatenate
df_list = []
for sheet_name, df in all_sheets.items():
df_list.append(df)
full_df = pd.concat(df_list, ignore_index=True)
# Step 3: Aggregate
agg = full_df.groupby(["Origin", "Destination"]).agg(
TotalWeight=pd.NamedAgg(column="Weight", aggfunc="sum"),
TotalCost=pd.NamedAgg(column="Cost", aggfunc="sum"),
ShipmentCount=pd.NamedAgg(column="Weight", aggfunc="count"),
)
agg["AverageCostPerShipment"] = agg["TotalCost"] / agg["ShipmentCount"]
agg = agg.reset_index()
# Step 4: Write to new Excel file with Dashboard sheet
with pd.ExcelWriter("shipments_dashboard.xlsx", engine="openpyxl") as writer:
# Save raw consolidated data (optional)
full_df.to_excel(writer, sheet_name="AllShipments", index=False)
# Save aggregated view
agg.to_excel(writer, sheet_name="Dashboard", index=False)
print("Dashboard generated: shipments_dashboard.xlsx")If you need to add a chart on the “Dashboard” sheet to visualize, for example, the top 10 origin‐destination pairs by TotalWeight:
import openpyxl
from openpyxl.chart import BarChart, Reference
wb = openpyxl.load_workbook("shipments_dashboard.xlsx")
ws = wb["Dashboard"]
# Sort the data by TotalWeight descending (in Excel, or prepare in pandas before writing)
# Assuming data starts at row 2, columns: A=Origin, B=Destination, C=TotalWeight, D=TotalCost, E=ShipmentCount, F=AverageCostPerShipment
# Let's assume we already sorted in pandas, so top 10 are rows 2-11
bar_chart = BarChart()
bar_chart.title = "Top 10 Routes by Total Weight"
bar_chart.x_axis.title = "Route"
bar_chart.y_axis.title = "Total Weight"
# Use a helper column to join Origin and Destination into a single category, say in column G
for idx, row in enumerate(agg.itertuples(index=False), start=2):
route = f"{row.Origin}-{row.Destination}"
ws[f"G{idx}"] = route
# Reference data for chart: categories in G2:G11 and values in C2:C11
cats = Reference(ws, min_col=7, min_row=2, max_row=11)
values = Reference(ws, min_col=3, min_row=2, max_row=11)
bar_chart.add_data(values, titles_from_data=False)
bar_chart.set_categories(cats)
bar_chart.height = 10 # optional: chart height
bar_chart.width = 20 # optional: chart width
ws.add_chart(bar_chart, "H2")
wb.save("shipments_dashboard_with_chart.xlsx")By combining pandas for heavy aggregation and openpyxl for final chart embedding, you achieve a robust pipeline that can be triggered monthly or even daily.
Integrating Excel with Databases and Other Data Sources
Often, you need to pull data from a SQL database (e.g., PostgreSQL, SQL Server, MySQL), perform some transformations, and then export to Excel for stakeholders. Python makes this easy with libraries such as SQLAlchemy, pyodbc, psycopg2, and pandas’ read_sql.
1. Reading from a SQL Database into pandas and Exporting to Excel
import pandas as pd
from sqlalchemy import create_engine
# Example: PostgreSQL connection
db_user = "db_user"
db_password = "password"
db_host = "localhost"
db_port = 5432
db_name = "sales_db"
engine = create_engine(
f"postgresql+psycopg2://{db_user}:{db_password}@{db_host}:{db_port}/{db_name}"
)
# Run a SQL query
query = """
SELECT
order_id,
customer_name,
order_date,
region,
total_amount
FROM orders
WHERE order_date BETWEEN '2025-01-01' AND '2025-01-31'
"""
df_orders = pd.read_sql(query, engine)
# Perform any transformations
df_orders["order_date"] = pd.to_datetime(df_orders["order_date"])
df_orders["month"] = df_orders["order_date"].dt.month
# Export to Excel with simple formatting
with pd.ExcelWriter("january_orders.xlsx", engine="openpyxl") as writer:
df_orders.to_excel(writer, sheet_name="Orders", index=False)
workbook = writer.book
worksheet = writer.sheets["Orders"]
# Apply column widths
for idx, column in enumerate(df_orders.columns, start=1):
max_length = max(
df_orders[column].astype(str).map(len).max(),
len(column)
) + 2
worksheet.column_dimensions[chr(64 + idx)].width = max_length2. Updating an Existing Excel File with New Database Data
Suppose you have a master Excel file (inventory_master.xlsx) that your team uses to track monthly inventory levels. Each month, new data arrives in a database and you need to update the file’s “CurrentStock” sheet without overwriting other sheets (e.g., the “History” sheet).
import pandas as pd
from openpyxl import load_workbook
# Load new data from database
# For demonstration, let's assume a DataFrame is already available
new_stock_df = pd.DataFrame({
"ItemID": [101, 102, 103],
"CurrentStock": [50, 75, 30],
"LastUpdated": pd.to_datetime(["2025-06-01", "2025-06-01", "2025-06-01"]),
})
# Load existing workbook
wb = load_workbook("inventory_master.xlsx")
writer = pd.ExcelWriter("inventory_master.xlsx", engine="openpyxl")
writer.book = wb
# Replace only the "CurrentStock" sheet
if "CurrentStock" in wb.sheetnames:
# Remove old sheet
std = wb["CurrentStock"]
wb.remove(std)
wb.save("inventory_master.xlsx") # Necessary step before writing with pandas
# Write updated data
with pd.ExcelWriter("inventory_master.xlsx", engine="openpyxl", mode="a") as writer:
new_stock_df.to_excel(writer, sheet_name="CurrentStock", index=False)
# Optionally, update formulas or pivot tables that depend on this sheet manuallyBy operating in “append” mode and removing the old sheet first, you ensure that existing sheets remain intact.
Troubleshooting Common Issues
When working with Python and Excel, certain pitfalls and edge cases frequently arise. Below are some common issues and how to resolve them:
- ValueError: No engine for filetype: '.xlsb'
- Cause: pandas cannot read .xlsb without the pyxlsb package.
- Solution:
pip install pyxlsb
- Then:
df = pd.read_excel("file.xlsb", engine="pyxlsb")
- UserWarning: Workbook contains no default style or “Normal was not found”
- Cause: You’re using openpyxl to modify a file created by a different library (e.g., XlsxWriter) that didn’t embed the default “Normal” style.
- Solution: Create a default style in the workbook before modifying:
from openpyxl.styles import NamedStyle normal_style = NamedStyle(name="Normal") wb.add_named_style(normal_style)
- Date Parsing Issues
- Symptoms: Dates read as floats (Excel’s internal serial date number) or strings like "20250101".
- Solution:
- With pandas: specify parse_dates=["DateColumn"] in read_excel.
- If using openpyxl: read cell values, then convert:
excel_serial = ws["A2"].value # If excel_serial is a float, convert with: from openpyxl.utils.datetime import from_excel py_date = from_excel(excel_serial)
- “Cell is already being used” or Permission Errors When Saving
- Cause: Another process (e.g., open Excel manually) has the file open.
- Solution: Close the file in Excel or ensure no process is locking it. On Windows, COM automation can keep a handle open; ensure you call app.quit() or wb.Close(). Also, run Python script with appropriate permissions (e.g., if writing to a protected directory).
- Missing or Outdated Library Versions
- Symptoms: Errors such as ImportError: No module named 'openpyxl' or TypeError: to_excel() got an unexpected keyword argument 'engine'.
- Solution: Update your environment:
pip install --upgrade pandas openpyxl xlsxwriter xlrd xlwt xlwings pyxlsb
- VBA or Macro Loss When Saving
- Cause: openpyxl does not preserve VBA macros by default. If you open and save a .xlsm file, macros may be stripped.
- Solution:
- Use keep_vba=True when loading and saving with openpyxl:
wb = load_workbook("file.xlsm", keep_vba=True) # Make changes... wb.save("file_with_macros.xlsm")- Alternatively, if you only need to add content and maintain macros, modify in place using keep_vba=True.
- Row or Column Limitations
- Issue: The .xlsx format supports up to 1,048,576 rows and 16,384 columns (column XFD). If your data exceeds these limits, you cannot write it into a single sheet.
- Solution: Split data across multiple sheets or export to a more suitable format (e.g., CSV for non‐Excel processing, Parquet for analytics).
- Slow Performance with COM Automation (xlwings/pywin32)
- Symptoms: Iterating cell by cell in a loop is painfully slow.
- Solution:
- Use range‐to‐Python transfer: read large blocks via sheet.range("A1:D1000").options(np.array).value (xlwings).
- Perform bulk operations in Python, then write back the entire block at once.
- Minimize calls that cross the COM boundary.
Best Practices for Python–Excel Workflows
To ensure maintainability, readability, and robustness in your Python–Excel projects, consider adopting these best practices:
- Modularize Your Code
- Divide your script into functions or classes: e.g., load_data(), transform_data(), write_excel_report().
- If the project grows, consider packaging it as a Python module with a setup.py or using poetry for dependency management.
- Parameterize File Paths and Sheet Names
- Avoid hardcoding: use configuration files (.yaml, .json, .ini) or command‐line arguments (argparse) so that you can change paths or sheet names without editing the script.
- Use Virtual Environments
- Create a venv or use conda environments to manage dependencies; this helps avoid conflicts between library versions (e.g., pandas vs. openpyxl).
- Error Handling and Logging
- Wrap critical I/O operations (file open, database access) in try/except blocks.
- Use Python’s logging module to record progress, errors, and warnings rather than relying on print statements.
- Version Control Everything
- Include your Python scripts, configuration files, and even sample Excel templates in a Git repository.
- If possible, avoid committing large binary Excel files; instead, include minimal templates or dummy data files.
- Document Your Workflow
- Write docstrings for functions explaining inputs, outputs, and side effects.
- Maintain a README that describes how to install dependencies and run the scripts.
- Test on Sample Data
- Before running on production‐scale files, test your code on smaller or representative datasets. This helps catch formatting or data‐type issues early.
- Back Up Original Files
- If your script modifies an existing file (e.g., template), create a timestamped backup or write to a new output file to prevent accidental data loss.
- Avoid Over‐Formatting
- Remember that heavy styling (dozens of formats, conditional rules, images) can bloat file size and slow down Excel.
- Where possible, use simpler formatting or rely on Excel’s built‐in cell styles.
- Use Pythonic Data Manipulation Libraries
- Instead of manually looping through rows to calculate totals, use pandas’ vectorized operations (groupby, agg, merge). This is not only more concise but also typically much faster.
- Keep an Eye on Compatibility
- Although .xlsx is the standard today, some legacy systems might require .xls or even .csv.
- To write .xls, you can use xlwt (limited to older Excel versions). For .csv, Python’s built‐in csv module or pandas’ to_csv are ideal.
Combining Python, Excel, and Web APIs
In modern data ecosystems, data often flows from online services into spreadsheets. Python makes it trivial to fetch data from web APIs (e.g., RESTful services, JSON endpoints) and then write it to Excel for distribution.
Example: Fetching Currency Exchange Rates and Generating an Excel Report
Suppose you want to generate an up‐to‐date Excel sheet showing exchange rates of USD against various currencies, using a free API such as exchangerate‐api.com.
import requests
import pandas as pd
# Step 1: Fetch data from API
api_key = "YOUR_API_KEY"
url = f"https://v6.exchangerate-api.com/v6/{api_key}/latest/USD"
response = requests.get(url)
data = response.json()
# Step 2: Parse JSON into DataFrame
rates = data.get("conversion_rates", {})
df = pd.DataFrame(list(rates.items()), columns=["Currency", "Rate"])
df["InverseRate"] = 1 / df["Rate"] # e.g., how many USD per 1 unit of other currency
# Step 3: Filter to currencies of interest
currencies_of_interest = ["EUR", "GBP", "JPY", "CAD", "AUD"]
df = df[df["Currency"].isin(currencies_of_interest)]
# Step 4: Sort by Rate descending
df = df.sort_values("Rate", ascending=False)
# Step 5: Write to Excel with formatting
output_file = "exchange_rates.xlsx"
with pd.ExcelWriter(output_file, engine="xlsxwriter") as writer:
df.to_excel(writer, sheet_name="Rates", index=False)
workbook = writer.book
worksheet = writer.sheets["Rates"]
# Format header
header_format = workbook.add_format({"bold": True, "bg_color": "#333333", "font_color": "#FFFFFF"})
for col_num, value in enumerate(df.columns.values):
worksheet.write(0, col_num, value, header_format)
# Format numeric columns
num_format = workbook.add_format({"num_format": "0.0000"})
worksheet.set_column("B:C", 12, num_format)
print(f"Exchange rates written to {output_file}")In just a handful of lines, you have consumed a live web API, converted JSON to a pandas DataFrame, filtered and sorted results, and written a self‐contained, styled Excel file.
Best Practices for Secure and Reproducible Scripts
When distributing Python scripts that generate or manipulate Excel files, consider these additional recommendations:
- Don’t Hardcode Credentials
- Use environment variables or a secure secrets manager (e.g., HashiCorp Vault, AWS Secrets Manager) to store API keys, database passwords, or email SMTP credentials.
- For example, in Python:
import os db_password = os.getenv("DB_PASSWORD") # Set via environment or .env file
- Lock Your Dependencies
- Use a requirements.txt with pinned versions or a Pipfile.lock/poetry.lock so that every environment uses the same library versions.
- For example:
pandas==2.1.0 openpyxl==3.1.2 xlsxwriter==3.0.6 requests==2.31.0
- Use Virtual Environments in Deployment
- If deploying to a server or sharing with colleagues, provide instructions for setting up a virtual environment (python -m venv venv → pip install -r requirements.txt).
- Validate Input Data
- Before writing to Excel, check that DataFrame columns exist and have expected types. Raise informative errors early.
- For example:
required_columns = {"OrderID", "Product", "Quantity", "Price"} if not required_columns.issubset(df.columns): missing = required_columns - set(df.columns) raise ValueError(f"Missing required columns: {missing}")
- Test on Multiple Platforms
- If your team uses both Windows and macOS, ensure that your script runs correctly on both, especially if using libraries that rely on system features (e.g., COM automation).
- Provide a Sample Output
- Include a small sample Excel file generated by the script in your repository (e.g., sample_output.xlsx) so that stakeholders and developers know what to expect.
- Schedule and Monitor Jobs
- If your script runs as a scheduled task (e.g., via cron, Windows Task Scheduler, or a workflow tool like Apache Airflow), add logging and alerting so that failures are quickly detected and addressed.
- Documentation and User Guides
- Create a one‐page guide or a Markdown document that explains how to run the script, what dependencies are required, what input files should look like, and where output files will be saved.
Conclusion
The synergy between Python and Excel empowers professionals to automate repetitive tasks, scale data‐processing workflows, and bridge spreadsheets with complex data‐science pipelines. By leveraging libraries such as pandas, openpyxl, XlsxWriter, and xlwings, you can seamlessly read, transform, and write Excel files—applying custom formatting, formulas, charts, and conditional rules with minimal manual effort. Moreover, Python’s vast ecosystem enables integration with databases, web APIs, and machine‐learning algorithms, transforming Excel from a standalone desktop tool into a dynamic component within a larger data ecosystem.
Whether your goal is to generate weekly sales reports automatically, create templated spreadsheets for business users, merge and analyze data from multiple sheets, or build an interactive Excel front end controlled by Python UDFs, the principles illustrated in this article provide a roadmap. Key takeaways include:
- Select the Appropriate Library: Use pandas for bulk tabular data, XlsxWriter for high‐performance writing with advanced formatting, openpyxl to modify existing workbooks and harness Excel features (tables, data validation, named styles), and xlwings or pywin32 when you need to control a live Excel application.
- Prioritize Performance: Avoid cell‐by‐cell loops; use bulk operations (e.g., to_excel, write_row, append). Cache reusable objects (formats, database connections), and limit unnecessarily large style or conditional formatting rules.
- Maintain Reproducibility: Keep your scripts organized, version‐controlled, and well‐documented. Parameterize inputs, externalize credentials, and lock dependencies.
- Handle Edge Cases Gracefully: Anticipate and address file locks, incorrect data types, date parsing quirks, missing libraries, and platform differences. Include error handling and logging so that when issues arise, they are obvious and traceable.
By adopting best practices—modular code, virtual environments, configuration files, and robust testing—you ensure that your Excel‐generating scripts remain maintainable, secure, and adaptable as needs evolve. Over time, you and your organization will save countless hours, reduce human errors, and unlock insights by efficiently combining Python’s computational power with Excel’s familiar interface.
Ultimately, mastering Python‐Excel integration is not merely a technical skill but a catalyst for transforming how data moves through your workflow: from raw collection to polished, distributable reports. Whether you are an analyst, accountant, scientist, or developer, the techniques covered in this guide will help you craft robust, scalable, and reproducible solutions that elevate your productivity and deliver greater value to stakeholders.
Happy coding—and may your spreadsheets always be error‐free, automated, and insightful!