Pivot Tables

Pivot tables are a powerful tool for summarizing and analyzing data. In Pandas, the pivot_table() method allows you to group data and calculate aggregated values in a tabular format. This tutorial demonstrates how to create and customize pivot tables for various use cases.

Creating a Pivot Table

To create a pivot table, specify the index, columns, and values parameters. The default aggregation is the mean, but you can specify other functions using the aggfunc parameter. Here’s an example:

import pandas as pd

# Create a sample DataFrame
data = {
    "City": ["Chennai", "Chennai", "Madurai", "Madurai", "Trichy"],
    "Month": ["Jan", "Feb", "Jan", "Feb", "Jan"],
    "Sales": [200, 250, 150, 180, 90]
}

df = pd.DataFrame(data)

# Create a pivot table
pivot = pd.pivot_table(df, index="City", columns="Month", values="Sales", aggfunc="sum")
print(pivot)

Output

Month Jan Feb
Chennai 200 250
Madurai 150 180
Trichy 90 NaN

Explanation: The pivot_table() method groups data by the City column and aggregates sales for each Month. Missing values are displayed as NaN.

Customizing Aggregation

You can use different aggregation functions like sum, mean, or count to customize the pivot table. Here’s an example:

# Use multiple aggregation functions
pivot = pd.pivot_table(df, index="City", columns="Month", values="Sales", aggfunc=["sum", "mean"])
print(pivot)

Output

aggfunc sum mean
Month Jan Feb Jan Feb
Chennai 200 250 200.0 250.0
Madurai 150 180 150.0 180.0
Trichy 90 NaN 90.0 NaN

Explanation: By using the aggfunc parameter, you can apply multiple aggregation functions to the pivot table. In this example, both sum and mean are calculated for sales grouped by city and month.

Filling Missing Values

Missing values in pivot tables can be replaced using the fill_value parameter. Here’s an example:

# Replace missing values with 0
pivot = pd.pivot_table(df, index="City", columns="Month", values="Sales", aggfunc="sum", fill_value=0)
print(pivot)

Output

Month Jan Feb
Chennai 200 250
Madurai 150 180
Trichy 90 0

Explanation: The fill_value parameter replaces missing values in the pivot table with the specified value, such as 0 in this example.

Key Takeaways

  • Pivot Tables: Use pivot_table() to group and summarize data.
  • Custom Aggregations: Apply custom or multiple aggregation functions using aggfunc.
  • Handle Missing Data: Replace missing values with fill_value.
  • Insights: Pivot tables simplify data analysis by organizing data into a readable format.