Pivot Tables
Pivot tables are a powerful tool for summarizing and analyzing data. In Pandas, the pivot_table()
method allows you to group data and calculate aggregated values in a tabular format. This tutorial demonstrates how to create and customize pivot tables for various use cases.
Creating a Pivot Table
To create a pivot table, specify the index
, columns
, and values
parameters. The default aggregation is the mean, but you can specify other functions using the aggfunc
parameter. Here’s an example:
import pandas as pd
# Create a sample DataFrame
data = {
"City": ["Chennai", "Chennai", "Madurai", "Madurai", "Trichy"],
"Month": ["Jan", "Feb", "Jan", "Feb", "Jan"],
"Sales": [200, 250, 150, 180, 90]
}
df = pd.DataFrame(data)
# Create a pivot table
pivot = pd.pivot_table(df, index="City", columns="Month", values="Sales", aggfunc="sum")
print(pivot)
Output
Month | Jan | Feb |
---|---|---|
Chennai | 200 | 250 |
Madurai | 150 | 180 |
Trichy | 90 | NaN |
Explanation: The pivot_table()
method groups data by the City
column and aggregates sales for each Month
. Missing values are displayed as NaN
.
Customizing Aggregation
You can use different aggregation functions like sum
, mean
, or count
to customize the pivot table. Here’s an example:
# Use multiple aggregation functions
pivot = pd.pivot_table(df, index="City", columns="Month", values="Sales", aggfunc=["sum", "mean"])
print(pivot)
Output
aggfunc | sum | mean | ||
---|---|---|---|---|
Month | Jan | Feb | Jan | Feb |
Chennai | 200 | 250 | 200.0 | 250.0 |
Madurai | 150 | 180 | 150.0 | 180.0 |
Trichy | 90 | NaN | 90.0 | NaN |
Explanation: By using the aggfunc
parameter, you can apply multiple aggregation functions to the pivot table. In this example, both sum
and mean
are calculated for sales grouped by city and month.
Filling Missing Values
Missing values in pivot tables can be replaced using the fill_value
parameter. Here’s an example:
# Replace missing values with 0
pivot = pd.pivot_table(df, index="City", columns="Month", values="Sales", aggfunc="sum", fill_value=0)
print(pivot)
Output
Month | Jan | Feb |
---|---|---|
Chennai | 200 | 250 |
Madurai | 150 | 180 |
Trichy | 90 | 0 |
Explanation: The fill_value
parameter replaces missing values in the pivot table with the specified value, such as 0 in this example.
Key Takeaways
- Pivot Tables: Use
pivot_table()
to group and summarize data. - Custom Aggregations: Apply custom or multiple aggregation functions using
aggfunc
. - Handle Missing Data: Replace missing values with
fill_value
. - Insights: Pivot tables simplify data analysis by organizing data into a readable format.