Grouping Data
Grouping data is a powerful way to analyze subsets of data by splitting it into groups based on one or more columns. Pandas provides the groupby()
method to perform grouping operations, such as aggregations and transformations. This tutorial explores how to use groupby()
effectively.
Basic Grouping
To group data by a specific column, use the groupby()
method. Once grouped, you can apply aggregation functions like sum()
, mean()
, or count()
. Here’s an example:
import pandas as pd
# Create a sample DataFrame
data = {
"City": ["Chennai", "Madurai", "Chennai", "Madurai", "Trichy"],
"Sales": [100, 200, 150, 180, 90],
"Profit": [20, 40, 30, 35, 15]
}
df = pd.DataFrame(data)
# Group by City and calculate total Sales
grouped = df.groupby("City")["Sales"].sum()
print(grouped)
Output
City | Sales |
---|---|
Chennai | 250 |
Madurai | 380 |
Trichy | 90 |
Explanation: The groupby("City")
method groups the data by the City
column. The sum()
function is applied to calculate the total sales for each city.
Grouping with Multiple Aggregations
You can perform multiple aggregations on grouped data using the agg()
method. Here’s an example:
# Group by City and calculate total Sales and average Profit
grouped = df.groupby("City").agg({
"Sales": "sum",
"Profit": "mean"
})
print(grouped)
Output
City | Sales | Profit |
---|---|---|
Chennai | 250 | 25.0 |
Madurai | 380 | 37.5 |
Trichy | 90 | 15.0 |
Explanation: The agg()
method allows you to perform multiple aggregation operations on grouped data. In this example, total sales and average profit are calculated for each city.
Grouping with Multiple Columns
You can group data by multiple columns by passing a list to the groupby()
method. Here’s an example:
# Group by City and Sales, and calculate total Profit
grouped = df.groupby(["City", "Sales"])["Profit"].sum()
print(grouped)
Output
City | Sales | Profit |
---|---|---|
Chennai | 100 | 20 |
Chennai | 150 | 30 |
Madurai | 180 | 35 |
Madurai | 200 | 40 |
Trichy | 90 | 15 |
Explanation: Grouping by multiple columns creates a hierarchical index, allowing you to analyze the data at multiple levels. In this example, profits are grouped by both city and sales values.
Key Takeaways
- Basic Grouping: Use
groupby()
to group data by a single column and apply aggregation functions. - Multiple Aggregations: Use
agg()
for performing multiple operations on grouped data. - Grouping by Multiple Columns: Analyze data at multiple levels by grouping with more than one column.
- Insights: Grouping helps summarize and analyze data effectively based on categorical variables.