Grouping Data

Grouping data is a powerful way to analyze subsets of data by splitting it into groups based on one or more columns. Pandas provides the groupby() method to perform grouping operations, such as aggregations and transformations. This tutorial explores how to use groupby() effectively.

Basic Grouping

To group data by a specific column, use the groupby() method. Once grouped, you can apply aggregation functions like sum(), mean(), or count(). Here’s an example:

import pandas as pd

# Create a sample DataFrame
data = {
    "City": ["Chennai", "Madurai", "Chennai", "Madurai", "Trichy"],
    "Sales": [100, 200, 150, 180, 90],
    "Profit": [20, 40, 30, 35, 15]
}

df = pd.DataFrame(data)

# Group by City and calculate total Sales
grouped = df.groupby("City")["Sales"].sum()
print(grouped)

Output

City Sales
Chennai 250
Madurai 380
Trichy 90

Explanation: The groupby("City") method groups the data by the City column. The sum() function is applied to calculate the total sales for each city.

Grouping with Multiple Aggregations

You can perform multiple aggregations on grouped data using the agg() method. Here’s an example:

# Group by City and calculate total Sales and average Profit
grouped = df.groupby("City").agg({
    "Sales": "sum",
    "Profit": "mean"
})
print(grouped)

Output

City Sales Profit
Chennai 250 25.0
Madurai 380 37.5
Trichy 90 15.0

Explanation: The agg() method allows you to perform multiple aggregation operations on grouped data. In this example, total sales and average profit are calculated for each city.

Grouping with Multiple Columns

You can group data by multiple columns by passing a list to the groupby() method. Here’s an example:

# Group by City and Sales, and calculate total Profit
grouped = df.groupby(["City", "Sales"])["Profit"].sum()
print(grouped)

Output

City Sales Profit
Chennai 100 20
Chennai 150 30
Madurai 180 35
Madurai 200 40
Trichy 90 15

Explanation: Grouping by multiple columns creates a hierarchical index, allowing you to analyze the data at multiple levels. In this example, profits are grouped by both city and sales values.

Key Takeaways

  • Basic Grouping: Use groupby() to group data by a single column and apply aggregation functions.
  • Multiple Aggregations: Use agg() for performing multiple operations on grouped data.
  • Grouping by Multiple Columns: Analyze data at multiple levels by grouping with more than one column.
  • Insights: Grouping helps summarize and analyze data effectively based on categorical variables.