Filtering Data

Filtering data is a crucial part of data analysis, enabling you to extract relevant rows that meet specific conditions. Pandas provides multiple ways to filter rows using conditions and expressions. This tutorial explores how to filter data effectively using boolean indexing and the query() method.

Basic Filtering

You can filter rows by applying a condition on a DataFrame or Series. Here’s an example:

import pandas as pd

# Create a sample DataFrame
data = {
    "Name": ["Karthick", "Durai", "Praveen", "Naveen"],
    "Age": [25, 30, 22, 28],
    "City": ["Chennai", "Coimbatore", "Madurai", "Trichy"]
}

df = pd.DataFrame(data)

# Filter rows where Age > 25
filtered_df = df[df["Age"] > 25]
print(filtered_df)

Output

Name Age City
Durai 30 Coimbatore
Naveen 28 Trichy

Explanation: The condition df["Age"] > 25 filters rows where the value in the Age column is greater than 25. The resulting DataFrame includes only rows that meet this condition.

Filtering with Multiple Conditions

You can combine multiple conditions using logical operators such as & (and), | (or), and ~ (not). Here’s an example:

# Filter rows where Age > 25 and City is "Trichy"
filtered_df = df[(df["Age"] > 25) & (df["City"] == "Trichy")]
print(filtered_df)

Output

Name Age City
Naveen 28 Trichy

Explanation: The condition (df["Age"] > 25) & (df["City"] == "Trichy") filters rows where the age is greater than 25 and the city is "Trichy." Both conditions must be true for a row to be included.

Filtering Using Query

The query() method allows you to filter rows using a string-based expression. This approach is often more readable for complex conditions. Here’s an example:

# Filter rows where City is "Madurai" or Age < 25
filtered_df = df.query("City == 'Madurai' or Age < 25")
print(filtered_df)

Output

Name Age City
Praveen 22 Madurai
Karthick 25 Chennai

Explanation: The query() method uses a string-based condition to filter rows. In this example, rows where the city is "Madurai" or the age is less than 25 are included in the resulting DataFrame.

Key Takeaways

  • Basic Filtering: Use boolean conditions to filter rows based on column values.
  • Multiple Conditions: Combine conditions with logical operators like &, |, and ~.
  • Query Method: Use the query() method for readable and concise filtering expressions.
  • Efficiency: Filtering data helps focus on the most relevant subset of the dataset for analysis.