Filtering Data
Filtering data is a crucial part of data analysis, enabling you to extract relevant rows that meet specific conditions. Pandas provides multiple ways to filter rows using conditions and expressions. This tutorial explores how to filter data effectively using boolean indexing and the query()
method.
Basic Filtering
You can filter rows by applying a condition on a DataFrame or Series. Here’s an example:
import pandas as pd
# Create a sample DataFrame
data = {
"Name": ["Karthick", "Durai", "Praveen", "Naveen"],
"Age": [25, 30, 22, 28],
"City": ["Chennai", "Coimbatore", "Madurai", "Trichy"]
}
df = pd.DataFrame(data)
# Filter rows where Age > 25
filtered_df = df[df["Age"] > 25]
print(filtered_df)
Output
Name | Age | City |
---|---|---|
Durai | 30 | Coimbatore |
Naveen | 28 | Trichy |
Explanation: The condition df["Age"] > 25
filters rows where the value in the Age
column is greater than 25. The resulting DataFrame includes only rows that meet this condition.
Filtering with Multiple Conditions
You can combine multiple conditions using logical operators such as &
(and), |
(or), and ~
(not). Here’s an example:
# Filter rows where Age > 25 and City is "Trichy"
filtered_df = df[(df["Age"] > 25) & (df["City"] == "Trichy")]
print(filtered_df)
Output
Name | Age | City |
---|---|---|
Naveen | 28 | Trichy |
Explanation: The condition (df["Age"] > 25) & (df["City"] == "Trichy")
filters rows where the age is greater than 25 and the city is "Trichy." Both conditions must be true for a row to be included.
Filtering Using Query
The query()
method allows you to filter rows using a string-based expression. This approach is often more readable for complex conditions. Here’s an example:
# Filter rows where City is "Madurai" or Age < 25
filtered_df = df.query("City == 'Madurai' or Age < 25")
print(filtered_df)
Output
Name | Age | City |
---|---|---|
Praveen | 22 | Madurai |
Karthick | 25 | Chennai |
Explanation: The query()
method uses a string-based condition to filter rows. In this example, rows where the city is "Madurai" or the age is less than 25 are included in the resulting DataFrame.
Key Takeaways
- Basic Filtering: Use boolean conditions to filter rows based on column values.
- Multiple Conditions: Combine conditions with logical operators like
&
,|
, and~
. - Query Method: Use the
query()
method for readable and concise filtering expressions. - Efficiency: Filtering data helps focus on the most relevant subset of the dataset for analysis.