Replacing Values in Pandas
The replace()
method in Pandas provides a versatile way to substitute values in a DataFrame or Series. This method can handle single values, lists, and even dictionaries for more complex replacements. This tutorial demonstrates the key use cases for replace()
.
Replacing Single Values
Replace a specific value in a column or across the entire DataFrame. Here’s an example:
import pandas as pd
# Create a sample DataFrame
data = {
"Name": ["Karthick", "Durai", "Praveen"],
"City": ["Chennai", "Coimbatore", "Chennai"]
}
df = pd.DataFrame(data)
# Replace 'Chennai' with 'Chennai Metro'
df["City"] = df["City"].replace("Chennai", "Chennai Metro")
print(df)
Output:
Name | City |
---|---|
Karthick | Chennai Metro |
Durai | Coimbatore |
Praveen | Chennai Metro |
Explanation: The replace()
method substitutes all occurrences of 'Chennai'
with 'Chennai Metro'
in the City
column.
Replacing Multiple Values
Replace multiple values using a list or dictionary. Here’s an example:
# Replace multiple values using a dictionary
replacement_dict = {
"Chennai Metro": "Chennai",
"Coimbatore": "Kovai"
}
df["City"] = df["City"].replace(replacement_dict)
print(df)
Output:
Name | City |
---|---|
Karthick | Chennai |
Durai | Kovai |
Praveen | Chennai |
Explanation: The dictionary in replace()
maps values to their replacements, converting 'Chennai Metro'
to 'Chennai'
and 'Coimbatore'
to 'Kovai'
.
Replacing Missing Values
Use replace()
to handle missing values (e.g., NaN
) in a DataFrame. Here’s an example:
import numpy as np
# Add missing values
df.loc[1, "City"] = np.nan
# Replace NaN with 'Unknown'
df["City"] = df["City"].replace(np.nan, "Unknown")
print(df)
Output:
Name | City |
---|---|
Karthick | Chennai |
Durai | Unknown |
Praveen | Chennai |
Explanation: The replace()
method substitutes NaN
values with 'Unknown'
in the City
column, ensuring no missing data remains.
Key Takeaways
- Versatility: The
replace()
method handles single values, multiple values, and missing data efficiently. - Customization: Use dictionaries for complex replacements and lists for bulk replacements.
- Missing Data: Replace
NaN
values to ensure clean and complete datasets. - Scalability:
replace()
works seamlessly on large datasets with diverse replacement needs.