Pandas DataFrames
A Pandas DataFrame
is a two-dimensional labeled data structure, similar to a table in a database or an Excel spreadsheet. It consists of rows and columns and is one of the most commonly used data structures in Pandas for handling structured data. DataFrames make it easy to manipulate, filter, and analyze data effectively.
Creating DataFrames
A DataFrame can be created from various data sources such as dictionaries, lists, or even external files like CSV and Excel. Let’s create a DataFrame using a dictionary representing Indian states and their capitals:
import pandas as pd
# Create a dictionary of states and capitals
data = {
"State": ["Tamil Nadu", "Kerala", "Karnataka", "Andhra Pradesh"],
"Capital": ["Chennai", "Thiruvananthapuram", "Bengaluru", "Amaravati"],
"Population (Millions)": [72.14, 35.33, 68.42, 49.67]
}
# Create a DataFrame
df = pd.DataFrame(data)
# Display the DataFrame
print(df)
Output
State | Capital | Population (Millions) |
---|---|---|
Tamil Nadu | Chennai | 72.14 |
Kerala | Thiruvananthapuram | 35.33 |
Karnataka | Bengaluru | 68.42 |
Andhra Pradesh | Amaravati | 49.67 |
Explanation: The dictionary data
contains three keys: State
, Capital
, and Population (Millions)
, each representing a column in the dataset. The pd.DataFrame()
function converts the dictionary into a DataFrame, where each key becomes a column, and the lists under them represent the rows. Finally, the DataFrame is displayed using the print()
function in a tabular format.
Accessing Data in a DataFrame
You can access specific rows, columns, or elements in a DataFrame using labels or indices. Let us retrieve the "Capital" column from the above DataFrame:
# Access a single column
capitals = df["Capital"]
print(capitals)
Output
Index | Capital |
---|---|
0 | Chennai |
1 | Thiruvananthapuram |
2 | Bengaluru |
3 | Amaravati |
Explanation: The statement df["Capital"]
retrieves the "Capital" column as a Pandas Series. Each entry in the column corresponds to a row in the DataFrame. This method is useful for quickly extracting or analyzing specific columns in the dataset.
Key Takeaways
- Flexible Data Structure: DataFrames are two-dimensional labeled data structures in Pandas, similar to Excel spreadsheets or SQL tables.
- Versatile Creation: They can be created from dictionaries, lists, or external files like CSVs and Excel.
- Easy Access: Columns can be accessed easily using their labels, simplifying data handling.
- Tabular Format: DataFrames present data in a tabular format, making them ideal for data analysis tasks.