Pandas Read CSV

A CSV (Comma-Separated Values) file is one of the most common formats for storing tabular data. Pandas provides an easy-to-use function, read_csv(), to read CSV files into a DataFrame. This allows for efficient data manipulation and analysis. Let’s explore how to read and handle CSV files using Pandas.

Reading a CSV File

To read a CSV file, you can use the pd.read_csv() function by providing the file path as an argument. The following example demonstrates reading a CSV file containing information about Indian rivers:

import pandas as pd

# Read a CSV file into a DataFrame
df = pd.read_csv("indian_rivers.csv")

# Display the first 5 rows
print(df.head())

Output

River Length (km) Origin States Covered
Ganga 2525 Gangotri Glacier 11
Godavari 1465 Trimbakeshwar 6
Krishna 1400 Mahabaleshwar 5
Kaveri 805 Talakaveri 4
Brahmaputra 2900 Angsi Glacier 5

Explanation: The pd.read_csv() function reads the CSV file indian_rivers.csv into a DataFrame named df. The .head() method displays the first 5 rows of the DataFrame, making it easy to preview the dataset. The columns River, Length (km), Origin, and States Covered represent the data fields in the CSV file.

Specifying Parameters

The read_csv() function provides several parameters to customize the data import process. For example, you can specify a delimiter if the file uses a separator other than commas, skip rows, or select specific columns. Here’s an example:

# Read a CSV file with custom delimiter
df = pd.read_csv("indian_rivers.csv", delimiter=",", usecols=["River", "Length (km)"])

# Display the DataFrame
print(df)

Output

River Length (km)
Ganga 2525
Godavari 1465
Krishna 1400
Kaveri 805
Brahmaputra 2900

Explanation: In this example, the usecols parameter selects only the River and Length (km) columns from the CSV file, and the delimiter parameter ensures that the file is correctly parsed using commas as separators.

Key Takeaways

  • Simple Import: The pd.read_csv() function is used to read CSV files into DataFrames.
  • Preview Data: Use .head() to display the first few rows of the DataFrame.
  • Customizable Parameters: Parameters like usecols and delimiter allow flexibility in reading specific parts of the data.
  • Common Format: CSV is a widely used format for tabular data, making it essential for real-world data analysis tasks.