Pandas Read JSON

JSON (JavaScript Object Notation) is a lightweight data format widely used for transmitting data over the web. Pandas provides the read_json() function to easily import JSON data into a DataFrame, enabling efficient analysis and manipulation. Let’s explore how to read and handle JSON files with Pandas.

Reading a JSON File

To read a JSON file, use the pd.read_json() function by providing the file path as an argument. The following example demonstrates reading a JSON file containing information about popular Indian cities:

import pandas as pd

# Read a JSON file into a DataFrame
df = pd.read_json("indian_cities.json")

# Display the first 5 rows
print(df.head())

Output

City State Population Area (sq km)
Chennai Tamil Nadu 7090000 426
Bengaluru Karnataka 8443675 741
Hyderabad Telangana 6809970 650
Mumbai Maharashtra 12442373 603
Delhi Delhi 16787941 1484

Explanation: The pd.read_json() function reads the JSON file indian_cities.json into a DataFrame named df. The .head() method displays the first 5 rows of the DataFrame, showing columns such as City, State, Population, and Area (sq km). This allows for a quick preview of the dataset.

Handling JSON Structures

JSON files can have varying structures, such as objects, arrays, or nested objects. The read_json() function can handle these variations by specifying parameters like orient. Here’s an example:

# Read a JSON file with nested objects
df = pd.read_json("nested_cities.json", orient="records")

# Display the DataFrame
print(df)

Output

City State Population Area (sq km)
Chennai Tamil Nadu 7090000 426
Bengaluru Karnataka 8443675 741
Hyderabad Telangana 6809970 650
Mumbai Maharashtra 12442373 603
Delhi Delhi 16787941 1484

Explanation: The orient parameter specifies how the JSON data is structured. In this example, orient="records" indicates that each JSON object represents a row in the DataFrame. This flexibility allows you to handle different JSON formats seamlessly.

Key Takeaways

  • Simple Import: The pd.read_json() function reads JSON files into Pandas DataFrames efficiently.
  • Preview Data: Use .head() to display the first few rows of the DataFrame for a quick overview.
  • Flexible Handling: The orient parameter allows handling various JSON structures, such as nested or array-based data.
  • Real-World Application: JSON is widely used in APIs and web data, making this function essential for modern data analysis tasks.