Prev Next

Pandas Analyzing Data

Analyzing data is one of the most critical steps in any data science or analytics workflow. Pandas offers a wide range of functions and methods to explore, summarize, and understand data efficiently. These tools help in deriving meaningful insights and preparing data for further analysis. Let’s dive into some of the essential features of Pandas for data analysis.

Generating Summary Statistics

You can use the describe() method to generate summary statistics for numerical columns in your DataFrame. This method provides useful metrics such as count, mean, standard deviation, and percentiles. Here’s an example:

import pandas as pd

# Create a sample DataFrame
data = {
    "City": ["Chennai", "Bengaluru", "Hyderabad", "Mumbai", "Delhi"],
    "Population (Millions)": [7.09, 8.44, 6.81, 12.44, 16.79],
    "Area (sq km)": [426, 741, 650, 603, 1484]
}

df = pd.DataFrame(data)

# Generate summary statistics
print(df.describe())

Output

	Population (Millions)	Area (sq km)
count	5.000000	5.000000
mean	10.114000	780.800000
std	4.248452	389.255411
min	6.810000	426.000000
25%	7.090000	603.000000
50%	8.440000	650.000000
75%	12.440000	741.000000
max	16.790000	1484.000000

Explanation: The describe() method generates summary statistics for numerical columns in the DataFrame. Metrics such as mean, std (standard deviation), and percentiles provide valuable insights into the distribution of data.

Inspecting Data

Pandas provides functions to quickly inspect the structure and content of your dataset. Use head() to preview the first few rows, info() to check column types, and shape to see the dimensions of the DataFrame. Here’s an example:

# Inspect the first few rows
print(df.head())

# Get information about the DataFrame
print(df.info())

# Get the shape of the DataFrame
print("Shape:", df.shape)

Output

Head Output:

City	Population (Millions)	Area (sq km)
Chennai	7.09	426
Bengaluru	8.44	741
Hyderabad	6.81	650
Mumbai	12.44	603
Delhi	16.79	1484

Info Output:


RangeIndex: 5 entries, 0 to 4
Data columns (total 3 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   City                  5 non-null      object 
 1   Population (Millions) 5 non-null      float64
 2   Area (sq km)          5 non-null      int64  
dtypes: float64(1), int64(1), object(1)
memory usage: 248.0 bytes

Shape Output:

(5, 3)

Explanation: The head() method previews the first 5 rows of the DataFrame, info() provides details about column types and non-null counts, and shape returns the dimensions of the DataFrame as (rows, columns).

Key Takeaways

Summary Statistics: Use describe() to get an overview of numerical data.
Data Inspection: Functions like head(), info(), and shape help quickly inspect the dataset’s structure and content.
Efficient Analysis: These tools allow you to understand the dataset and prepare it for further analysis.

Prev Next

Web Design

AI and Data Science

Full Stack Development

Database Tutorials

TryMeYourSelf is optimized for learning and training. Examples might be simplified to improve reading and learning.