Normal Distribution
The normal distribution, or Gaussian distribution, is a bell-shaped curve where most of the data points cluster around the mean. This is one of the most commonly used distributions in statistics, useful for analyzing scores, measurements, or other continuous data.
Key Topics
Generating a Normal Distribution
NumPy's random.normal()
function allows you to generate random numbers that follow a normal distribution. You can specify the mean, standard deviation, and the size of the data.
Example
# Generating normal distribution
import numpy as np
# Mean = 50, Standard Deviation = 10, Size = 1000
heights = np.random.normal(50, 10, 1000)
print("First 10 heights:", heights[:10])
Output
Explanation: The random.normal()
function generates 1000 random values with a mean of 50 and a standard deviation of 10, simulating a bell-shaped distribution.
Visualizing the Distribution
You can visualize the normal distribution using Seaborn's sns.histplot()
or Matplotlib's plt.hist()
. This helps identify the bell-shaped curve characteristic of a normal distribution.
Example
# Visualizing normal distribution
import seaborn as sns
import matplotlib.pyplot as plt
# Data
heights = np.random.normal(170, 5, 1000)
# Plot
sns.histplot(heights, kde=True, color="blue")
plt.title("Normal Distribution of Heights")
plt.xlabel("Height (cm)")
plt.ylabel("Frequency")
plt.show()
Output
Explanation: The histogram shows the frequency of heights, while the KDE (Kernel Density Estimate) curve highlights the bell shape of the normal distribution.
Key Takeaways
- Normal Distribution: Used to model data that clusters around a mean, such as heights, weights, or exam scores.
- Generating Data: Use
np.random.normal()
to generate data with specified mean and standard deviation. - Visualization: Use Seaborn or Matplotlib to identify the bell shape of the distribution.
- Applications: Analyze measurements, test results, or financial trends.