Data Distribution

Data distribution refers to the way data points are spread across a dataset. NumPy provides functions to generate random numbers from various distributions, such as normal, uniform, and binomial. These distributions are essential for simulations, statistical analysis, and machine learning.

Key Topics

Normal Distribution

The normal distribution, also known as the Gaussian distribution, is a bell-shaped curve where data is symmetrically distributed around the mean. It is commonly used in real-world scenarios, such as modeling heights or exam scores.

Example

# Generating normal distribution
import numpy as np

# Mean = 50, Standard Deviation = 10, Size = 1000
scores = np.random.normal(50, 10, 1000)

print("Sample scores:", scores[:10])

Output

Sample scores: [48.9 50.3 53.7 46.2 49.8 54.1 52.6 45.5 51.7 47.8]

Explanation: The np.random.normal() function generates random numbers based on the normal distribution. You can specify the mean, standard deviation, and size of the data.

Uniform Distribution

In a uniform distribution, all values have an equal probability of occurring within a specified range. This is often used in simulations and random sampling.

Example

# Generating uniform distribution
ages = np.random.uniform(18, 60, 10)

print("Sample ages:", ages)

Output

Sample ages: [22.3 45.6 34.9 19.2 58.4 37.8 40.5 29.1 20.6 54.2]

Explanation: The np.random.uniform() function generates random numbers within a specified range (e.g., 18 to 60 for ages).

Binomial Distribution

The binomial distribution models the number of successes in a fixed number of trials, such as flipping a coin or rolling a die. This is commonly used in probability and statistics.

Example

# Generating binomial distribution
flips = np.random.binomial(n=10, p=0.5, size=5)

print("Coin flips (number of heads in 10 flips):", flips)

Output

Coin flips (number of heads in 10 flips): [5 6 4 7 5]

Explanation: The np.random.binomial() function simulates trials with a specified probability of success (p=0.5 for a fair coin).

Key Takeaways

  • Normal Distribution: Use np.random.normal() for bell-shaped data modeling.
  • Uniform Distribution: Use np.random.uniform() for equal-probability data generation.
  • Binomial Distribution: Use np.random.binomial() for trial-based success modeling.
  • Applications: Simulate data for experiments, machine learning, and statistical analysis.