SciPy Statistics
The scipy.stats
module provides a wide range of statistical functions and tests. These include probability distributions, descriptive statistics, statistical tests, and more, making it a powerful tool for data analysis and scientific research.
Key Topics
Probability Distributions
SciPy supports a variety of continuous and discrete probability distributions. You can generate random samples, compute probabilities, and fit data to distributions.
Example: Normal Distribution
import numpy as np
from scipy.stats import norm
import matplotlib.pyplot as plt
# Generate random samples from a normal distribution
samples = norm.rvs(loc=0, scale=1, size=1000)
# Plot the histogram and the PDF
plt.hist(samples, bins=30, density=True, alpha=0.6, color='g')
# Plot the PDF
xmin, xmax = plt.xlim()
x = np.linspace(xmin, xmax, 100)
p = norm.pdf(x, loc=0, scale=1)
plt.plot(x, p, 'k', linewidth=2)
plt.title('Normal Distribution')
plt.show()
Output
Explanation: We generate random samples from a normal distribution and plot the histogram along with the theoretical PDF. This helps visualize how the samples fit the distribution.
Descriptive Statistics
Descriptive statistics summarize and describe the main features of a dataset. SciPy provides functions to compute mean, median, variance, skewness, kurtosis, and more.
Example: Descriptive Statistics
import numpy as np
from scipy import stats
# Generate random samples
data = np.random.normal(loc=0, scale=1, size=1000)
# Compute descriptive statistics
mean = np.mean(data)
median = np.median(data)
variance = np.var(data)
skewness = stats.skew(data)
kurtosis = stats.kurtosis(data)
print(f"Mean: {mean}")
print(f"Median: {median}")
print(f"Variance: {variance}")
print(f"Skewness: {skewness}")
print(f"Kurtosis: {kurtosis}")
Output
Median: 0.02...
Variance: 0.99...
Skewness: 0.03...
Kurtosis: -0.04...
Explanation: We compute various descriptive statistics for a dataset of random samples from a normal distribution. These statistics provide insights into the central tendency, dispersion, and shape of the data.
Statistical Tests
SciPy offers a variety of statistical tests to determine if your data follows a certain distribution or if there are significant differences between groups. Examples include t-tests, chi-square tests, and more.
Example: One-Sample T-Test
import numpy as np
from scipy import stats
data = np.random.normal(loc=0, scale=1, size=1000)
# Perform a one-sample t-test
stat, p_value = stats.ttest_1samp(data, popmean=0)
print(f"T-statistic: {stat}")
print(f"P-value: {p_value}")
Output
P-value: 0.74...
Explanation: We perform a one-sample t-test to determine if the mean of the data is significantly different from 0. The high p-value indicates that there is no significant difference.
Key Takeaways
- Probability Distributions: Generate samples, compute probabilities, and fit data to distributions.
- Descriptive Statistics: Summarize data with mean, median, variance, skewness, and kurtosis.
- Statistical Tests: Perform hypothesis tests to determine significant differences or distribution fits.
- Comprehensive Tools: SciPy provides a wide range of statistical functions for data analysis and research.