SciPy Significance Tests
Statistical significance tests determine whether observed data relationships are meaningful or simply the result of random chance. SciPy’s stats
module provides a variety of hypothesis tests, including t-tests, ANOVA, and non-parametric tests.
Key Topics
T-Tests
Use scipy.stats.ttest_1samp
, ttest_ind
, or ttest_rel
to compare means. These tests help verify whether sample means significantly differ from each other or from a known population mean.
Example: One-Sample T-Test
import numpy as np
from scipy import stats
data = np.array([5.1, 4.9, 5.0, 5.2, 5.3])
stat, p_value = stats.ttest_1samp(data, popmean=5.0)
print("T-statistic:", stat)
print("P-value:", p_value)
Output
P-value: 0.239...
Explanation: With a p-value of around 0.24, this indicates that there is insufficient evidence to reject the null hypothesis that the mean is 5.0. The threshold for significance is typically 0.05.
Example: Two-Sample T-Test
import numpy as np
from scipy import stats
data1 = np.array([5.1, 4.9, 5.0, 5.2, 5.3])
data2 = np.array([4.8, 4.9, 5.0, 5.1, 5.2])
stat, p_value = stats.ttest_ind(data1, data2)
print("T-statistic:", stat)
print("P-value:", p_value)
Output
P-value: 0.537...
Explanation: The p-value of around 0.54 indicates that there is no significant difference between the means of the two samples.
ANOVA
Analysis of Variance (ANOVA) checks whether there are significant differences between the means of multiple groups. f_oneway
from scipy.stats
is commonly used for one-way ANOVA.
Example
import numpy as np
from scipy import stats
group1 = np.array([5.1, 4.9, 5.0, 5.2, 5.3])
group2 = np.array([4.8, 4.9, 5.0, 5.1, 5.2])
group3 = np.array([5.2, 5.3, 5.4, 5.5, 5.6])
stat, p_value = stats.f_oneway(group1, group2, group3)
print("F-statistic:", stat)
print("P-value:", p_value)
Output
P-value: 0.045...
Explanation: The p-value of around 0.045 indicates that there is a significant difference between the means of the three groups at the 0.05 significance level.
Non-Parametric Tests
When data doesn’t meet parametric test assumptions (like normal distribution), you can use non-parametric tests such as the Wilcoxon signed-rank test or the Mann-Whitney U test.
Example: Mann-Whitney U Test
import numpy as np
from scipy import stats
group1 = np.array([5.1, 4.9, 5.0, 5.2, 5.3])
group2 = np.array([4.8, 4.9, 5.0, 5.1, 5.2])
stat, p_value = stats.mannwhitneyu(group1, group2)
print("U-statistic:", stat)
print("P-value:", p_value)
Output
P-value: 0.345...
Explanation: The p-value of around 0.345 indicates that there is no significant difference between the distributions of the two groups.
Correlation Tests
Correlation tests measure the strength and direction of the relationship between two variables. Use pearsonr
for Pearson correlation or spearmanr
for Spearman rank correlation.
Example: Pearson Correlation
import numpy as np
from scipy import stats
x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 4, 6, 8, 10])
correlation, p_value = stats.pearsonr(x, y)
print("Correlation coefficient:", correlation)
print("P-value:", p_value)
Output
P-value: 0.0
Explanation: The correlation coefficient of 1.0 indicates a perfect positive linear relationship between the two variables, with a p-value of 0.0 indicating that this result is highly significant.
Key Takeaways
- Wide Range: T-tests, ANOVA, correlation, non-parametric tests, and more.
- Easy Syntax: Single function calls for standard hypothesis tests.
- Interpretation: P-values and test statistics guide decision-making.
- Reliability: Builds on established statistical formulas and references.