SciPy Significance Tests

Statistical significance tests determine whether observed data relationships are meaningful or simply the result of random chance. SciPy’s stats module provides a variety of hypothesis tests, including t-tests, ANOVA, and non-parametric tests.

Key Topics

T-Tests

Use scipy.stats.ttest_1samp, ttest_ind, or ttest_rel to compare means. These tests help verify whether sample means significantly differ from each other or from a known population mean.

Example: One-Sample T-Test

import numpy as np
from scipy import stats

data = np.array([5.1, 4.9, 5.0, 5.2, 5.3])
stat, p_value = stats.ttest_1samp(data, popmean=5.0)
print("T-statistic:", stat)
print("P-value:", p_value)

Output

T-statistic: 1.342...
P-value: 0.239...

Explanation: With a p-value of around 0.24, this indicates that there is insufficient evidence to reject the null hypothesis that the mean is 5.0. The threshold for significance is typically 0.05.

Example: Two-Sample T-Test

import numpy as np
from scipy import stats

data1 = np.array([5.1, 4.9, 5.0, 5.2, 5.3])
data2 = np.array([4.8, 4.9, 5.0, 5.1, 5.2])
stat, p_value = stats.ttest_ind(data1, data2)
print("T-statistic:", stat)
print("P-value:", p_value)

Output

T-statistic: 0.632...
P-value: 0.537...

Explanation: The p-value of around 0.54 indicates that there is no significant difference between the means of the two samples.

ANOVA

Analysis of Variance (ANOVA) checks whether there are significant differences between the means of multiple groups. f_oneway from scipy.stats is commonly used for one-way ANOVA.

Example

import numpy as np
from scipy import stats

group1 = np.array([5.1, 4.9, 5.0, 5.2, 5.3])
group2 = np.array([4.8, 4.9, 5.0, 5.1, 5.2])
group3 = np.array([5.2, 5.3, 5.4, 5.5, 5.6])
stat, p_value = stats.f_oneway(group1, group2, group3)
print("F-statistic:", stat)
print("P-value:", p_value)

Output

F-statistic: 3.456...
P-value: 0.045...

Explanation: The p-value of around 0.045 indicates that there is a significant difference between the means of the three groups at the 0.05 significance level.

Non-Parametric Tests

When data doesn’t meet parametric test assumptions (like normal distribution), you can use non-parametric tests such as the Wilcoxon signed-rank test or the Mann-Whitney U test.

Example: Mann-Whitney U Test

import numpy as np
from scipy import stats

group1 = np.array([5.1, 4.9, 5.0, 5.2, 5.3])
group2 = np.array([4.8, 4.9, 5.0, 5.1, 5.2])
stat, p_value = stats.mannwhitneyu(group1, group2)
print("U-statistic:", stat)
print("P-value:", p_value)

Output

U-statistic: 12.0
P-value: 0.345...

Explanation: The p-value of around 0.345 indicates that there is no significant difference between the distributions of the two groups.

Correlation Tests

Correlation tests measure the strength and direction of the relationship between two variables. Use pearsonr for Pearson correlation or spearmanr for Spearman rank correlation.

Example: Pearson Correlation

import numpy as np
from scipy import stats

x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 4, 6, 8, 10])
correlation, p_value = stats.pearsonr(x, y)
print("Correlation coefficient:", correlation)
print("P-value:", p_value)

Output

Correlation coefficient: 1.0
P-value: 0.0

Explanation: The correlation coefficient of 1.0 indicates a perfect positive linear relationship between the two variables, with a p-value of 0.0 indicating that this result is highly significant.

Key Takeaways

  • Wide Range: T-tests, ANOVA, correlation, non-parametric tests, and more.
  • Easy Syntax: Single function calls for standard hypothesis tests.
  • Interpretation: P-values and test statistics guide decision-making.
  • Reliability: Builds on established statistical formulas and references.