ANOVA and Multiple Comparisons
One-Way ANOVA Test (go to the calculator)
The one-way ANOVA test checks the null assumption that the mean (average) of two or more groups is equal. The test tries to determine if the difference between the sample averages reflects a real difference between the groups, or is due to the random noise inside each group.
When the ANOVA test rejects the null assumption it only tells us that not all the means are equal. For more information, the tool also runs the Tukey HSD test, which compares each pair separately.
The one-way ANOVA model is identical to the linear regression model with one categorical variable — the group. When using linear regression, the results will produce the same ANOVA table and the same p-value.
Assumptions
- Independence — Independent groups and independent observations that represent the population.
- Normal distribution — The population distributes normally. This assumption is important for small sample sizes (n < 30).
The ANOVA calculator runs the Shapiro-Wilk test as part of the test run. - Equality of variances — The variances of all groups are equal. The ANOVA test is considered robust to this assumption when group sizes are similar (maximum sample size / minimum sample size < 1.5).
The ANOVA calculator runs Levene’s test as part of the test run.
Calculation
The model analyzes the differences between all observations and the overall average, and tries to determine whether the differences are only random or are also partially explained by the group (similar to linear regression).
As in standard deviation calculation, we use the sum of squares instead of the absolute difference.
SST — the sum of squares of the total differences.
SSG / SSB — the sum of squares of the differences caused by the group. The calculation is similar to SST but uses only the difference between the group’s average and the overall average.
SSE / SSW — the sum of squares of the differences within the groups. Similar to SST but uses only the differences between observations and their group averages.
ANOVA Table
💡 Hover over the cells for more details.
| Source | Degrees of Freedom | Sum of Squares | Mean Square | F statistic | P‑value |
|---|---|---|---|---|---|
| Groups (between groups) | k − 1 | $$SSG= \sum_{j=1}^{n_i}\sum_{i=1}^k (\bar{x}_{i}-\bar{x})^2 = \sum_{i=1}^k n_i(\bar{x}_i-\bar{x})^2$$ | $$MSG = \frac{SSG}{k - 1}$$ | $$F = \frac{MSG}{MSE}$$ | P(x > F) |
| Error (within groups) | n − k | $$SSE=\sum_{j=1}^{n_i}\sum_{i=1}^k (x_{ij}-\bar{x_i})^2 = \sum_{i=1}^k (n_i-1)S_i^2$$ | $$MSE = \frac{SSE}{n - k}$$ | ||
| Total | n − 1 | $$SST = \sum_{j=1}^{n_i}\sum_{i=1}^k (x_{ij}-\bar{x})^2 = SSG + SSE$$ | $$\text{Sample Variance} = \frac{SST}{n - 1}$$ |
ni — Sample size of group i.
n — Overall sample size, includes all groups (Σni, i = 1 to k).
x̄i — Average of group i.
x̄ — Overall average (Σxi,j / n, i = 1 to k, j = 1 to ni).
Si — Standard deviation of group i.
Effect Size
Prior effect size
If you are not sure which effect size value and type to choose, select “Medium” effect size and the tool will choose the ‘f’ type with the relevant value.
There are several methods to calculate the effect size.
- Eta-squared
$$\eta^2=\frac{SSG}{SST} \qquad \eta^2=\frac{f^2}{1+f^2} \qquad f^2=\frac{\eta^2}{1-\eta^2}$$ This is the ratio of the explained sum of squares to the total sum of squares, equivalent to R2 in linear regression. - Cohen’s f — Method 1 (used by this tool)
$$f=\sqrt{\frac{SSG}{SSE}}$$ This is the ratio of the explained sum of squares to the unexplained sum of squares (random noise). - Cohen’s f — Method 2
$$f=\sqrt{\frac{\sum_{i=1}^k(\bar{x}_{i}-\bar{x})^2}{k \cdot \sigma^2}}$$
When using equal group sizes both methods give the same result since ni = n/k.
Method 1 $$f^2=\frac{\sum_{i=1}^k n_i(\bar{x}_i-\bar{x})^2}{\sum_{j=1}^{n_i}\sum_{i=1}^k (x_{ij}-\bar{x})^2}$$ Method 2 $$\sigma^2=\frac{\sum_{j=1}^{n_i}\sum_{i=1}^k (x_{ij}-\bar{x})^2}{n} \qquad f^2 = \frac{\sum_{i=1}^k(\bar{x}_{i}-\bar{x})^2 \cdot \frac{n}{k}}{\sum_{j=1}^{n_i}\sum_{i=1}^k (x_{ij}-\bar{x})^2}$$Multiple Comparisons
When running a single test, the significance level (α) is the maximum allowed Type I error. When running n multiple comparisons, each at significance level α, the probability that at least one test incorrectly rejects a true null hypothesis grows much larger than α: $$\alpha'=1-(1-\alpha)^n$$ For example, with 6 comparisons (n = 6) and α = 0.05:
α' = 1 − (1 − 0.05)6 = 0.265
To keep α' = 0.05 we need a much smaller significance level in each single test.
α — allowed Type I error probability for a single test. This is the corrected α.
α' — allowed Type I error probability for all tests as a package. This is the required α.
Bonferroni Correction
Bonferroni suggested α = α'/n. The correction assumes independent tests.
Bonferroni Correction Calculator
Any change in any field recalculates the other fields. Changing n or Overall α' calculates the Corrected α; changing Corrected α calculates the Overall α'.
Sidák Correction
The Bonferroni correction is an approximation. The exact calculation is:
$$\alpha'=1-(1-\alpha)^n \quad \Rightarrow \quad \alpha=1-\sqrt[n]{1-\alpha'}$$Sidák Correction Calculator
This is the probability of a Type I error in at least one test when all null assumptions are correct.
Any change in any field recalculates the other fields. Changing n or Overall α' calculates the Corrected α; changing Corrected α calculates the Overall α'.
Holm Method
The Bonferroni/Sidák corrections are conservative, protecting against Type I error at the expense of increasing Type II error. They also assume independent tests, which is usually not the case. The Holm correction provides a better balance. Steps:
- Rank tests by p-value: R = 1 for the smallest p-value, R = n for the largest.
- $$\alpha_{(i)}=\frac{\alpha'}{n+1-R_{(i)}}$$
- Stop at the first non-significant test; all subsequent tests are also non-significant (H0 accepted).
Holm Method Calculator
Enter p-values separated by comma ,, Space, or Enter.
Any change in Overall α' or P-values recalculates Corrected α and H0.
Example explanation (four comparisons):
n = 4.
0.05 / (4 + 1 − 1) = 0.0125. Since 0.011 < 0.0125 this comparison is significant.
0.05 / (4 + 1 − 2) = 0.0167. Since 0.026 > 0.0167 this comparison is not significant.
The algorithm stops here; all remaining comparisons are not significant. (Note: even if a later comparison would have been significant, the algorithm does not continue.)
Tukey HSD Test / Tukey-Kramer Test (go to the calculator)
The Tukey HSD (Honestly Significant Difference) test is a multiple comparison test that compares the means of each pair of groups. It uses the Studentized range distribution rather than the regular t-test. It is only a two-tailed test, since the null assumption is equal means.
The Tukey HSD test assumes equal groups; the Tukey-Kramer extension handles unequal groups, so Tukey HSD is a special case of Tukey-Kramer.
The ANOVA calculator runs both the ANOVA test and the Tukey-Kramer test.
Assumptions
- Independence — Independent groups and independent observations.
- Normal distribution — The population distributes normally.
- Equality of variances — The variances of all groups are equal.
Calculation
For each pair of groups i and j:
$$Difference = |\bar{x}_i-\bar{x}_j| \qquad SE=\sqrt{\frac{MSW}{2}\left(\frac{1}{n_i}+\frac{1}{n_j}\right)}$$ Test statistic $$Q=\frac{Difference}{SE}$$P-value and Q1−α percentile use the cumulative Studentized range distribution:
$$p\text{-}value=P(X \leq Q,\,Groups,\,DFE) \qquad P(X \leq Q_{1-\alpha})=1-\alpha$$ Confidence Interval $$CI = Difference \pm SE \cdot Q_{1-\alpha}$$Any difference larger than the critical mean is significant:
$$Critical\;Mean=SE \cdot Q_{1-\alpha}$$Levene’s Test (go to the calculator)
Levene’s test checks the null assumption that the standard deviation of two or more groups is equal. It tries to determine if the difference between the variances reflects a real group difference or is due to random noise.
The test runs the ANOVA model on the absolute differences from each group’s center, using either the mean or the median.
Assumptions
- Independence — Independent groups and independent observations.
- Normal distribution — The population distributes normally. Important for small sample sizes (n < 30).
The ANOVA calculator runs the Shapiro-Wilk test as part of the run.
Calculation
The general recommendation is to use the mean for symmetrical distributions or n > 30, and the median for asymmetrical distributions. Since the median and mean are nearly identical in symmetrical distributions, the median is usually safe to use. Using the median is called the Brown-Forsythe test.
- Using the mean: $$X'_{ij}=X_{ij}-\bar{X}_i \quad (\bar{X}_i \text{ is the mean of group } i)$$
- Using the median: $$X'_{ij}=X_{ij}-\tilde{X}_i \quad (\tilde{X}_i \text{ is the median of group } i)$$
Example
Levene’s test example using medians.
Observations $$\begin{bmatrix}Group1&Group2&Group3\\1&3&13\\2&4&15\\2&\textcolor{#1a7a1a}{\textbf{5}}&16\\ \textcolor{#1a7a1a}{\textbf{3}}&\textcolor{#1a7a1a}{\textbf{6}}&\textcolor{#1a7a1a}{\textbf{16}}\\4&8&19\\5&11&21\\6&&22\end{bmatrix}$$ Medians $$\begin{bmatrix}Group1&Group2&Group3\\3&5.5&16\end{bmatrix}$$ Differences from medians $$\begin{bmatrix}Group1&Group2&Group3\\2.0&2.5&3.0\\1.0&1.5&1.0\\1.0&0.5&0\\0&0.5&0\\1.0&2.5&3.0\\2.0&5.5&5.0\\3.0&&6.0\end{bmatrix}$$Now run a regular ANOVA test on the differences.
Kruskal-Wallis Test (go to the calculator)
The Kruskal-Wallis test is the non-parametric equivalent of the one-way ANOVA test.
It checks the null assumption that when selecting a value from each of n groups, each group has an equal probability of containing the highest value.
When groups have a similar distribution shape, the null assumption extends to state that the medians are equal.
With two groups, the Kruskal-Wallis test is equivalent to the Mann-Whitney U test (same result as the Mann-Whitney U test calculator with Z approximation and no continuity correction). The test tries to determine if the difference between the ranks reflects a real difference between the groups, or is due to the random noise inside each group. When the Kruskal-Wallis test rejects the null assumption it only tells that not all groups have an equal probability of containing the highest value. For more information, the tool also runs Multiple comparisons, compares each pair separately.
You may choose the method to compare each pair: Dunn's or Mann Whitney U.
Assumptions
- Independence — Independent groups and independent observations.
- Variables — The group is a categorical variable; the dependent variable may be continuous or ordinal.
- Similar shape and scale — Relevant only when the null hypothesis assumes equal medians.
Calculation
Test statistic
$$H=\frac{12}{n(n+1)}\sum_{j=1}^{k}\frac{R_j^2}{n_j}-3(n+1)$$Rj — rank sum of group j.
nj — sample size of group j.
n — total sample size across all groups (n = n1 + … + nk).
k — number of groups.
Example
Three groups of observations:
| Group 1 | Group 2 | Group 3 |
|---|---|---|
| 1 | 2 | 4 |
| 3 | 8 | 7 |
| 6 | 13 | 16 |
| 9 | 15 | 17 |
| 12 | 19 | |
| 21 |
Rank all values together across all groups:
| Group | Value | Rank |
|---|---|---|
| Group 1 | 1 | 1 |
| Group 2 | 2 | 2 |
| Group 1 | 3 | 3 |
| Group 3 | 4 | 4 |
| Group 1 | 6 | 5 |
| Group 3 | 7 | 6 |
| Group 2 | 8 | 7 |
| Group 1 | 9 | 8 |
| Group 1 | 12 | 9 |
| Group 2 | 13 | 10 |
| Group 2 | 15 | 11 |
| Group 3 | 16 | 12 |
| Group 3 | 17 | 13 |
| Group 2 | 19 | 14 |
| Group 2 | 21 | 15 |
R1 = 1+3+5+8+9 = 26, n1 = 5
R2 = 2+7+10+11+14+15 = 59, n2 = 6
R3 = 4+6+12+13 = 35, n3 = 4
n = 5 + 6 + 4 = 15
No tied ranks in this example, so no correction is needed. H = 3.0808.
Mean ranks: MeanRank1 = 26/5 = 5.2, MeanRank2 = 59/6 = 9.8333, MeanRank3 = 35/4 = 8.75.