ANOVA and Multiple Comparisons

One-Way ANOVA Test (go to the calculator)

The one-way ANOVA test checks the null assumption that the mean (average) of two or more groups is equal. The test tries to determine if the difference between the sample averages reflects a real difference between the groups, or is due to the random noise inside each group.

When the ANOVA test rejects the null assumption it only tells us that not all the means are equal. For more information, the tool also runs the Tukey HSD test, which compares each pair separately.

The one-way ANOVA model is identical to the linear regression model with one categorical variable — the group. When using linear regression, the results will produce the same ANOVA table and the same p-value.

Assumptions

  • Independence — Independent groups and independent observations that represent the population.
  • Normal distribution — The population distributes normally. This assumption is important for small sample sizes (n < 30).
    The ANOVA calculator runs the Shapiro-Wilk test as part of the test run.
  • Equality of variances — The variances of all groups are equal. The ANOVA test is considered robust to this assumption when group sizes are similar (maximum sample size / minimum sample size < 1.5).
    The ANOVA calculator runs Levene’s test as part of the test run.

Calculation

The model analyzes the differences between all observations and the overall average, and tries to determine whether the differences are only random or are also partially explained by the group (similar to linear regression).
As in standard deviation calculation, we use the sum of squares instead of the absolute difference.

SST — the sum of squares of the total differences.
SSG / SSB — the sum of squares of the differences caused by the group. The calculation is similar to SST but uses only the difference between the group’s average and the overall average.
SSE / SSW — the sum of squares of the differences within the groups. Similar to SST but uses only the differences between observations and their group averages.

ANOVA SST formula

ANOVA Table

💡 Hover over the cells for more details.

SourceDegrees of FreedomSum of SquaresMean SquareF statisticP‑value
Groups
(between groups)
k − 1$$SSG= \sum_{j=1}^{n_i}\sum_{i=1}^k (\bar{x}_{i}-\bar{x})^2 = \sum_{i=1}^k n_i(\bar{x}_i-\bar{x})^2$$$$MSG = \frac{SSG}{k - 1}$$$$F = \frac{MSG}{MSE}$$P(x > F)
Error
(within groups)
n − k$$SSE=\sum_{j=1}^{n_i}\sum_{i=1}^k (x_{ij}-\bar{x_i})^2 = \sum_{i=1}^k (n_i-1)S_i^2$$$$MSE = \frac{SSE}{n - k}$$
Totaln − 1$$SST = \sum_{j=1}^{n_i}\sum_{i=1}^k (x_{ij}-\bar{x})^2 = SSG + SSE$$$$\text{Sample Variance} = \frac{SST}{n - 1}$$
k — Number of groups.
ni — Sample size of group i.
n — Overall sample size, includes all groups (Σni, i = 1 to k).
i — Average of group i.
— Overall average (Σxi,j / n, i = 1 to k, j = 1 to ni).
Si — Standard deviation of group i.

Effect Size

Prior effect size
If you are not sure which effect size value and type to choose, select “Medium” effect size and the tool will choose the ‘f’ type with the relevant value.

There are several methods to calculate the effect size.

  • Eta-squared
    $$\eta^2=\frac{SSG}{SST} \qquad \eta^2=\frac{f^2}{1+f^2} \qquad f^2=\frac{\eta^2}{1-\eta^2}$$ This is the ratio of the explained sum of squares to the total sum of squares, equivalent to R2 in linear regression.
  • Cohen’s f — Method 1 (used by this tool)
    $$f=\sqrt{\frac{SSG}{SSE}}$$ This is the ratio of the explained sum of squares to the unexplained sum of squares (random noise).
  • Cohen’s f — Method 2
    $$f=\sqrt{\frac{\sum_{i=1}^k(\bar{x}_{i}-\bar{x})^2}{k \cdot \sigma^2}}$$

When using equal group sizes both methods give the same result since ni = n/k.

Method 1 $$f^2=\frac{\sum_{i=1}^k n_i(\bar{x}_i-\bar{x})^2}{\sum_{j=1}^{n_i}\sum_{i=1}^k (x_{ij}-\bar{x})^2}$$ Method 2 $$\sigma^2=\frac{\sum_{j=1}^{n_i}\sum_{i=1}^k (x_{ij}-\bar{x})^2}{n} \qquad f^2 = \frac{\sum_{i=1}^k(\bar{x}_{i}-\bar{x})^2 \cdot \frac{n}{k}}{\sum_{j=1}^{n_i}\sum_{i=1}^k (x_{ij}-\bar{x})^2}$$

Multiple Comparisons

When running a single test, the significance level (α) is the maximum allowed Type I error. When running n multiple comparisons, each at significance level α, the probability that at least one test incorrectly rejects a true null hypothesis grows much larger than α: $$\alpha'=1-(1-\alpha)^n$$ For example, with 6 comparisons (n = 6) and α = 0.05:
α' = 1 − (1 − 0.05)6 = 0.265

To keep α' = 0.05 we need a much smaller significance level in each single test.
α — allowed Type I error probability for a single test. This is the corrected α.
α' — allowed Type I error probability for all tests as a package. This is the required α.

Bonferroni Correction

Bonferroni suggested α = α'/n. The correction assumes independent tests.

Bonferroni Correction Calculator

The number of tests / pairs.
Overall significance level.
One pair’s significance level.
Results update automatically when leaving a field or pressing Enter

Any change in any field recalculates the other fields. Changing n or Overall α' calculates the Corrected α; changing Corrected α calculates the Overall α'.

Sidák Correction

The Bonferroni correction is an approximation. The exact calculation is:

$$\alpha'=1-(1-\alpha)^n \quad \Rightarrow \quad \alpha=1-\sqrt[n]{1-\alpha'}$$

Sidák Correction Calculator

The number of tests / pairs.
Overall significance level.
One pair’s significance level.
Results update automatically when leaving a field or pressing Enter
When you use a corrected significance level of α = 0.025321 in any single test, the overall significance level is α' = 0.05.
This is the probability of a Type I error in at least one test when all null assumptions are correct.

Any change in any field recalculates the other fields. Changing n or Overall α' calculates the Corrected α; changing Corrected α calculates the Overall α'.

Holm Method

The Bonferroni/Sidák corrections are conservative, protecting against Type I error at the expense of increasing Type II error. They also assume independent tests, which is usually not the case. The Holm correction provides a better balance. Steps:

  1. Rank tests by p-value: R = 1 for the smallest p-value, R = n for the largest.
  2. $$\alpha_{(i)}=\frac{\alpha'}{n+1-R_{(i)}}$$
  3. Stop at the first non-significant test; all subsequent tests are also non-significant (H0 accepted).

Holm Method Calculator

Results update automatically when leaving a field or pressing Enter

Enter p-values separated by comma ,, Space, or Enter.
Any change in Overall α' or P-values recalculates Corrected α and H0.

Example explanation (four comparisons):
n = 4.
0.05 / (4 + 1 − 1) = 0.0125. Since 0.011 < 0.0125 this comparison is significant.
0.05 / (4 + 1 − 2) = 0.0167. Since 0.026 > 0.0167 this comparison is not significant.
The algorithm stops here; all remaining comparisons are not significant. (Note: even if a later comparison would have been significant, the algorithm does not continue.)

Tukey HSD Test / Tukey-Kramer Test (go to the calculator)

The Tukey HSD (Honestly Significant Difference) test is a multiple comparison test that compares the means of each pair of groups. It uses the Studentized range distribution rather than the regular t-test. It is only a two-tailed test, since the null assumption is equal means.

The Tukey HSD test assumes equal groups; the Tukey-Kramer extension handles unequal groups, so Tukey HSD is a special case of Tukey-Kramer.
The ANOVA calculator runs both the ANOVA test and the Tukey-Kramer test.

Assumptions

  • Independence — Independent groups and independent observations.
  • Normal distribution — The population distributes normally.
  • Equality of variances — The variances of all groups are equal.

Calculation

For each pair of groups i and j:

$$Difference = |\bar{x}_i-\bar{x}_j| \qquad SE=\sqrt{\frac{MSW}{2}\left(\frac{1}{n_i}+\frac{1}{n_j}\right)}$$ Test statistic $$Q=\frac{Difference}{SE}$$

P-value and Q1−α percentile use the cumulative Studentized range distribution:

$$p\text{-}value=P(X \leq Q,\,Groups,\,DFE) \qquad P(X \leq Q_{1-\alpha})=1-\alpha$$ Confidence Interval $$CI = Difference \pm SE \cdot Q_{1-\alpha}$$

Any difference larger than the critical mean is significant:

$$Critical\;Mean=SE \cdot Q_{1-\alpha}$$

Levene’s Test (go to the calculator)

Levene’s test checks the null assumption that the standard deviation of two or more groups is equal. It tries to determine if the difference between the variances reflects a real group difference or is due to random noise.

The test runs the ANOVA model on the absolute differences from each group’s center, using either the mean or the median.

Assumptions

  • Independence — Independent groups and independent observations.
  • Normal distribution — The population distributes normally. Important for small sample sizes (n < 30).
    The ANOVA calculator runs the Shapiro-Wilk test as part of the run.

Calculation

The general recommendation is to use the mean for symmetrical distributions or n > 30, and the median for asymmetrical distributions. Since the median and mean are nearly identical in symmetrical distributions, the median is usually safe to use. Using the median is called the Brown-Forsythe test.

  • Using the mean: $$X'_{ij}=X_{ij}-\bar{X}_i \quad (\bar{X}_i \text{ is the mean of group } i)$$
  • Using the median: $$X'_{ij}=X_{ij}-\tilde{X}_i \quad (\tilde{X}_i \text{ is the median of group } i)$$

Example

Levene’s test example using medians.

Observations $$\begin{bmatrix}Group1&Group2&Group3\\1&3&13\\2&4&15\\2&\textcolor{#1a7a1a}{\textbf{5}}&16\\ \textcolor{#1a7a1a}{\textbf{3}}&\textcolor{#1a7a1a}{\textbf{6}}&\textcolor{#1a7a1a}{\textbf{16}}\\4&8&19\\5&11&21\\6&&22\end{bmatrix}$$ Medians $$\begin{bmatrix}Group1&Group2&Group3\\3&5.5&16\end{bmatrix}$$ Differences from medians $$\begin{bmatrix}Group1&Group2&Group3\\2.0&2.5&3.0\\1.0&1.5&1.0\\1.0&0.5&0\\0&0.5&0\\1.0&2.5&3.0\\2.0&5.5&5.0\\3.0&&6.0\end{bmatrix}$$

Now run a regular ANOVA test on the differences.

Kruskal-Wallis Test (go to the calculator)

The Kruskal-Wallis test is the non-parametric equivalent of the one-way ANOVA test.
It checks the null assumption that when selecting a value from each of n groups, each group has an equal probability of containing the highest value.
When groups have a similar distribution shape, the null assumption extends to state that the medians are equal.

With two groups, the Kruskal-Wallis test is equivalent to the Mann-Whitney U test (same result as the Mann-Whitney U test calculator with Z approximation and no continuity correction). The test tries to determine if the difference between the ranks reflects a real difference between the groups, or is due to the random noise inside each group. When the Kruskal-Wallis test rejects the null assumption it only tells that not all groups have an equal probability of containing the highest value. For more information, the tool also runs Multiple comparisons, compares each pair separately.
You may choose the method to compare each pair: Dunn's or Mann Whitney U.

Assumptions

  • Independence — Independent groups and independent observations.
  • Variables — The group is a categorical variable; the dependent variable may be continuous or ordinal.
  • Similar shape and scale — Relevant only when the null hypothesis assumes equal medians.

Calculation

Test statistic

$$H=\frac{12}{n(n+1)}\sum_{j=1}^{k}\frac{R_j^2}{n_j}-3(n+1)$$

Rj — rank sum of group j.
nj — sample size of group j.
n — total sample size across all groups (n = n1 + … + nk).
k — number of groups.

Example

Three groups of observations:

Group 1Group 2Group 3
124
387
61316
91517
1219
21

Rank all values together across all groups:

GroupValueRank
Group 111
Group 222
Group 133
Group 344
Group 165
Group 376
Group 287
Group 198
Group 1129
Group 21310
Group 21511
Group 31612
Group 31713
Group 21914
Group 22115

R1 = 1+3+5+8+9 = 26,   n1 = 5
R2 = 2+7+10+11+14+15 = 59,   n2 = 6
R3 = 4+6+12+13 = 35,   n3 = 4
n = 5 + 6 + 4 = 15

$$H = \frac{12}{15 \cdot (15 + 1)}\left(\frac{26^2}{5}+\frac{59^2}{6}+\frac{35^2}{4}\right) - 3 \cdot (15 + 1) = 3.0808$$

No tied ranks in this example, so no correction is needed. H = 3.0808.
Mean ranks:   MeanRank1 = 26/5 = 5.2,   MeanRank2 = 59/6 = 9.8333,   MeanRank3 = 35/4 = 8.75.