The p-value is the type I error probability, to reject a correct H_{0}, under the assumption that H_{0} is correct.

When the p-value is small enough, less than α, the risk of type I error is low, and we reject the H_{0}.

In the following two-tailed example the p-value is greater than α, hence we can not reject the null assumption.

Please click the "P-Value" radio button.

Every student learns how important it is to get a significant result, but the p-value is only one component in a package.

The "power" and "effect size" are not less important.

All the bellow examples are fake, the made-up data was created only for demonstration. The used significance level is 0.05.

Several researchers looked for high blood pressure treatment.

Research in New South Walls discovered that garlic helps to reduce blood pressure with p-value equals **0.007**.

The test power was very strong**0.996**, to discover a medium effect size of 0.5.

But the standardized effect size is small (0.26), and the unstandardized effect size is almost meaningless (0.109), reducing the blood pressure by only 0.109 mmHg.

**Data**

The test power was very strong

Result | Value |
---|---|

P-value | 0.00710160 |

Sample size | 114 |

Test Power | 0.9964 |

Standardized effect size | Small, 0.26 |

Difference | -0.109 |

x1 <- c(145.2,154.9,159.8,170.4,158,160.2,161.5,152.8,140.6,133.2,121.8,133.5,125.7,127.2,143.5,124.6,122.3,136.1,144.1,154.8,161.2,132.3,122.3,133,146.6,121.4,133.6,139.4,132.7,121.3,119.6,141.2, 155.7,166.5,132.1,144,152.2,155.8,144.6,154.9,159.8,170.3,158.1,160.3,161.4,152.8,140.4,133.1,121.4,134.3,126.1,127,144,125,121.6,135.4,144.1,155,160.5,131.8,122.4,132.7,147.1,121.9,133.5,138.6,132.5,121,120,140.9,155.6,167.3,131.9,144,152.1,155.5,145,154.6,159.5,169.8,157.8,160.1,162.3,152.9,140.8,132.7,122.2,134.1,125.9,126.8,144.3,124.9,122,135.8,144.2,154.5,160.5,131.9,121.5,132.5,146.5,122.1,134,138.6,133.2,120.8,120,141.1,155.4,166.9,132.2,143.6,151.5,155.8)

x2 <-c(144.5,155.5,160.4,170.4,158.1,160.3,162.2,153.1,141,132.8,121.8,134.2,125.7,127.3,144.3,125,122.4,135.6,144.1,154.5,160.5,131.7,121.7,132.9,146.6,121.5,134.4,138.8,133.3,121.1,119.6,141.2,155.8, 167.1,132.2,143.9,152.3,155.7,145.5,154.6,159.9,170.1,157.8,160.3,162,153.3,141.1,133.2,121.7,133.8,126,127,143.7,124.9,121.6,136.3,143.8,155,160.7,132.3,121.8,132.9,146.5,122.1,133.7,138.7,132.6,121.5,119.7,140.9,156.3,167.4,131.6,144.2,151.9,156.5,145,155.1,160,170.4,157.8,159.9,161.7,153.2,140.5,133.2,122,134.2,126.3,126.6,143.6,124.8,122.5,135.7,144,155.3,161.2,132.5,122,133.3,146.9,122.1,133.9,138.8,132.6,120.5,120.1,140.8,156.4,166.9,131.8,144.1,152.2,156.4)

t.test(x1, x2, alternative = "two.sided", paired = TRUE, mu = 0, conf.level = 0.99)

x2 <-c(144.5,155.5,160.4,170.4,158.1,160.3,162.2,153.1,141,132.8,121.8,134.2,125.7,127.3,144.3,125,122.4,135.6,144.1,154.5,160.5,131.7,121.7,132.9,146.6,121.5,134.4,138.8,133.3,121.1,119.6,141.2,155.8, 167.1,132.2,143.9,152.3,155.7,145.5,154.6,159.9,170.1,157.8,160.3,162,153.3,141.1,133.2,121.7,133.8,126,127,143.7,124.9,121.6,136.3,143.8,155,160.7,132.3,121.8,132.9,146.5,122.1,133.7,138.7,132.6,121.5,119.7,140.9,156.3,167.4,131.6,144.2,151.9,156.5,145,155.1,160,170.4,157.8,159.9,161.7,153.2,140.5,133.2,122,134.2,126.3,126.6,143.6,124.8,122.5,135.7,144,155.3,161.2,132.5,122,133.3,146.9,122.1,133.9,138.8,132.6,120.5,120.1,140.8,156.4,166.9,131.8,144.1,152.2,156.4)

t.test(x1, x2, alternative = "two.sided", paired = TRUE, mu = 0, conf.level = 0.99)

Research in New Victoria discovered that half an hour of daily exercise doesn't help in reducing the blood pressure with a p-value equals **0.07**.

But the test power was very week **0.034**, to discover a medium effect size of 0.5.

The standardized effect size is large (1.1), and the unstandardized effect size is meaningful (2.1), reducing the blood pressure by 2.1 mmHg.

**Data**

Result | Value |
---|---|

P-value | 0.071 |

Sample size | 5 |

Test Power | 0.034 |

Standardized effect size | large, 1.1 |

Difference | -2.1 |

The standardized effect size is large (1.1), and the unstandardized effect size is meaningful (2.1), reducing the blood pressure by 2.1 mmHg.

x1<-c(142.7,154,157.7,171.2,158.1)

x2<-c(144.6,155.6,162.3,170.6,161.1)

t.test(x1, x2, alternative = "two.sided", paired = TRUE, mu = 0, conf.level = 0.99)

x2<-c(144.6,155.6,162.3,170.6,161.1)

t.test(x1, x2, alternative = "two.sided", paired = TRUE, mu = 0, conf.level = 0.99)

The seven dwarfs don't live together, each now lives in a different country.

When they reached the age of 140, they started suffering from high blood pressure. Since their mother used to tell them that onion cures any disease, each of them decided to conduct research, without telling his brothers.

Each dwarf uses a sample size of 114. Following the results:

The six dwarfs were disappointed. They left the researches in the drawer. Nobody will be interested to know that onion doesn't help to reduce blood pressure.

Grumpy publish an article on the "Nature Medicine" Journal.

Does onion help to reduce blood pressure? Multiple comparisons

When they reached the age of 140, they started suffering from high blood pressure. Since their mother used to tell them that onion cures any disease, each of them decided to conduct research, without telling his brothers.

Each dwarf uses a sample size of 114. Following the results:

Dwarf | P_value |
---|---|

Doc | 0.09 |

Grumpy | 0.049 |

Happy | 0.25 |

Sleepy | 0.17 |

Bashful | 0.31 |

Sneezy | 0.22 |

Dopey | 0.39 |

Grumpy publish an article on the "Nature Medicine" Journal.

Does onion help to reduce blood pressure? Multiple comparisons

The garlic treatment is significant, but the effect size is very small, it isn't really useful treatment.

The daily exercise treatment is not Significant, but there are several reasons to prefer this treatment.

1. The test power is weak, hence the test may not have enough power to reject an incorrect null assumption. We can't know if the result is significant.

2. The p-value 0.07 is larger than the significance level 0.05, but 0.05 is not a "holy" number. You may choose a different significance level like 0.1 or 0.01 if you decided **before** the experiment that this is the appropriate risk. Usually, you don't have the freedom to choose the α when all the researchers in your field use the same value.

3. The effect size is large, so if H_{0} is correct, there is a small probability, 0.07, that the treatment doesn't help, but if it helps, there is a high chance that it is a meaningful reduction.

The correct solution is to repeat the daily exercise research with larger sample size.

Result | Garlic | Exercise |
---|---|---|

P-value | 0.00710160 | 0.071 |

Sample size | 114 | 5 |

Test Power | 0.9964 | 0.034 |

Standardized effect size | Small, 0.26 | Large, 1.1 |

Difference | -0.109 | -2.1 |