What is power?

Power is the measure of a test's ability to accurately detect that the null hypothesis is false. Specifically, power is the probability that a test with the specified assumptions (sample size, difference, standard deviation, alpha level, and type of alternate hypothesis) will correctly reject the null hypothesis when the alternate hypothesis is true.

If a test has low power, you may fail to detect an effect and mistakenly conclude that none exists. If the power of your test is too high, very small and possibly uninteresting effects can become significant.

No test is perfect; there is always the possibility that the results of a test will lead you to reject the null hypothesis (H0) when it is actually true (a type I error) or to fail to reject H0 when it is actually false (a type II error). This is because in order to estimate population means, you have to take random samples, and random samples are just that, random. Thus, it is always possible that your sample mean will end up being very different from the population mean.

For example, suppose that a certain normally distributed population has a mean (m) of 10 and a standard deviation (s) of 2. This means that 95.44% of the values in this population are between 6 and 14. However, it is always possible that you could select 10 observations at random and end up with a sample mean of 4. From such a sample you would never guess that the true mean of the population is actually 10!

Of course, the odds of getting such a sample are incredibly small, but it is nevertheless possible. It is an unfortunate fact of life that sampling error can sometimes lead you to the wrong conclusion. While you can't know when this will occur, you can estimate how often it will occur. That's where power comes in.

For example, suppose you are conducting a one-sample t-test to see if the mean volume of product dispensed into shampoo bottles in your factory is different from the target volume of 8 oz. You decide to sample 10 randomly selected bottles. If m is actually 7.5 oz (the bottles are being under filled by 0.5 oz) and s is actually 0.43 oz, then the test has a power of 0.9039.

A power value of 0.9039 means that if you go out and repeat the same experiment many times (taking a new random sample each time), about 90.39% of the time you will end up correctly rejecting the null hypothesis. The other 9.61% of the time, sampling error will cause you to fail to reject H0, even though it is really false. Of course, you are not likely to go out and repeat the test more than once, but it is good to know that the odds of getting a misleading sample are fairly small.