The Infamous p-value
Students and professionals everywhere recite the p-value in terms of their data sets, with varying accuracy.
What is it, and what is it not?
First things first, it IS arbitrary, but it IS useful. It’s most useful in the hands of a capable scientist, whom can put it in context of their data, and the overall picture. It can be used and abused, but it can also be helpful in following a line of evidence, or an experimental thread.
Most importantly, YOU should understand it, so YOU can make an informed decision as to whether you’re being sold false goods, cherry picked results, or “meh” information.
Let’s skip the long-winded narrative, and get right to it. Per Wiley’s 2013 textbook on Medical Statistics(4th Ed.), a p-value can be interpreted as: “the probability of obtaining the observed difference, or one more extreme, if the null hypothesis is true.”
Not too scary right? But, there are layers here.
Probability of the observed difference
Or one more extreme
If the null hypothesis is true
The observed difference is referring to the statistical assay conducted prior to obtaining the p-value. In hypothesis testing, the null hypothesis is one of two metrics we use to determine the relationship of our results. It’s conjugate is the alternative hypothesis. Basically, you can frame it as an if/else statement of models. If you’re testing a new drug that increases enzyme kinetics, you could set your null hypothesis that the drug has no effect. Your alternative hypothesis is that it DOES have an effect.
So, then the probability of the observed difference(Do enzyme rates increase?), IF the null hypothesis is true(There is no difference in enzyme rates between the group treated with the experimental drug, and the group that was not.)
So,
Ho = No Difference in enzyme rates due to drug treatment
Ha = Significant difference in enzyme rates
An oversimplified model could be…
def pharmaceuticalCompany(research):
#
#analyze research
#
If Ho:
Ha = False
Return (spendMoreMoney)
Else:
Ho = False
Return (printMoney)
So, cheeky psuedocode aside. The probability of obtaining the observed difference, if Ho = True, should be a tiny number if we can reject the null hypothesis with any sense of confidence.
This brings us to the common(and commonly criticized) threshold of 0.05. I should also note that 0.05 obviously correlates with a 5% probability that we observed our results by pure chance alone. That’s still a chance, and we should strive to recreate our experiments as often as possible to further lower the odds we are working on a chance outcome.
In a perfect world, anything less that 0.05 allows us to reject the null hypothesis, and anything greater leads us to fail to reject the null, and in turn, our experimental hypothesis. Unfortunately it is not a perfect world, and it is as arbitrary as it sounds, there is no magic number.
P-values matter, but they are a cog in a much larger machine. How you arrived at that p-value, is the absolute most important part of the equation.
Solid science produces solid results, but solid science is hard.
So, the next time someone shows you a graph and a p-value, put on your science hat, and ask more questions.