A reminder regarding p-values and their use

1 minute read

The use, and usefulness, of p-values is controversial. One of the reasons for this is that p-values often are misinterpreted. One such misinterpretation is that the p-value is a property of a particular hypothetical process. It is not, as

if the hypothetical process happens to be the true one, p-values are uniformly distributed in the case of a continuous and invertible cumulative density function.

What follows is a formal presentation of the above statement. Let X be a random variable with distribution function FX=P(Xx), and let x be a realization of X. We can think of x of an example value of X. Consider for a moment the situation where we don’t know what FX is, but we think it is F0. One thing we can do with x is to compute p=1F0(x), which is the probability of observing x or something more extreme under our assumption F0. The standard use is to compare p with a pre-defined threshold τ, and say that we reject our hypothesis F0 if p<τ.

The p-value can be seen as a realization of the random variable Y=1F0(X). In the case where F0=FX, and FX is continuous and invertible, we get that:

FY(y)=P(Yy)=P(1FX(X)y)=P(X>F1X(1y))=1P(XF1X(1y))=1FX(F1X(1y))=1(1y)=y The uniform distribution on [0,1] has FY as distribution function, hence the p-values in our case are uniformly distributed.