A reminder regarding p-values and their use

1 minute read

The use, and usefulness, of \(p\)-values is controversial. One of the reasons for this is that \(p\)-values often are misinterpreted. One such misinterpretation is that the \(p\)-value is a property of a particular hypothetical process. It is not, as

if the hypothetical process happens to be the true one, \(p\)-values are uniformly distributed in the case of a continuous and invertible cumulative density function.

What follows is a formal presentation of the above statement. Let \(X\) be a random variable with distribution function \(F_X = P(X \leq x)\), and let \(x\) be a realization of \(X\). We can think of \(x\) of an example value of \(X\). Consider for a moment the situation where we don’t know what \(F_X\) is, but we think it is \(F_0\). One thing we can do with \(x\) is to compute \(p = 1 - F_0(x)\), which is the probability of observing \(x\) or something more extreme under our assumption \(F_0\). The standard use is to compare \(p\) with a pre-defined threshold \(\tau\), and say that we reject our hypothesis \(F_0\) if \(p < \tau\).

The \(p\)-value can be seen as a realization of the random variable \(Y = 1 - F_0(X)\). In the case where \(F_0 = F_X\), and \(F_X\) is continuous and invertible, we get that:

\[ \begin{split} F_Y(y) &= P(Y \leq y) = P(1 - F_X(X) \leq y) \\ &= P(X > F^{-1}_X(1 - y)) = 1 - P(X \leq F^{-1}_X(1 - y))\\ &= 1 - F_X(F^{-1}_X(1 - y)) = 1 - (1 - y)\\ &= y \end{split} \] The uniform distribution on \([0,1]\) has \(F_Y\) as distribution function, hence the \(p\)-values in our case are uniformly distributed.