An illustration of Power and Type I Error caculations
Overview
This notebook demonstrates how to evaluate the type I error and power for a statistical test using simulation studies.
Example problem
Suppose we have measurements on some traits from two groups of samples. A simple two-sample t-test can be used to evaluate whether or not there is a difference between these two groups on this trait. The question is, how well does this t-test perform for our problem? In particular we are concerned with:
- What’s the chance that the t-test fails to detect a difference, if there indeed is a difference between these two groups? This is the so-called “power”.
- What’s the chance that the t-test claims to detect a dfference between the groups when in fact there should not be a difference? This is the so-called “type I error”.
Simulation study design
A typical simulation study usually has at least 4 steps:
- Simulate: randomly generate two data-sets $X$ and $Y$ from the same distribution ($H_0$) and from different distributions where $E[X] \neq E[Y]$ ($H_1$)
- Analyze: perform a t-test for difference in mean
- Evaluate: compare p-value with given threshold and make a decision of statistical significance – use 1 to indicate rejection of $H_0$ and 0 to indicate failure to reject $H_0$.
- Aggregate / summarize: repeat 1 to 3 many times. Compute type I error by counting the proportion of 1’s among all replicates under the $H_0$ simulations, and similarly for power under the $H_1$ simulations.
An illustration in R
First, we define two functions: one is a statistical method to perform the analysis (a t-test in this case), and the other is a general framework to evaluate performance of an input statistical method .test
under given scenarios generated by an input .simulate
function.
# This function performs a regular two-sample t-test, two-sided
t_test <- function(x, y) {
res <- t.test(x, y, paired=T)
# statistic d-> N(0,1) so tatistic^2 is 1-df chi-squared
return (list(stat=res$statistic^2, p=res$p.value))
}
# Evaluate type I error or statistical power
run_simulation <- function(.simulate, param_x, param_y, .test, nrep, cutoff, statistic = 'p_value') {
sig <- 0
for (i in 1:nrep) {
x <- do.call(.simulate, param_x)
y <- do.call(.simulate, param_y)
res <- .test(x, y)
condition <- ifelse(statistic == 'p_value', res$p < cutoff, res$stat > cutoff)
if (condition) sig <- sig + 1
}
res <- sig/nrep
return(res)
}
Then we call these functions under different scenarios,
set.seed(999)
nsample <- 100
nrep <- 1000
print('t-test type I error via evaluating p-values:')
run_simulation(runif, list(n=nsample, min=0, max=1), list(n=nsample, min=0, max=1), t_test, nrep, 0.05, statistic = 'p_value')
print('t-test type I error via evaluating chi-square statistics:')
run_simulation(runif, list(n=nsample, min=0, max=1), list(n=nsample, min=0, max=1), t_test, nrep, qchisq(1-0.05, df = 1), statistic = 'stat')
print('t-test power via evaluating p-values:')
run_simulation(runif, list(n=nsample, min=0, max=1), list(n=nsample, min=0.1, max=1.1), t_test, nrep, 0.05, statistic = 'p_value')
Analytical power calculation
In some cases, it is possible to calculate power analytically for simple statistical methods. For example,
set.seed(999)
delta = 0.1
## delta = 0
a = 0
s = 1
n = 100
lev = 0.05
error = qnorm(1-lev/2)*s/sqrt(n)
left = a - error
right = a + error
assumed = a + delta
Zleft = (left-assumed)/(s/sqrt(n))
Zright = (right-assumed)/(s/sqrt(n))
power = 1 - (pnorm(Zright)-pnorm(Zleft))
print(power)
you can see how pwr::pwr.t.test
in R implements this:
pwr::pwr.t.test
function (n = NULL, d = NULL, sig.level = 0.05, power = NULL,
type = c("two.sample", "one.sample", "paired"), alternative = c("two.sided",
"less", "greater"))
{
if (sum(sapply(list(n, d, power, sig.level), is.null)) !=
1)
stop("exactly one of n, d, power, and sig.level must be NULL")
if (!is.null(d) && is.character(d))
d <- cohen.ES(test = "t", size = d)$effect.size
if (!is.null(sig.level) && !is.numeric(sig.level) || any(0 >
sig.level | sig.level > 1))
stop(sQuote("sig.level"), " must be numeric in [0, 1]")
if (!is.null(power) && !is.numeric(power) || any(0 > power |
power > 1))
stop(sQuote("power"), " must be numeric in [0, 1]")
type <- match.arg(type)
alternative <- match.arg(alternative)
tsample <- switch(type, one.sample = 1, two.sample = 2, paired = 1)
ttside <- switch(alternative, less = 1, two.sided = 2, greater = 3)
tside <- switch(alternative, less = 1, two.sided = 2, greater = 1)
if (tside == 2 && !is.null(d))
d <- abs(d)
if (ttside == 1) {
p.body <- quote({
nu <- (n - 1) * tsample
pt(qt(sig.level/tside, nu, lower = TRUE), nu, ncp = sqrt(n/tsample) *
d, lower = TRUE)
})
}
if (ttside == 2) {
p.body <- quote({
nu <- (n - 1) * tsample
qu <- qt(sig.level/tside, nu, lower = FALSE)
pt(qu, nu, ncp = sqrt(n/tsample) * d, lower = FALSE) +
pt(-qu, nu, ncp = sqrt(n/tsample) * d, lower = TRUE)
})
}
if (ttside == 3) {
p.body <- quote({
nu <- (n - 1) * tsample
pt(qt(sig.level/tside, nu, lower = FALSE), nu, ncp = sqrt(n/tsample) *
d, lower = FALSE)
})
}
if (is.null(power))
power <- eval(p.body)
else if (is.null(n))
n <- uniroot(function(n) eval(p.body) - power, c(2 +
1e-10, 1e+09))$root
else if (is.null(d)) {
if (ttside == 2)
d <- uniroot(function(d) eval(p.body) - power, c(1e-07,
10))$root
if (ttside == 1)
d <- uniroot(function(d) eval(p.body) - power, c(-10,
5))$root
if (ttside == 3)
d <- uniroot(function(d) eval(p.body) - power, c(-5,
10))$root
}
else if (is.null(sig.level))
sig.level <- uniroot(function(sig.level) eval(p.body) -
power, c(1e-10, 1 - 1e-10))$root
else stop("internal error")
NOTE <- switch(type, paired = "n is number of *pairs*", two.sample = "n is number in *each* group",
NULL)
METHOD <- paste(switch(type, one.sample = "One-sample", two.sample = "Two-sample",
paired = "Paired"), "t test power calculation")
structure(list(n = n, d = d, sig.level = sig.level, power = power,
alternative = alternative, note = NOTE, method = METHOD),
class = "power.htest")
}