An illustration of Power and Type I Error caculations

Overview

This notebook demonstrates how to evaluate the type I error and power for a statistical test using simulation studies.

Example problem

Suppose we have measurements on some traits from two groups of samples. A simple two-sample t-test can be used to evaluate whether or not there is a difference between these two groups on this trait. The question is, how well does this t-test perform for our problem? In particular we are concerned with:

  1. What’s the chance that the t-test fails to detect a difference, if there indeed is a difference between these two groups? This is the so-called “power”.
  2. What’s the chance that the t-test claims to detect a dfference between the groups when in fact there should not be a difference? This is the so-called “type I error”.

Simulation study design

A typical simulation study usually has at least 4 steps:

  1. Simulate: randomly generate two data-sets $X$ and $Y$ from the same distribution ($H_0$) and from different distributions where $E[X] \neq E[Y]$ ($H_1$)
  2. Analyze: perform a t-test for difference in mean
  3. Evaluate: compare p-value with given threshold and make a decision of statistical significance – use 1 to indicate rejection of $H_0$ and 0 to indicate failure to reject $H_0$.
  4. Aggregate / summarize: repeat 1 to 3 many times. Compute type I error by counting the proportion of 1’s among all replicates under the $H_0$ simulations, and similarly for power under the $H_1$ simulations.

An illustration in R

First, we define two functions: one is a statistical method to perform the analysis (a t-test in this case), and the other is a general framework to evaluate performance of an input statistical method .test under given scenarios generated by an input .simulate function.

# This function performs a regular two-sample t-test, two-sided
t_test <- function(x, y) {
    res <- t.test(x, y, paired=T)
    # statistic d-> N(0,1) so tatistic^2 is 1-df chi-squared
    return (list(stat=res$statistic^2, p=res$p.value))
}

# Evaluate type I error or statistical power
run_simulation <- function(.simulate, param_x, param_y, .test, nrep, cutoff, statistic = 'p_value') {
    sig <- 0
    for (i in 1:nrep) {
      x <- do.call(.simulate, param_x)
      y <- do.call(.simulate, param_y)
      res <- .test(x, y)
      condition <- ifelse(statistic == 'p_value', res$p < cutoff, res$stat > cutoff)
      if (condition) sig <- sig + 1
    }
    res <- sig/nrep
    return(res)
}

Then we call these functions under different scenarios,

set.seed(999)
nsample <- 100
nrep <- 1000

print('t-test type I error via evaluating p-values:')
run_simulation(runif, list(n=nsample, min=0, max=1), list(n=nsample, min=0, max=1), t_test, nrep, 0.05, statistic = 'p_value')
print('t-test type I error via evaluating chi-square statistics:')
run_simulation(runif, list(n=nsample, min=0, max=1), list(n=nsample, min=0, max=1), t_test, nrep, qchisq(1-0.05, df = 1), statistic = 'stat')
print('t-test power via evaluating p-values:')
run_simulation(runif, list(n=nsample, min=0, max=1), list(n=nsample, min=0.1, max=1.1), t_test, nrep, 0.05, statistic = 'p_value')

[1] "t-test type I error via evaluating p-values:"
0.05
[1] "t-test type I error via evaluating chi-square statistics:"
0.051
[1] "t-test power via evaluating p-values:"
0.686

Analytical power calculation

In some cases, it is possible to calculate power analytically for simple statistical methods. For example,

set.seed(999)

delta = 0.1
## delta = 0
a = 0
s = 1
n = 100
lev = 0.05
error = qnorm(1-lev/2)*s/sqrt(n)
left = a - error
right = a + error
assumed = a + delta
Zleft = (left-assumed)/(s/sqrt(n))
Zright = (right-assumed)/(s/sqrt(n))
power = 1 - (pnorm(Zright)-pnorm(Zleft))
print(power)

[1] 0.170075

you can see how pwr::pwr.t.test in R implements this:

pwr::pwr.t.test

function (n = NULL, d = NULL, sig.level = 0.05, power = NULL, 
    type = c("two.sample", "one.sample", "paired"), alternative = c("two.sided", 
        "less", "greater")) 
{
    if (sum(sapply(list(n, d, power, sig.level), is.null)) != 
        1) 
        stop("exactly one of n, d, power, and sig.level must be NULL")
    if (!is.null(d) && is.character(d)) 
        d <- cohen.ES(test = "t", size = d)$effect.size
    if (!is.null(sig.level) && !is.numeric(sig.level) || any(0 > 
        sig.level | sig.level > 1)) 
        stop(sQuote("sig.level"), " must be numeric in [0, 1]")
    if (!is.null(power) && !is.numeric(power) || any(0 > power | 
        power > 1)) 
        stop(sQuote("power"), " must be numeric in [0, 1]")
    type <- match.arg(type)
    alternative <- match.arg(alternative)
    tsample <- switch(type, one.sample = 1, two.sample = 2, paired = 1)
    ttside <- switch(alternative, less = 1, two.sided = 2, greater = 3)
    tside <- switch(alternative, less = 1, two.sided = 2, greater = 1)
    if (tside == 2 && !is.null(d)) 
        d <- abs(d)
    if (ttside == 1) {
        p.body <- quote({
            nu <- (n - 1) * tsample
            pt(qt(sig.level/tside, nu, lower = TRUE), nu, ncp = sqrt(n/tsample) * 
                d, lower = TRUE)
        })
    }
    if (ttside == 2) {
        p.body <- quote({
            nu <- (n - 1) * tsample
            qu <- qt(sig.level/tside, nu, lower = FALSE)
            pt(qu, nu, ncp = sqrt(n/tsample) * d, lower = FALSE) + 
                pt(-qu, nu, ncp = sqrt(n/tsample) * d, lower = TRUE)
        })
    }
    if (ttside == 3) {
        p.body <- quote({
            nu <- (n - 1) * tsample
            pt(qt(sig.level/tside, nu, lower = FALSE), nu, ncp = sqrt(n/tsample) * 
                d, lower = FALSE)
        })
    }
    if (is.null(power)) 
        power <- eval(p.body)
    else if (is.null(n)) 
        n <- uniroot(function(n) eval(p.body) - power, c(2 + 
            1e-10, 1e+09))$root
    else if (is.null(d)) {
        if (ttside == 2) 
            d <- uniroot(function(d) eval(p.body) - power, c(1e-07, 
                10))$root
        if (ttside == 1) 
            d <- uniroot(function(d) eval(p.body) - power, c(-10, 
                5))$root
        if (ttside == 3) 
            d <- uniroot(function(d) eval(p.body) - power, c(-5, 
                10))$root
    }
    else if (is.null(sig.level)) 
        sig.level <- uniroot(function(sig.level) eval(p.body) - 
            power, c(1e-10, 1 - 1e-10))$root
    else stop("internal error")
    NOTE <- switch(type, paired = "n is number of *pairs*", two.sample = "n is number in *each* group", 
        NULL)
    METHOD <- paste(switch(type, one.sample = "One-sample", two.sample = "Two-sample", 
        paired = "Paired"), "t test power calculation")
    structure(list(n = n, d = d, sig.level = sig.level, power = power, 
        alternative = alternative, note = NOTE, method = METHOD), 
        class = "power.htest")
}