# An illustration of Power and Type I Error caculations

## Overview

This notebook demonstrates how to evaluate the type I error and power for a statistical test using simulation studies.

## Example problem

Suppose we have measurements on some traits from two groups of samples. A simple two-sample t-test can be used to evaluate whether or not there is a difference between these two groups on this trait. The question is, how well does this t-test perform for our problem? In particular we are concerned with:

1. What’s the chance that the t-test fails to detect a difference, if there indeed is a difference between these two groups? This is the so-called “power”.
2. What’s the chance that the t-test claims to detect a dfference between the groups when in fact there should not be a difference? This is the so-called “type I error”.

## Simulation study design

A typical simulation study usually has at least 4 steps:

1. Simulate: randomly generate two data-sets $X$ and $Y$ from the same distribution ($H_0$) and from different distributions where $E[X] \neq E[Y]$ ($H_1$)
2. Analyze: perform a t-test for difference in mean
3. Evaluate: compare p-value with given threshold and make a decision of statistical significance – use 1 to indicate rejection of $H_0$ and 0 to indicate failure to reject $H_0$.
4. Aggregate / summarize: repeat 1 to 3 many times. Compute type I error by counting the proportion of 1’s among all replicates under the $H_0$ simulations, and similarly for power under the $H_1$ simulations.

## An illustration in R

First, we define two functions: one is a statistical method to perform the analysis (a t-test in this case), and the other is a general framework to evaluate performance of an input statistical method .test under given scenarios generated by an input .simulate function.

# This function performs a regular two-sample t-test, two-sided
t_test <- function(x, y) {
res <- t.test(x, y, paired=T)
# statistic d-> N(0,1) so tatistic^2 is 1-df chi-squared
return (list(stat=res$statistic^2, p=res$p.value))
}

# Evaluate type I error or statistical power
run_simulation <- function(.simulate, param_x, param_y, .test, nrep, cutoff, statistic = 'p_value') {
sig <- 0
for (i in 1:nrep) {
x <- do.call(.simulate, param_x)
y <- do.call(.simulate, param_y)
res <- .test(x, y)
condition <- ifelse(statistic == 'p_value', res$p < cutoff, res$stat > cutoff)
if (condition) sig <- sig + 1
}
res <- sig/nrep
return(res)
}



Then we call these functions under different scenarios,

set.seed(999)
nsample <- 100
nrep <- 1000

print('t-test type I error via evaluating p-values:')
run_simulation(runif, list(n=nsample, min=0, max=1), list(n=nsample, min=0, max=1), t_test, nrep, 0.05, statistic = 'p_value')
print('t-test type I error via evaluating chi-square statistics:')
run_simulation(runif, list(n=nsample, min=0, max=1), list(n=nsample, min=0, max=1), t_test, nrep, qchisq(1-0.05, df = 1), statistic = 'stat')
print('t-test power via evaluating p-values:')
run_simulation(runif, list(n=nsample, min=0, max=1), list(n=nsample, min=0.1, max=1.1), t_test, nrep, 0.05, statistic = 'p_value')


[1] "t-test type I error via evaluating p-values:"

0.05
[1] "t-test type I error via evaluating chi-square statistics:"

0.051
[1] "t-test power via evaluating p-values:"

0.686

## Analytical power calculation

In some cases, it is possible to calculate power analytically for simple statistical methods. For example,

set.seed(999)

delta = 0.1
## delta = 0
a = 0
s = 1
n = 100
lev = 0.05
error = qnorm(1-lev/2)*s/sqrt(n)
left = a - error
right = a + error
assumed = a + delta
Zleft = (left-assumed)/(s/sqrt(n))
Zright = (right-assumed)/(s/sqrt(n))
power = 1 - (pnorm(Zright)-pnorm(Zleft))
print(power)


[1] 0.170075


you can see how pwr::pwr.t.test in R implements this:

pwr::pwr.t.test


function (n = NULL, d = NULL, sig.level = 0.05, power = NULL,
type = c("two.sample", "one.sample", "paired"), alternative = c("two.sided",
"less", "greater"))
{
if (sum(sapply(list(n, d, power, sig.level), is.null)) !=
1)
stop("exactly one of n, d, power, and sig.level must be NULL")
if (!is.null(d) && is.character(d))
d <- cohen.ES(test = "t", size = d)$effect.size if (!is.null(sig.level) && !is.numeric(sig.level) || any(0 > sig.level | sig.level > 1)) stop(sQuote("sig.level"), " must be numeric in [0, 1]") if (!is.null(power) && !is.numeric(power) || any(0 > power | power > 1)) stop(sQuote("power"), " must be numeric in [0, 1]") type <- match.arg(type) alternative <- match.arg(alternative) tsample <- switch(type, one.sample = 1, two.sample = 2, paired = 1) ttside <- switch(alternative, less = 1, two.sided = 2, greater = 3) tside <- switch(alternative, less = 1, two.sided = 2, greater = 1) if (tside == 2 && !is.null(d)) d <- abs(d) if (ttside == 1) { p.body <- quote({ nu <- (n - 1) * tsample pt(qt(sig.level/tside, nu, lower = TRUE), nu, ncp = sqrt(n/tsample) * d, lower = TRUE) }) } if (ttside == 2) { p.body <- quote({ nu <- (n - 1) * tsample qu <- qt(sig.level/tside, nu, lower = FALSE) pt(qu, nu, ncp = sqrt(n/tsample) * d, lower = FALSE) + pt(-qu, nu, ncp = sqrt(n/tsample) * d, lower = TRUE) }) } if (ttside == 3) { p.body <- quote({ nu <- (n - 1) * tsample pt(qt(sig.level/tside, nu, lower = FALSE), nu, ncp = sqrt(n/tsample) * d, lower = FALSE) }) } if (is.null(power)) power <- eval(p.body) else if (is.null(n)) n <- uniroot(function(n) eval(p.body) - power, c(2 + 1e-10, 1e+09))$root
else if (is.null(d)) {
if (ttside == 2)
d <- uniroot(function(d) eval(p.body) - power, c(1e-07,
10))$root if (ttside == 1) d <- uniroot(function(d) eval(p.body) - power, c(-10, 5))$root
if (ttside == 3)
d <- uniroot(function(d) eval(p.body) - power, c(-5,
10))$root } else if (is.null(sig.level)) sig.level <- uniroot(function(sig.level) eval(p.body) - power, c(1e-10, 1 - 1e-10))$root
else stop("internal error")
NOTE <- switch(type, paired = "n is number of *pairs*", two.sample = "n is number in *each* group",
NULL)
METHOD <- paste(switch(type, one.sample = "One-sample", two.sample = "Two-sample",
paired = "Paired"), "t test power calculation")
structure(list(n = n, d = d, sig.level = sig.level, power = power,
alternative = alternative, note = NOTE, method = METHOD),
class = "power.htest")
}