An illustration of permutation testing

Overview

This notebook demonstrates basic concept of permutation testing:

How is permutation test performed?
How does permutation test compare to the equavalent analytical solution?

Please refer to the Example problem and Simulation study design sections of this notebook for the setup of the example used below.

An illustration in R

Code below is copied from this notebook.

# This function performs a regular two-sample t-test, two-sided
t_test <- function(x, y) {
    res <- t.test(x, y, paired=T)
    # statistic d-> N(0,1) so tatistic^2 is 1-df chi-squared
    return (list(stat=res$statistic^2, p=res$p.value))
}

# Evaluate type I error or statistical power
run_simulation <- function(.simulate, param_x, param_y, .test, nrep, cutoff, statistic = 'p_value') {
    sig <- 0
    for (i in 1:nrep) {
      x <- do.call(.simulate, param_x)
      y <- do.call(.simulate, param_y)
      res <- .test(x, y)
      condition <- ifelse(statistic == 'p_value', res$p < cutoff, res$stat > cutoff)
      if (condition) sig <- sig + 1
    }
    res <- sig/nrep
    return(res)
}

Now we add an additional function implementing a permutation-based t-test:

# Run a permutation test
perm_test <- function(x, y, nperm = 1000) {
    stat <- (mean(x) - mean(y))^2
    stats <- vector()
    for (i in 1:nperm) {
      pxy <- sample(c(x,y))
      stats[i] <- (mean(pxy[1:length(x)]) - mean(pxy[(length(x) + 1):length(pxy)]))^2
    }
    return(list(stat=stat, p=length(which(stats>=stat))/nperm))
}

First let’s compare on the same data-set how these tests compare:

set.seed(999)
nsample <- 1000
x = runif(nsample, 0, 1)
y = runif(nsample, 0.06, 1)
print('t-test:')
t_test(x,y)
print("permutation test")
perm_test(x,y)

[1] "t-test:"

$stat: t: 5.99421417154389
$p: 0.014523963776656

[1] "permutation test"

$stat: 0.000956946889773521
$p: 0.015

they are very similar to each other. To formally evaluate the type I error and statistical power between these tests as in this notebook:

set.seed(999)
nsample <- 100
nrep <- 1000

print('t-test type I error:')
run_simulation(runif, list(n=nsample, min=0, max=1), list(n=nsample, min=0, max=1), t_test, nrep, 0.05, statistic = 'p_value')
print('t-test power:')
run_simulation(runif, list(n=nsample, min=0, max=1), list(n=nsample, min=0.1, max=1.1), t_test, nrep, 0.05, statistic = 'p_value')
print('permutation test type I error:')
run_simulation(runif, list(n=nsample, min=0, max=1), list(n=nsample, min=0, max=1), perm_test, nrep, 0.05, statistic = 'p_value')
print('permutation test power:')
run_simulation(runif, list(n=nsample, min=0, max=1), list(n=nsample, min=0.1, max=1.1), perm_test, nrep, 0.05, statistic = 'p_value')

[1] "t-test type I error:"

0.05

[1] "t-test power:"

0.681

[1] "permutation test type I error:"

0.04

[1] "permutation test power:"

0.684

The behavior of type I error and power are also comparable.