An illustration of permutation testing in GWAS

Without loss of generality, consider a genetic association study with 10 SNPs weakly associated with a quantitative trait. We would like to assess the significance of these SNPs after multiple testing correction.

10 SNPs are independent

Let’s simulate the genotype data first, assuming the SNPs are independent, and a sample size of $N=1,000$ unrelated individuals,

# set random seed
set.seed(999)
# Sample from a multivariate normal distribution for SNP matrix X, as if the SNP data has been normalized,

n <- 1000
p <- 10
R <- diag(p)
mu <- rep(0,p)
X <- MASS::mvrnorm(n, mu = mu, Sigma = R)

Now we simulate the 10 SNP effects, assuming 0.5% heritability in each SNP,

sigma2 <- 0.005
b <- rnorm(p, 0, sqrt(sigma2))

then simulate phenotype vector $y$ as $y = Xb + e$ where $e$ is a vector of $N(0, \sqrt{1-\sigma^2})$,

y <- X %*% b + rnorm(p, 0, sqrt(1-sigma2))

And perform association analysis,

test_assoc = function(X,y) {
    res = susieR::univariate_regression(X,y)
    p = pnorm(res$betahat/res$sebetahat)
    return(pmin(p,1-p)*2)
}

p_orig = test_assoc(X,y)

p_orig

<ol class=list-inline>
  • 0.00244222209530038
  • 0.155430852452004
  • 0.455153232198785
  • 0.111110083396467
  • 0.111854162419335
  • 0.0103712166234653
  • 0.514298813938308
  • 0.644904338120907
  • 2.94489178997992e-06
  • 0.741834184053375
  • </ol>

    Permutation testing scheme 1 – equivalence to analytical p-value

    Here we perform permutation testing on the data as follows:

    1. Permute the phenotype vector
    2. Perform association tests and get the permuted p-value for each SNP
    3. Compare the permuted p-value with the original p-value for each SNP. If the permutated p-value is smaller than the original p-value we call it a “success” and otherwise “failure”
    4. Repeat 1 - 3 numerous times. Empirical p-value is defined by the total number of success divided by the total number of permutations

    This scheme should result in a similar p-value as the original analytical p-value. Let’s use 10,000 permutations for this problem:

    test_perm_1 = function(X,y,K=10000) {
        p_orig = test_assoc(X,y)
        res = matrix(0, K, ncol(X))
        for (i in 1:K) {
            p_perm = test_assoc(X,sample(y))
            res[i,] = p_orig >= p_perm
        }
        return(apply(res, 2, mean))
    }
    
    
    p_perm_1 = test_perm_1(X,y)
    
    
    plot(p_orig, p_perm_1)
    
    

    png

    This case permutation and analytical p-values are the same.

    Permutation testing scheme 2 – minimum p-value

    Here we perform permutation testing on the data as follows:

    1. Permute the phenotype vector
    2. Perform association tests and get the minimum p-value among the 10 permuted tests
    3. Compare the minimum p-value with each of the 10 SNPs’ original p-value. If the minimum p-value is smaller than the original p-value we call it a “success” and otherwise “failure”
    4. Repeat 1 - 3 numerous times. Empirical p-value is defined by the total number of success divided by the total number of permutations

    Let’s use 10,000 permutations for this problem:

    test_perm_2 = function(X,y,K=10000) {
        p_orig = test_assoc(X,y)
        res = matrix(0, K, ncol(X))
        for (i in 1:K) {
            p_perm = test_assoc(X,sample(y))
            min_p = min(p_perm)
            res[i,] = sapply(p_orig, function(x) x>=min_p)
        }
        return(apply(res, 2, mean))
    }
    
    
    p_perm_2 = test_perm_2(X,y)
    
    

    If this permutation scheme is correct, we would expect that it is comparable to a Bonferroni correction of testing 10 independent hypothesis

    bonferroni_correct = function(p) 1 - (1 - p) ^ length(p)
    
    
    bonferroni_correct(p_orig)
    
    
    <ol class=list-inline>
  • 0.0241555612908793
  • 0.815348804296027
  • 0.997694613271454
  • 0.692050291915425
  • 0.694618413064428
  • 0.0990033365759982
  • 0.999269379859342
  • 0.999968124745087
  • 2.94485276454148e-05
  • 0.999998684813438
  • </ol>
    plot(bonferroni_correct(p_orig), p_perm_2)
    
    

    png

    p_perm_2
    
    
    <ol class=list-inline>
  • 0.0249
  • 0.8174
  • 0.9974
  • 0.6927
  • 0.6945
  • 0.0998
  • 0.9989
  • 0.9999
  • 0
  • 1
  • </ol>

    The Bonferroni and this permutation theme give the same result because by definition, both Bonferroni and this permutation assess if the most significant association is real. By using the minimum p-value, permutation test gives the empirical distribution of the most extreme statistic.