P8149: Human Population Genetics Syllabus

Department of Biostatistics, Columbia University

Instructor

Gao Wang, PhD.

Course overview and rationale

This introductory course covers key concepts in human and population genetics, introducing statistical modeling and mathematical derivations when essential. It follows a “methods first, theory later” structure. While mathematics helps abstract problems, formalize intuitions and deepen understanding, the tools required for this course are basic: algebra, entry-level calculus, and some familiarity with probability distributions.

Parts I and II are mostly descriptive, on how to measure inheritance, map genes, quantify linkage disequilibrium, estimate heritability, detect relatedness, and characterize population structure. These methods are immediately useful without diving deep into evolutionary theory.

Part III provides some theoretical foundations that requires a bit deeper generative thinking. While Hardy-Weinberg Equilibrium assumes infinite population size, random mating, no selection, no mutation, and no migration, each evolutionary force is a violation of these assumptions: drift violates infinite size, gene flow violates population closure, selection violates equal fitness. The coalescent reframes drift as a retrospective process and establishes the neutral expectation; selection then becomes the study of deviations from neutrality.

Part IV applies these methods and theory to core problems in population and statistical genetics: reconstructing human population history and understanding complex trait architecture.

Connection to P8139 and P8119

This course is the first in a sequence of three courses, with a focus on concepts and principles in genetics and the key intuitions in modeling them. P8139 Statistical Genetic Modeling emphasizes foundational statistical methodology, with genetics and multi-omics data analysis as application cases. P8119 Advanced Statistical and Computational Methods in Genetics covers specific computational approaches for current statistical genetics research using multi-omics data. Some key topics overlap intentionally across the three courses to reinforce from complementary perspectives.

Textbooks

Required

  • Pritchard, J.K. An Owner’s Guide to the Human Genome: An Introduction to Human Population Genetics, Variation and Disease. Freely available electronically from Pritchard Lab at Stanford.
  • Notes on Population Genetics, developed by Dr. Prakash Gorroochurn at Columbia University and freely available electronically to course participants.

Supplementary references

For deeper coverage of statistical methods in genetics:

  • Balding, D., Moltke, I., and Marioni, J. (editors). Handbook of Statistical Genomics (4th edition). Wiley. Freely available electronically to Columbia students through Columbia Library eResources.

For more mathematical treatment of population genetics theory:

  • Hedrick, P.W. Genetics of Populations (4th edition). Jones & Bartlett Publishers.
  • Hartl, D.L. and Clark, A.G. Principles of Population Genetics (4th edition). Sinauer Associates.

(These two books, particularly the latter, are standard desktop references for biologists. Used copies are typically available online for under $50 each although no requirement to get these for our course.)


PART I: Inheritance and relatedness (Weeks 1–4)

Week 1: What are in the human genome and how are genetic variations inherited?

  • Concepts: Human genome structure, types of genetic variation, DNA sequencing, Mendel’s Laws, allele and genotype frequencies
  • Mathematical framework: Basic probability for inheritance patterns
  • Application examples: Genome-wide patterns of variation, estimating allele frequencies from population samples
  • Reading: Pritchard HG Book Ch 1.1, 1.2, 1.3, 1.4; Gorroochurn PopGen Notes Ch 1

Week 2: How do genetic variants combine and travel together?

  • Concepts: Hardy-Weinberg Equilibrium; linkage disequilibrium, haplotype structure
  • Mathematical framework: Binomial expansion for HWE, $\chi^2$ tests for HWE departure; $D$, $D’$, and $r^2$ as measures of LD
  • Application examples: Estimating recessive disease carrier frequencies (Tay-Sachs, cystic fibrosis), testing HWE in case-control studies, HLA region associations, how LD enables genotype imputation and GWAS (Li & Stephens copying model overview)
  • Reading: Pritchard HG Book Ch 2.3 (LD sections); Gorroochurn PopGen Notes Ch 2, 3.1–3.3

Week 3: How do we map disease genes in families?

  • Concepts: Recombination, genetic linkage, recombination fraction $\theta$, co-segregation of markers with disease phenotypes localizes causal genes
  • Mathematical framework: LD decay with physical distance and time, maximum likelihood estimation of recombination fraction, LOD score method overview
  • Application examples: mapping cystic fibrosis genes, why large pedigrees with Mendelian inheritance enabled early successes
  • Reading: Pritchard HG Book Ch 2.3 (recombination sections), Ch 4.2; Gorroochurn PopGen Notes Ch 3.1, 3.4

Week 4: How do we quantify relatedness from families to populations?

  • Concepts: Identity by descent, inbreeding, Wright’s F-statistics
  • Mathematical framework: Computing $F$ from pedigrees, $F_{ST}$ as standardized variance in allele frequencies
  • Application examples: Consanguinity and recessive disease risk, detecting cryptic relatedness via IBD segment sharing
  • Reading: Pritchard HG Book Ch 2.4 ($F_{ST}$ definitions and interpretation); Gorroochurn PopGen Notes Ch 7

PART II: Traits and populations (Weeks 5–8)

Week 5: How heritable are complex human traits and diseases?

  • Concepts: Phenotypic variance partitioning and heritability
  • Mathematical framework: Variance component analysis
  • Application examples: Heritability estimation (GCTA-GREML)
  • Reading: Pritchard HG Book Ch 4.1, 4.4; Gorroochurn PopGen Notes Ch 4

Week 6: How do we detect structure and admixture between populations?

  • Concepts: Genetic structure arises when populations have distinct allele frequencies, admixture occurs when individuals have ancestry from multiple source populations
  • Statistical model: STRUCTURE as Bayesian clustering (ancestry proportions $Q$, population frequencies $P$, MCMC inference), ADMIXTURE as a frequentist clustering, PCA as complementary and practical alternative
  • Application examples: Two-way and three-way admixture, interpreting admixture and PCA plots and their limitations
  • Reading: Pritchard HG Book Ch 3.1

Week 7: How do we track ancestry along chromosomes?

  • Concepts: Local ancestry inference identifies which source population contributed each chromosomal segment in admixed individuals, recombination creates switches between ancestry tracts
  • Statistical model: Li & Stephens, Hidden Markov models for ancestry transitions, RFMix and ELAI for local ancestry inference
  • Application examples: Admixture mapping for disease genes
  • Reading: Pritchard HG Book Ch 3.2

Week 8: MIDTERM EXAM (in class) & Guest lecture I


PART III: Evolutionary forces that shape genetic variation (Weeks 9–12)

Week 9: What happens when populations are finite?

  • Theory: Mutation as source of variation, genetic drift, Wright-Fisher model, effective population size, mutation-drift balance, Kimura’s neutral theory
  • Mathematical framework: Fixation probability (neutral), expected heterozygosity, substitution rate under neutrality
  • Reading: Pritchard HG Book Ch 1.5, 2.1; Gorroochurn PopGen Notes Ch 6, 8.1–8.9

Week 10: How do we look backward through genealogies to make inferences?

  • Theory: Coalescence, MRCA, gene genealogies
  • Mathematical framework: Expected coalescence times, Watterson’s $\theta$, nucleotide diversity $\pi$, site frequency spectrum, Tajima’s D as comparison of $\theta$
  • Applications: Coalescent simulation (msprime), demographic inference from SFS shape
  • Reading: Pritchard HG Book Ch 2.2; Gorroochurn PopGen Notes Ch 8.10–8.13

Week 11: What happens when populations exchange migrants?

  • Theory: Gene flow, migration-drift equilibrium, Wahlund effect, $F_{ST}$ as measure of differentiation
  • Mathematical framework: Island model, structured coalescent (concept), D-statistics and $f$-statistics for admixture detection
  • Applications: Evidence for archaic admixture in humans
  • Reading: Pritchard HG Book Ch 2.4; Gorroochurn PopGen Notes Ch 9

Week 12: What happens when fitness varies among genotypes?

  • Theory: Fitness and selection coefficients, purifying vs positive vs balancing selection, selective sweeps and hitchhiking
  • Mathematical framework: Fixation probability for beneficial alleles (Haldane’s $2s$), efficacy boundary ($N_e s \approx 1$), Fay & Wu’s H for hitchhiking
  • Application examples: LCT lactase persistence sweep, SLC24A5 and skin pigmentation, high-altitude adaptation (EPAS1, EGLN1), MHC balancing selection
  • Reading: Pritchard HG Book Ch 2.5, 2.6, 2.7; Gorroochurn PopGen Notes Ch 5, 10

PART IV: Population and statistical genetics applications (Weeks 13–14)

Week 13: How do we reconstruct human population history from genetic data?

  • Key insights: Genetic data as a record of demographic history; population splits and admixture events leave signatures in allele frequencies, LD patterns, and haplotype structure; ancient DNA provides direct snapshots of past populations and calibrates inferences from modern data
  • Application examples: ancient DNA analysis in Green et al. 2010 the Neanderthal genome; Recent migrations, Haak et al. 2015
  • Reading: Pritchard HG Book Ch 3.3, 3.4

Week 14: How do evolution shape complex trait architecture?

  • Key insights: Drift, selection, and demography jointly shape genetic architecture, stabilizing selection constrains effect sizes, background selection influences where causal variants reside, rare variant accumulation reflects recent demography
  • Statistical modeling: Evolutionary models for effect size distributions, detecting polygenic adaptation from frequency differentiation, conservation scores as priors for fine-mapping, genetic correlation across traits
  • Application examples: Evolution and GWAS in Simons et al. 2018, functional genomics connecting variants to mechanisms in Ota et al 2025; Common and rare variants contributions to complex traits in Spence et al 2025
  • Reading: Pritchard HG Book Ch 4.5, 4.6, 4.7, 4.8

Week 15: FINAL EXAM (in class) & Guest lecture II