Computational genomics reading list (version 2016)

A list compiled by Xin He at UChicago in 2016.

Sequence Analysis

Motif discovery

Background

  • What are DNA sequence motifs? NBT, 2006
  • How does DNA sequence motif discovery work? NBT, 2006

Paper

  • Fitting a mixture model by expectation maximization to discover motifs in biopolymers, ISMB, 2004 (MEME)

Profile HMM

Background

  • Eddy, Hidden Markov Models, Current Opinion in Structural Biology, 1996
  • DEKM, Introduction in Chapter 5.

Paper

  • DEKM Chapter 5, 5.1-5.4.

Dynamic Programming

  • Introduction to Algorithms, CLRS, 3^rd^ edition. Sections: 15.1-15.3.
    • Remark: focus on the main recurrence equations, (15.2) and (15.7). Pay less attention to technicalities such as running time analysis.

Sequence alignment and HMM

  • DEKM, Sections 2.1-2.3, up to “Local alignment”.
  • DEKM, Sections 3.2-3.3 (ignoring the last part, “Modeling of labeled sequences”), 11.6.

NGS Data Analysis

Variant calling

Background

  • Genotype and SNP calling from next-generation sequencing data [Nielsen & Song, NRG, 2011]

Paper

  • A framework for variation discovery and genotyping using next-generation DNA sequencing data, NG, 2011 (GATK)

ChIP-seq

Background

  • Computational methodology for ChIP-seq analysis, Quant Biology, 2013

Paper

  • Model-based Analysis of ChIP-Seq, Genome Bio, 2008 (MACS)

RNA-seq: transcriptome inference

Background

  • RNA-Seq: a revolutionary tool for transcriptomics, NRG, 2009

Paper

  • Statistical Inferences for Isoform Expression in RNA-Seq. Bioinformatics, 2009

RNA-seq: differential expression

Background

Paper

  • Small-sample estimation of negative binomial dispersion, with applications to SAGE data, Biostatistics, 2008 (edgeR)
  • Moderated statistical tests for assessing differences in tag abundance, Bioinformatics, 2007 (edgeR)

Regulatory Genomics

Gene regulatory networks

Background

  • Inferring Cellular Networks Using Probabilistic Graphical Models, Science, 2004

Paper

  • Using Bayesian Networks to Analyze Expression Data, J Computational Biology, 2000

From sequence to gene expression

Background Paper

  • Integrating regulatory motif discovery and genome-wide expression analysis, PNAS, 2003
  • Genome-wide discovery of transcriptional modules from DNA sequence and gene expression. Bioinformatics, 2003

Epigenomics

Background

  • Mapping Human Epigenomes [Rivera & Ren, Cell, 2013]

Paper

  • Discovery and characterization of chromatin states for systematic annotation of the human genome, NBT, 2010 (ChromHMM)

Predicting transcription factor binding

Background Paper

  • Accurate inference of transcription factor binding from DNA sequence and chromatin accessibility data, Genome Research. 2011 (CENTIPEDE)

Molecular Evolution

Molecular evolution and comparative genomics

Background

  • DEKM Chapter 8, 8.1-8.3

Paper

  • Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res, 2005 (PhastCons)
  • Phylogenetic hidden Markov models. In R. Nielsen, ed., Statistical Methods in Molecular Evolution, 2005. (Technical version of the PhastCons paper)

Phylogeny reconstruction

Background

  • DEKM Chapter 8.4.
  • Bayesian Inference of Phylogeny and Its Impact on Evolutionary Biology, Science, 2001

Paper

  • Markov chain monte carlo algorithms for the Bayesian analysis of phylogenetic trees, MBE, 1999

Statistical Genetics

Methods for GWAS

Background

  • Bayesian statistical methods for genetic association studies. NRG, 2009

Paper

  • Imputation-based analysis of association studies: candidate regions and quantitative traits. PLG, 2007

eQTL mapping

Background

Paper

  • A statistical framework for joint eQTL analysis in multiple tissues. PLoS Genetics, 2013

Integrative association mapping

Background

Paper

  • Sherlock: Detecting Gene-Disease Associations by Matching Patterns of Expression QTL and GWAS. AJHG, 2013
  • Joint analysis of functional genomic data and genome-wide association studies of 18 human traits, The American Journal of Human Genetics. 2014

Reference

  • Biological Sequence Analysis, Durbin et al (DEKM)
  • Introduction to algorithms (CLRS)