Computational genomics reading list (version 2016)
A list compiled by Xin He at UChicago in 2016.
- What are DNA sequence motifs? NBT, 2006
- How does DNA sequence motif discovery work? NBT, 2006
- Fitting a mixture model by expectation maximization to discover motifs in biopolymers, ISMB, 2004 (MEME)
- Eddy, Hidden Markov Models, Current Opinion in Structural Biology, 1996
- DEKM, Introduction in Chapter 5.
- DEKM Chapter 5, 5.1-5.4.
- Introduction to Algorithms, CLRS, 3^rd^ edition. Sections: 15.1-15.3.
- Remark: focus on the main recurrence equations, (15.2) and (15.7). Pay less attention to technicalities such as running time analysis.
Sequence alignment and HMM
- DEKM, Sections 2.1-2.3, up to “Local alignment”.
- DEKM, Sections 3.2-3.3 (ignoring the last part, “Modeling of labeled sequences”), 11.6.
NGS Data Analysis
- Genotype and SNP calling from next-generation sequencing data [Nielsen & Song, NRG, 2011]
- A framework for variation discovery and genotyping using next-generation DNA sequencing data, NG, 2011 (GATK)
- Computational methodology for ChIP-seq analysis, Quant Biology, 2013
- Model-based Analysis of ChIP-Seq, Genome Bio, 2008 (MACS)
RNA-seq: transcriptome inference
- RNA-Seq: a revolutionary tool for transcriptomics, NRG, 2009
- Statistical Inferences for Isoform Expression in RNA-Seq. Bioinformatics, 2009
RNA-seq: differential expression
- Small-sample estimation of negative binomial dispersion, with applications to SAGE data, Biostatistics, 2008 (edgeR)
- Moderated statistical tests for assessing differences in tag abundance, Bioinformatics, 2007 (edgeR)
Gene regulatory networks
- Inferring Cellular Networks Using Probabilistic Graphical Models, Science, 2004
- Using Bayesian Networks to Analyze Expression Data, J Computational Biology, 2000
From sequence to gene expression
- Integrating regulatory motif discovery and genome-wide expression analysis, PNAS, 2003
- Genome-wide discovery of transcriptional modules from DNA sequence and gene expression. Bioinformatics, 2003
- Mapping Human Epigenomes [Rivera & Ren, Cell, 2013]
- Discovery and characterization of chromatin states for systematic annotation of the human genome, NBT, 2010 (ChromHMM)
Predicting transcription factor binding
- Accurate inference of transcription factor binding from DNA sequence and chromatin accessibility data, Genome Research. 2011 (CENTIPEDE)
Molecular evolution and comparative genomics
- DEKM Chapter 8, 8.1-8.3
- Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res, 2005 (PhastCons)
- Phylogenetic hidden Markov models. In R. Nielsen, ed., Statistical Methods in Molecular Evolution, 2005. (Technical version of the PhastCons paper)
- DEKM Chapter 8.4.
- Bayesian Inference of Phylogeny and Its Impact on Evolutionary Biology, Science, 2001
- Markov chain monte carlo algorithms for the Bayesian analysis of phylogenetic trees, MBE, 1999
Methods for GWAS
- Bayesian statistical methods for genetic association studies. NRG, 2009
- Imputation-based analysis of association studies: candidate regions and quantitative traits. PLG, 2007
- A statistical framework for joint eQTL analysis in multiple tissues. PLoS Genetics, 2013
Integrative association mapping
- Sherlock: Detecting Gene-Disease Associations by Matching Patterns of Expression QTL and GWAS. AJHG, 2013
- Joint analysis of functional genomic data and genome-wide association studies of 18 human traits, The American Journal of Human Genetics. 2014
- Biological Sequence Analysis, Durbin et al (DEKM)
- Introduction to algorithms (CLRS)