View of Manhattan from Gao Wang's office

Manhattan and Hudson Plots from Gao Wang's office, 12/06/2023, and 09/08/2022


Welcome to StatFunGen Lab wiki

This wiki site is developed to share useful information with our lab members and collaborators.

Our Laboratory of Statistical Functional Genomics (#StatFunGen) in the Gertude H. Sergievsky Center at Columbia University aims to understand molecular mechanisms of complex traits biology through studying the molecular and cellular functional impacts of genetic variation at population scale. We use statistical and computational approaches to integrate population scale genomics, epigenomics, transcriptomics, proteomics, and metabolomics data. Our work spans a range of subjects at the intersection of genetics, statistics, bioinformatics, machine learning and neurological sciences. We are particularly interested in developing and applying advanced computational methods to address significant problems in our field. Therefore, much of our research involves the thoughtful analysis of large-scale ‘omics data to inspire new hypotheses on the molecular mechanisms of disease etiology, or to motivate the development of novel computational methodology.

One focus of our work is the statistical analysis of molecular quantitative trait loci (QTL). We use statistical approaches to quantify the genetic regulation of molecular traits in a large number of cells and tissues collected from aging brains, cerebrospinal fluid, and blood. This allows us to create a detailed landscape of the functional genomic regulation landscape in the brain and central nervous system. Working closely with multiple research consortia world-wide, we study the QTL of various molecular phenotypes, including chromatin accessibility, histone acetylation, transcription factor binding activity, methylation, small RNAs, gene expression, alternative polyadenylation, alternative splicing, proteomics, lipidomics, and metabolomics.

Another focus of our research is the computational discovery of novel regulatory genomics mechanisms. We use existing sequencing and other assay data, along with new computational methods, to uncover new molecular phenotypes. For example, we quantify novel spatial patterns of histone acetylation and methylation QTL. We also develop new methods for generating alternative polyadenylation data. We investigate mechanisms of genomic regulation through the development and application of improved computational methods in statistical modeling and machine learning for fine-mapping, colocalization, functional variant annotation and prediction, epigenome and transcriptome-wide association, and gene-set and pathway discovery.

We also investigate the context-specific regulation of functional genomic features. By integrating the growing collection of molecular QTL resources with molecular data from minority population samples across the country, we aim to uncover the differential genetic regulations at the variants, genes, and pathways levels. We study the heterogeneity of cross-population QTL, and use the similarities between populations to improve the prediction of molecular phenotypes in minority populations.

Furthermore, we use functional genomics knowledge to study neurodegenerative disorders. We integrate whole genome sequences, brain and CNS QTL data, and a broad range of epigenomic data to study Alzheimer’s disease (AD). We develop and apply multi-omics integration methods to understand the synergy of ‘omics regulations in conferring disease risk. We integrate functional genomic data to aid in the discovery of rare and/or non-coding genetic variants (including copy number variations). We also develop approaches for profiling AD families and patients using genetic, functional genomic, and pathology information at all levels to better characterize and uncover the heterogeneity within AD. Finally, we study the genetic comorbidity between AD and other complex traits in diverse cohorts.

An important aspect of our group’s mission is the development of functional genomics data, computational methods and resources. We develop data resources and user-friendly computational tools, including software packages and bioinformatics pipelines, for novel methods we develop as well as existing methods in the field that we have assessed and trust. Our aim is to enhance the analysis and interpretation of large-scale ‘omics data to understand neurological disorders. Additionally, we are committed to education and outreach to nurture a vibrant and collaborative research community.

We nurture a work environment that values innovation, curiosty and enjoyment of our lab members. If you are interested in working with us to advance both scientific research and individual growth, feel free to reach out to Gao Wang for information on possible openings.