Course computing environment setup
This document instructs on setting up the computing environment necessary to implement our handson tutorials for statistical genetics available from GitHub.
Instructions below are tested on Mac and Linux computers. Currently we do not offer support to running directly under Windows, although it will still work for a Windows running the Windows Subsystem for Linux (WSL). Windows users can follow the WSL installation instructions to configure their Windows system, before returning to the installation instructions below.
To open up JupyterLab server on Windows, it is recommended that a modern web browser be used, eg Edge instead of Internet Explorer.
Install relevant software environment
All software (with exception to ANNOVAR) have been packaged and distributed as conda
packages.
Work with existing conda
setup
For those familiar with and have been using conda
environments to handle package management, you can create one single course specific computing environment containing these necessary R packages, Python packages, and some additional executables.
Instructions below assume you use the conda
commands to manage packages, although the same applies if you use micromamba
commands instead — mostly just replace conda
with micromamba
in commands below. Since conda
is extremely slow at resolving and managing complex dependencies, we strongly suggest using micromamba
— which is a lot faster — as replacement to conda
for your own daily work.
To do so, you can create a dedicated environment for the course called cumc_statgen
and install R and Python packages as follows:
conda config --prepend channels bioconductor && conda config --prepend channels conda-forge && conda config --prepend channels dnachun # for micromamba it should be `prepend` instead of `--prepend`
curl -O https://raw.githubusercontent.com/cumc/handson-tutorials/main/setup/docker/r_libs.yml && conda env create -n cumc_statgen -f r_libs.yml && rm -f r_libs.yml
curl -O https://raw.githubusercontent.com/cumc/handson-tutorials/main/setup/docker/python_libs.yml && conda env update -n cumc_statgen -f python_libs.yml && rm -f python_libs.yml
Then activate the environment using
conda activate cumc_statgen
Finally, add some extra executable programs to it
curl -fsSL https://raw.githubusercontent.com/gaow/misc/master/bash/pixi/global_packages.txt | grep -vP "micromamba|python|r-base" > global_packages.txt && \
conda intall --file global_packages.txt -y && rm -f global_packages.txt
To run the tutorials you always need to activate this environment first.
Start from scratch
If you have never worked with any Python package management tools or never used conda
or similar tools (none of miniconda
, micromamba
, mamba
, pixi
) and/or would like to start from scratch, this document provides a quick way to setup a production conda
environment using pixi
and micromamba
, installing tools such as R, Python 3, and Jupyter Lab. The document is written with setting up the software environment on a high performance computing cluster (HPC) although the exact same setup should also apply to your MacOS, Linux PC and Windows PC with WSL installed. Please follow the instructions up until section “Basic software environment and manager setup”, skipping the section “A note for Columbia Neurology HPC users” unless you are from Columbia Neurology.
Caution that this will make changes to your local computing environment, including adding extra lines of shell environment configuration such as export PATH
commands to your shell configuration file, and .libPath
to ~/.Rprofile
. Usually for novices with a computer not configured with these tools, it is not harmful to use our setup. However for savvy developers you may consider installing all required software from conda
yourself, in a way that best fits into your computing environment.
Then, with this as a working conda
setup, you will need to install additional R, Python and executables using the configurations files provided in the sections above “Work with existing conda
setup”. Specifically, you add additional executables via:
curl -fsSL https://raw.githubusercontent.com/cumc/handson-tutorials/main/setup/docker/global_packages | \
while read i; do pixi global install $i; done
and R/Python packages via:
curl -O https://raw.githubusercontent.com/cumc/handson-tutorials/main/setup/docker/r_libs.yml && micromamba install -n r_libs --file r_libs.yaml -y && rm -f r_libs.yaml
curl -O https://raw.githubusercontent.com/cumc/handson-tutorials/main/setup/docker/python_libs.yml && micromamba install -n python_libs --file python_libs.yaml -y && rm -f python_libs.yaml
Install statgen-setup
script
Please download this statgen-set
script, save it to your computer with filename statgen-setup
. Then please open your command terminal, use cd
command to navigate to where the file is downloaded and saved to (on Mac OS it should be ~/Downloads
by default), and run
chmod +x statgen-setup
to make this script executable. You should now be able to run this script in the command terminal as ./statgen-setup
from the directory it is downloaded to. Please test it by typing ./statgen-setup -h
to output the help information for this script.
You can also move this script to specific folders in your system (bash
PATH) such that you will be able to run it simply as statgen-setup
without having to type in the path e.g. ./
. One possibility is to install it to where sos
program is installed. To do so, first type which sos
to see the path where sos
is installed to. Then you can move statgen-setup
script to that same path (either via mv
command on the terminal, or cut and paste it through the file manager in your operating system).
Launching tutorials
To launch the environment to run the tutorials, please run from the command terminal:
./statgen-setup serve --no-docker
You will see in the terminal that the script is downloading the latest tutorial files
When it completes, you should see a line printed on the screen that contains a URL:
Please copy that URL to your web browser. Your JupyterLab server should start like this:
At this point, please open a Terminal
under Other
in the launcher window, and run
get-data
This is to download all the data required to run the tutorials. It may take a while for get-data
command to load the data. Please wait to using the tutorials until after this command is completed.
All data and tutorials will be downloaded to the same folder where you ran statgen-setup serve
.
A note on ANNOVAR
We can neither pre-install ANNOVAR program to our container, nor redistribute it, due to license restriction. Exercise involving ANNOVAR annotations will not work unless you manually install ANNOVAR to one of the folders in your shell $PATH
. Please find ANNOVAR software here. You can download and decompress ANNOVAR to ~/statgen_course_$USER
, which you should be able to access within the Juypter Lab server you launched for the tutorials. You can then follow the ANNOVAR user guide to install it yourself.
For example, if you followed our “start from scratch” section, you should have $HOME/.pixi/bin
available as part of your $PATH
. Suppose you have registered at ANNOVAR website and have a link to directly download it in the form like the following (but not exactly this link):
https://www.openbioinformatics.org/annovar/download/0wgxR2rIVP/annovar.latest.tar.gz
You can simply install it through a one-liner,
curl https://www.openbioinformatics.org/annovar/download/0wgxR2rIVP/annovar.latest.tar.gz -o - | tar zxvf - --strip-components=1 -C $HOME/.pixi/bin