Course computing environment setup

This document instructs on setting up the computing environment necessary to implement our handson tutorials for statistical genetics available from GitHub.

Instructions below are tested on Mac and Linux computers. Currently we do not offer support to running directly under Windows, although it will still work for a Windows running the Windows Subsystem for Linux (WSL). Windows users can follow the WSL installation instructions to configure their Windows system, before returning to the installation instructions below.

To open up JupyterLab server on Windows, it is recommended that a modern web browser be used, eg Edge instead of Internet Explorer.

Install relevant software environment

All software (with exception to ANNOVAR) have been packaged and distributed as conda packages.

Work with existing conda setup

For those familiar with and have been using conda environments to handle package management, you can create one single course specific computing environment containing these necessary R packages, Python packages, and some additional executables.

Instructions below assume you use the conda commands to manage packages, although the same applies if you use micromamba commands instead — mostly just replace conda with micromamba in commands below. Since conda is extremely slow at resolving and managing complex dependencies, we strongly suggest using micromamba — which is a lot faster — as replacement to conda for your own daily work.

To do so, you can create a dedicated environment for the course called cumc_statgen and install R and Python packages as follows:

conda config --prepend channels bioconductor && conda config --prepend channels conda-forge && conda config --prepend channels dnachun # for micromamba it should be `prepend` instead of `--prepend`
curl -O https://raw.githubusercontent.com/cumc/handson-tutorials/main/setup/docker/r_libs.yml && conda env create -n cumc_statgen -f r_libs.yml && rm -f r_libs.yml
curl -O https://raw.githubusercontent.com/cumc/handson-tutorials/main/setup/docker/python_libs.yml && conda env update -n cumc_statgen -f python_libs.yml && rm -f python_libs.yml

Then activate the environment using

conda activate cumc_statgen

Finally, add some extra executable programs to it

curl -fsSL https://raw.githubusercontent.com/gaow/misc/master/bash/pixi/global_packages.txt | grep -vP "micromamba|python|r-base" > global_packages.txt && \
  conda intall --file global_packages.txt -y && rm -f global_packages.txt 

To run the tutorials you always need to activate this environment first.

Start from scratch

If you have never worked with any Python package management tools or never used conda or similar tools (none of miniconda, micromamba, mamba, pixi) and/or would like to start from scratch, this document provides a quick way to setup a production conda environment using pixi and micromamba, installing tools such as R, Python 3, and Jupyter Lab. The document is written with setting up the software environment on a high performance computing cluster (HPC) although the exact same setup should also apply to your MacOS, Linux PC and Windows PC with WSL installed. Please follow the instructions up until section “Basic software environment and manager setup”, skipping the section “A note for Columbia Neurology HPC users” unless you are from Columbia Neurology.

Caution that this will make changes to your local computing environment, including adding extra lines of shell environment configuration such as export PATH commands to your shell configuration file, and .libPath to ~/.Rprofile. Usually for novices with a computer not configured with these tools, it is not harmful to use our setup. However for savvy developers you may consider installing all required software from conda yourself, in a way that best fits into your computing environment.

Then, with this as a working conda setup, you will need to install additional R, Python and executables using the configurations files provided in the sections above “Work with existing conda setup”. Specifically, you add additional executables via:

curl -fsSL https://raw.githubusercontent.com/cumc/handson-tutorials/main/setup/docker/global_packages | \
 while read i; do pixi global install $i; done

and R/Python packages via:

curl -O https://raw.githubusercontent.com/cumc/handson-tutorials/main/setup/docker/r_libs.yml && micromamba install -n r_libs --file r_libs.yaml -y && rm -f r_libs.yaml
curl -O https://raw.githubusercontent.com/cumc/handson-tutorials/main/setup/docker/python_libs.yml && micromamba install -n python_libs --file python_libs.yaml -y && rm -f python_libs.yaml

Install statgen-setup script

Please download this statgen-set script, save it to your computer with filename statgen-setup. Then please open your command terminal, use cd command to navigate to where the file is downloaded and saved to (on Mac OS it should be ~/Downloads by default), and run

chmod +x statgen-setup

to make this script executable. You should now be able to run this script in the command terminal as ./statgen-setup from the directory it is downloaded to. Please test it by typing ./statgen-setup -h to output the help information for this script.

You can also move this script to specific folders in your system (bash PATH) such that you will be able to run it simply as statgen-setup without having to type in the path e.g. ./. One possibility is to install it to where sos program is installed. To do so, first type which sos to see the path where sos is installed to. Then you can move statgen-setup script to that same path (either via mv command on the terminal, or cut and paste it through the file manager in your operating system).

Launching tutorials

To launch the environment to run the tutorials, please run from the command terminal:

./statgen-setup serve --no-docker

You will see in the terminal that the script is downloading the latest tutorial files

image

When it completes, you should see a line printed on the screen that contains a URL:

image

Please copy that URL to your web browser. Your JupyterLab server should start like this:

image

At this point, please open a Terminal under Other in the launcher window, and run

get-data

This is to download all the data required to run the tutorials. It may take a while for get-data command to load the data. Please wait to using the tutorials until after this command is completed.

All data and tutorials will be downloaded to the same folder where you ran statgen-setup serve.

A note on ANNOVAR

We can neither pre-install ANNOVAR program to our container, nor redistribute it, due to license restriction. Exercise involving ANNOVAR annotations will not work unless you manually install ANNOVAR to one of the folders in your shell $PATH. Please find ANNOVAR software here. You can download and decompress ANNOVAR to ~/statgen_course_$USER, which you should be able to access within the Juypter Lab server you launched for the tutorials. You can then follow the ANNOVAR user guide to install it yourself.

For example, if you followed our “start from scratch” section, you should have $HOME/.pixi/bin available as part of your $PATH. Suppose you have registered at ANNOVAR website and have a link to directly download it in the form like the following (but not exactly this link):

https://www.openbioinformatics.org/annovar/download/0wgxR2rIVP/annovar.latest.tar.gz

You can simply install it through a one-liner,

curl https://www.openbioinformatics.org/annovar/download/0wgxR2rIVP/annovar.latest.tar.gz -o - | tar zxvf - --strip-components=1 -C $HOME/.pixi/bin