Course computing environment setup

This document instructs on setting up the computing environment necessary to implement our handson tutorials for statistical genetics available from GitHub.

Instructions below are tested on Mac and Linux computers. Currently we do not offer support to running directly under Windows, although it will still work for a Windows running the Windows Subsystem for Linux (WSL). Windows users can follow the WSL installation instructions to configure their Windows system, before returning to the installation instructions below.

To open up JupyterLab server on Windows, it is recommended that a modern web browser be used, eg Edge instead of Internet Explorer.

Install relevant software environment

All software (with exception to ANNOVAR) have been packaged and distributed as conda packages. We recommend using pixi and micromamba to manage conda packages.

Option 1: Start from scratch

Caution that this will make changes to your local computing environment, including adding extra lines of shell environment configuration such as export PATH commands to your shell configuration file, and .libPath to ~/.Rprofile. And you will be using Python and R from this installation rather than your system’s default if they exist. Usually for novice users with a computer not configured with these tools, it is not harmful to use our setup. For savvy developers please review the material below first and adopt it with caution; or, see the next section “Install to existing software environment”.

If you have never worked with any Python package management tools or never used conda or similar tools (none of miniconda, micromamba, mamba, pixi) and/or would like to start from scratch, this document provides a quick way to setup a production conda environment using pixi and micromamba, installing tools such as R, Python 3, and Jupyter Lab. The document is written with setting up the software environment on a high performance computing cluster (HPC) although the exact same setup should also apply to your MacOS, Linux PC and Windows PC with WSL installed. Please follow the instructions up until section “Basic software environment and manager setup”, skipping the section “A note for Columbia Neurology HPC users” unless you are from Columbia Neurology.

Next, you can install for the course these necessary R packages, Python packages, and some additional executables.

To do so, first add executables via:

pixi global install $(curl -fsSL https://raw.githubusercontent.com/cumc/handson-tutorials/main/setup/docker/global_packages | tr '\n' ' ')

then add R/Python packages via:

curl -O https://raw.githubusercontent.com/cumc/handson-tutorials/main/setup/docker/r_libs.yml && micromamba install -n r_libs --file r_libs.yaml -y && rm -f r_libs.yaml
curl -O https://raw.githubusercontent.com/cumc/handson-tutorials/main/setup/docker/python_libs.yml && micromamba install -n python_libs --file python_libs.yaml -y && rm -f python_libs.yaml

Option 2: Install to existing software environment

For those familiar with and have been using conda environments to handle package management, you can create one single course specific computing environment containing these necessary R packages, Python packages, and some additional executables.

Instructions below assume you use the conda commands to manage packages, although the same applies if you use micromamba commands instead — mostly just replace conda with micromamba in commands below. Since conda is extremely slow at resolving and managing complex dependencies, we strongly suggest using micromamba — which is a lot faster — as replacement to conda for your own daily work.

To do so, you can create a dedicated environment for the course called cumc_statgen and install R and Python packages as follows:

conda config --prepend channels bioconda && conda config --prepend channels conda-forge && conda config --prepend channels dnachun # for micromamba it should be `prepend` instead of `--prepend`
curl -O https://raw.githubusercontent.com/cumc/handson-tutorials/main/setup/docker/r_libs.yml && conda env create -n cumc_statgen -f r_libs.yml && rm -f r_libs.yml
curl -O https://raw.githubusercontent.com/cumc/handson-tutorials/main/setup/docker/python_libs.yml && conda env update -n cumc_statgen -f python_libs.yml && rm -f python_libs.yml

Then activate the environment using

conda activate cumc_statgen

and add one package that is essential but is not proper to fit into python_libs.yml file which mainly serves to install libraries from scratch:

conda install jupyter_client>=8.0.1 -c conda-forge

Finally, add some extra executable programs to it. We suggest using pixi which would be a lot smoother to install compared to conda,

curl -fsSL https://raw.githubusercontent.com/gaow/misc/master/bash/pixi/pixi-setup.sh | bash 
pixi global install $(curl -fsSL https://raw.githubusercontent.com/cumc/handson-tutorials/main/setup/docker/global_packages.txt | grep -vP "micromamba|python|r-base|jupyter|notebook" | tr '\n' ' ')

Notice that we need to exclude Python, R and Jupyter Notebook from the executable to be installed because otherwise your system will be using pixi installed version of those programs, causing a conflict with your existing environment.

To run the tutorials you always need to activate this environment first.

Launch tutorials

To launch the environment to run the tutorials, please run from the command terminal:

jupyter-lab

you should see a line printed on the screen that contains a URL:

image

Please copy that URL to your web browser. Your JupyterLab server should start like this:

image

At this point, please open a Terminal under Other in the launcher window, and run

curl -fsSL https://raw.githubusercontent.com/cumc/handson-tutorials/main/setup/course_entrypoint.sh | bash

This is to download all the data required to run the tutorials. It may take a while to load the data. Please wait to using the tutorials until after this command is completed.

All data and tutorials will be downloaded to the same folder where you ran jupter-lab.

A note on ANNOVAR

We can neither pre-install ANNOVAR program to our container, nor redistribute it, due to license restriction. Exercise involving ANNOVAR annotations will not work unless you manually install ANNOVAR to one of the folders in your shell $PATH. Please find ANNOVAR software here. You can download and decompress ANNOVAR to where you launched the Jupyter Lab server so you can access within the server. You can then follow the ANNOVAR user guide to install it yourself.

For example, if you followed our “start from scratch” section, you should have $HOME/.pixi/bin available as part of your $PATH. Suppose you have registered at ANNOVAR website and have a link to directly download it in the form like the following (but not exactly this link):

https://www.openbioinformatics.org/annovar/download/0wgxR2rIVP/annovar.latest.tar.gz

You can simply install it through a one-liner,

curl https://www.openbioinformatics.org/annovar/download/0wgxR2rIVP/annovar.latest.tar.gz -o - | tar zxvf - --strip-components=1 -C $HOME/.pixi/bin