Course computing environment setup

This document instructs on setting up the computing environment necessary to implement our handson tutorials for statistical genetics available from GitHub.

Instructions below are tested on Mac and Linux computers. Currently we do not offer support to running directly under Windows, although it will still work for a Windows running the Windows Subsystem for Linux (WSL). Windows users can follow the WSL installation instructions to configure their Windows system, before returning to the installation instructions below.

To open up JupyterLab server on Windows, it is recommended that a modern web browser be used, eg Edge instead of Internet Explorer.

Install relevant software environment

All software (with exception to ANNOVAR) have been packaged and distributed as conda packages. We recommend using pixi to manage conda packages.

Option 1: Start from scratch

Caution that this will make changes to your local computing environment, including adding extra lines of shell environment configuration such as export PATH commands to your shell configuration file, and .libPath to ~/.Rprofile. And you will be using Python and R from this installation rather than your system’s default if they exist. Usually for novice users with a computer not configured with these tools, it is not harmful to use our setup. For savvy developers please review the material below first and adopt it with caution; or, see the next section “Install to existing software environment”.

If you have never worked with any Python package management tools or never used conda or similar tools (none of miniconda, micromamba, mamba, pixi) and/or would like to start from scratch, this document provides a quick way to setup a production conda environment using pixi, installing tools such as R, Python 3, and Jupyter Lab. The document is written with setting up the software environment on a high performance computing cluster (HPC) although the exact same setup should also apply to your MacOS, Linux PC and Windows PC with WSL installed. Please follow the instructions up until section “Basic software environment and manager setup”, skipping the section “A note for Columbia Neurology HPC users” unless you are from Columbia Neurology.

Next, you can install for the course these necessary R packages, Python packages, and some additional executables.

To do so, first add executables via:

pixi global install $(curl -fsSL https://raw.githubusercontent.com/cumc/handson-tutorials/main/setup/global_packages.txt | tr '\n' ' ')

then add R/Python packages via:

pixi global install --environment r-base $(curl -fsSL https://raw.githubusercontent.com/cumc/handson-tutorials/main/setup/r_packages.txt | grep -v "#" | tr '\n' ' ')
pixi global install --environment python $(curl -fsSL https://raw.githubusercontent.com/cumc/handson-tutorials/main/setup/python_packages.txt | grep -v "#" | tr '\n' ' ')
pixi clean cache -y

Option 2: Install to existing software environment

For those familiar with and have been using their own conda setup to handle package management, you can create one single course specific computing environment containing these necessary R packages, Python packages, and some additional executables. We will not provide instructions how to do this assuming you are comfortable doing it based on your own computing setup. If you are not confident about figuring it out (or fail to do so), that is an indication that you might better off following option 1 to setup everything from scratch not only for the course but for more general package management on your computing system using pixi as our recommendation of package management tool.

Launch tutorials

To launch the environment to run the tutorials, please run from the command terminal:

jupyter-lab

you should see a line printed on the screen that contains a URL:

Please copy that URL to your web browser. Your JupyterLab server should start like this:

At this point, please open a Terminal under Other in the launcher window, and run

curl -fsSL https://raw.githubusercontent.com/cumc/handson-tutorials/main/setup/course_entrypoint.sh | bash

This is to download all the data required to run the tutorials. It may take a while to load the data. Please wait to using the tutorials until after this command is completed.

All data and tutorials will be downloaded to the same folder where you ran jupter-lab.

A note on `ANNOVAR`

We can neither pre-install ANNOVAR program to our container, nor redistribute it, due to license restriction. Exercise involving ANNOVAR annotations will not work unless you manually install ANNOVAR to one of the folders in your shell $PATH. Please find ANNOVAR software here. You can download and decompress ANNOVAR to where you launched the Jupyter Lab server so you can access within the server. You can then follow the ANNOVAR user guide to install it yourself.

For example, if you followed our “start from scratch” section, you should have $HOME/.pixi/bin available as part of your $PATH. Suppose you have registered at ANNOVAR website and have a link to directly download it in the form like the following (but not exactly this link):

https://www.openbioinformatics.org/annovar/download/0wgxR2rIVP/annovar.latest.tar.gz

You can simply install it through a one-liner,

curl https://www.openbioinformatics.org/annovar/download/0wgxR2rIVP/annovar.latest.tar.gz -o - | tar zxvf - --strip-components=1 -C $HOME/.pixi/bin