JupyterLab + SoS Suite setup

This document provides tips for setting up your computing environment using micromamba package manager.

Operating OS requirement

The instructions on this page are tested and known to work for Linux and MacOS. Although with some efforts it might work for Windows, using Windows your every day computational biology research is discouraged.

Note for Neurology HPC Users: To configure the network proxy, add the following commands to your ~/.bashrc and then run the source command. Begin by opening ~/.bashrc in a text editor and appending the commands:

export http_proxy=http://menloproxy.cumc.columbia.edu:8080
export https_proxy=http://menloproxy.cumc.columbia.edu:8080

and type source ~/.bashrc to load the changes.

Purge previous installations of micromamba or miniconda

This is an optional step only necessary for those who had installed various software previously and now would like to start from scratch.

First, back-up the previous micromamba installation by:

mv ~/micromamba ~/micromamba_backup
mv ~/.conda ~/conda_backup
rm -rf ~/.mamba ~/.conda ~/.anaconda

Then make a back up of ~/.bashrc by:

mv ~/.bashrc ~/.bashrc_backup

and install a new copy of it:

cp /etc/skel/.bashrc ~/.bashrc

Finally, open up your ~/.bashrc_backup, review and move the contents that you deem relevant to the new environment that you are about to setup. For example, the http_proxy and https_proxy discussed in the previous section should be retained for Neurology HPC users. However if you would like micromamba to setup R and not using R installed on the HPC, please do not include module load R in the new bashrc that you are configuring now.

At this point, please log out then log back in to refresh the computing environment.

Note: the exact same tip works also for purging your miniconda3 installation.

Install micromamba

We highly recommend using micromamba over minicondaor anaconda. Unlike miniconda, micromaba does not need a base environment and does not come with a default version of Python. micromamba supports a subset of all mamba implements a command line interface from scratch in the C++ language.

To install please follow instructions on this page. Briefly,

"${SHELL}" <(curl -L micro.mamba.pm/install.sh)

Push the “enter” or “return” key on your keyboard when prompted to follow the default setting.

If your computer does not have curl available you can use wget like this:

cd ~
wget -qO- https://micromamba.snakepit.net/api/micromamba/linux-64/latest | tar -xvj bin/micromamba
~/bin/micromamba shell init -s bash -p ~/micromamba

where you manually specific the OS, in this case linux-64.

After installation is done you should load micromamba from ~/.bashrc (Linux) or ~/.zshrc (MacOS) by typing source ~/.bashrc (or source ~/.zshrc). To verify you’ve installed it successfully:

micromamba -h

This should print the help message. You can then use micromamba to create environments, install packages, etc. For conveniences it is strongly recommended adding the following channels to micromamba to install packages from by default:

micromamba config prepend channels nodefaults
micromamba config prepend channels bioconda
micromamba config prepend channels conda-forge

After you successfully installed the latest version of micromamba, please follow prompts below to setup a JupyterLab + SoS Suite environment for daily computing.

Setup the Script of Scripts computing environment

Current recommended version of SoS suite along with Python and R can be installed using this configuration file pisces-rabbit.yml:

wget https://raw.githubusercontent.com/gaow/misc/master/docker/pisces-rabbit.yml 
micromamba env create -y -f pisces-rabbit.yml

Notice that we name environment by the Zodiac of the month and year. For example, pisces-rabbit pins the setup to what was tested by our lab members to be a stable distribution as of Feb 20 (Pisces), 2023 (Rabbit). This wiki will be periodically updated to the latest stable version we have tested.

If you want to load this environment by default, you can open your ~/.bashrc file (or ~/.zshrc) and add this line:

micromamba activate pisces-rabbit 

and type source ~/.bashrc (or source ~/.zshrc) to load the changes. Otherwise you need to type the command above each time you want to activate and work under this environment after opening up a new Shell session.

Note for Neurology HPC users:

  • When you submit a job to the cluster, since the computing node ignores the ~/.bashrc settings, you need to add or source these lines in your job submission template in order to activate and use this environment:
export PATH=$HOME/.local/bin:$PATH
export MAMBA_ROOT_PREFIX=$HOME/micromamba
eval "$(micromamba shell hook --shell bash)"
micromamba activate pisces-rabbit

You can put these lines in a file called ~/mamba_activate.sh and include source ~/mamba_activate.sh as the first line in your job submission script or template.

  • The SoS notebook plugin, sos-r, is not included in the setup because as of today (August, 2023) r-feather does not support Apple Silicon CPU. However it is available for Intel/AMD CPU. HPC users are encouraged to run
micromamba install sos-r -y

to install the R plugin for SoS notebook on the cluster.

At this point, you can test your installation by connecting to the HPC via JupyterLab. If everything works well so far, you can optionally delete your old micromamba environment that you backed up earlier:

rm -rf ~/micromamba_backup
rm -rf ~/conda_backup

Install other software

Once this is set, you can also install other software using micromamba as the software manager, as long as they are released in one of the conda channels. In our setting we have already included conda-forge and bioconda by default. You can install for example plink, plink2, bcftools, tabix etc easily:

micromamba install plink plink2 bcftools tabix -y

About R libraries

It is important to realize that the R software installed using micromamba is packaged and distributed by conda-forge. It is therefore highly recommended that R libraries be installed also from conda-forge as long as they are available. For example, to install R library pacman you can verify that it is available on conda-forge; then you can install it using:

micromamba install r-pacman -y

For libraries not available on conda-forge you can use the regular approaches to install them, such as from cran, bioconductor and GitHub.