Interactive Analysis on MMCloud + AWS

We have implemented a utility script mm_interactive.sh to help start interactive sessions on MMCloud.

The script currently supports four types of sessions: shell environment via tmate, JupyterLab, RStudio, and VS Code.

Initial Configuration

Ensure you have obtained lastest version of three scripts in the src/ folder of this repository. You can git clone the repo to get all files in this repo but the three required files for this purpose are mm_interactive.sh, host_init.sh and bind_mount.sh. You need to keep these 3 files in the same folder when you run commands in this document.

When starting interactive session for the first time, run the command below:

bash mm_interactive.sh --mount-packages -i quay.io/danielnachun/tmate-minimal

or

bash mm_interactive.sh --mount-packages -i quay.io/danielnachun/tmate-minimal --float-executable float.darwin_arm64

where --float-executable is the shell command name for your float software from mmcloud and is default to float in the mm_interactive.sh script which will work for Linux users. You can overwrite it with what’s in fact installed on your system — for example for Mac users who installed the float program as float.darwin_arm64 (see this page for details) you can overwrite it using --float-executable option. You can ignore --float-executable in all commands on this page if your default is set to float.

When prompted, input your account details: your OpCenter username and password. Then, a connection to the interactive session will be established from your shell terminal. A few minutes later, you should see the output:

To access the server, copy this URL into a browser: ...

or

SSH session: ...

Copy the URL into your web browser. For the SSH session, you may copy that into your terminal.

Once logged in, use this command to install recommended software.

curl -fsSL https://raw.githubusercontent.com/gaow/misc/master/bash/pixi/pixi-mamba.sh | bash

Once these initial packages are installed (should take around 1 hour), you can cancel this instance and start new instances using one of the other IDE as will be discussed next.

Note: The setup script will install Jupyter Lab and VS Code by default, but will not install Rstudio, to save storage space (and thus cost associated with the storage space). If you are not an Rstudio user you can stick to this setup.

If you prefer Rstudio, you can additionally install it via:

pixi global install rstudio

Once you are done with the initial setup, you are ready to login to your IDE of choice. Please now cancel the VM instance that you used for the initial setup.

Daily Use

The initial configuration from the steps above should have installed JupyterLab, RStudio and VS Code as integrated development environments (IDEs) that you can choose instead of working with the shell. To access JupyterLab, use:

bash mm_interactive.sh --mount-packages -i quay.io/danielnachun/tmate-minimal -ide jupyter --float-executable float.darwin_arm64

After a few minutes, you’ll see the message: To access the server, copy this URL into a browser ... Follow this instruction to access JupyterLab.

An example:

To access the server, copy this URL into a browser: http://3.89.222.63:10089/lab?token=efe44d238df52e7c35be2ffe8b87fa00263f82b48878d7b8. 

3.89.222.63:10089 is your gateway IP address, which is fixed for this job(no matter how many times you suspend your jobs). efe44d238df52e7c35be2ffe8b87fa00263f82b48878d7b8 is your token, without which you cannot access Jupyterlab. So, just copying and sending the URL in the browser of your currently running Jupyterlab don’t allow others from a different device to access this Jupyterlab. If you are using default mount setting, you can create a soft link in the JupyterLab terminal console to allow the folder to appear in the sidebar with commands:

ln -s /data/ ~

Please perform all analyses in your interactive folder located at /data/interactive_analysis/<name>. Ensure that no files are saved to your home directory, as they may be lost and cannot be recovered if the kernel crashes.

To access RStudio Server, use:

bash mm_interactive.sh --mount-packages -i quay.io/danielnachun/tmate-minimal -ide rstudio --float-executable float.darwin_arm64

After a few minutes, you’ll see the message: To access RStudio Server, navigate to ... Follow the instructions to access RStudio.

Currently, our script automatically configures two default mounts:

  1. Maps the root S3 folder to /data/ on your instance.
  2. Maps your interactive folder to the corresponding folder in your home directory on our S3 bucket, with the capability to create the folder if it does not yet exist. This latter mount is required and cannot be changed.

If additional mounts are necessary, the -am option allows for further customization. Essentially, the end result will be the same: an S3 bucket path will be mounted to a directory on the instance. The custom command structure would look like this:

bash mm_interactive.sh -am 's3://s3_path1:tovm_path1' -am 's3://s3_path2:tovm_path2' ...

Additional Software Installation

Please refer to this documentation page for our recommended package mamangement using pixi and micromamba. If you want to add other conda packages, follow the recommendations for command executables, R, and Python. For example, to install pecotmr, which is an R package with many dependencies, follow the appropriate guidelines:

Note: This is actually run within your instance. To get a bash session in your instance, run the command specified in Section Initial Configuration above

micromamba install -n r_libs r-pecotmr -c dnachun

In addition, you can install other packages you need for your analysis using pixi since the image is pixi-based. Taking STAR as an example:

pixi global install STAR

You can check if the installed packages are executable using (pay attention to the package names, which might not be the same as those in your installing commands):

which star
star --version

Interactive analysis job mamangement

Status

Please refer to this section for a quick overview of Opcenter GUI to track job status.

Suspension

To conserve resources, suspend your interactive session when not in use with the provided command Suspend your environment when you do not need it by running: ... displayed on your terminal. Additionally, in the OpCenter GUI, you can find your job via your Job ID and suspend it there. You can resume your job via the OpCenter GUI as well.

To reconnect to your instance, it will use the same link it provided to you in its initialization, which should be saved as a log file in the cwd on your local machine with its job id as an identifier(<your_jobID>_<your_interactive session_type>.log). An example:pu2zb2h51qpuqcmuy8ke0_jupyter.log.

Migration

If you need to migrate your session to a different specifications — such as with a larger cpu and memory when you run out of memory — use the migration option. There are two ways to do this.

  1. If you are in your instance, such as in jupyter, a blue button in the upper right corner interface, which will allow you to log in and view your instance information. From here, you can migrate to a new instance with preferred CPU/memory settings; you can also choose another instance family as well. The same goes for RStudio instances. Below is a screenshot of the button on the jupyter interface. Migration button on upper right-hand side on the interface

  2. If you are not in your instance, but have access to the Opcenter GUI, you can go to your job and click “Migrate” in the top right as well. Migration button on the upper right-hand side on the job

Trouble-shooting

This section documents frequently encountered issues and solutions.

IDE crashes

Out-of-Memory(OOM) error

If you encounter an Invalid response: 502 Bad Gateway error, check the Wave Watcher session on MMcloud GUI. If total memory usage (Memory Used indicated in blue line + Swap Used indicated in purple line) reaches the avaiable limit, an OOM error will likely to occur.

In Wave Watcher

To resolve, please request a new instance with increased memory allocation. For example start a Jupyter using bash mm_interactive.sh --mount-packages -i quay.io/danielnachun/tmate-minimal -ide jupyter --float-executable float.darwin_arm64 -c 4 -m 32 to request 32GB of memory (default is 16GB). Modify --float-executable as needed.

Fatal issues from the code

In JupyterLab when you got error Knernel died/aborted/interrupted/killed/restarting despite the fact that your memory used is well below the memory limit, this may indicate your code has an issue that crashes the IDE. You can check out the stderr.autosave on MMcloud GUI under Attachments tab, which is the log file of the JupyterLab. By searching error, Error and ERROR keywords in log, you may find the commands that caused the error right above the error message. You can search for AsyncIOLoopKernelRestarter: restarting kernel (1/5), keep random ports which indicates that kernel was being restarted.

To fix this error, please identify problematic codes you wrote and make changes.

Known issues in Jupyter Lab

Bad connection to R kernel

  1. If unexpected error shows up during Jupyter cell execution and the cwd is NULL, try restart R kernel by switching to other kernel and back to R kernel on the right top corner of the Notebook.
  2. If the R kernel is missing from the Notebook’s drop-down menu and Error in loadNamespace(x) : there is no package called 'IRkernel' appears in stderr.autosave, run IRkernel::installspec() in a terminal R session to resolve. Details are shown here.

Unknown issues

Freeze Behavior

If the terminal freezes without active jobs, close and reopen it. This typically won’t affect ongoing analyses in your Notebook. But it would definitely help with the detective work if you could record the hh:mm:ss when it happened and report this time stamp as accurately as possible to #mmcloud-debug slack channel along with your job ID.