Interactive Analysis on MMCloud + AWS

We have implemented a utility script mm_interactive.sh to help start interactive sessions on MMCloud.

The script currently supports four types of sessions: shell environment via tmate, JupyterLab, RStudio, and VS Code.

Initial Configuration

Ensure you have obtained lastest version of three scripts in the src/ folder of this repository. You can git clone the repo to get all files in this repo but you must keep the files in the same folder when you run commands. These files are mm_interactive.sh, host_init_interactive.sh and bind_mount.sh. You need to keep these 3 files in the same folder when you run commands in this document. The easiest way to do so is to just clone the repo to your local machine.

When starting interactive session for the first time, run the command below. This will start an interactive tmate session for you:

bash mm_interactive.sh --mount-packages

or

bash mm_interactive.sh --mount-packages \
--float-executable float.darwin_arm64

where --float-executable is the shell command name for your float software. It defaults to float, so if you are running on a Linux system, you do not need to specify it. You can overwrite it with what’s in fact installed on your system. See this page for details.

When prompted, input your account details: your OpCenter username and password. Then, a connection to the interactive session will be established from your shell terminal. For tmate sessions, you will see the below message. Just type y and enter to continue.

NOTICE: tmate sessions are primarily designed for initial package configuration.
For regular development work, we recommend utilizing a more advanced Integrated Development Environment (IDE)
via the -ide option, if you have previously set up an alternative IDE.
Do you wish to proceed with the tmate session? (y/N): y

A few minutes later, you should see the output:

To access the server, copy this URL into a browser: ...

or

SSH session: ...

Copy the URL into your web browser. For the SSH session, you may copy that into your terminal.

Purge previous installations if you reinstall

To remove previous installations,

cd /mnt/efs/$YOUR_USER_NAME
rm -rf ~/.mamba ~/.conda ~/.anaconda ~/.pixi ~/.jupyter

It is recommended you ask an admin user to help clear the directory for you.

Once logged in, use this command to install recommended software.

curl -fsSL https://raw.githubusercontent.com/gaow/misc/master/bash/pixi/pixi-setup.sh | bash

Note: You DO NOT need to install Rstudio or VS Code yourself. This is already provided as a “shared” package between all users.

Once you are done with the initial setup (should take around 40-50 minutes), you are ready to login to your IDE of choice. Please now cancel the VM instance that you used for the initial setup, at MMCloud OpCenter.

Daily Use

The initial configuration from the steps above should have installed JupyterLab, RStudio and VS Code as integrated development environments (IDEs) that you can choose instead of working with the shell. To access JupyterLab, use:

bash mm_interactive.sh --mount-packages -ide jupyter --float-executable float.darwin_arm64

After a few minutes, you’ll see the message: To access the server, copy this URL into a browser ... Follow this instruction to access JupyterLab.

An example:

To access the server, copy this URL into a browser: http://3.89.222.63:10089/lab?token=efe44d238df52e7c35be2ffe8b87fa00263f82b48878d7b8. 

3.89.222.63:10089 is your gateway IP address, which is fixed for this job(no matter how many times you suspend your jobs). efe44d238df52e7c35be2ffe8b87fa00263f82b48878d7b8 is your token, without which you cannot access Jupyterlab. So, just copying and sending the URL in the browser of your currently running Jupyterlab don’t allow others from a different device to access this Jupyterlab.

If you are using default mount setting, you can create a soft link in the JupyterLab terminal console to allow the folder to appear in the sidebar with commands:

ln -s /data/ ~

Please perform all analyses in your interactive folder located at /data/interactive_sessions/<name>. Ensure that no files are saved to your home directory, as they may be lost and cannot be recovered if the kernel crashes.

To access RStudio Server, use:

bash mm_interactive.sh --mount-packages -ide rstudio --float-executable float.darwin_arm64

After a few minutes, you’ll see the message: To access RStudio Server, navigate to ... Follow the instructions to access RStudio.

To access VScode, use:

bash mm_interactive.sh --mount-packages -ide vscode --float-executable float.darwin_arm64

NOTE If you ever lose track of the URL to access your interactive job, in the directory where you ran your command, you will see a log file in the form of <JOB_ID>_<IDE>.log, where <IDE> will be the IDE you ran, such as JupyterLab

Using ghq to pull GitHub repositories

In order to pull Github repos consistently on this setup, we have opted to use the ghq library as a robust approach on AWS S3 (installed as go-ghq which is included already in the initial setup in the tmate session). For those missing the package, it can be installed with pixi global install go-ghq.

With go-ghq installed you will see a ghq directory under your /home/ubuntu home directory. This is where you will be pulling your repos. In order to do so, simply cd ghq into your directory and run ghq get https://github.com/rfeng2023/mmcloud or your intended repo. Afterwards, you can continue using your original git commands including pull, commit and push.

Additional Mounts

Currently, the script only has one default mount: s3://statfungen/ftp_fgc_xqtl to /data/

If additional mounts are necessary, the -am option allows for further customization. Essentially, the end result will be the same: an S3 bucket path will be mounted to a directory on the instance. The custom command structure would look like this:

bash mm_interactive.sh --mount-packages -am 's3://statfungen/dir1:/dir1' -am 's3://statfungen/dir2:/dir2' ...

Then, on your instance, you will be able to access the contents of the bucket from the mounted directory.

Additional Software Installation

If you want to add other conda packages, follow the recommendations on this page.

Note: This is actually run within your instance in your interactive tmate session. To get a tmate session in your instance, run the command specified in Section Initial Configuration above.

Interactive analysis job mamangement

Status

Please refer to this section for a quick overview of Opcenter GUI to track job status.

Suspension

To conserve resources, suspend your interactive session when not in use with the provided command Suspend your environment when you do not need it by running: ... displayed on your terminal. Additionally, in the OpCenter GUI, you can find your job via your Job ID and suspend it there. You can resume your job via the OpCenter GUI as well.

To reconnect to your instance, it will use the same link it provided to you in its initialization, which should be saved as a log file in the cwd on your local machine with its job id as an identifier(<your_jobID>_<your_interactive session_type>.log). An example:pu2zb2h51qpuqcmuy8ke0_jupyter.log.

FOR JUPYTER JOBS ONLY: If you are using the latest repo for mm_interactive, the instance will automatically suspend after 2 hours of inactivity, or 30 minutes if no notebook was ever opened in the instance. There is currently a known limitation where activity in terminals cannot be tracked, therefore will be coutned as “idle” time. To make sure your instance is not preemptively suspended, please open a notebook. Automatic Jupyter suspension is still a work in progress. Please feel free to refer to this github issue for the latest updates.

Migration

If you need to migrate your session to a different specifications — such as with a larger cpu and memory when you run out of memory — use the migration option. There are two ways to do this.

  1. If you are in your instance, such as in jupyter, a blue button in the upper right corner interface, which will allow you to log in and view your instance information. From here, you can migrate to a new instance with preferred CPU/memory settings; you can also choose another instance family as well. The same goes for RStudio instances. Below is a screenshot of the button on the jupyter interface. Migration button on upper right-hand side on the interface

  2. If you are not in your instance, but have access to the Opcenter GUI, you can go to your job and click “Migrate” in the top right as well. Migration button on the upper right-hand side on the job

Trouble-shooting

This section documents frequently encountered issues and solutions.

IDE crashes

Out-of-Memory(OOM) error

If you encounter an Invalid response: 502 Bad Gateway error, check the Wave Watcher session on MMcloud GUI. If total memory usage (Memory Used indicated in blue line + Swap Used indicated in purple line) reaches the avaiable limit, an OOM error will likely to occur.

In Wave Watcher

To resolve, please request a new instance with increased memory allocation. For example start a Jupyter using bash mm_interactive.sh --mount-packages -i quay.io/danielnachun/tmate-minimal -ide jupyter --float-executable float.darwin_arm64 -c 4 -m 32 to request 32GB of memory (default is 16GB). Modify --float-executable as needed.

Fatal issues from the code

In JupyterLab when you got error Knernel died/aborted/interrupted/killed/restarting despite the fact that your memory used is well below the memory limit, this may indicate your code has an issue that crashes the IDE. You can check out the stderr.autosave on MMcloud GUI under Attachments tab, which is the log file of the JupyterLab. By searching error, Error and ERROR keywords in log, you may find the commands that caused the error right above the error message. You can search for AsyncIOLoopKernelRestarter: restarting kernel (1/5), keep random ports which indicates that kernel was being restarted.

To fix this error, please identify problematic codes you wrote and make changes.

Known issues in Jupyter Lab

Bad connection to R kernel

  1. If unexpected error shows up during Jupyter cell execution and the cwd is NULL, try restart R kernel by switching to other kernel and back to R kernel on the right top corner of the Notebook.
  2. If the R kernel is missing from the Notebook’s drop-down menu and Error in loadNamespace(x) : there is no package called 'IRkernel' appears in stderr.autosave, run IRkernel::installspec() in a terminal R session to resolve. Details are shown here.

Unknown issues

Freeze Behavior

If the terminal freezes without active jobs, close and reopen it. This typically won’t affect ongoing analyses in your Notebook. But it would definitely help with the detective work if you could record the hh:mm:ss when it happened and report this time stamp as accurately as possible to #mmcloud-debug slack channel along with your job ID.