Setup AWS and MemVerge

MemVerge float tool for interactive and batch job submission

Download Float from the Operation Center

  • For linux user
       wget https://<op_center_ip_address>/float --no-check-certificate
       # Example using an IP address:
       wget https://44.222.241.133/float --no-check-certificate
    
  • For Mac Intel chip user
       wget https://<op_center_ip_address>/float.darwin_amd64 --no-check-certificate
       # Example using an IP address:
       wget https://44.222.241.133/float.darwin_amd64 --no-check-certificate
    
  • For Mac Apple Silicon M chip user
       wget https://<op_center_ip_address>/float.darwin_arm64 --no-check-certificate
       # Example using an IP address:
       wget https://44.222.241.133/float.darwin_arm64 --no-check-certificate
    
  • For Windows user, the float file above is not compatible, so you need to access https://44.222.241.133 and manually downloaded the version of the tool specifically for Windows.

Or you can choose to open MMCloud OpCenter and download it with GUI.

Move and Make It Executable

  • For MAC and Linux users with sudo access, replace <float_binary> below with what you just downloaded,
       sudo mv <float_binary> /usr/local/bin
       sudo chmod +x /usr/local/bin/<float_binary> 
    
  • For users without sudo you can add export PATH=$PATH:<PATH> in your ~/.bashrc where <PATH> is path to <float_binary> executable. Don’t forget to source it afterwards. Then,
      chmod +x <PATH>/<float_binary> 
    
  • For Windows user: Files located in C:\Windows\System32 are automatically included in the system’s PATH environment variable on Windows. This means that any executable file in this directory can be run from any location in the Command Prompt without specifying the full path to the executable. The System32 directory is a crucial part of the Windows operating system, containing many of its core files and utilities. So, if float.exe is in this directory, you can run it from anywhere in the Command Prompt by just typing float.

Addressing Mac Security Settings

Optional: For Mac Users

If you are using a Mac, float command might be blocked due to your security settings. Follow these steps to address it:

  • Open ‘System Preferences’.
  • Navigate to ‘Privacy & Security ‘.
  • Under the ‘Security’ tab, you’ll see a message about Float being blocked. Click on ‘Allow Anyway’.

MacOS float command name conflict

Typically we name our <float_binary> as float. This works on Linux. But on MacOS, float is a shell reserved keyword. In that case we can keep the <float_binary> name as is, for example float.darwin_arm64 under /usr/local/bin. Then all float commands will become float.darwin_arm64, for example float.darwin_arm64 submit ....

AWS CLI tools for data management on AWS

AWS CLI tools is available as a conda package on anaconda.org. For example, if you use pixi to manage your packages you can install it via:

pixi global install awscli

To install the official package: https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html

To summarize:

  • If you are Mac user, you can use below commands to install AWS CLI tools.
       curl "https://awscli.amazonaws.com/AWSCLIV2.pkg" -o "AWSCLIV2.pkg"
       sudo installer -pkg AWSCLIV2.pkg -target /
    
  • If you are HPC/Linux user, you can use to install AWS CLI tools (Also add export PATH=$PATH:/home/<UNI>/.local/bin in your ~/.bashrc and don’t forget to source it).
       curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
       unzip awscliv2.zip
       ./aws/install -i /home/<UNI>/.local/bin/aws-cli -b /home/<UNI>/.local/bin
    
  • If you are Windows user, you can open cmd as administrator and use below commands to install AWS CLI tools.
       msiexec.exe /i https://awscli.amazonaws.com/AWSCLIV2.msi
    

Check if it was installed successfully with

   which aws
   aws --version

Notes for System Admin

This section is only relevant to system admins. If you are a user you can skip this

Setting Up Your IAW User and Account

This is a one-time job for the system admin, done through GUI

FIXME: the approach below will gave every one in the group the full access to the whole bucket, so everyone can read and edit others’ file, that would be convenient but also dangerous. Need to manage it better next step

  • Log into AWS Console:
    • Navigate to AWS Console.
    • Sign up for a root AWS account if you’re new, else log in.
  • Search for IAW:
    • After logging in, search for “IAW” using the top search bar.
  • Creat Group
    • Click “User groups” on the left.
    • Attach “AmazonS3FullAccess” for this group
    • Add Users to this group.
  • Add user and set up access key
    • GUI/or maybe for root user (first time to set up the access key)
      • Add User:
        • Click “Users” on the left and then click “Create user” on the right.
        • Click “Next” following instructions.
      • Manage Access Keys:
        • Find “Security recommendations” on the IAW dashboard.
        • Click “Manage access keys”.
      • Create an Access Key:
        • Go to the “Access keys” section.
        • Select “Create access key”.
      • Retrieve Your Access Key and Secret Access Key:
        • A dialogue will show your Access Key ID and Secret Access Key.
        • Check the box, then click “Next”.
        • Download a copy of these keys for safekeeping.
    • CLI (change to root access)
       aws iam create-user --user-name YourUserName
       aws iam add-user-to-group --user-name YourUserName --group-name Gao-lab
       aws iam create-access-key --user-name YourUserName
       # create password 
       aws iam create-login-profile --user-name YourUserName --password NEW_PASSWORD
      

      copy these keys for safekeeping.

  • Configure AWS CLI:
    • Run the following in your terminal:
      aws configure
      
    • Provide:
      • Your Access Key ID and Secret Access Key.
      • Region: us-east-1.
      • Output format (e.g., yaml).

Create project S3 Bucket

To create an S3 bucket, ensure your $S3_BUCKET name is globally unique and in lowercase. For example:

aws s3api create-bucket --bucket $S3_BUCKET --region $AWS_REGION

Example:

aws s3api create-bucket --bucket cumc-gao --region us-east-1

MMCloud account management

First, login as admin,

float login -u <admin username> -p <admin passwd> -a /<op_center_ip_address>

Then create a new user for example tom,

float user add tom

Setup MMCloud OpCenter

MMCloud OpCenter is analogous to the login node on a HPC.

As of May 2024, OpCenters are created and managed by MemVerge support team. We no longer need to worry about setting them up ourselves.

Upgrade MMCloud OpCenter

You may be asked to upgrade MMCloud OpCenter from time to time. To do so,

  • float release ls (check the version thats available)
  • float release upgrade (upgrade to latest)
  • wait for 1-2 mins
  • float login (to login again)
  • float release sync (upgrade local float binary. You can skip this if you use the latest containers provided by MemVerge, see Appendix II. You may get a permission deneied error, if so, please use sudo)
  • float release migrate --dbPath /mnt/memverge/data/opcenter (this is a one time upgrade of the backend DB)
  • done!

Configure OpCenter

We can configure the Opcenter in two ways

  1. Using the GUI interface

Using the GUI interface admins can change the setting of the OpCenter like to expand instance type and allow for more retry on spot instance before jobs fail.

Configure using GUI

  1. Using CLI configuration commands (recommended). You must be logged in as the admin user. Here are some examples
float config set cloud.createVMPolicy spotFirst
float config set cloud.createVMRetryInterval 5m0s
float config set cloud.createVMRetryLimit "3"
float config set migrate.cpuDisable true
float config set migrate.cpuLowerBoundDuration 10m0s
float config set migrate.cpuLowerBoundRatio 1
float config set migrate.cpuLowerLimit 8
float config set migrate.cpuMigrateStep 10
float config set migrate.cpuUpperBoundDuration 10m0s
float config set migrate.cpuUpperBoundRatio 99
float config set migrate.memDisable false
float config set migrate.memLimit 100
float config set migrate.memLowerBoundDuration 10m0s
float config set migrate.memLowerBoundRatio 1
float config set migrate.memLowerLimit 8
float config set migrate.memMigrateStep 10
float config set migrate.memUpperBoundDuration 10m0s
float config set migrate.memUpperBoundRatio 90
float config set provider.allowList "*"
float config set provider.denyList "t*"
float config set scheduler.jobExecutorLimit 900
float config set cloud.handleRebalanceMemThreshold 128G
float config set cloud.swapFileSize 12G

Some of the lines may result an error like this if you use MAC float config set cloud.createVMRetryLimit -1 (Error: unknown shorthand flag: ‘1’ in -1’) and float config set provider.allowList r5*,r6* (zsh: no matches found: r5,r6). Enclosing with quotes work for float config set provider.allowList "r5*,r6*". You need to make sure that each line has been configured without any error, when you run the float commands above.

Additional settings

  • Gateway setup
  • Security group for port 8888 created in the AWS console

Clean up OpCenter space (recommeded to contact Memverge support team first)

Sometimes we may need to free up some space by deleting older builds on root volume. The admin of the opcenter instance should have the ssh phrase to be able to ssh into opcenter and clean it up.

  1. login using ssh -i ec2-user id created as an admin . This needs the pem key.
  2. Check the log files du -lh -d 1 and how much space is used on the system df -h
  3. Run sudo podman image prune -a -f to clean up some space

Compute resource quota inrease request

Both the AWS and the MMCloud OpCenter have a default limit on the number of jobs we can submit at a time. However, it is possible to request AWS to increase the quota limit by contacting AWS customer service to increase the quota for Spot instances (which MMCloud uses). Currently, we have increased our AWS quota to 4000 CPUs per AWS region, although this can be changed to even largeer if we request again. Requests are usually approved within 24hrs.

Admins should also change the default maximum job of an OpCenter using CLI. See later section about OpCenter configuration via CLI.

AWS storage quota increase request

When we submit too many jobs each loads large EB2 volume, we may see in jobs submitted later that:

2024-02-21T02:39:10.861: Failed to create float data volume, error: VolumeLimitExceeded: You have exceeded your maximum gp3 storage limit of 50 TiB in this region. Please contact AWS Support to request an Elastic Block Store service limit increase.

Although it is possible to ask AWS customer service to increase this limit, since EB2 volume should hold just temporary files, 50TB is a decent limit. If your jobs uses more than 50TB EB2 volume on the fly to support the computing, it is advised to examine into your jobs and decide if this is truly necessary (likely not!).