Setup AWS and MemVerge
MemVerge float
tool for interactive and batch job submission
Download Float from the Operation Center
- For linux user
wget https://<op_center_ip_address>/float --no-check-certificate # Example using an IP address: wget https://44.222.241.133/float --no-check-certificate
- For Mac Intel chip user
wget https://<op_center_ip_address>/float.darwin_amd64 --no-check-certificate # Example using an IP address: wget https://44.222.241.133/float.darwin_amd64 --no-check-certificate
- For Mac Apple Silicon M chip user
wget https://<op_center_ip_address>/float.darwin_arm64 --no-check-certificate # Example using an IP address: wget https://44.222.241.133/float.darwin_arm64 --no-check-certificate
- For Windows user, the float file above is not compatible, so you need to access https://44.222.241.133 and manually downloaded the version of the tool specifically for Windows.
Or you can choose to open MMCloud OpCenter and download it with GUI.
Move and Make It Executable
- For MAC and Linux users with
sudo
access, replace<float_binary>
below with what you just downloaded,sudo mv <float_binary> /usr/local/bin sudo chmod +x /usr/local/bin/<float_binary>
- For users without
sudo
you can addexport PATH=$PATH:<PATH>
in your~/.bashrc
where<PATH>
is path to<float_binary>
executable. Don’t forget to source it afterwards. Then,chmod +x <PATH>/<float_binary>
- For Windows user:
Files located in C:\Windows\System32 are automatically included in the system’s PATH environment variable on Windows. This means that any executable file in this directory can be run from any location in the Command Prompt without specifying the full path to the executable. The System32 directory is a crucial part of the Windows operating system, containing many of its core files and utilities. So, if float.exe is in this directory, you can run it from anywhere in the Command Prompt by just typing
float
.
Addressing Mac Security Settings
Optional: For Mac Users
If you are using a Mac, float
command might be blocked due to your security settings. Follow these steps to address it:
- Open ‘System Preferences’.
- Navigate to ‘Privacy & Security ‘.
- Under the ‘Security’ tab, you’ll see a message about Float being blocked. Click on ‘Allow Anyway’.
MacOS float
command name conflict
Typically we name our <float_binary>
as float
. This works on Linux. But on MacOS, float
is a shell reserved keyword. In that case we can keep the <float_binary>
name as is, for example float.darwin_arm64
under /usr/local/bin
. Then all float
commands will become float.darwin_arm64
, for example float.darwin_arm64 submit ...
.
AWS CLI tools for data management on AWS
AWS CLI tools is available as a conda package on anaconda.org. For example, if you use pixi
to manage your packages you can install it via:
pixi global install awscli
To install the official package: https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html
To summarize:
- If you are Mac user, you can use below commands to install AWS CLI tools.
curl "https://awscli.amazonaws.com/AWSCLIV2.pkg" -o "AWSCLIV2.pkg" sudo installer -pkg AWSCLIV2.pkg -target /
- If you are HPC/Linux user, you can use to install AWS CLI tools (Also add
export PATH=$PATH:/home/<UNI>/.local/bin
in your~/.bashrc
and don’t forget to source it).curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip" unzip awscliv2.zip ./aws/install -i /home/<UNI>/.local/bin/aws-cli -b /home/<UNI>/.local/bin
- If you are Windows user, you can open cmd as administrator and use below commands to install AWS CLI tools.
msiexec.exe /i https://awscli.amazonaws.com/AWSCLIV2.msi
Check if it was installed successfully with
which aws
aws --version
Notes for System Admin
This section is only relevant to system admins. If you are a user you can skip this
Setting Up Your IAW User and Account
This is a one-time job for the system admin, done through GUI
FIXME: the approach below will gave every one in the group the full access to the whole bucket, so everyone can read and edit others’ file, that would be convenient but also dangerous. Need to manage it better next step
- Log into AWS Console:
- Navigate to AWS Console.
- Sign up for a root AWS account if you’re new, else log in.
- Search for IAW:
- After logging in, search for “IAW” using the top search bar.
- Creat Group
- Click “User groups” on the left.
- Attach “AmazonS3FullAccess” for this group
- Add Users to this group.
- Add user and set up access key
- GUI/or maybe for root user (first time to set up the access key)
- Add User:
- Click “Users” on the left and then click “Create user” on the right.
- Click “Next” following instructions.
- Manage Access Keys:
- Find “Security recommendations” on the IAW dashboard.
- Click “Manage access keys”.
- Create an Access Key:
- Go to the “Access keys” section.
- Select “Create access key”.
- Retrieve Your Access Key and Secret Access Key:
- A dialogue will show your Access Key ID and Secret Access Key.
- Check the box, then click “Next”.
- Download a copy of these keys for safekeeping.
- Add User:
- CLI (change to root access)
aws iam create-user --user-name YourUserName aws iam add-user-to-group --user-name YourUserName --group-name Gao-lab aws iam create-access-key --user-name YourUserName # create password aws iam create-login-profile --user-name YourUserName --password NEW_PASSWORD
copy these keys for safekeeping.
- GUI/or maybe for root user (first time to set up the access key)
- Configure AWS CLI:
- Run the following in your terminal:
aws configure
- Provide:
- Your Access Key ID and Secret Access Key.
- Region:
us-east-1
. - Output format (e.g.,
yaml
).
- Run the following in your terminal:
Create project S3 Bucket
To create an S3 bucket, ensure your $S3_BUCKET
name is globally unique and in lowercase. For example:
aws s3api create-bucket --bucket $S3_BUCKET --region $AWS_REGION
Example:
aws s3api create-bucket --bucket cumc-gao --region us-east-1
MMCloud account management
First, login as admin,
float login -u <admin username> -p <admin passwd> -a /<op_center_ip_address>
Then create a new user for example tom
,
float user add tom
Setup MMCloud OpCenter
MMCloud OpCenter is analogous to the login node on a HPC.
As of May 2024, OpCenters are created and managed by MemVerge support team. We no longer need to worry about setting them up ourselves.
Upgrade MMCloud OpCenter
You may be asked to upgrade MMCloud OpCenter from time to time. To do so,
float release ls
(check the version thats available)float release upgrade
(upgrade to latest)- wait for 1-2 mins
float login
(to login again)float release sync
(upgrade local float binary. You can skip this if you use the latest containers provided by MemVerge, see Appendix II. You may get a permission deneied error, if so, please usesudo
)float release migrate --dbPath /mnt/memverge/data/opcenter
(this is a one time upgrade of the backend DB)- done!
Configure OpCenter
We can configure the Opcenter in two ways
- Using the GUI interface
Using the GUI interface admins can change the setting of the OpCenter like to expand instance type and allow for more retry on spot instance before jobs fail.
- Using CLI configuration commands (recommended). You must be logged in as the admin user. Here are some examples
float config set cloud.createVMPolicy spotFirst
float config set cloud.createVMRetryInterval 5m0s
float config set cloud.createVMRetryLimit "3"
float config set migrate.cpuDisable true
float config set migrate.cpuLowerBoundDuration 10m0s
float config set migrate.cpuLowerBoundRatio 1
float config set migrate.cpuLowerLimit 8
float config set migrate.cpuMigrateStep 10
float config set migrate.cpuUpperBoundDuration 10m0s
float config set migrate.cpuUpperBoundRatio 99
float config set migrate.memDisable false
float config set migrate.memLimit 100
float config set migrate.memLowerBoundDuration 10m0s
float config set migrate.memLowerBoundRatio 1
float config set migrate.memLowerLimit 8
float config set migrate.memMigrateStep 10
float config set migrate.memUpperBoundDuration 10m0s
float config set migrate.memUpperBoundRatio 90
float config set provider.allowList "*"
float config set provider.denyList "t*"
float config set scheduler.jobExecutorLimit 900
float config set cloud.handleRebalanceMemThreshold 128G
float config set cloud.swapFileSize 12G
Some of the lines may result an error like this if you use MAC float config set cloud.createVMRetryLimit -1
(Error: unknown shorthand flag: ‘1’ in -1’) and float config set provider.allowList r5*,r6*
(zsh: no matches found: r5,r6).
Enclosing with quotes work for float config set provider.allowList "r5*,r6*"
. You need to make sure that each line has been configured without any error, when you run the float
commands above.
Additional settings
- Gateway setup
- Security group for port 8888 created in the AWS console
Clean up OpCenter space (recommeded to contact Memverge support team first)
Sometimes we may need to free up some space by deleting older builds on root volume. The admin of the opcenter instance should have the ssh
phrase to be able to ssh into opcenter and clean it up.
- login using
ssh -i
ec2-user id created as an admin . This needs thepem key
. - Check the log files
du -lh -d 1
and how much space is used on the systemdf -h
- Run
sudo podman image prune -a -f
to clean up some space
Compute resource quota inrease request
Both the AWS and the MMCloud OpCenter have a default limit on the number of jobs we can submit at a time. However, it is possible to request AWS to increase the quota limit by contacting AWS customer service to increase the quota for Spot instances (which MMCloud uses). Currently, we have increased our AWS quota to 4000 CPUs per AWS region, although this can be changed to even largeer if we request again. Requests are usually approved within 24hrs.
Admins should also change the default maximum job of an OpCenter using CLI. See later section about OpCenter configuration via CLI.
AWS storage quota increase request
When we submit too many jobs each loads large EB2 volume, we may see in jobs submitted later that:
2024-02-21T02:39:10.861: Failed to create float data volume, error: VolumeLimitExceeded: You have exceeded your maximum gp3 storage limit of 50 TiB in this region. Please contact AWS Support to request an Elastic Block Store service limit increase.
Although it is possible to ask AWS customer service to increase this limit, since EB2 volume should hold just temporary files, 50TB is a decent limit. If your jobs uses more than 50TB EB2 volume on the fly to support the computing, it is advised to examine into your jobs and decide if this is truly necessary (likely not!).