Getting Started
This guide covers the essentials for using the AWS-based HPC system at CU Neurology.
Access
Please refer to our internal document, CU_Neurology_HPC_Info_2026.md (requires CUMC VPN or on-campus connection) for information on HPC account access and storage overview.
For off campus access, first connect to the CUIMC VPN. After you setup your account according to CU_Neurology_HPC_Info_2026.md, you can use SSH to access the login node:
ssh <username>@<hpc_ip>
On Windows, you can use Windows Subsystem for Linux (WSL) which includes ssh and other Linux utilities.
Understanding the system
Resource overview
While AWS HPC theoretically offers near-infinite resources, each research group (lab) has their own cluster configured with default limits:
| Resource | Limit |
|---|---|
| Partitions | CPU on-demand instance, CPU SPOT instance, GPU |
| CPU nodes | ~4,200 (on-demand + SPOT) |
| GPU nodes | ~25 |
Current status (Jan 15, 2026):
- Only on-demand nodes available; SPOT instance support via MemVerge is currently being tested.
- The WaveRider feature of MemVerge across compute nodes (on-demand or SPOT) is yet to be implemented.
- Recommended concurrent jobs: < 1,000. Submitting too many jobs may result in
Resource temporarily unavailableerrors.
Check partition and nodes availability:
sinfo
These limits serve as guard-rails to control costs, because without them, runaway jobs accessible to unlimited computing resource can quickly become expensive. To adjust limits temporarily for specific projects, or permanently for the group, please contact the IT team.
Architecture
There are multiple HPC clusters in production. Each PI/group works within a dedicated HPC cluster on AWS. Each cluster has a “head node” for submitting jobs, which is operational 24/7. Compute nodes start as needed when jobs are submitted and power off when jobs complete.
Our HPC cluster uses SLURM (Simple Linux Utility for Resource Management), a widely used open-source workload manager. By default, SLURM allocates resources based on what you request in your job script (CPUs, memory, etc.), not necessarily an entire node. You can configure resource allocation using --constraint to specify instance types and --mem to request specific memory amounts. See SLURM Quick Start for details on configuring job resources.
When you log in to the HPC cluster, you are placed on the head node, which is shared with other users. This node is intended only for lightweight tasks such as navigating directories, viewing files, inspecting scripts, and submitting jobs. Running computationally intensive tasks, including copying large files, will slow down the system for everyone.
Important: DO NOT run heavy computation on the head node. Always submit jobs to the cluster.
The SLURM Job Manager
Most SLURM commands and workflows are the same as our previous on-premises HPC, with the following key differences:
--constraintmust match an available instance type; use|to separate multiple options--memmust be below the instance memory limit to leave headroom for system processes
Recommended memory settings by instance type:
| Instance | Max --mem |
|---|---|
| cpu2mem4 | 3 GB |
| cpu2mem8 | 7 GB |
| cpu4mem32 | 30 GB |
| cpu8mem64 | 60 GB |
| cpu16mem128 | 120 GB |
Documentation
Getting Started
- SLURM Quick Start: Run your first job with step-by-step instructions and job templates
Software
Similar to our on-premises HPC, use module avail to list available software and module load to load them. Contact the IT team if you need additional software installed or have trouble installing software under your local account. Alternatively, follow the customized software setup to install R, Python, and other packages on your own in customized paths without using module.
SLURM Guides
- SLURM Reference: Complete command reference
- For Lab Managers: Usage tracking and cost management
Pricing
Interactive Development Environment (IDE)
Advanced Topics
Current Status
For news and known issues, visit our blog.