SLURM Reference
Comprehensive reference for SLURM commands and options
Job Submission Commands
salloc - Obtain a job allocation for interactive use (docs)
sbatch - Submit a batch script for later execution (docs)
srun - Obtain a job allocation and run an application (docs)
Common Options
| Option | Description |
|---|---|
-A, --account=<account> |
Account to be charged for resources used |
-a, --array=<index> |
Job array specification (sbatch only) |
-b, --begin=<time> |
Initiate job after specified time |
-C, --constraint=<features> |
Required node features |
--cpu-bind=<type> |
Bind tasks to specific CPUs (srun only) |
-c, --cpus-per-task=<count> |
Number of CPUs required per task |
-d, --dependency=<state:jobid> |
Defer job until specified jobs reach specified state |
-m, --distribution=<method> |
Specify distribution methods for remote processes |
-e, --error=<filename> |
File for job error messages (sbatch and srun only) |
-x, --exclude=<name> |
Host names to exclude from job allocation |
--exclusive |
Reserve all CPUs and GPUs on allocated nodes |
--export=<name=value> |
Export specified environment variables |
--gpus-per-task=<list> |
Number of GPUs required per task |
-J, --job-name=<name> |
Job name |
-l, --label |
Prepend task ID to output (srun only) |
--mail-type=<type> |
E-mail notification type (begin, end, fail, requeue, all) |
--mail-user=<address> |
E-mail address |
--mem=<size>[units] |
Memory required per allocated node (e.g., 16GB) |
--mem-per-cpu=<size>[units] |
Memory required per allocated CPU (e.g., 2GB) |
-w, --nodelist=<hostnames> |
Host names to include in job allocation |
-N, --nodes=<count> |
Number of nodes required for the job |
-n, --ntasks=<count> |
Number of tasks to be launched |
--ntasks-per-node=<count> |
Number of tasks to be launched per node |
-o, --output=<filename> |
File for job output (sbatch and srun only) |
-p, --partition=<names> |
Partition in which to run the job |
--signal=[B:]<num>[@time] |
Signal job when approaching time limit |
-t, --time=<time> |
Limit for job run time |
Examples
# Request interactive job on debug node with 4 CPUs
salloc -p debug -c 4
# Request interactive job with V100 GPU
salloc -p gpu --ntasks=1 --gpus-per-task=v100:1
# Submit batch job
sbatch batch.job
Job Management Commands
squeue - View jobs in queue
| Option | Description |
|---|---|
-A, --account=<list> |
Filter by accounts |
-o, --format=<options> |
Output format to display |
-j, --jobs=<list> |
Filter by job IDs |
-l, --long |
Show more available information |
--me |
Filter by your own jobs |
-n, --name=<list> |
Filter by job names |
-p, --partition=<list> |
Filter by partitions |
-P, --priority |
Sort jobs by priority |
--start |
Show expected start time for pending jobs |
-t, --states=<list> |
Filter by states |
-u, --user=<list> |
Filter by users |
Examples:
# View your own jobs
squeue --me
# View with estimated start times for pending jobs
squeue --me --start
# View jobs on specified partition in long format
squeue -lp epyc-64
scancel - Cancel jobs
| Option | Description |
|---|---|
-A, --account=<account> |
Restrict to specified account |
-n, --name=<job_name> |
Restrict to jobs with specified name |
-w, --nodelist=<hostnames> |
Restrict to jobs using specified hosts |
-p, --partition=<partition> |
Restrict to specified partition |
-u, --user=<username> |
Restrict to specified user |
Examples:
# Cancel specific job
scancel 111111
# Cancel all your own jobs
scancel -u $USER
# Cancel your own jobs on specified partition
scancel -u $USER -p oneweek
# Cancel all pending jobs
scancel -u $USER -t pending
sprio - View job priorities
| Option | Description |
|---|---|
-o, --format=<options> |
Output format to display |
-j, --jobs=<list> |
Filter by job IDs |
-l, --long |
Show more information |
-n, --norm |
Show normalized priority factors |
-p, --partition=<list> |
Filter by partitions |
-u, --user=<list> |
Filter by users |
Examples:
# View normalized job priorities for your jobs
sprio -nu $USER
# View priorities for specified partition
sprio -nlp gpu
Partition and Node Information
sinfo - View node and partition info
| Option | Description |
|---|---|
-o, --format=<options> |
Output format to display |
-l, --long |
Show more information |
-N, --Node |
Show in node-oriented format |
-n, --nodes=<hostnames> |
Filter by host names |
-p, --partition=<list> |
Filter by partitions |
-t, --states=<list> |
Filter by node states |
-s, --summarize |
Show summary information |
Examples:
# View all partitions and nodes by state
sinfo
# Summarize node states by partition
sinfo -s
# View nodes in idle state
sinfo --states=idle
# View nodes for partition in long, node-oriented format
sinfo -lNp epyc-64
scontrol - View or modify configuration
| Option | Description |
|---|---|
-d, --details |
Show more details |
-o, --oneliner |
Show information on one line |
Examples:
# View partition information
scontrol show partition epyc-64
# View node information
scontrol show node $NODE
# View detailed job information
scontrol show job 111111 -d
# View hostnames for job
scontrol show hostnames
# Hold a pending job
scontrol hold <job_id>
# Release a held job
scontrol release <job_id>
Environment Variables
SLURM sets these variables within your job:
| Variable | Description |
|---|---|
SLURM_ARRAY_TASK_COUNT |
Number of tasks in job array |
SLURM_ARRAY_TASK_ID |
Job array task ID |
SLURM_CPUS_PER_TASK |
Number of CPUs requested per task |
SLURM_JOB_ACCOUNT |
Account used for job |
SLURM_JOB_ID |
Job ID |
SLURM_JOB_NAME |
Job name |
SLURM_JOB_NODELIST |
List of nodes allocated to job |
SLURM_JOB_NUM_NODES |
Number of nodes allocated to job |
SLURM_JOB_PARTITION |
Partition used for job |
SLURM_NTASKS |
Number of job tasks |
SLURM_PROCID |
MPI rank of current process |
SLURM_SUBMIT_DIR |
Directory from which job was submitted |
SLURM_TASKS_PER_NODE |
Number of job tasks per node |
Examples:
# Specify OpenMP threads
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
# Specify MPI task
srun -n $SLURM_NTASKS ./mpi_program
Job State Codes
The squeue command shows job states:
| Status | Code | Explanation |
|---|---|---|
| COMPLETED | CD |
Job has completed successfully |
| COMPLETING | CG |
Job is finishing but some processes are still active |
| FAILED | F |
Job terminated with non-zero exit code |
| PENDING | PD |
Job is waiting for resource allocation |
| PREEMPTED | PR |
Job was terminated due to preemption |
| RUNNING | R |
Job is currently running |
| SUSPENDED | S |
Running job stopped with cores released |
| STOPPED | ST |
Running job stopped with cores retained |
Full list in SLURM documentation
Job Reason Codes
Why a job is in its current state:
| Reason Code | Explanation |
|---|---|
Priority |
Higher priority jobs are ahead; job will run eventually |
Dependency |
Waiting for dependent job to complete |
Resources |
Waiting for resources; job will run eventually |
InvalidAccount |
Invalid account; cancel and resubmit |
InvalidQoS |
Invalid QoS; cancel and resubmit |
QOSGrpCpuLimit |
QoS CPU limit reached; job will run eventually |
QOSGrpMaxJobsLimit |
QoS max jobs reached; job will run eventually |
QOSGrpNodeLimit |
QoS node limit reached; job will run eventually |
PartitionCpuLimit |
Partition CPU limit reached; job will run eventually |
PartitionMaxJobsLimit |
Partition max jobs reached; job will run eventually |
PartitionNodeLimit |
Partition node limit reached; job will run eventually |
AssociationCpuLimit |
Association CPU limit reached; job will run eventually |
AssociationMaxJobsLimit |
Association max jobs reached; job will run eventually |
AssociationNodeLimit |
Association node limit reached; job will run eventually |