SLURM Reference

Comprehensive reference for SLURM commands and options

Job Submission Commands

salloc - Obtain a job allocation for interactive use (docs)

sbatch - Submit a batch script for later execution (docs)

srun - Obtain a job allocation and run an application (docs)

Common Options

Option	Description
`-A, --account=<account>`	Account to be charged for resources used
`-a, --array=<index>`	Job array specification (sbatch only)
`-b, --begin=<time>`	Initiate job after specified time
`-C, --constraint=<features>`	Required node features
`--cpu-bind=<type>`	Bind tasks to specific CPUs (srun only)
`-c, --cpus-per-task=<count>`	Number of CPUs required per task
`-d, --dependency=<state:jobid>`	Defer job until specified jobs reach specified state
`-m, --distribution=<method>`	Specify distribution methods for remote processes
`-e, --error=<filename>`	File for job error messages (sbatch and srun only)
`-x, --exclude=<name>`	Host names to exclude from job allocation
`--exclusive`	Reserve all CPUs and GPUs on allocated nodes
`--export=<name=value>`	Export specified environment variables
`--gpus-per-task=<list>`	Number of GPUs required per task
`-J, --job-name=<name>`	Job name
`-l, --label`	Prepend task ID to output (srun only)
`--mail-type=<type>`	E-mail notification type (begin, end, fail, requeue, all)
`--mail-user=<address>`	E-mail address
`--mem=<size>[units]`	Memory required per allocated node (e.g., 16GB)
`--mem-per-cpu=<size>[units]`	Memory required per allocated CPU (e.g., 2GB)
`-w, --nodelist=<hostnames>`	Host names to include in job allocation
`-N, --nodes=<count>`	Number of nodes required for the job
`-n, --ntasks=<count>`	Number of tasks to be launched
`--ntasks-per-node=<count>`	Number of tasks to be launched per node
`-o, --output=<filename>`	File for job output (sbatch and srun only)
`-p, --partition=<names>`	Partition in which to run the job
`--signal=[B:]<num>[@time]`	Signal job when approaching time limit
`-t, --time=<time>`	Limit for job run time

Examples

# Request interactive job on debug node with 4 CPUs
salloc -p debug -c 4

# Request interactive job with V100 GPU
salloc -p gpu --ntasks=1 --gpus-per-task=v100:1

# Submit batch job
sbatch batch.job

Job Management Commands

squeue - View jobs in queue

Documentation

Option	Description
`-A, --account=<list>`	Filter by accounts
`-o, --format=<options>`	Output format to display
`-j, --jobs=<list>`	Filter by job IDs
`-l, --long`	Show more available information
`--me`	Filter by your own jobs
`-n, --name=<list>`	Filter by job names
`-p, --partition=<list>`	Filter by partitions
`-P, --priority`	Sort jobs by priority
`--start`	Show expected start time for pending jobs
`-t, --states=<list>`	Filter by states
`-u, --user=<list>`	Filter by users

Examples:

# View your own jobs
squeue --me

# View with estimated start times for pending jobs
squeue --me --start

# View jobs on specified partition in long format
squeue -lp epyc-64

scancel - Cancel jobs

Documentation

Option	Description
`-A, --account=<account>`	Restrict to specified account
`-n, --name=<job_name>`	Restrict to jobs with specified name
`-w, --nodelist=<hostnames>`	Restrict to jobs using specified hosts
`-p, --partition=<partition>`	Restrict to specified partition
`-u, --user=<username>`	Restrict to specified user

Examples:

# Cancel specific job
scancel 111111

# Cancel all your own jobs
scancel -u $USER

# Cancel your own jobs on specified partition
scancel -u $USER -p oneweek

# Cancel all pending jobs
scancel -u $USER -t pending

sprio - View job priorities

Documentation

Option	Description
`-o, --format=<options>`	Output format to display
`-j, --jobs=<list>`	Filter by job IDs
`-l, --long`	Show more information
`-n, --norm`	Show normalized priority factors
`-p, --partition=<list>`	Filter by partitions
`-u, --user=<list>`	Filter by users

Examples:

# View normalized job priorities for your jobs
sprio -nu $USER

# View priorities for specified partition
sprio -nlp gpu

Partition and Node Information

sinfo - View node and partition info

Documentation

Option	Description
`-o, --format=<options>`	Output format to display
`-l, --long`	Show more information
`-N, --Node`	Show in node-oriented format
`-n, --nodes=<hostnames>`	Filter by host names
`-p, --partition=<list>`	Filter by partitions
`-t, --states=<list>`	Filter by node states
`-s, --summarize`	Show summary information

Examples:

# View all partitions and nodes by state
sinfo

# Summarize node states by partition
sinfo -s

# View nodes in idle state
sinfo --states=idle

# View nodes for partition in long, node-oriented format
sinfo -lNp epyc-64

scontrol - View or modify configuration

Documentation

Option	Description
`-d, --details`	Show more details
`-o, --oneliner`	Show information on one line

Examples:

# View partition information
scontrol show partition epyc-64

# View node information
scontrol show node $NODE

# View detailed job information
scontrol show job 111111 -d

# View hostnames for job
scontrol show hostnames

# Hold a pending job
scontrol hold <job_id>

# Release a held job
scontrol release <job_id>

Environment Variables

SLURM sets these variables within your job:

Variable	Description
`SLURM_ARRAY_TASK_COUNT`	Number of tasks in job array
`SLURM_ARRAY_TASK_ID`	Job array task ID
`SLURM_CPUS_PER_TASK`	Number of CPUs requested per task
`SLURM_JOB_ACCOUNT`	Account used for job
`SLURM_JOB_ID`	Job ID
`SLURM_JOB_NAME`	Job name
`SLURM_JOB_NODELIST`	List of nodes allocated to job
`SLURM_JOB_NUM_NODES`	Number of nodes allocated to job
`SLURM_JOB_PARTITION`	Partition used for job
`SLURM_NTASKS`	Number of job tasks
`SLURM_PROCID`	MPI rank of current process
`SLURM_SUBMIT_DIR`	Directory from which job was submitted
`SLURM_TASKS_PER_NODE`	Number of job tasks per node

Examples:

# Specify OpenMP threads
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK

# Specify MPI task
srun -n $SLURM_NTASKS ./mpi_program

Job State Codes

The squeue command shows job states:

Status	Code	Explanation
COMPLETED	`CD`	Job has completed successfully
COMPLETING	`CG`	Job is finishing but some processes are still active
FAILED	`F`	Job terminated with non-zero exit code
PENDING	`PD`	Job is waiting for resource allocation
PREEMPTED	`PR`	Job was terminated due to preemption
RUNNING	`R`	Job is currently running
SUSPENDED	`S`	Running job stopped with cores released
STOPPED	`ST`	Running job stopped with cores retained

Full list in SLURM documentation

Job Reason Codes

Why a job is in its current state:

Reason Code	Explanation
`Priority`	Higher priority jobs are ahead; job will run eventually
`Dependency`	Waiting for dependent job to complete
`Resources`	Waiting for resources; job will run eventually
`InvalidAccount`	Invalid account; cancel and resubmit
`InvalidQoS`	Invalid QoS; cancel and resubmit
`QOSGrpCpuLimit`	QoS CPU limit reached; job will run eventually
`QOSGrpMaxJobsLimit`	QoS max jobs reached; job will run eventually
`QOSGrpNodeLimit`	QoS node limit reached; job will run eventually
`PartitionCpuLimit`	Partition CPU limit reached; job will run eventually
`PartitionMaxJobsLimit`	Partition max jobs reached; job will run eventually
`PartitionNodeLimit`	Partition node limit reached; job will run eventually
`AssociationCpuLimit`	Association CPU limit reached; job will run eventually
`AssociationMaxJobsLimit`	Association max jobs reached; job will run eventually
`AssociationNodeLimit`	Association node limit reached; job will run eventually

Full list in SLURM documentation