Job Arrays & Cost Optimization

Techniques to optimize SLURM workloads for cost efficiency

This page covers techniques to reduce costs when running SLURM workloads on AWS.

For basic job submission and job arrays, see SLURM Quick Start.

Cost-Saving Strategies

Right-Size Your Jobs

Over-requesting resources wastes money. Use seff after jobs complete to check actual usage:

seff <job_id>

If memory utilization is low (e.g., 2 GB used out of 7 GB requested), use a smaller instance next time. A cpu2mem4 instance costs half as much as cpu2mem8.

Use Job Arrays for Batch Processing

Job arrays allow multiple jobs to share instances efficiently. For example, 4 jobs requesting 7GB each can share 2 instances of cpu2mem16 (16GB per instance), rather than each job spinning up its own instance.

See Job Arrays for usage examples.

Avoid --exclusive Unless Necessary

The --exclusive flag reserves the entire node for your job, even if you only use a fraction of its resources. Only use this when your workload truly requires all CPUs and memory on a node.

Storage Performance Considerations

The shared Lustre storage system has 3,000 MB/sec throughput by default (can be increased at higher cost).

The I/O Bottleneck Situation

When many compute nodes read/write simultaneously, storage can become a bottleneck:

Example: 1,000 nodes reading a 1,000 MB file simultaneously:

  • Total throughput: 3,000 MB/sec / 1,000 nodes = 3 MB/sec per node
  • Time to read: 1,000 MB / 3 MB/sec = ~333 seconds (5+ minutes)

Comparison: Single node reading the same file:

  • Time to read: ~1 second (with 12.5 Gbps network)

Option 1: Stagger Job Starts

Instead of starting all jobs at once, stagger them with delays:

# Start jobs with 30-second intervals
for i in {1..100}; do
    sbatch my_job.sh
    sleep 30
done

This gives each node time to read data before the next job starts, reducing contention and often improving total runtime despite the staggered start.

Option 2: Slurm Array Operation

Another solution based on SLURM --array option, which gives strict control onto resources requested:

# limit the number of simultaneously running tasks from this job array to 4.
--array=<start>-<end>%4

Multi-step pipeline with various resources request

Multi-step pipelines may have very different resources requirement in each step, e.g., genotyping, as htslib supports multi-threading, while GATK4’s multithreading is still in its beta, and looks won’t be in official release in foreseeable future, e.g. ApplyBQSRSpark; or different steps takes very different amount of RAMs.

Users can split them into multiple scripts and specify the dependency from SLURM’s --dependency option, which is powerful and fully compatible with --array:

# In this example, we suppose the array job id is 1-1 corresponding to each individual sample
alignmentID=$(sbatch --parsable --array=<your-job-array> bwa_mem.sh)
#echo "Alignment Job-Array ID: ${alignmentID}"
genotypingID=$(sbatch --parsable --array=<your-job-array> --dependency=aftercorr:${alignmentID} GATK_part.sh)
#echo "Genotyping Job-Array. ID: ${genotypingID}"
cleaningID=$(sbatch --parsable --array=<your-job-array> --dependency=aftercorr:${genotypingID} cleaning.sh)
#echo "Cleanng Job-Array. ID: ${cleaningID}"