Capacity test, November 2025

Before launching the system into production in November 2025, we ran a series of disk I/O stress tests to ensure the storage layer could handle real-world workloads. Here’s what we found.

For CPU-based I/O, we pushed up to 2,000 parallel jobs reading the same BAM file, and all completed successfully. We did observe a 15-30% slowdown when scaling from around 10 to 83 effective simultaneous reads, though from the user’s perspective this was barely noticeable. More importantly, Lustre I/O bandwidth sat at only ~50% capacity throughout, meaning no saturation and plenty of headroom. It seems that our current Lustre setup handles large-scale BAM-based workloads reliably.

For GPU-based I/O, we tested Nanopore basecalling using Dorado on a g5.12xlarge instance (4 A10G GPUs, 48 CPUs, 96 GB RAM). Processing 40 pod5 files took about 19 minutes at near 100% GPU utilization. Dorado fully saturated all available GPUs, and the cost came out to $5.67 per run. We found that GPU jobs are compute-bound rather than I/O-bound, confirming that disk I/O is not a bottleneck even for intensive GPU workflows.