HPC as Facilities and Other Resources

Computing Environment

The Department of Neurology and the Getrude H. Sergievsky Center supports research computing through a high-performance computing (HPC) environment hosted on Amazon Web Services (AWS). A customized Columbia University Irving Medical Center (CUIMC) AWS infrastructure has been constructed to execute both HIPAA (https://www.hipaa.cuimc.columbia.edu/) and non-HIPAA workloads utilizing the AWS Well-Architected Framework. The HIPAA section of the infrastructure will exclusively permit HIPAA-compliant AWS services (https://aws.amazon.com/compliance/services-in-scope/) and has been designed to adhere to HIPAA and CUIMC-aligned policies.

Each laboratory has access to a dedicated cluster comprising a persistent login node and on-demand compute nodes managed by the Slurm scheduler. These nodes can be accessed from the CUIMC campus via redundant 10Gbps connectivity. This environment facilitates efficient scaling from small analyses to large-scale batch workflows requiring thousands of concurrent jobs across multiple compute instances with varying hardware configurations. The environment offers CPU-optimized instances ranging from 2 cores with 4GB memory to 64 cores with 512GB memory. For machine learning applications, GPU-accelerated instances with NVIDIA A10G Tensor Core GPUs are available in configurations with 1, 4, or 8 GPUs and up to 192GB system memory. Dynamic memory scaling through MemVerge integration allows jobs to automatically migrate to higher-memory instances when resource demands spike, preventing failures and optimizing utilization without manual intervention.

Project data is stored across two distinct storage tiers utilizing AWS FSx Lustre parallel file system and AWS S3 storage service with Data Repository Association. The Lustre-based parallel file system provides 3,000 MB/sec throughput, expandable as needed, for concurrent access across multiple compute nodes. Long-term storage utilizes Amazon S3 with intelligent tiering to optimize costs. This storage model ensures high data throughput and scalability as storage requirements evolve while minimizing storage-related expenses. Large genomic datasets stored within this environment are securely shared among investigators at CUIMC, reducing redundant storage and transfer costs.

The departmental AWS framework further enhances cost efficiency through centralized management, volume-based pricing, and NIH STRIDES pricing. Daily cost reports provide granular visibility by user and project for budgeting and accountability across collaborative computing tasks. Integration with MemVerge Memory Machine Cloud enables reliable execution of long-running jobs on EC2 Spot instances through automatic checkpointing and migration. Specifically, when AWS reclaims a Spot instance, MemVerge captures the running job state and seamlessly restores it on a new instance without losing progress, reducing job failure rates to below 3% while achieving 50-70% cost savings compared to on-demand pricing.

Additional on-premise computing resources are available through the Columbia Center for Computational Biology and Bioinformatics (C2B2) for smaller-scale analyses.

Software Environment

Researchers can launch JupyterLab sessions on compute nodes for interactive data analysis, connect VS Code directly to the cluster for code development, or run RStudio Server within Singularity containers on compute nodes. The Singularity platform enables reproducible execution of complex software environments without administrative privileges. Software management uses a conventional module system for pre-installed packages, including bioinformatics and statistical software such as SPSS, SAS, STATA, PSEUDOMARKER, FASTLINK, SAGE, GENEHUNTER, GENEHUNTER-PLUS, ALLEGRO, FBAT, MERLIN, QTDT, SOLAR, ACT, FISHER, MENDEL, PEDCHECK, HAPLOVIEW, PLINK, IMPUTE2, STRUCTURE, EIGENSTRAT, GENESPRING, PARTEK, CLCBio Genomics workbench, POLYPHEN-2, PhastCons, GERP, SIFT, BWA, GATK, SnpEff, ANNOVAR, CADD, LUMPY, and GenomeSTRiP, among others. Additional programs can be added upon request to the IT team, or, for power users, at their discretion through a centralized conda-based system managed by pixi package manager.

Security and Compliance

This HPC environment supports the secure storage and analysis of human genomic and clinical data in compliance with NIH Genomic Data Sharing (GDS) and institutional requirements. Security features include controlled access via Single Sign-On (SSO) with multi-factor authentication, encryption at rest and in transit, and routine backups. The common security infrastructure encompasses AWS CloudTrail activity logging, GuardDuty threat detection, Security Alerting, and Service Control Policies.