Alderaan Cluster

Purpose

The Alderaan cluster is a resource for both campuses of the University of Colorado Denver.

Alderaan is a shared cluster running multi-core jobs using MPI and the high-speed interconnect (High-Performace Computing, HPC), as well as multiple single-core jobs (High-Throughput Computing, HTC). Alderaan head node, math-alderaan, is an alternative login node for the joint scheduler along with clas-compute.

All jobs need to be limited in duration. Long-running single-core jobs should be submitted to Score cluster or Colibri cluster partitions instead.

Work consisting of a very large number of single-core jobs can be also submitted to the Open Science Grid (OSG) - log into https://www.osgconnect.net with your CU Denver credentials and follow the instructions to be contacted by OSG staff for an onboarding meeting.

The Alddraan cluster is funded by NSF grant 2019089 CC* Compute: Accelerating Science and Education by Campus and Grid Computing under the NSF CC* program. Two additional GPUS were purchased by the Department of Mathematical and Statistical Sciences and by the Center for Computational Mathematics. The cluster is integrated with the Open_Science_Grid (OSG). At least 20% of core time is being contributed to the OSG as required by the NSF funding.

The cluster was built by Atipa in 2021.

Software

  • SLURM cheduler, cluster tools
  • GNU compiler stack with MPI
  • Intel OneAPI compilers, development tools, MPI, and MKL
  • CUDA
  • MATLAB, R, Python, Tensorflow
  • Modules
  • Singularity containers for custom software environments

Progress

Fall semester 2020

  • Purchase order
  • Planning
  • Advance testing of software

Spring semester 2021

  • Delivery and installation
  • Testing with the vendor's software configuration

Fall semester 2021

  • Finalizing networking, access, and security
  • Open to early users
  • Set up governance and allocation processes

Spring semester 2022

  • Open to all users
  • Scaling up user base, installing custom software on request.
  • Installing 2 additional GPUs
  • Running OSG jobs to satisfy the 20% core time funding requirement
  • Allocation limits and storage quotas not in effect yet

Coming soon

  • OnDemand web access to the cluster for Jupyterhub and remote desktop

User Training

  • ssh command line usage
  • software

How to Use

Authorized users can submit jobs on the Alderaan cluster on the head node by ssh to math-alderaan.ucdenver.pvt. The head node is for compilation and small tests only. Please do not run any intensive or long jobs on the head node.

Another head node clas-compute.ucdenver.pvt is also available. SLURM jobs can be submitted and user and files accessed from either head node, but the software on clas-compute is not the same as on alderaan cluster.

An allocation system is set up and will be activated when warranted by the cluster load.

Web access including Jupyterhub, remote desktop, and R Studio is pending.

Hardware

  • 2048 AMD cores and 16TB memory in 32 compute nodes. Each node has 2 AMD EPYC 7502 32 core processors for a total of 64 cores, 512GB DDR4 memory, and dual 960GB SSD. The compute nodes are in 8 2U rack-mount drawers, with 4 nodes in each drawer.
  • 2 high-memory GPU nodes, each with 2 AMD EPYC 7502 32 core processors for total 64 cores, 2 NVIDIA Tesla A100 GPUs, and 2TB DDR4 memory.
  • Head node with 2 AMD EPYC 7502 32 core processors and 256GB memory.
  • 816TB storage (1PB raw), in 2 storage nodes, each with 2 Intel Xeon( Gold 5215 CPUs 2.50GHz, 196GB memory, and XFS filesystems shared by NFS over Infiniband.
  • HDR100 Infiniband interconnect
  • Full 10Gb/s ethernet connectivity path from every node to Internet 2

Contact

jan.mandel@ucdenver.edu