Guide to CCM Clusters
Logging in
How to log in
The main entry point to the system is math-alderaan.ucdenver.pvt
, which is similar to Alderaan nodes. You can also log in through clas-compute.ucdenver.pvt
, which is similar to nodes of the other clusters, Score and Colibri. You can submit and monitor your batch jobs to any ofthe cluster from either entry pint. The system uses CentOS 7 and 8 operating system. Alderaan cluster runs Centos 8 while the other clusters and clas-compute
run Centos 7.
At this time, the main way of using the system is to use an SSH client to login to a terminal session on math-alderaan or clas-compute. You will need to be on the CU Denver private network (wired or CU Denver wireless, not CU Denver Guest). To connect from the internet, you need to use the university's VPN or remote access (click "Complimentary" to start Windows, then the Windows icon and search for Powershell). It is highly recommended to download and use the VMware Horizon app instead of continuing the browser. Either way, log in and click on the "Complimentary" button, which will give you a Windows virtual machine on the campus network. Then open a Powershell window (Windows button, search box opens, type shell
, select Powershell
). Type ssh math-alderaan
in the Powershell window. See here for more on the remote client.
This system uses your normal portal/email username and password, but your account must be set up before using the system. Please go to accounts to request an account; if you are a student, the faculty project lead should request your account.
On Linux or a Mac, you can use simply the Terminal app, which is built into the operating system. It is hidden away in Applications -> Utilities folder on a Mac and in similar places on various Linux desktops. You may want to drag it to your dock (on a Mac) or the desktop (on Linux) so that it is available more conveniently next time.
Current Windows 10/11 has a native ssh client - just type ssh
in a terminal window (also called powershell window or command window). The ssh client also has scp
and sftp
for file transfer.
You can also use the Windows Subsystem for Linux (SSL), where you install a Linux distribution as an app and can use it to ssh out like from a terminal window on any Linux machine. Note that the SSL may not work with VPN.
Either way, from a terminal window, at the command line prompt type in:
ssh username@math-alderaan.ucdenver.pvt
or
ssh username@clas-compute.ucdenver.pvt
The username is your account name, a single short word which you can use to log into the university portal instead of email address, not the firstname.lastname in your university email. Contact us or OIT helpdesk at ucd-oit-helpdesk@cuanschutz.edu if you do not know what your account name is.
After connecting, ssh should ask for your CU Denver password and you enter it at this point. You should be then at the math-alderaan
or clas-compute
prompt and in your home directory, which is /home/username
.
Interactive use limitations
Using a server ‘interactively’ (aka not scheduling a job) is often needed for troubleshooting a job or just watching what it is doing in real time. After SSH’ing into a head node, you can type ssh math-colibri-i01
or whatever server you want to go to directly.
Please do not run anything directly on compute nodes, which are reserved for jobs under the control of the scheduler, even if you may be able to ssh there. These are nodes with names like math-alderaan-c01 with something else than "i" before the number. Using compute nodes, where other people run jobs through the scheduler, will interfere with their work and make you very unpopular. It is OK to ssh to a compute node to check on your job, but don't run anything there.
Screen virtual terminal in interactive usage
If you use screen
, if you get disconnected, whatever you were running is still going and you can connect to it later. This is called a virtual terminal session. It is generally a good idea to use screen
on math-compute, math-alderaan, or on the interactive nodes.
Typing screen
creates a new terminal session. You can give it a name you want to juggle more sessions, by screen -S 'name'
(make the name whatever you want).
If you want to disconnect from the session but leave it running, hit the combination of Control-A and press the D key to disconnect. Control-A is the combo to let screen know you want to do an action.
When you want to reconnect to your screen session later, log back onto wherever you started the screen and type screen –r
. If you have more than one screen, it’ll complain and tell you the screens you have available to reconnect to. Type screen –r 'name’
to reconnect to that screen.
You can't just scroll in screen
to see your terminal history as you normally would.
Press Control-A and then Esc and scrolling up and down will work temporarily the usual way. When you type anything, screen
will leave the scrolling model.
File Storage
You are responsible for keeping copies of your important files elsewhere. Files and entire filesystems can be lost.
The home directories are on a shared file server and linked as /home/username
. Everyone can have also
a project directory /storage/department/projects/username
(where department may be one of many departments who use this system). Users from some departments have
project directories created in /data001/projects/username
instead. These are currently accessible from alderaan nodes only,
i.e., not accessible from clas-compute and math-colibri or math-score nodes. The location of project directory is emailed to the user
when the directory is created, usualy as a part of onboarding.
The difference betwen project and home directories is that home directories are backed up occasionally while project directories are not. Please keep your home directory small to make the backups possible.
In addition, groups can request shared project directories also in /storage/department/projects' or
/data001/projects`.
Please monitor the usage of the partition you are on by
df -h .
and if it nearing full check you you do not use more space than you are aware of by df -h
. If you need a lot of data storage, please contact us before filling everything you can find.
On Alderaan only, you can make your own directory in /scratch
, which is on a large fast filesystem.
When /scratch
starts filling up, oldest files will be purged automatically.
Do not keep any confidential or sensitive files on this system. We are not equipped for the level of security this would take. In particular, no proprietary data, health records, grades, social security numbers, and like are allowed. If you use ssh keys to connect elsewhere from this system (such as github or another computer account), it is highly recommended to make an ssh key with a passcode for that. Otherwise, the security of the account you are connecting to is only as good as the read protection of your files here.
Files and directories including your home directory are created with permissions which allow anyone to read them but not
write. This is Linux default to encourage collaboration. If you want to keep a file or directory private, you need to change the permissions yourself.
Type chmod og-rwx file_or_directory_name
to make the file or directory not accessible by others (except system administrators, of course).
Where is the software? Modules and Singularity containers
We normally do not install application software directly on the system because of software and version conflicts. Instead, we install sofware in modules or singularity containers.
Modules
To see what software is available in modules,
module avail
provide a list of available software packages and their versions. A command module load modulename/version
will change your environment (such as the PATH
variable) temporarily so that the software and its various parts can be found, for example
module load R/4.2.1
or
module load matlab
Try that and use env
command to see what has changed.
You may need to load multiple modules at the same time. When you are done with a module you can module unload
it but it is strongly recommended to do
module purge
and start over loading exactly the modules you need, or simply log out and back in again.
Installed software and environment modules on our different clusters are generally different. See modules for more information.
Singularity containers
A singularity container is a bit like a separate computer in itself which just runs on the same hardware. Thus, software in different containers won't conflict, and a container can provide a complete environment including a different operating system, libraries, etc. A disadvantage, however, is that you can use only the software installed in the container; software on the system outside of the container is not visible from the inside. Containers are read only and cannot be changed. An exception is that some package managers, like conda, may allow installing software while you are inside the container. Additions made by conda this way actually reside in directory .local
in your home directory.
Using singularity is easy. Type, for example,
singularity shell /storage/singularity/tensorflow
python3
and you can use many Python packages for machine learning. We have containers with statistics software, optimization, molecular chemistry, optimization, and more. See Singularity for more details and list of software in our containers.
Old software versions
Sometimes, you may need a specific version of some software package from few years ago. We'll try. If the software version is not too much in the past, we may be able to install the software in a module or in a singularity container. However, installing an older or a more complicated package may require recreating an entire software ecosystem at a certain point in computer history years ago, which would be overwhelming or impossible.
File Transfer
Linux or Mac
On a Linux or Mac computer, you can use file transfer utilities rsync
, scp
, sftp
on your computer to transfer files and entire directores between your computer and clusters. These utilities are normally a part of the system, if not you can install them from your Linux distribution. Rsync is recommended. Typing man rsync
should give you the manual for the system you are on. Rsync can transfer file trees recursively and resume a transfer which was interrupted.
Windows
On current Windows PC, you can use scp
and sftp
from the command window (a.k.a. Powershell window). Current Windows 10 and 11 have OpenSSH client built in.
From a Website
You can download a file from a website using simply wget
followed by the URL of the file. You can get the URL of a file posted on the web by a right-click and selectingv something like "Copy link address".
Github
The easiest way to download files from Github is to clone the entire repository. On the repository main page, click green button "Code" and copy the link. Then
git clone <the link you just copied>
You can use https link if you want to clone the repository for reading only. If you want to push your changes to Github in future, you need to use ssh. It is strongly recommended to create a separate key secured by a strong passphrase for this. Otherwise, the security of your Github account is only as good as the protection of files here - anyone who gains administrator access here can log into your Github account.
Dropbox
coming soon
Globus
Globus is a free service which can transfer large files (many GB and TB) between servers on the internet using a simple web interface and without supervision. See the Globus section how to use Globus here.
Requesting Information about the Environment
Queues
Jobs are submitted to compute nodes through the scheduler. To see the queues (called "partitions") on the scheduler, type
$ sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
math-alderaan up 7-00:00:00 1 drain* math-alderaan-c28
math-alderaan up 7-00:00:00 4 mix math-alderaan-c[01,08-10]
math-alderaan up 7-00:00:00 25 alloc math-alderaan-c[04-07,11-27,29-32]
math-alderaan up 7-00:00:00 2 idle math-alderaan-c[02-03]
math-alderaan-gpu up 7-00:00:00 1 mix math-alderaan-h01
math-alderaan-gpu up 7-00:00:00 1 alloc math-alderaan-h02
math-colibri-gpu up infinite 24 idle math-colibri-c[01-24]
math-score up infinite 5 idle math-score-c[01-05]
chem-xenon up infinite 6 unk* chem-xenon-c[01-06]
clas-interactive up infinite 1 idle math-colibri-i02
math-alderaan-osg up 1-00:00:00 1 drain* math-alderaan-c28
math-alderaan-osg up 1-00:00:00 4 mix math-alderaan-c[01,08-10]
math-alderaan-osg up 1-00:00:00 25 alloc math-alderaan-c[04-07,11-27,29-32]
math-alderaan-osg up 1-00:00:00 2 idle math-alderaan-c[02-03]
clas-dev up infinite 1 idle clas-devnode-c01
Nodes
To see a list of all nodes, use:
$ sinfo -N
NODELIST NODES PARTITION STATE
chem-xenon-c01 1 chem-xenon unk*
chem-xenon-c02 1 chem-xenon unk*
chem-xenon-c03 1 chem-xenon unk*
chem-xenon-c04 1 chem-xenon unk*
chem-xenon-c05 1 chem-xenon unk*
chem-xenon-c06 1 chem-xenon unk*
clas-rcdesktop-01 1 clas-rcdesktop down*
math-alderaan-c01 1 math-alderaan alloc
math-alderaan-c02 1 math-alderaan alloc
math-alderaan-c03 1 math-alderaan alloc
math-alderaan-c04 1 math-alderaan alloc
math-alderaan-c05 1 math-alderaan alloc
math-alderaan-c06 1 math-alderaan alloc
math-alderaan-c07 1 math-alderaan alloc
math-alderaan-c08 1 math-alderaan alloc
math-alderaan-c09 1 math-alderaan alloc
math-alderaan-c10 1 math-alderaan alloc
math-alderaan-c11 1 math-alderaan alloc
math-alderaan-c12 1 math-alderaan alloc
math-alderaan-c13 1 math-alderaan alloc
math-alderaan-c14 1 math-alderaan alloc
math-alderaan-c15 1 math-alderaan alloc
math-alderaan-c16 1 math-alderaan mix
math-alderaan-c17 1 math-alderaan idle
math-alderaan-c18 1 math-alderaan idle
math-alderaan-c19 1 math-alderaan idle
math-alderaan-c20 1 math-alderaan idle
math-alderaan-c21 1 math-alderaan idle
math-alderaan-c22 1 math-alderaan idle
math-alderaan-c23 1 math-alderaan idle
math-alderaan-c24 1 math-alderaan idle
math-alderaan-c25 1 math-alderaan idle
math-alderaan-c26 1 math-alderaan idle
math-alderaan-c27 1 math-alderaan idle
math-alderaan-c28 1 math-alderaan idle
math-alderaan-c29 1 math-alderaan idle
math-alderaan-c30 1 math-alderaan idle
math-alderaan-c31 1 math-alderaan idle
math-alderaan-c32 1 math-alderaan idle
math-alderaan-h01 1 math-alderaan-gpu idle
math-alderaan-h02 1 math-alderaan-gpu idle
math-colibri-c01 1 math-colibri-gpu idle
math-colibri-c02 1 math-colibri-gpu idle
math-colibri-c03 1 math-colibri-gpu idle
math-colibri-c04 1 math-colibri-gpu unk*
math-colibri-c05 1 math-colibri-gpu unk*
math-colibri-c06 1 math-colibri-gpu unk*
math-colibri-c07 1 math-colibri-gpu unk*
math-colibri-c08 1 math-colibri-gpu unk*
math-colibri-c09 1 math-colibri-gpu unk*
math-colibri-c10 1 math-colibri-gpu unk*
math-colibri-c11 1 math-colibri-gpu unk*
math-colibri-c12 1 math-colibri-gpu unk*
math-colibri-c13 1 math-colibri-gpu idle
math-colibri-c14 1 math-colibri-gpu idle
math-colibri-c15 1 math-colibri-gpu idle
math-colibri-c16 1 math-colibri-gpu idle
math-colibri-c17 1 math-colibri-gpu idle
math-colibri-c18 1 math-colibri-gpu idle
math-colibri-c19 1 math-colibri-gpu idle
math-colibri-c20 1 math-colibri-gpu idle
math-colibri-c21 1 math-colibri-gpu idle
math-colibri-c22 1 math-colibri-gpu idle
math-colibri-c23 1 math-colibri-gpu idle
math-colibri-c24 1 math-colibri-gpu idle
math-score-c01 1 math-score unk*
math-score-c02 1 math-score unk*
math-score-c03 1 math-score idle
math-score-c04 1 math-score idle
math-score-c05 1 math-score idle
It looks confusing but there is a method to the madness in the naming convention. Obviously, math-colibri and math-score are the identifiers for what cluster/building the servers are in, but the –c## and –i## stand for compute and interactive. The c## servers are usually part of the queuing system and the i## ones are for interactive use. Again, never ssh to compute nodes directly.
Submitting Jobs to the Scheduler
Submitting a job
The sbatch job_script
command is used to submit a job into a queue. Your job starts executing in the directory where it was submitted, so submit it from a directory accessible to all compute nodes, such as a subdirectory of your home directory. You can add switches to the sbatch
command, but it is recommended to make them a part of your batch script so that you do not have to do that every time. Please do not use more cores than the number of tasks specified in your script.
Template batch job scripts
The template batch scripts and simple examples to run are available. Get your copy by
git clone https://github.com/ccmucdenver/templates.git
To build the examples, type make
in the examples
directory.
Please do not request the number of nodes on Alderaan by --nodes
or -N
, unless you really need entire nodes for some reason. Request only the CPU cores you need by --ntasks
, then the node or nodes you use can be shared with others.
Single-core job
This script will be sufficient for many jobs, such as those you code yourself which do not use multiprocessing.
#!/bin/bash
# A simple single core job template
#SBATCH --job-name=mpi_hello_single
#SBATCH --partition=math-alderaan
#SBATCH --time=1:00:00 # Max wall-clock time
#SBATCH --ntasks=1 # number of cores, leave at 1
examples/hello_world_fortran.exe # replace by your own executable
If you run an application that can use more cores, you can requests the number of cores in --ntask
parameter instead of 1. Your allocation will be charged for the time of all cores you requested, regardless if you use them or not.
If you expect that your application will use more memory than 8GB (our nodes have 512GB memory and 64 cores each), you should request more tasks, about the expected memory usage in GB divided by 8. Otherwise the node memory may get overloaded when the machine gets busy with many jobs, and everyone's jobs may stall or crash. Note: this may change once we start allocating memory use, but at the moment we do not.
Multiple single-core jobs using arrays
#!/bin/bash
# Multiple single core jobs using array template
#SBATCH --job-name=mpi_hello_single
#SBATCH --partition=math-alderaan
#SBATCH --time=1:00:00 # Max wall-clock time
#SBATCH --ntasks=1 # number of cores, leave at 1
#SBATCH --array=1-5,10-11 # specifies to submit this script 7 times where array values are 1, 2, 3, 4, 5, 10, and 11.
examples/hello_world_fortran.exe # replace by your own executable
SLURM job arrays simplify running multiple instances of the same job script using a single batch script. The above example demonstrates submitting the 'hello_world_fortran.exe' script seven times where array values are 1, 2, 3, 4, 5, 10, and 11.
Helpful Directives/Variables:
-
%a: add the array number to naming convention.
#SBATCH --job-name=mpi_hello_single_%a
-
%[insert-number]: Limit the number of array jobs to submit at a time.
#SBATCH --array=1-1000%10
A SLURM array job automatically submits jobs within your allocated resources. If you wish to conserve resources for other tasks, it can be advantageous to control the number of array jobs submitted simultaneously. In the example provided above, a total of 1000 jobs are executed, with 10 jobs running concurrently at any given time.
-
SLURM_ARRAY_TASK_ID: An environment variable that holds the array value. You can use it to pass the array value to the script you intend to execute.
python example_script.py ${SLURM_ARRAY_TASK_ID}
A simple MPI job template
#!/bin/bash
# alderaan_mpi.sh
# A simple MPI job template
#SBATCH --job-name=mpi_hello
#SBATCH --partition=math-alderaan
#SBATCH --time=1:00:00 # Max wall-clock time
#SBATCH --ntasks=360 # Total number of MPI processes, no need for --nodes
mpirun examples/mpi_hello_world.exe # replace by your own executable, no need for -np
A more general MPI job template
You can request the number of nodes. The scheduler will then split the tasks over the nodes.
#!/bin/bash
# alderaan_mpi_general.sh
# A a more general MPI job template
#SBATCH --job-name=mpi_hello
#SBATCH --partition=math-alderaan
#SBATCH --nodes=2 # Number of requested nodes
#SBATCH --time=1:00:00 # Max wall-clock time
#SBATCH --ntasks=5 # Total number of tasks over all nodes, max 64*nodes
mpirun -np 10 examples/mpi_hello_world.exe # replace by your own executable and number of processors
# do not use more MPI processes than nodes*ntasks
Please do not request the number of nodes on Alderaan by --nodes
or -N
, unless you really need entire nodes for some reason. Request only the CPU cores you need by --ntasks
, then the node or nodes you use can be shared with others.
How to use GPU
How to to run with GPU on Alderaan
The partition math-alderaan-gpu has two high memory/GPU nodesmath-alderann-h[01,02]
with two NVIDIA A-100 40GB GPUs and 2TB memory each. Use --partition=math-alderaan-gpu
with --gres=gpu:a100:1
to request one GPU and --gres=gpu:a100:2
to request two GPUs. At the moment, Alderaan does not support explicit memory allocation by the --mem flag.
Please do not use Alderaan GPUs without allocating them by --gres
as above first. Please do not request an entire node on Alderaan by --nodes
or -N
, unless you really need all of it, request only the CPU cores you need by --ntasks
. Large memory jobs and GPUs jobs can share the same node.
An example job script: #!/bin/bash #SBATCH --job-name=gpu #SBATCH --gres=gpu:a100:1 #SBATCH --partition=math-alderaan-gpu #SBATCH --time=1:00:00 # Max wall-clock time 1 day 1 hour #SBATCH --ntasks=1 # number of cores singularity exec /storage/singularity/tensorflow.sif python3 yourgpucode.py
Of course, instead of singularity you can run another GPU code on one of the GPU nodes directly. The nodes have currently installed CUDA 11.2. You will have to install tensorflow in your account yourself. A compatible version is [tensorflow 2.4.0] (https://docs.nvidia.com/deeplearning/frameworks/tensorflow-release-notes/rel_21-03.html).
It is recommended to use the tensorflow singularity container because it has updated CUDA (11.4) and a version of tensorflow compatible with the CUDA version.
Interactive jobs with GPU on Alderaan
From the command line,
srun -p math-alderaan-gpu --time=2:00:0 -n 1 --gres=gpu:a100:1 --pty bash -i
will give you an interactive shell on one of the GPU nodes with one GPU allocated. You can then start singluarity shell
singularity shell /storage/singularity/tensorflow.sif
You can also start the Singularity shell directly:
srun -p math-alderaan-gpu --time=2:00:0 -n 1 --gres=gpu:a100:1 singularity shell /storage/singularity/tensorflow.sif
will allocate one GPU, one core, and run an internactive sinularity shell.
How to run with GPUs on Colibri
To use Colibri GPUs, do not use --gres
but reserve a whole node by --nodes=1
. Singularity containers work on Colibri, but current versions of tensorflow do not support the CPUs on Colibri. You can use an older version instead:
#!/bin/bash
#SBATCH --job-name=gpu
#SBATCH --gres=gpu:a100:1
#SBATCH --partition=math-colibri-gpu
#SBATCH --time=1:00:00 # Max wall-clock time 1 day 1 hour
#SBATCH --nodes=1 # number of nodes
singularity exec /storage/singularity/tensorflow-v1.3.sif python3 yourgpucode.py
Interactive jobs
Remember you should not directly ssh to a node because it would interfere with jobs scheduled to run on that node. For interactive access to a compute node, do instead:
srun -p math-alderaan --time=2:00:0 -n 1 --pty bash -i
This will request a session for you as a job in a single core slot on a compute node in the math-alderaan partition for up to 2 hours. After the job starts, your session is transfered to the node. The job will end when you exit or the time runs out. Of course you can do the same for other partitions and add other flags such as to request more cores or a GPU.
To start an interactive job on Alderaan with a GPU:
srun -p math-alderaan-gpu --time=2:00:0 -n 1 --gres=gpu:a100:1 --pty bash -i
Viewing Job Queues, Job Status, and System Status
The command squeue
will show one line for each
job running on the system.
The command sinfo
will show a summary of jobs and partitions status on the system:
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
math-alderaan up 7-00:00:00 10 mix math-alderaan-c[01-10]
math-alderaan up 7-00:00:00 8 alloc math-alderaan-c[11-15,29,31-32]
math-alderaan up 7-00:00:00 14 idle math-alderaan-c[16-28,30]
math-alderaan-gpu up 7-00:00:00 1 drng math-alderaan-h01
math-alderaan-gpu up 7-00:00:00 1 mix math-alderaan-h02
math-colibri-gpu up infinite 24 idle math-colibri-c[01-24]
math-score up infinite 5 idle math-score-c[01-05]
chem-xenon up infinite 6 down* chem-xenon-c[01-06]
clas-interactive up infinite 1 down* math-score-i01
clas-interactive up infinite 1 idle math-colibri-i02
math-alderaan-osg up 1-00:00:00 10 mix math-alderaan-c[01-10]
math-alderaan-osg up 1-00:00:00 8 alloc math-alderaan-c[11-15,29,31-32]
math-alderaan-osg up 1-00:00:00 14 idle math-alderaan-c[16-28,30]
clas-dev up infinite 1 idle clas-devnode-c01
Real-time system status including temperature, load, and the partitions from sinfo
, is available in News and Status Updates.
We will be happy to install software and build containers for you, do not hesitate to ask!
Building Your Own Software
Here are the best practices when you compile and link your own software:
-
Use
math-alderaan
head node to build software for use on the Alderaan cluster. Usemodule avail
to see which tools are available in modules. We can add other tools and package them in modules on request. -
Use
clas-compute
ormath-colibri-i02
to build software for the Colibri cluster, andclas-compute
ormath-score-i01
for the Score cluster. You can download and build libraries and other package in your own account. -
Alderaan runs Centos 8, while
clas-compute
and Colibri and Score clusters Centos 7. Software built on one will normally not work on the other.