Guide to CCM Clusters
Logging in
How to log in
Your account must be set up before using the system. Please go to accounts for more about getting an account.
To access the Alderaan cluster, you need to be on the CU Denver or CU Anschutz private network. This can be done in several ways:
- On campus wired network.
- On campus secure wireless network (not the guest network)
- Through CU Denver or CU Anschutz by GlobalProtect VPN. Windows Subsystem for Linux (WSL) is not recommended with VPN, as GlobalProtect sometimes does not forward connections properly from WSL, even though it works for other Windows applications.
- Through the Windows remote desktop https://remote.ucdenver.edu After logging in, click on "Complimentary". It is recommended to download and use the VMware Horizon app instead of continuing in the browser.
To log in, use your university username (not email) and password. If you do not know your username, you can retrieve it at https://myaccount.ucdenver.edu/forgot-username.
We currently offer three ways how to access Alderaan cluster:
-
Perhaps the easiest way to log in is to use JupyterHub. Simply go to https://math-alderaan.ucdenver.pvt and log in, which will give you a web page with a file navigation tree and one or more terminal windows. You can also run Python notebooks, edit files, and more. Your session will run safely on some a compute node as a batch job.
-
The Remote Desktop will give you a Linux desktop with the ability to open terminal windows and work with graphical software such as Matlab or R directly. Just create a remote PC (also called connection) named math-alderaan in Windows, or on macOS in Windows App (previously Microsoft Remote Desktop). See the Remote Desktop chapter for more details.
-
Secure Shell (SSH) is a classical way to access Alderaan from the command line. Type ssh username@math-alderaan.ucdenver.pvt in a terminal window on Linux or Mac, or in a Powershell window on Windows 10 or 11 (Press Windows button, search box opens, type
shell, selectPowershell). If you omit the username, your computer may send your local username instead, which may not work. You can use the .ssh/config file to fill the correct username and to automate connections. For more convenience and security, you can set up passwordless ssh from Linux as well as from Windows.
SSH also allows you to transfer files via scp, sftp, or rsync commands.
Alderaan cluster runs Centos 8.
Interactive use limitations
Using a server ‘interactively’ (a.k.a. not scheduling a job) is often needed for troubleshooting a job or just watching what it is doing in real time. After SSH’ing into a head node, start an interactive Slurm job as described in the Interactive jobs section below, or use JupyterHub, where your session will run as a Slurm job for up to a week.
Please do not run anything computationally intensive on the head math-alderaan. This can destabilize important services which keep the cluster running,and make the experience worse for everyione.
Please do not run anything directly on compute nodes without a reservation. They are reserved for jobs under the control of the Slurm scheduler, even if you may be able to ssh there. These are nodes with names like math-alderaan-c01 with something else than "i" before the number. Using compute nodes, where other people run jobs through the scheduler, will interfere with their work and make you very unpopular. It is OK to ssh to a compute node to check on your job, but don't run anything there.
Screen virtual terminal in interactive usage
If you use screen, if you get disconnected, whatever you were running is still going and you can connect to it later. This is called a virtual terminal session. It is generally a good idea to use screen on math-alderaan only.
Typing screen creates a new terminal session. You can give it a name you want to juggle more sessions, by screen -S 'name' (make the name whatever you want).
If you want to disconnect from the session but leave it running, hit the combination of Control-A and press the D key to disconnect. Control-A is the combo to let screen know you want to do an action.
When you want to reconnect to your screen session later, log back onto wherever you started the screen and type screen –r. If you have more than one screen, it’ll complain and tell you the screens you have available to reconnect to. Type screen –r 'name’ to reconnect to that screen.
You can't just scroll in screen to see your terminal history as you normally would.
Press Control-A and then Esc and scrolling up and down will work temporarily the usual way. When you type anything, screen will leave the scrolling model.
File Storage
You are responsible for keeping copies of your important files elsewhere. Files and entire filesystems can be lost.
The home directories are on a shared file server and linked as /home/username. Everyone can have also
a project directory. The legacy project directories are /storage/department/projects/username
(where department may be one of many departments who use this system). New project directories are currently
created as /data001/projects/username instead. The location of the project directory is emailed to the user
when the directory is created as a part of setting up a new account.
In addition, groups can request shared project directories also in /storage/department/projects or /data001/projects.
The difference between project and home directories is that home directories are backed up occasionally (if not too large) while project directories and too large home directories are not backed up. Please keep your home directory small to make the backups possible. Please be aware that even if disk space is large and currently not restricted, it is finite.
Please monitor the usage of the partition you are on by
df -h .
and if it nearing full check you you do not use more space than you are aware of by df -h. If you need a lot of data storage, please contact us before filling everything you can find.
On Alderaan, you can make your own directory in /scratch.
When /scratch starts filling up or the space is needed for the system, oldest files will be purged automatically.
Do not keep any confidential or sensitive files on this system. We are not equipped for the level of security this would take. In particular, proprietary data, health records, grades, social security numbers, or data which have to comply with any law or regulation are not allowed.
If you use ssh keys to connect elsewhere from this system (such as github or another computer account), it is highly recommended to make an ssh key with a passcode for that. Otherwise, the security of the account you are connecting to is only as good as the read protection of your files here, which is not much.
Files and directories including your home directory are created with permissions which allow anyone to read them but not
write. This is Linux default to encourage collaboration. If you want to keep a file or directory private, you need to change the permissions yourself.
Type chmod og-rwx file_or_directory_name to make the file or directory not accessible by others (except system administrators, who can access anything).
Where is the software? Modules and Singularity containers
Use the dedicated guides for software environments:
- Environment Modules for system-provided software.
- Singularity Containers for containerized software stacks.
- Conda and Python for user-managed environments.
Quick module commands:
module avail
module load <module>
module purge
If you need older software versions, contact support. Availability depends on whether compatible dependencies can still be built or obtained.
Installing your own software packages
Install user packages in your own home or project directories, not system locations.
For Python environments and package management, use Conda and Python.
If you use R, configure a personal R library path in your home directory so package installs do not interfere with other tools.
File Transfer
Recommended options:
- Globus File Transfer for large unattended transfers.
rsync,scp, orsftpfor command-line transfers.git clonefor source repositories.wget <URL>for direct downloads.
Examples:
scp ~/Desktop/file.txt username@math-alderaan.ucdenver.pvt:/home/username/
scp username@math-alderaan.ucdenver.pvt:/home/username/file.txt ~/Desktop/
Requesting Information about the Environment
Queues
Jobs are submitted to compute nodes through the scheduler. To see the queues (called "partitions") on the scheduler, type
$ sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
math-alderaan up 7-00:00:00 2 mix math-alderaan-c[08,16]
math-alderaan up 7-00:00:00 25 alloc math-alderaan-c[01-06,09-15,17,21-29,31-32]
math-alderaan up 7-00:00:00 3 idle math-alderaan-c[18-20]
math-alderaan-short up 1-00:00:00 2 mix math-alderaan-c[08,16]
math-alderaan-short up 1-00:00:00 25 alloc math-alderaan-c[01-06,09-15,17,21-29,31-32]
math-alderaan-short up 1-00:00:00 3 idle math-alderaan-c[18-20]
jupyter up 7-00:00:00 2 mix math-alderaan-c[08,16]
jupyter up 7-00:00:00 25 alloc math-alderaan-c[01-06,09-15,17,21-29,31-32]
jupyter up 7-00:00:00 3 idle math-alderaan-c[18-20]
system_test up 7-00:00:00 1 idle math-alderaan-c30
math-alderaan-gpu down 7-00:00:00 1 drain math-alderaan-h01
math-alderaan-gpu-short up 1-00:00:00 2 mix math-alderaan-h[01-02]
math-alderaan-gpu-quick up 2:00:00 1 mix math-alderaan-h[01-02]
Partitions with shorter runtime have higher priority.
Nodes
To see a list of all nodes, use:
$ sinfo -N
sinfo -N
NODELIST NODES PARTITION STATE
math-alderaan-c01 1 math-alderaan-short alloc
math-alderaan-c01 1 jupyter alloc
math-alderaan-c01 1 math-alderaan alloc
math-alderaan-c02 1 math-alderaan-short alloc
math-alderaan-c02 1 jupyter alloc
math-alderaan-c02 1 math-alderaan alloc
math-alderaan-c03 1 math-alderaan-short alloc
math-alderaan-c03 1 jupyter alloc
math-alderaan-c03 1 math-alderaan alloc
math-alderaan-c04 1 math-alderaan-short alloc
math-alderaan-c04 1 jupyter alloc
math-alderaan-c04 1 math-alderaan alloc
math-alderaan-c05 1 math-alderaan-short alloc
math-alderaan-c05 1 jupyter alloc
math-alderaan-c05 1 math-alderaan alloc
math-alderaan-c06 1 math-alderaan-short alloc
math-alderaan-c06 1 jupyter alloc
math-alderaan-c06 1 math-alderaan alloc
math-alderaan-c08 1 math-alderaan-short alloc
math-alderaan-c08 1 jupyter alloc
math-alderaan-c08 1 math-alderaan alloc
math-alderaan-c09 1 math-alderaan-short alloc
math-alderaan-c09 1 jupyter alloc
math-alderaan-c09 1 math-alderaan alloc
math-alderaan-c10 1 math-alderaan-short alloc
math-alderaan-c10 1 jupyter alloc
math-alderaan-c10 1 math-alderaan alloc
math-alderaan-c11 1 math-alderaan-short alloc
math-alderaan-c11 1 jupyter alloc
math-alderaan-c11 1 math-alderaan alloc
math-alderaan-c12 1 math-alderaan-short alloc
math-alderaan-c12 1 jupyter alloc
math-alderaan-c12 1 math-alderaan alloc
math-alderaan-c13 1 math-alderaan-short alloc
math-alderaan-c13 1 jupyter alloc
math-alderaan-c13 1 math-alderaan alloc
math-alderaan-c14 1 math-alderaan-short alloc
math-alderaan-c14 1 jupyter alloc
math-alderaan-c14 1 math-alderaan alloc
math-alderaan-c15 1 math-alderaan-short alloc
math-alderaan-c15 1 jupyter alloc
math-alderaan-c15 1 math-alderaan alloc
math-alderaan-c16 1 math-alderaan-short mix
math-alderaan-c16 1 jupyter mix
math-alderaan-c16 1 math-alderaan mix
math-alderaan-c17 1 math-alderaan-short alloc
math-alderaan-c17 1 jupyter alloc
math-alderaan-c17 1 math-alderaan alloc
math-alderaan-c18 1 math-alderaan-short idle
math-alderaan-c18 1 jupyter idle
math-alderaan-c18 1 math-alderaan idle
math-alderaan-c19 1 math-alderaan-short idle
math-alderaan-c19 1 jupyter idle
math-alderaan-c19 1 math-alderaan idle
math-alderaan-c20 1 math-alderaan-short idle
math-alderaan-c20 1 jupyter idle
math-alderaan-c20 1 math-alderaan idle
math-alderaan-c21 1 math-alderaan-short alloc
math-alderaan-c21 1 jupyter alloc
math-alderaan-c21 1 math-alderaan alloc
math-alderaan-c22 1 math-alderaan-short alloc
math-alderaan-c22 1 jupyter alloc
math-alderaan-c22 1 math-alderaan alloc
math-alderaan-c23 1 math-alderaan-short alloc
math-alderaan-c23 1 jupyter alloc
math-alderaan-c23 1 math-alderaan alloc
math-alderaan-c24 1 math-alderaan-short alloc
math-alderaan-c24 1 jupyter alloc
math-alderaan-c24 1 math-alderaan alloc
math-alderaan-c25 1 math-alderaan-short alloc
math-alderaan-c25 1 jupyter alloc
math-alderaan-c25 1 math-alderaan alloc
math-alderaan-c26 1 math-alderaan-short alloc
math-alderaan-c26 1 jupyter alloc
math-alderaan-c26 1 math-alderaan alloc
math-alderaan-c27 1 math-alderaan-short alloc
math-alderaan-c27 1 jupyter alloc
math-alderaan-c27 1 math-alderaan alloc
math-alderaan-c28 1 math-alderaan-short alloc
math-alderaan-c28 1 jupyter alloc
math-alderaan-c28 1 math-alderaan alloc
math-alderaan-c29 1 math-alderaan-short alloc
math-alderaan-c29 1 jupyter alloc
math-alderaan-c29 1 math-alderaan alloc
math-alderaan-c30 1 system_test idle
math-alderaan-c31 1 math-alderaan-short alloc
math-alderaan-c31 1 jupyter alloc
math-alderaan-c31 1 math-alderaan alloc
math-alderaan-c32 1 math-alderaan-short alloc
math-alderaan-c32 1 jupyter alloc
math-alderaan-c32 1 math-alderaan alloc
math-alderaan-h01 1 math-alderaan-gpu-short drain
math-alderaan-h01 1 math-alderaan-gpu drain
math-alderaan-h02 1 math-alderaan-gpu-short drain
Nodes math-alderaan-c01 to math-alderaan-c32 are compute nodes. Nodes math-alderaan-h01 and math-alderaan-h02 are high memory GPU nodes. Again, never ssh to nodes directly to work on them, only to monitor your allocated jobs.
Submitting Jobs to the Scheduler
Submitting a job
The sbatch job_script command is used to submit a job into a queue. Your job starts executing in the directory where it was submitted, so submit it from a directory accessible to all compute nodes, such as a subdirectory of your home directory. You can add switches to the sbatch command, but it is recommended to make them a part of your batch script so that you do not have to do that every time. Please do not use more cores than the number of tasks specified in your script.
Template batch job scripts
The template batch scripts and simple examples to run are available. Get your copy by
git clone https://github.com/ccmucdenver/templates.git
To build the examples, type make in the examples directory.
Please do not request the number of nodes on Alderaan by --nodes or -N, unless you really need entire nodes for some reason. Request only the CPU cores you need by --ntasks, then the node or nodes you use can be shared with others.
SLURM Directives with Explanations
| Directive | Explanation | Options |
|---|---|---|
#SBATCH --job-name= |
Specifies a name for your job. | Use whatever naming convention makes sense to you! If you would like a suggestion: #SBATCH --job-name=job#SBATCH --output=job.out#SBATCH --error=job.err |
#SBATCH --output= |
Specifies the file to which standard output (stdout) will be redirected. | |
#SBATCH --error= |
Specifies the file to which standard error (stderror) will be redirected. | |
#SBATCH --nodes= |
Specifies the number of nodes requested for the job. | Please do not request a node unless you know you need the full node’s memory or CPU |
#SBATCH --ntasks= |
Specifies the number of tasks (processes/threads) per node. | ntasks can take a value between 1-64. Recommend: Start small (i.e., 1-5) & if jobs are running out of CPU/memory then increase the value. |
#SBATCH --partition= |
Specifies the partition or queue where the job will be submitted. | Recommend: Use CPU or GPU Alderaan partitions. CPU nodes, specify: #SBATCH --partition=math-alderaanGPU nodes, specify: #SBATCH --partition=math-alderaan-gpu |
#SBATCH --array= |
Specifies an array of job tasks with indices for array job submissions. Examples: #SBATCH --array=1-5 #SBATCH --array=0-10,20-21 |
You can specify how many array jobs to run at one time with %. Example: Run only 3 jobs at one time for 10 jobs: #SBATCH --array=1-10%3 |
How to make your job start faster
Use these practical rules to improve queue wait time.
-
Request only what you need.
- Keep
--timeclose to expected runtime. - Use
--ntasksfor cores you actually use. - Avoid
--nodesunless you really need full nodes.
- Keep
-
Choose partition and runtime strategically.
- Use shorter-runtime partitions when possible.
- Partitions with shorter runtime have higher priority.
-
Control array submission pressure.
- Limit array concurrency with
%(for example--array=1-1000%10). - Avoid flooding the queue with too many simultaneous tasks.
- Limit array concurrency with
-
Check system pressure before submitting.
sinfofor partition availability.squeuefor queue status.squeue.shandjobs-on-nodes.shfor resource detail by job and node.- News and Status Updates for current operational constraints.
-
Use these full-node rules of thumb.
- Compute nodes: requesting 64 cores or memory near 500GB is effectively a full-node job.
- High-memory GPU nodes: requesting memory near 2000GB or two GPUs is effectively a full-node job.
- If your request is effectively full-node, reducing minor settings usually will not make the job start faster. Focus on accurate runtime, partition choice, and current queue conditions.
Single-core job
This script will be sufficient for many jobs, such as those you code yourself which do not use multiprocessing.
#!/bin/bash
# A simple single core job template
#SBATCH --job-name=mpi_hello_single
#SBATCH --partition=math-alderaan
#SBATCH --time=1:00:00 # Max wall-clock time
#SBATCH --ntasks=1 # number of cores, leave at 1
examples/hello_world_fortran.exe # replace by your own executable
If you run an application that can use more cores, you can requests the number of cores in --ntask parameter instead of 1. Your allocation will be charged for the time of all cores you requested, regardless if you use them or not.
If you expect that your application will use more memory than 8GB (our nodes have 512GB memory and 64 cores each), you should request more tasks, about the expected memory usage in GB divided by 8. Otherwise the node memory may get overloaded when the machine gets busy with many jobs, and everyone's jobs may stall or crash. Note: this may change once we start allocating memory use, but at the moment we do not.
Multiple single-core jobs using arrays
#!/bin/bash
# Multiple single core jobs using array template
#SBATCH --job-name=mpi_hello_single
#SBATCH --partition=math-alderaan
#SBATCH --time=1:00:00 # Max wall-clock time
#SBATCH --ntasks=1 # number of cores, leave at 1
#SBATCH --array=1-5,10-11 # specifies to submit this script 7 times where array values are 1, 2, 3, 4, 5, 10, and 11.
examples/hello_world_fortran.exe # replace by your own executable
SLURM job arrays simplify running multiple instances of the same job script using a single batch script. The above example demonstrates submitting the 'hello_world_fortran.exe' script seven times where array values are 1, 2, 3, 4, 5, 10, and 11.
Helpful Directives/Variables:
-
%a: add the array number to naming convention.
#SBATCH --job-name=mpi_hello_single_%a -
%[insert-number]: Limit the number of array jobs to submit at a time.
#SBATCH --array=1-1000%10A SLURM array job automatically submits jobs within your allocated resources. If you wish to conserve resources for other tasks, it can be advantageous to control the number of array jobs submitted simultaneously. In the example provided above, a total of 1000 jobs are executed, with 10 jobs running concurrently at any given time.
-
SLURM_ARRAY_TASK_ID: An environment variable that holds the array value. You can use it to pass the array value to the script you intend to execute.
python example_script.py ${SLURM_ARRAY_TASK_ID}
A simple MPI job template
#!/bin/bash
# alderaan_mpi.sh
# A simple MPI job template
#SBATCH --job-name=mpi_hello
#SBATCH --partition=math-alderaan
#SBATCH --time=1:00:00 # Max wall-clock time
#SBATCH --ntasks=360 # Total number of MPI processes, no need for --nodes
mpirun examples/mpi_hello_world.exe # replace by your own executable, no need for -np
A more general MPI job template
You can request the number of nodes. The scheduler will then split the tasks over the nodes.
#!/bin/bash
# alderaan_mpi_general.sh
# A a more general MPI job template
#SBATCH --job-name=mpi_hello
#SBATCH --partition=math-alderaan
#SBATCH --nodes=2 # Number of requested nodes
#SBATCH --time=1:00:00 # Max wall-clock time
#SBATCH --ntasks=5 # Total number of tasks over all nodes, max 64*nodes
mpirun -np 10 examples/mpi_hello_world.exe # replace by your own executable and number of processors
# do not use more MPI processes than nodes*ntasks
Please do not request the number of nodes on Alderaan by --nodes or -N, unless you really need entire nodes for some reason. Request only the CPU cores you need by --ntasks, then the node or nodes you use can be shared with others.
Interactive jobs
Remember you should not directly ssh to a node because it would interfere with jobs scheduled to run on that node. For interactive access to a compute node, do instead:
srun -p math-alderaan --time=2:00:0 -n 1 --pty bash -i
This will request a session for you as a job in a single core slot on a compute node in the math-alderaan partition for up to 2 hours. After the job starts, your session is transfered to the node. The job will end when you exit or the time runs out. Of course you can do the same for other partitions and add other flags such as to request more cores or a GPU.
To start an interactive job on Alderaan with a GPU:
srun -p math-alderaan-gpu-quick --time=2:00:0 -n 1 --gres=gpu:a100:1 --pty bash -i
How to use GPU
How to run with GPU on Alderaan
The partitions
math-alderaan-gpu-short
math-alderaan-gpu
have two high memory/GPU nodesmath-alderaan-h[01,02] with two NVIDIA A-100 40GB GPUs and 2TB memory each. Use --partition=math-alderaan-gpu-short (1 day job duration maximum) with --gres=gpu:a100:1 to request one GPU and --gres=gpu:a100:2 to request two GPUs. For longer jobs, up to 7 days, you can use --partition=math-alderaan-gpu, but node availability may be limited and your job may wait longer.
Please do not use Alderaan GPUs without allocating them by --gres as above first. Please do not request an entire node on Alderaan by --nodes or -N, unless you really need all of it, request only the CPU cores you need by --ntasks. Large memory jobs and GPUs jobs can share the same node.
Minimal example job script:
#!/bin/bash
#SBATCH --job-name=gpu
#SBATCH --gres=gpu:a100:1
#SBATCH --partition=math-alderaan-gpu-short
#SBATCH --time=1:00:00 # Max wall-clock time 1 hour
#SBATCH --ntasks=1 # number of cores
nvidia-smi -L
GPU software stack and TensorFlow compatibility-container guidance are maintained in Singularity Containers.
Interactive jobs with GPU on Alderaan
From the command line,
srun -p math-alderaan-gpu-short --time=2:00:0 -n 1 --gres=gpu:a100:1 --pty bash -i
will give you an interactive shell on one of the GPU nodes with one GPU allocated.
To confirm that your job is using the GPU:
nvidia-smi
Viewing Job Queues, Job Status, and System Status
The command squeue will show one line for each
job running on the system.
The command squeue.sh will show one line for each
job running on the system with a listing of all resources requested - CPUs, memory, GPUs. jobs-on-nodes.sh shows the jobs running node by node with the resources reserved. These custom commands should help understanding the use of the resource and the reasons why jobs may wait.
The command sinfo will show a summary of jobs and partitions status on the system:
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
math-alderaan up 7-00:00:00 10 mix math-alderaan-c[01-10]
math-alderaan up 7-00:00:00 8 alloc math-alderaan-c[11-15,29,31-32]
math-alderaan up 7-00:00:00 14 idle math-alderaan-c[16-28,30]
math-alderaan-gpu up 7-00:00:00 1 drng math-alderaan-h01
math-alderaan-gpu up 7-00:00:00 1 mix math-alderaan-h02
Real-time system status including temperature, load, and the partitions from sinfo, is available in News and Status Updates.
We will be happy to install software and build containers for you, do not hesitate to ask!
Building Your Own Software
Here are the best practices when you compile and link your own software:
-
Use
math-alderaanhead node to build software for use on the Alderaan cluster. Usemodule availto see which tools are available in modules. We can add other tools and package them in modules on request. -
Alderaan nodes run Centos 8.
Linux Introduction
If you are new to Linux command-line usage, start with:
pwd,ls,cdfor navigationcp,mv,rmfor file operationscat,echo, andnanofor basic file viewing and editing
Use man <command> for detailed help, for example man rsync or man srun.