SLURM Introduction♯
Like most HPC clusters, Picotte provides access to compute nodes via a job scheduler. The scheduler on Picotte is SLURM1.
To use SLURM, you provide some work you'd like the cluster do to (typically in the form of a shell script) and specify the resources required (e.g. "10 CPUs and 24 GiB of memory"). This is called a job. SLURM adds your job to a queue of others waiting to be scheduled, and then waits for a node with the requested resources to become available. Then, the job starts running on that node. SLURM monitors the job's status and progress, which you can check on using the SLURM command-line tools as your job runs.
Working with SLURM is a fundamental skill for using Picotte.
Getting started♯
Note
This guide assumes you're familiar with the basics of command-line interfaces and the Unix shell. We highly recommend working through our Introduction to the Unix shell workshop before proceeding.
You use SLURM via its command line tools, which all start with the letter s
. First, make sure you're connected to the Picotte login node.
To start, let's use sinfo
to check on the cluster status:
$ sinfo
The output is a table showing all compute nodes, grouped by their partition2 and status:
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
bm up 21-00:00:0 2 idle bigmem[001-002]
def* up 2-00:00:00 4 down* node[011,047,065,073]
def* up 2-00:00:00 59 mix node[001-010,012-038,040,051-064]
...
This output tells us that the bm
partition has two nodes, both of which are
idle. The def
partition has a few nodes down, but most are in a "mixed" state,
which means they have some resources allocated to running jobs, but also some
resources available for new jobs.
Look for the def-sm
partition and make sure there are some nodes in the idle
or mix
state. If there are, the cluster has capacity to run your new jobs. If
not, your jobs may have to wait in the queue for a while.
You can learn more about sinfo output and node states here.
Running an interactive job♯
Now let's run our first job using srun
.
Execute the following command, replacing SLURM_FREE_PRJ
with the name of your
group's free-tier SLURM account3. Your research group will typically have two
accounts, one for free tier jobs and one for priority tier jobs. For example, if
your PI's name is Sara Zhang, these are typically called zhangprj
and
zhangfreeprj
. You use --account=zhangfreeprj
to submit free tier jobs and
--account=zhangprj
for paid tier jobs. In this case, you're submitting a
free-tier job, so use the ...freeprj
account for your group4.
$ srun --partition=def-sm --account=SLURM_FREE_PRJ --pty /bin/bash
When you run it, you should see something like this:
[jjp366@picotte001 ~]$ srun --partition=def-sm --account=testprj --pty /bin/bash
srun: job 11702109 queued and waiting for resources
srun: job 11702109 has been allocated resources
[jjp366@node038 ~]$
The job is first added to the queue, then allocated resources, then run. You
specified the command /bin/bash
in the arguments to srun
—this just starts a
new shell. You also passed the --pty
argument, which connects the input and
output of our terminal to our job once its run. So you're now connected to a new
shell on the compute node our job was assigned, which in this case is
node038
5.
Here's a breakdown of the above command and what each part means:
srun
-
Command to launch a single interactive job on a compute node.
--partition=def-sm
-
The
partition
argument specifies which "partition" the job should run in. All job requests in SLURM must specify a partition. You can think of each partition as it's own job queue. Partitions are mostly used to make logical group nodes (e.g. "GPU nodes" or "bib memory" nodes), but are also used to configure access and billing. For example, thedef
anddef-sm
partitions are identical, except thatdef-sm
(the free tier partition) has time and resource limits but not billing, whiledef
is configured with much higher limits, but bills for usage. In this case, you're submitting todef-sm
, the default free tier partition. You can read more about the partitions available on Picotte here. --account=...
-
The
--account
argument specifies which SLURM "bank account"3 you want to use for this job—where it will be billed to, essentially. Your research group will typically have two accounts, one for free tier jobs and one for priority tier jobs. Some groups have more (e.g. to separate jobs that need to be billed to different grants). All jobs request must specify an account6. --pty
-
This stands for "pseudo-terminal", and allows you to interact with the job after it starts as if you were logged into the assigned node.
/bin/bash
-
The last argument is the command you want to run. In this case, its
/bin/bash
which starts a new terminal, but you can run any command you want. For example, trylscpu
which prints out information about the CPU on the assigned node. You can run anything you want as long as it will run on the Linux command line—a Python script you wrote, open-source bioinformatics tools, commercial structural analysis software—the sky's the limit.
Running batch jobs♯
Running an interactive job with srun
blocks your terminal and directly prints
the output of the job there. That's useful for testing and debugging, but it
means you can only run one job a time, and you have to stay at your terminal
until your job finishes, which isn't very useful.
To run jobs that take hours or days to complete, or to run multiple jobs at
once, you use the sbatch
command
rather than srun
. This allows you to submit jobs that can run in the
background without requiring your terminal to stay connected. These are called
"batch jobs", hence the name sbatch
. Most of your job submissions
will typically use this command.
To use sbatch
, you write a bash script using special comments to pass
arguments. Create a new file, test_job.sh
and write the following to it:
#!/bin/bash
#SBATCH --account=YOUR_ACCOUNT_NAME
#SBATCH --partition=def-sm
hostname
This is a script that uses the hostname
command to print the name of the node
that the job runs on. Replace YOUR_ACCOUNT_NAME
with your research group's
free tier account.
The #SBATCH
lines pass arguments to the scheduler. They have the same meaning
here as they do when passing them to srun
on the command line.
Now, submit this job:
$ sbatch test_job.sh
You should see output similar to:
[jjp366@picotte001 ~]$ sbatch test_job.sh
Submitted batch job 11702135
[jjp366@picotte001 ~]$
Notice that you're immediately back at the terminal on the login node, rather
than on a compute node. Your job is running on compute node, but your terminal
isn't connected to the output like it was when using srun
. You can do other
work, or disconnect, and your job will continue running.
Also notice the output Submitted batch job 11702135
. You can check the status
of your job by passing that number (the job ID) to the
scontrol
command. For example:
[jjp366@picotte001 ~]$ scontrol show job 11702136
JobId=11702136 JobName=test_job.sh
UserId=jjp366(2451) GroupId=jjp366(2462) MCS_label=N/A
Priority=6369 Nice=0 Account=urcfadmprj QOS=normal WCKey=*
JobState=COMPLETED Reason=None Dependency=(null)
Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
RunTime=00:00:00 TimeLimit=00:30:00 TimeMin=N/A
SubmitTime=2025-05-28T17:15:42 EligibleTime=2025-05-28T17:15:42
AccrueTime=2025-05-28T17:15:42
StartTime=2025-05-28T17:15:42 EndTime=2025-05-28T17:15:42 Deadline=N/A
SuspendTime=None SecsPreSuspend=0 LastSchedEval=2025-05-28T17:15:42 Scheduler=Main
Partition=def-sm AllocNode:Sid=picotte001:14229
ReqNodeList=(null) ExcNodeList=(null)
NodeList=node043
BatchHost=node043
NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
TRES=cpu=1,mem=4000M,node=1,billing=1
Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
MinCPUsNode=1 MinMemoryCPU=4000M MinTmpDiskNode=0
Features=(null) DelayBoot=00:00:00
OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
Command=/home/jjp366/test_job.sh
WorkDir=/home/jjp366
StdErr=/home/jjp366/slurm-11702136.out
StdIn=/dev/null
StdOut=/home/jjp366/slurm-11702136.out
Power=
There are a lot of details here, but the ones that matter right now are
JobState=COMPLETED
and StdOut=/home/jjp366/slurm-11702136.out
. This means
the job finished running (which makes sense, all it did is run the hostname
command, which takes less than a second), and that the output was saved to the
file slurm-11702136.out
.
Examining this file:
[jjp366@picottemgmt ~]$ cat slurm-11702136.out
node043
You can see that the hostname
command output node043
, as that's the node
this job ran on.
More realistic batch jobs♯
The examples above are much more minimal than the typical batch job. Features not covered in the above that you'll need to use in typical batch jobs include:
- Loading dependencies using Environment Modules, or language-specific systems like Python's virtual environments.
- Passing additional
#SBATCH
arguments to request more resources (CPUs, GPUs, memory, multiple nodes). You can read about these parameters on the Writing SLURM Job Scripts page. - Using job arrays to easily parallelize work.
See also♯
-
originally an acronym for "Simple Linux Utility for Resource Management". ↩
-
A partition is a group of nodes with similar characteristics. For example
def
is the default partition, and consists of all ordinary CPU nodes.gpu
is all the nodes with GPUs;bm
nodes have 1.5TB of memory. Nodes can be a part of more than one partition. For example, thelong
partition contains the exact same set of nodes asdef
, it just has a longer job time limit. ↩ -
These are SLURM "bank accounts", not your user account that you use to log in use to log in to Picotte. The naming is very confusing. An "account" in this sense is an abstraction that SLURM uses to keep track of where a job should be billed. They're sometimes also called "projects". Even if your group has many users, they'll typically be submitting jobs to one or two accounts. ↩↩
-
Typically this will start with your PI's last name, but your PI can choose whatever name they want when setting this up, so it might be something different. You can see all the accounts you have access to by running the command:
sacctmgr show user $USER withassoc
. ↩ -
When you run this command, you'll likely see a different node. SLURM schedules your job onto whatever node has the available resources to host it, so it won't always be the same. ↩
-
While all jobs must specify an account, you don't necessarily always have to pass the
--account
argument, because you can set up a default account, which SLURM will use if you don't specify a different one. You can see your default account by running the commandsacctmgr show user $USER
(look forDef Acct
in the output). You can set your default account using a command likesacctmgr modify user where name=$USER set defaultaccount=NEW_DEFAULT_ACCOUNT
, replacingNEW_DEFAULT_ACCOUNT
with the account you want to use. ↩