SLURM Introduction♯

Like most HPC clusters, Picotte provides access to compute nodes via a job scheduler. The scheduler on Picotte is SLURM¹.

To use SLURM, you provide some work you'd like the cluster do to (typically in the form of a shell script) and specify the resources required (e.g. "10 CPUs and 24 GiB of memory"). This is called a job. SLURM adds your job to a queue of others waiting to be scheduled, and then waits for a node with the requested resources to become available. Then, the job starts running on that node. SLURM monitors the job's status and progress, which you can check on using the SLURM command-line tools as your job runs.

Working with SLURM is a fundamental skill for using Picotte.

Getting started♯

Note

This guide assumes you're familiar with the basics of command-line interfaces and the Unix shell. We highly recommend working through our Introduction to the Unix shell workshop before proceeding.

You use SLURM via its command line tools, which all start with the letter s. First, make sure you're connected to the Picotte login node.

To start, let's use sinfo to check on the cluster status:

$ sinfo

The output is a table showing all compute nodes, grouped by their partition² and status:

PARTITION  AVAIL  TIMELIMIT  NODES  STATE NODELIST
bm            up 21-00:00:0      2   idle bigmem[001-002]
def*          up 2-00:00:00      4  down* node[011,047,065,073]
def*          up 2-00:00:00     59    mix node[001-010,012-038,040,051-064]
...

This output tells us that the bm partition has two nodes, both of which are idle. The def partition has a few nodes down, but most are in a "mixed" state, which means they have some resources allocated to running jobs, but also some resources available for new jobs.

Look for the def-sm partition and make sure there are some nodes in the idle or mix state. If there are, the cluster has capacity to run your new jobs. If not, your jobs may have to wait in the queue for a while.

You can learn more about sinfo output and node states here.

Running an interactive job♯

Now let's run our first job using srun.

Execute the following command, replacing SLURM_FREE_PRJ with the name of your group's free-tier SLURM account³. Your research group will typically have two accounts, one for free tier jobs and one for priority tier jobs. For example, if your PI's name is Sara Zhang, these are typically called zhangprj and zhangfreeprj. You use --account=zhangfreeprj to submit free tier jobs and --account=zhangprj for paid tier jobs. In this case, you're submitting a free-tier job, so use the ...freeprj account for your group⁴.

$ srun --partition=def-sm --account=SLURM_FREE_PRJ --pty /bin/bash

When you run it, you should see something like this:

[jjp366@picotte001 ~]$ srun --partition=def-sm --account=testprj --pty /bin/bash
srun: job 11702109 queued and waiting for resources
srun: job 11702109 has been allocated resources
[jjp366@node038 ~]$

The job is first added to the queue, then allocated resources, then run. You specified the command /bin/bash in the arguments to srun—this just starts a new shell. You also passed the --pty argument, which connects the input and output of our terminal to our job once its run. So you're now connected to a new shell on the compute node our job was assigned, which in this case is node038⁵.

Here's a breakdown of the above command and what each part means:

srun: Command to launch a single interactive job on a compute node.
--partition=def-sm: The partition argument specifies which "partition" the job should run in. All job requests in SLURM must specify a partition. You can think of each partition as it's own job queue. Partitions are mostly used to make logical group nodes (e.g. "GPU nodes" or "bib memory" nodes), but are also used to configure access and billing. For example, the def and def-sm partitions are identical, except that def-sm (the free tier partition) has time and resource limits but not billing, while def is configured with much higher limits, but bills for usage. In this case, you're submitting to def-sm, the default free tier partition. You can read more about the partitions available on Picotte here.
--account=...: The --account argument specifies which SLURM "bank account"³ you want to use for this job—where it will be billed to, essentially. Your research group will typically have two accounts, one for free tier jobs and one for priority tier jobs. Some groups have more (e.g. to separate jobs that need to be billed to different grants). All jobs request must specify an account⁶.
--pty: This stands for "pseudo-terminal", and allows you to interact with the job after it starts as if you were logged into the assigned node.
/bin/bash: The last argument is the command you want to run. In this case, its /bin/bash which starts a new terminal, but you can run any command you want. For example, try lscpu which prints out information about the CPU on the assigned node. You can run anything you want as long as it will run on the Linux command line—a Python script you wrote, open-source bioinformatics tools, commercial structural analysis software—the sky's the limit.

Running batch jobs♯

Running an interactive job with srun blocks your terminal and directly prints the output of the job there. That's useful for testing and debugging, but it means you can only run one job a time, and you have to stay at your terminal until your job finishes, which isn't very useful.

To run jobs that take hours or days to complete, or to run multiple jobs at once, you use the sbatch command rather than srun. This allows you to submit jobs that can run in the background without requiring your terminal to stay connected. These are called "batch jobs", hence the name sbatch. Most of your job submissions will typically use this command.

To use sbatch, you write a bash script using special comments to pass arguments. Create a new file, test_job.sh and write the following to it:

#!/bin/bash

#SBATCH --account=YOUR_ACCOUNT_NAME
#SBATCH --partition=def-sm

hostname

This is a script that uses the hostname command to print the name of the node that the job runs on. Replace YOUR_ACCOUNT_NAME with your research group's free tier account.

The #SBATCH lines pass arguments to the scheduler. They have the same meaning here as they do when passing them to srun on the command line.

Now, submit this job:

$ sbatch test_job.sh

You should see output similar to:

[jjp366@picotte001 ~]$ sbatch test_job.sh
Submitted batch job 11702135
[jjp366@picotte001 ~]$

Notice that you're immediately back at the terminal on the login node, rather than on a compute node. Your job is running on compute node, but your terminal isn't connected to the output like it was when using srun. You can do other work, or disconnect, and your job will continue running.

Also notice the output Submitted batch job 11702135. You can check the status of your job by passing that number (the job ID) to the scontrol command. For example:

[jjp366@picotte001 ~]$ scontrol show job 11702136
JobId=11702136 JobName=test_job.sh
   UserId=jjp366(2451) GroupId=jjp366(2462) MCS_label=N/A
   Priority=6369 Nice=0 Account=urcfadmprj QOS=normal WCKey=*
   JobState=COMPLETED Reason=None Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
   RunTime=00:00:00 TimeLimit=00:30:00 TimeMin=N/A
   SubmitTime=2025-05-28T17:15:42 EligibleTime=2025-05-28T17:15:42
   AccrueTime=2025-05-28T17:15:42
   StartTime=2025-05-28T17:15:42 EndTime=2025-05-28T17:15:42 Deadline=N/A
   SuspendTime=None SecsPreSuspend=0 LastSchedEval=2025-05-28T17:15:42 Scheduler=Main
   Partition=def-sm AllocNode:Sid=picotte001:14229
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=node043
   BatchHost=node043
   NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   TRES=cpu=1,mem=4000M,node=1,billing=1
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
   MinCPUsNode=1 MinMemoryCPU=4000M MinTmpDiskNode=0
   Features=(null) DelayBoot=00:00:00
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
   Command=/home/jjp366/test_job.sh
   WorkDir=/home/jjp366
   StdErr=/home/jjp366/slurm-11702136.out
   StdIn=/dev/null
   StdOut=/home/jjp366/slurm-11702136.out
   Power=

There are a lot of details here, but the ones that matter right now are JobState=COMPLETED and StdOut=/home/jjp366/slurm-11702136.out. This means the job finished running (which makes sense, all it did is run the hostname command, which takes less than a second), and that the output was saved to the file slurm-11702136.out.

Examining this file:

[jjp366@picottemgmt ~]$ cat slurm-11702136.out
node043

You can see that the hostname command output node043, as that's the node this job ran on.

More realistic batch jobs♯

The examples above are much more minimal than the typical batch job. Features not covered in the above that you'll need to use in typical batch jobs include:

Loading dependencies using Environment Modules, or language-specific systems like Python's virtual environments.
Passing additional #SBATCH arguments to request more resources (CPUs, GPUs, memory, multiple nodes). You can read about these parameters on the Writing SLURM Job Scripts page.
Using job arrays to easily parallelize work.