Skip to content

Slurm Glossary

Slurm[1] is the resource manager and job scheduler that runs Picotte. Picotte uses Slurm version 20.02.7.

It is a resource manager because it is aware of all resources in the cluster (CPUs, memory, GPUs, disk space, network connection), and can allocate resources to jobs which request them.

It ias a job scheduler because it decides when jobs start executing, and when to terminate jobs (if necessary).

Nodes vs. Tasks vs. CPUs vs. Cores

This section is adapted from documentation from Stanford University's Research Computing Group.[2]

tl;dr

For single-node multithreaded jobs:

#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=N

where N is replaced by any number from 1 to 48. If "cpus-per-task" is not specified, it defaults to 1. The default value 1 also applies to "nodes" and "ntasks", i.e. both the "nodes" and "ntasks" lines can be omitted in the above example.

Parts of a Node

A “node” is a computer. This glossary is simplifying for the sake of brevity.

Board A physical motherboard which contains one or more of each of Socket, Memory bus, PCI bus. (See photo of motherboard. NB this is not the motherboard used in Picotte nodes.)

Socket A physical socket on a motherboard which accepts a CPU package. (In the photo of the motherboard, the two areas surrounded by bare metal frames are the sockets.)

CPU (Central Processing Unit) The component which performs all computations and controls other hardware on the motherboard. Also colloquially called “socket”.

Core Most current CPUs contain multiple cores. A core typically executes instructions one at a time (i.e. serially). Having multiple cores allows for parallel computation on a single node.

Hyperthread A virtual CPU core. Allows a single CPU core to run more than one series (“thread”) of computations pseudo-simultaneously. This is turned off for high performance computing (HPC). For desktop computing, this is generally turned on.

Memory bus The connection (circuit board traces) between the sockets and the connectors holding the memory chips.

Memory Otherwise known as Random Access Memory (RAM). Stores information while programs run. Requires power to retain contents.

PCI (Peripheral Component Interconnect) bus The connection between the sockets and the connectors holding accessory cards, e.g. GPUs (graphical processing units), network interfaces, data acquisition boards, etc.

Storage aka disk Mass storage (hard drives, solid state drives, some mix of the two). Retains contents even while power is not applied. Can be many orders of magnitude larger than memory (RAM). Analogy: RAM is your brain, storage is your filing cabinet/bookshelf.

Graphical Processing Unit (GPU) A specialized computing device consisting of a GPU chip and its own independent memory; a bit like a separate computer on an expansion card. GPUs are specialized for performing linear algebra operations.

Slurm uses the terms core and CPU interchangeably depending on the context and Slurm command. --cpus-per-task actually specifies the number of cores per task. Linux generally does the same. For example, the load reports of uptime(1) and top(1) use “cpu-seconds”, which really means “core-seconds”. This is a historical artifact: early CPUs were all single-core.

400px|right|Photo of a server motherboard250px|right|Diagram of a motherboard 400px|right|Photo of the inside of a Dell R740 node, similar to the ones in Picotte

References

[1] Slurm 20.02.7 Documentation

[2] SCG Docs - FAQs - What are some Slurm terms? - nodes vs tasks vs cpus vs cores