Memory¶

Computer RAM (Random Access Memory) is short-term working memory that stores the data and instructions programs are actively using, allowing the CPU to access them quickly. RAM works in the same way on the HPC cluster as it does on laptops and desktops.

This page describes the different methods for requesting memory in Slurm jobs. See this blog post for more information about Virtual and Physical memory.

Memory requests in Slurm¶

All jobs must request an amount of RAM. See the below sections for information about the various methods of requesting memory in different job types.

Jobs which exceed the requested memory will be automatically killed by the scheduler with an exit state of OUT_OF_MEMORY.

Serial jobs¶

For non-parallel jobs, there are two different ways to request memory, either per-task (core), or per-job (total).

Each partition sets a different default memory request, so we advise to explicitly request memory using one of the below methods in all jobs.

Compute jobs requiring large amounts of RAM

If you have compute jobs with very large RAM requirements, you may want to make use of our public highmem nodes by submitting to the highmem partition, rather than compute. See the high memory jobs page for more information.

Requesting exclusive use of a node

If your job requires a whole node, request all the available memory with --mem=0, and all the CPUs (and GPUs if applicable) with --exclusive.

Note that the queueing time for a whole node is likely to be considerably longer than with small jobs, as the scheduler needs to allocate all resources from a single node, which will not be available if smaller jobs are consuming resources.

As exclusive jobs block other jobs from running on the same node concurrently, please ensure the jobs are utilising all resources correctly.

See the single node jobs page for example serial job scripts.

Per-task (core)¶

The per-task (per-core) approach multiplies the --mem-per-cpu value by the number of tasks (cores) requested. For example, requesting 2 tasks (cores) and 5G per task (core) would result in 10GB being requested for the job:

## Request 2 tasks/cores
#SBATCH -n 2                # (or --ntasks=2)

## Request 1GB per task/core
#SBATCH --mem-per-cpu=1G

Per job (total)¶

The per-job approach simply requests the total amount of memory required by the job, regardless of the number of tasks (cores) requested. For example, to request 10G for the job:

#SBATCH --mem=10G           # Request 10GB for the job

Parallel jobs¶

The primary resource on the parallel nodes is the high-throughput, low-latency interconnect known as InfiniBand. All nodes in the parallel partition are connected via an InfiniBand switch, which can significantly improve the performance of large, multi-node parallel jobs.

To achieve this performance benefit, only a single job is allowed to run on each parallel node. As a result, users should request all available CPUs and memory when submitting jobs to the parallel partition.

All jobs submitted to the parallel partition must include the following:

#SBATCH --exclusive
#SBATCH --mem=0

See the multi-node jobs page for example parallel job scripts.