Moving from Grid Engine to Slurm¶

This page is for users moving from Grid Engine to Slurm

The information below is only useful for users who have already used Grid Engine on Apocrita and want to know how to migrate their existing knowledge to Slurm. If you have never used Grid Engine before, stop reading and return to the main documentation.

The use of Slurm on Apocrita represents a significant change for users who are used to using the Univa Grid Engine (UGE) job scheduler.

Whilst UGE served us well, Slurm has been widely adopted by many other HPC sites, is under active development and has features and flexibility that we need as we introduce new platforms for the research community at the University.

This page shows Slurm commands and job script options next to their UGE counterparts to help you move from UGE to Slurm.

Job script header¶

UGE (`#$`) vs Slurm (`#SBATCH`)¶

The move from UGE to Slurm means your previous Apocrita job scripts will no longer work.

This is because the Apocrita job script header lines beginning with #$ will be ignored by Slurm. Instead, you should use lines beginning with #SBATCH, and you will need to convert the options you use on those lines from UGE to Slurm.

Watch out for $!

Note that it is #SBATCH (i.e. with the letter "S" in capitals, short for "Slurm Batch") and not #$BATCH (a dollar ("$") symbol). This is an easy mistake to make when you begin to convert your UGE job scripts. Do not use a $ (dollar) symbol in your Slurm job script header.

Examples of UGE job scripts and their equivalent Slurm job script are given below.

The commands used to submit jobs and check on the queue have also changed. See below for the equivalent commands.

Job submission and management¶

UGE (`qsub`, …) vs Slurm (`sbatch`, …)¶

UGE Commands	Slurm Commands
`# Batch job submission`	`# Batch job submission`
`qsub job_script`	`sbatch job_script`
`qsub job_script arg1 arg2 ...`	`sbatch job_script arg1 arg2 ...`
`# Job queue status`	`# Job queue status`
`qstat # Show your jobs (if any)`	`squeue --me`
`qstat -u "*" # Show all jobs`	`squeue`
`qstat -u username`	`squeue -u username`
`# Cancel (delete) a job`	`# Cancel (delete) a job`
`qdel jobid`	`scancel jobid`
`qdel jobname`	`scancel -n jobname`
`qdel jobid -t taskid`	`scancel jobid_taskid`
`qdel "*" # Delete all my jobs`	`scancel --me # Delete all my jobs`
`# Interactive job`	`# Interactive job`
`qlogin`	`salloc`
`# Completed job accounting`	`# Completed job accounting`
`qacct -j jobid`	`sacct -j jobid`

Job scripts¶

Use tasks (-n) and not cores per task (-c)

Slurm refers to CPUs as "tasks" and you should request the number of CPUs you require in most job scripts using -n or --ntasks, as per the examples below. The -c option is for number of CPUs required per task, and should only normally be used for advanced jobs, such as those combining Open MPI ranks and OpenMP threads for example.

You will need to rewrite your UGE job scripts. We advise taking a copy of any existing UGE script and naming it something like job.slurm or job.sbatch, to make it obvious it is a Slurm job script.

Put `#SBATCH` lines in one block¶

Please note: all Slurm job script header lines beginning with #SBATCH must come before ordinary lines that run job commands or your application. Any #SBATCH lines appearing after the first non-#SBATCH line will be ignored. For example:

#!/bin/bash
#SBATCH -p compute          # (or --partition=compute)
#SBATCH -n 1                # (or --ntasks=1) Request 1 core
#SBATCH -t 1:0:0            # Request 1 hour runtime
#SBATCH --mem-per-cpu=1G    # Request 1GB RAM per core

# Now the first "ordinary" line. No more #SBATCH lines would be processed if
# they were added after this

export MY_DATA=/gpfs/scratch/${USER}/data

module load app

app <arguments>

Please note that under Slurm, --mem-per-cpu must be set to an integer, i.e. for 7.5GB ram per core, you must use --mem-per-cpu=7500M and not --mem-per-cpu=7.5G.

You can also use Slurm's internal srun command to run your application/job commands as a separate Slurm job step:

srun app <arguments>

Please refer to the official Slurm documentation for more information about the srun command. We recommend most users stick to simple job scripts that don't use srun within them initially.

Serial job script (single-core)¶

Note that in Slurm you must specify one core to be safe – some job scripts will need the $SLURM_NTASKS environment variable (equivalent of UGE’s $NSLOTS variable) and Slurm only sets it if you explicitly request one core.

UGE serial job script (single-core)

#!/bin/bash
#$ -cwd # Set the working directory for the job to the current directory



#$ -pe smp 1 # Request 1 core





#$ -l h_rt=1:0:0 # Request 1 hour runtime

#$ -l h_vmem=1G # Request 1GB RAM per core

# Module load
module load app

# Run application
app \
  --input in.dat \
  --output out.dat

Slurm serial job script (single-core)

#!/bin/bash
# The working directory for the job is
# the current directory by default in
# Slurm

#SBATCH -n 1 # (or --ntasks=1) Request 1 core

# OPTIONAL LINE: default partition is
# compute
#SBATCH -p compute # (or --partition=compute)

#SBATCH -t 1:0:0 # Request 1 hour runtime

#SBATCH --mem-per-cpu=1G # Request 1GB RAM per core

# Module load
module load app

# Run application
app \
  --input in.dat \
  --output out.dat

For more detailed examples of single-core serial job scripts, please see the main documentation.

Serial job script (multi-core)¶

UGE serial job script (multi-core)

#!/bin/bash
#$ -cwd # Set the working directory for the job to the current directory



#$ -pe smp 4 # Request 4 CPU cores





#$ -l h_rt=1:0:0 # Request 1 hour runtime

#$ -l h_vmem=1G # Request 1GB RAM / core, i.e. 4GB total

# Module load
module load app

# Using $NSLOTS for threading
app \
  --threads ${NSLOTS} \
  --input in.dat \
  --output out.dat

Slurm serial job script (multi-core)

#!/bin/bash
# The working directory for the job is
# the current directory by default in
# Slurm

#SBATCH -n 4 # (or --ntasks=4) # Request 4 CPU cores

# OPTIONAL LINE: default partition is
# compute
#SBATCH -p compute # (or --partition=compute)

#SBATCH -t 1:0:0 # Request 1 hour runtime

#SBATCH --mem-per-cpu=1G # Request 1GB RAM / core, i.e. 4GB total

# Module load
module load app

# Using $SLURM_NTASKS for threading
app \
  --threads ${SLURM_NTASKS}
  --input in.dat \
  --output out.dat

For more detailed examples of multi-core serial job scripts, please see the main documentation.

Parallel job script¶

Request the right resources and partition

Parallel jobs must request the parallel partition and request at least two nodes. Jobs that fail to fulfil these requirements will be rejected by Slurm.

Slurm exclusive requests must separately request exclusive RAM

On UGE, jobs submitted to the Apocrita parallel nodes would automatically make an exclusive request for all CPUs and RAM on the nodes requested. On Slurm, users need to request both --exclusive and --mem=0 as in the example below. For more information, please see the official Slurm documentation.

UGE parallel job script

#!/bin/bash
#$ -cwd # Set the working directory for the job to the current directory



#$ -pe parallel 96 # Request 96 cores/2 ddy nodes


#$ -l infiniband=ddy-i # Choose infiniband island (ddy-i)

#$ -l h_rt=240:0:0 # Request 240 hours runtime

# This is automatically added to all UGE
# parallel jobs on Apocrita
#$ -l exclusive # Request all resources on node


# Module load
module load openmpi

# UGE needs to be explicitly given the
# number of ranks to use
# (usually via $NSLOTS)


mpirun \
  -np ${NSLOTS} \
  ./code \
  -i input.file

Slurm parallel job script

#!/bin/bash
# The working directory for the job is
# the current directory by default in
# Slurm

#SBATCH -N 2 # (or --nodes=2) # Request 2 ddy nodes
#SBATCH -n 96 # (or --ntasks=96) # Request 96 cores

#SBATCH -p parallel # (or --partition=parallel)

#SBATCH -t 240:0:0 # Request 240 hours runtime

# Both arguments required for exclusive
# use of CPUs and memory on all nodes
#SBATCH --exclusive
#SBATCH --mem=0

# Module load
module load openmpi

# Slurm knows how many cores to use for
# mpirun, detected automatically from
# ${SLURM_NTASKS}. Use -- to ensure
# arguments are passed to application
# and not mpirun
mpirun \
  -- \
  ./code \
‎  -i input.file

For more detailed examples of parallel job scripts, please see the main documentation.

Use mpirun instead of srun --mpi

The official Open MPI documentation recommends using mpirun for all MPI processes under Slurm and not srun.

Array job script¶

UGE array job script

#!/bin/bash
#$ -cwd
#$ -pe smp 1
#$ -l h_vmem=1G
#$ -j y
#$ -l h_rt=1:0:0
#$ -t 1-3

echo ${SGE_TASK_ID}

Slurm array job script

#!/bin/bash

#SBATCH -n 1
#SBATCH --mem-per-cpu=1G

#SBATCH -t 1:0:0
#SBATCH -a 1-3

echo ${SLURM_ARRAY_TASK_ID}

For more detailed examples of array job scripts, please see the main documentation.

GPU job script¶

UGE GPU job script

#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 8 # 8 cores (8 cores per GPU)
#$ -l h_rt=1:0:0 # 1 hour runtime
#$ -l h_vmem=11G # 11 * 8 = 88G total RAM
#$ -l gpu=1 # request 1 GPU

./run_code.sh

Slurm GPU job script

#!/bin/bash

#SBATCH -p gpushort # (or --partition=gpushort)
#SBATCH -n 8 # 8 cores (8 cores per GPU)
#SBATCH -t 1:0:0 # 1 hour runtime
#SBATCH --mem-per-cpu=11G # 11 * 8 = 88G total RAM
#SBATCH --gres=gpu:1 # request 1 GPU

./run_code.sh

For more detailed examples of different GPU job script types, please see the main GPU documentation.

More job script header options – UGE vs Slurm¶

Job Output Files¶

UGE job output files	SLURM job output files
`Individual (non-array) jobs`	`Individual (non-array) jobs`
`job scriptname.oJOBID`	`slurm-JOBID.out`
`job scriptname.eJOBID`
`Array jobs`	`Array jobs`
`job scriptname.oJOBID.TASKID`	`slurm-ARRAYJOBID_TASKID.out`
`job scriptname.eJOBID.TASKID`

By default, Slurm will generate a single job output file containing the STDOUT and STDERR output that UGE splits into two files by default. If you were already using "-j y" in your UGE jobs, this will be familiar.

The naming and merging of the files can be changed using job script options, and we highly recommend you give your Slurm jobs a name and then manually define your job output file(s) naming scheme to match, to avoid all your output files starting with "slurm-".

Single combined job output file¶

For a single output file (which would be named "jobname.o<job number>"):

#SBATCH -J jobname
#SBATCH -o %x.o%j

Example for a job called "jobname" with the job number 1234567:

jobname.o1234567

Separate job error and output files¶

For separate output files for STDOUT and STDERR (which would be named "jobname.o<job number>" and "jobname.e<job number>" respectively):

#SBATCH -J jobname
#SBATCH -o %x.o%j
#SBATCH -e %x.e%j

Example for a job called "jobname" with the job number 1234567:

jobname.e1234567
jobname.o1234567

UGE vs SLURM output naming options¶

UGE job script

#!/bin/bash
...
# Naming the job is optional.
# Default is name of job script
# DOES rename .o and .e output files.
#$ -N jobname

# Naming the output files is optional.
# Default is separate .o and .e files:
# jobname.oJOBID and jobname.eJOBID
# Use of '-N jobname' DOES affect those defaults
#$ -o myjob.out
#$ -e myjob.err

# To join .o and .e in to a single file
# similar to Slurm's default behaviour:
#$ -j y
‎ 
‎

Slurm job script

#!/bin/bash
...
# Naming the job is optional.
# Default is name of job script
# Does NOT rename .out file.
#SBATCH -J jobname

# Naming the output files is optional.
# Default is a single file for .o and .e:
# slurm-JOBID.out
# Use of '-J jobname' does NOT affect the default
#SBATCH -o myjob.out
#SBATCH -e myjob.err

# Use wildcards to recreate the UGE names
# %x = $SLURM_JOB_NAME
# %j = $SLURM_JOB_ID
#SBATCH -o %x.o%j      
#SBATCH -e %x.e%j

For further naming options, please refer to the official Slurm documentation.

The $SLURM_JOB_NAME variable will tell you the name of your job script, unless the -J jobname variable is used to rename your job. Then the environment variable is set to the value of jobname.

If you wanted to use $SLURM_JOB_NAME to always give you the name of the job script from within your job, you would have to remove the -J flag. However, the following command run inside your job script will give you the name of the job script regardless of whether you use the -J flag or not:

scontrol show jobid ${SLURM_JOB_ID} | grep Command= | awk -F/ '{print $NF}'

Renaming array job output files¶

An array job uses slurm-*ARRAYJOBID_TASKID*.out as the default output file for each task in the array job. This can be renamed but you need to use the %A and %a wildcards (not %j).

UGE job script

#!/bin/bash
...
# An array job (cannot start at 0)
#$ -t 1-1000

# Naming the job is optional.
# Default is name of job script
#$ -N jobname

# Naming the output files is optional.
# Default is separate .o and .e files:
# jobname.oJOBID and jobname.eJOBID
# Use of '-N jobname' DOES affect those defaults


# To join .o and .e in to a single file
# similar to Slurm's default behaviour:
#$ -j y
‎ 
‎ 
‎

Slurm job script

#!/bin/bash
...
# An array job (CAN start at 0)
#SBATCH -a 0-999 # (or --array=0-999)

# Naming the job is optional.
# Default is name of job script
#SBATCH -J jobname

# Naming the output files is optional.
# Default is a single file for .o and .e:
# slurm-ARRAYJOBID_TASKID.out
# Use of '-J jobname' does NOT affect
# the default

# Use wildcards to recreate the UGE names
# %x = $SLURM_JOB_NAME
# %A = $SLURM_ARRAY_JOB_ID
# %a = $SLURM_ARRAY_TASK_ID
#SBATCH -o %x.o%A.%a  
#SBATCH -e %x.e%A.%a

Emailing from a job¶

Slurm can email you when your job begins, ends or fails.

UGE job script

#!/bin/bash
...
# Mail events: begin, end, abort
#$ -m bea
#$ -M emailaddr@qmul.ac.uk

Slurm job script

#!/bin/bash
...
# Mail events: NONE, BEGIN, END, FAIL, ALL
#SBATCH --mail-type=ALL
#SBATCH --mail-user=emailaddr@qmul.ac.uk

Note that in Slurm, array jobs only send one email, not an email per job-array task as happens in UGE. If you want an email from every job-array task, add ARRAY_TASKS to the --mail flag:

#SBATCH --mail-type=ALL,ARRAY_TASKS
#
# DO NOT USE IF YOUR ARRAY JOB CONTAINS MORE THAN
# 20 TASKS!!

But please be aware that you will receive lots of emails if you run a large job array with this flag enabled.

Job Environment Variables¶

A number of environment variables are available for use in your job scripts – these are sometimes useful when creating your own log files, for informing applications how many cores they are allowed to use (we’ve already seen $SLURM_NTASKS in the examples above), and for reading sequentially numbered data files in job arrays.

UGE Environment Variables	Slurm Environment Variables
`$NSLOTS # Num cores reserved`	`$SLURM_NTASKS # Num cores from -n flag`
	`$SLURM_CPUS_PER_TASK # Num cores from -c flag`
`$JOB_ID # Unique jobid number`	`$SLURM_JOB_ID # Unique job id number`
`$JOB_NAME # Name of job`	`$SLURM_JOB_NAME # Name of job`

`# For array jobs`	`# For array jobs`
`$JOB_ID # Same for all tasks`	`$SLURM_JOB_ID #` `DIFFERENT FOR ALL TASKS`
`# (e.g, 20173)`	`# (e.g, 20173,20174,20175,)`
	`$SLURM_ARRAY_JOB_ID #` `SAME` `for all tasks`
	`# (e.g, 20173)`
`$SGE_TASK_ID # Job array task number`	`$SLURM_ARRAY_TASK_ID # Job array task number`
`# (e.g., 1,2,3,...)`	`# (e.g., 1,2,3,...)`
`$SGE_TASK_FIRST # First task id`	`$SLURM_ARRAY_TASK_MIN # First task id`
`$SGE_TASK_LAST # Last task id`	`$SLURM_ARRAY_TASK_MAX # Last task id`
`$SGE_TASK_STEPSIZE # Taskid increment: default 1`	`$SLURM_ARRAY_TASK_STEP # Increment: default 1`
	`$SLURM_ARRAY_TASK_COUNT # Number of tasks`

`# Others`	`# Others`
`$PE_HOSTFILE # Multi-node job host list`	`$SLURM_JOB_NODELIST # Multi-node job host list`
`$NHOSTS # Number of nodes in use`	`$SLURM_JOB_NUM_NODES # Number of nodes in use`
`$SGE_O_WORKDIR # Submit directory`	`$SLURM_SUBMIT_DIR # Submit directory`

Many more environment variables are available for use in your job script. The official Slurm documentation (also available by running man sbatch) documents input and output environment variables. The input variables can be set by you before submitting a job to set job options (although we recommend not doing this – it is better to put all options in your job script so that you have a permanent record of how you ran the job). The output variables can be used inside your job script to get information about the job (e.g., number of cores, job name and so on – we have documented several of these above.)

Moving from Grid Engine to Slurm¶

Job script header¶

UGE (#$) vs Slurm (#SBATCH)¶

Job submission and management¶

UGE (qsub, …) vs Slurm (sbatch, …)¶

Job scripts¶

Put #SBATCH lines in one block¶

Serial job script (single-core)¶

Serial job script (multi-core)¶

Parallel job script¶

Array job script¶

GPU job script¶

More job script header options – UGE vs Slurm¶

Job Output Files¶

Single combined job output file¶

Separate job error and output files¶

UGE vs SLURM output naming options¶

Renaming array job output files¶

Emailing from a job¶

Job Environment Variables¶

UGE (`#$`) vs Slurm (`#SBATCH`)¶

UGE (`qsub`, …) vs Slurm (`sbatch`, …)¶

Put `#SBATCH` lines in one block¶