Multi-node jobs¶

Jobs which require multi-node parallel processing, such as those which use MPI, are run on nodes with a low-latency InfiniBand connection. To run jobs on these nodes, request the parallel partition.

If your job is requesting more than 2 nodes, your job may queue for a long period of time, and may benefit from running on a Tier 2 cluster instead of Apocrita. If you are unsure about your eligibility or have any questions about Tier 2 facilities, please contact us.

Use mpirun instead of srun --mpi

The official Open MPI documentation recommends using mpirun for all MPI processes under Slurm and not srun.

#!/bin/bash

## Request 2 nodes
#SBATCH -N 2               # (or --nodes=2)

## Request 96 tasks/cores
#SBATCH -n 96              # (or --ntasks=96)

## Request the "parallel" partition
#SBATCH -p parallel        # (or --partition=parallel)

## Request 240 hours runtime
#SBATCH -t 240:0:0         # (or --time=240:0:0)

## Request all CPUs per node
#SBATCH --exclusive        

## Request all available memory per node
#SBATCH --mem=0

# ---

# Module load
module load openmpi

# Slurm knows how many tasks to use for mpirun, detected automatically from
# ${SLURM_NTASKS}. Use -- to ensure arguments are passed to the application
# and not mpirun
mpirun \
  -- \
  ./code \
  -i input.file

This example loads the openmpi module, and spawns the program ./code 96 times (48 per node) using the MPI framework.