Multi-node jobs¶
Jobs which require multi-node parallel processing, such as those which use MPI,
are run on nodes with a low-latency InfiniBand connection. To run jobs on
these nodes, request the parallel partition.
If your job is requesting more than 2 nodes, your job may queue for a long period of time, and may benefit from running on a Tier 2 cluster instead of Apocrita. If you are unsure about your eligibility or have any questions about Tier 2 facilities, please contact us.
Use mpirun instead of srun --mpi
The
official Open MPI documentation
recommends using mpirun for all MPI processes under Slurm and not
srun.
#!/bin/bash
## Request 2 nodes
#SBATCH -N 2 # (or --nodes=2)
## Request 96 tasks/cores
#SBATCH -n 96 # (or --ntasks=96)
## Request the "parallel" partition
#SBATCH -p parallel # (or --partition=parallel)
## Request 240 hours runtime
#SBATCH -t 240:0:0 # (or --time=240:0:0)
## Request all CPUs per node
#SBATCH --exclusive
## Request all available memory per node
#SBATCH --mem=0
# ---
# Module load
module load openmpi
# Slurm knows how many tasks to use for mpirun, detected automatically from
# ${SLURM_NTASKS}. Use -- to ensure arguments are passed to the application
# and not mpirun
mpirun \
-- \
./code \
-i input.file
This example loads the openmpi module, and spawns the program ./code 96
times (48 per node) using the MPI framework.