GNU Parallel¶
GNU parallel is a shell tool for executing jobs in parallel using one or more computers. A job can be a single command or a small script that has to be run for each of the lines in the input. The typical input is a list of files, a list of hosts, a list of users, a list of URLs, or a list of tables. A job can also be a command that reads from a pipe. GNU parallel can then split the input and pipe it into commands in parallel.
GNU Parallel is available as a module on Apocrita.
Usage¶
To run the default installed version of GNU Parallel, simply load the parallel
module:
module load parallel
then run, for example a parallel gzip on multiple files:
parallel gzip ::: file1 file2 file3
Alternatively, the list of commands to run can be specified using an input file:
parallel ::: < list.sh
Within a script this can be done using a here document e.g.
parallel ::: <<ENDOFLIST
command1 -option1 file1
command2 -option2 file2
ENDOFLIST
Example job¶
Serial job¶
Here is an example job running on 4 cores.
#!/bin/bash
#SBATCH -n 4 # (or --ntasks=4) Request 4 cores
#SBATCH --mem-per-cpu=1G # Request 1GB RAM per core
#SBATCH -t 1:0:0 # Request 1 hour runtime
module load parallel
parallel gzip ::: file1 file2 file3 file4