Using $TMPDIR¶
Files stored in $TMPDIR cannot be accessed from SSH sessions
The $TMPDIR variable points to a temporary directory that exists only on
the compute node while your job is running. Its contents cannot be accessed
directly from a login node or via the compute node via SSH.
For interactive work, use salloc to access the compute node and inspect
$TMPDIR. For batch jobs (sbatch), include any commands that use
$TMPDIR directly in your submission script, and copy any files you want
to keep to a persistent location (for example, $HOME) before the job
completes.
There is temporary space available on the nodes that can be used when you submit a job to the cluster.
As this storage is physically located on the nodes, it is not shared between nodes, but it will provide better performance for read/write (I/O) intensive tasks on a single node than networked storage. However, to use the temporary scratch space, you will need to copy files from networked storage to the temporary scratch space. In addition, if a job fails then any intermediate files created may be lost.
If your job does a lot of I/O operations to large files, it may therefore improve performance to:
- copy files from your home directory into the temporary folder
- run your job in the temporary folder
- copy files back from the temporary folder to your home directory if needed
- delete them from the temporary folder as soon as they're no longer needed
Basic example¶
The following job runs a shell-script ./runcode.sh in a data folder beneath a
user's home directory. The data is held on networked storage at this point.
#!/bin/bash
#SBATCH -n 1 # Request 1 core
#SBATCH -p compute # Request the compute partition
#SBATCH -t 1:0:0 # Request 1 hour runtime
#SBATCH --mem-per-cpu=2G # Request 2GB RAM per core
cd $HOME/project
./runcode.sh
On any node the temporary scratch directory is accessed using the variable
$TMPDIR. If specific, known files are needed in your processing, you can copy
your data to that space before working on it.
The following job:
- copies
data.filefrom theprojectdirectory to the temporary area - sets the current working directory to the temporary area
- runs the appropriate code
- copies the output file
results.databack to theprojectdirectory
This is the equivalent of the previous example, but using the temporary storage.
#!/bin/bash
#SBATCH -n 1 # Request 1 core
#SBATCH -p compute # Request the compute partition
#SBATCH -t 1:0:0 # Request 1 hour runtime
#SBATCH --mem-per-cpu=2G # Request 2GB RAM per core
# Copy data.file from the project directory to the temporary scratch space
cp $HOME/project/data.file $TMPDIR
# Move into the temporary scratch space where your data now is
cd $TMPDIR
# Do processing - as this is a small shell script, it is run from the network storage
$HOME/project/runcode.sh
# Copy results.data back to the project directory from the temporary scratch space
cp $TMPDIR/results.data $HOME/project/
If you do not know, or cannot list all the possible output files that you would
like to move back to you home directory you can use rsync to only copy
changed and new files back at the end of the job. This will save time and avoid
unnecessary copying.
The following job:
- copies files to the temporary scratch area
- runs the shell-script
./runcode.shon the local copy - copies the results back to networked storage
#!/bin/bash
#SBATCH -n 1 # Request 1 core
#SBATCH -p compute # Request the compute partition
#SBATCH -t 1:0:0 # Request 1 hour runtime
#SBATCH --mem-per-cpu=2G # Request 2GB RAM per core
# Source folder for data
DATADIR=$HOME/project
# Copy data (inc. subfolders) to temporary storage
rsync -rltv $DATADIR/ $TMPDIR/
# Run job from temporary folder
cd $TMPDIR
./runcode.sh
# Copy changed files back
rsync -rltv $TMPDIR/ $DATADIR/
Advanced example¶
This advanced example demonstrates how to trigger an action (for example
saving a checkpoint file) whilst the job is running, to avoid losing the
contents of $TMPDIR if the job reaches the runtime requested.
Using rsync, the following script copies the contents of $TMPDIR back to
$HOME after 55 minutes in a job that requests a 1 hour runtime:
#SBATCH -n 1 # Request 1 core
#SBATCH -p compute # Request the compute partition
#SBATCH -t 1:0:0 # Request 1 hour runtime
#SBATCH --mem-per-cpu=2G # Request 2GB RAM per core
# Rsync data back to $HOME after 55 minutes
(
sleep 55m
rsync -rltv "$TMPDIR/" "$HOME/"
) &
# Job code
./runcode.sh