DIAMOND¶
DIAMOND is a sequence aligner for protein and translated DNA searches, designed for high performance analysis of big sequence data.
Key features include:
- Pairwise alignment of proteins and translated DNA at 500x-20,000x speed of BLAST.
- Frameshift alignments for long read analysis.
- Low HPC resource requirements.
- Various output formats, including BLAST pairwise, tabular, XML and taxonomic classification.
DIAMOND is available as a module on Apocrita.
Usage¶
To run the default installed version of DIAMOND, simply load the diamond
module:
$ module load diamond
$ diamond help
Syntax: diamond COMMAND [OPTIONS]
Commands:
makedb Build DIAMOND database from a FASTA file
blastp Align amino acid query sequences against a protein reference database
blastx Align DNA query sequences against a protein reference database
view View DIAMOND alignment archive (DAA) formatted file
help Produce help message
version Display version information
getseq Retrieve sequences from a DIAMOND database file
dbinfo Print information about a DIAMOND database file
For usage documentation, run diamond help.
Example job¶
Selecting the number of threads
By default, DIAMOND will run multi-threaded on all available cores. To
prevent overloading a compute node, you should override this by passing
the --threads parameter with the value of ${SLURM_NTASKS}.
Serial job¶
Here is an example job running on 1 core and 1GB of memory:
#!/bin/bash
#SBATCH -n 1 # (or --ntasks=1) Request 1 core
#SBATCH --mem-per-cpu=1G # Request 1GB RAM per core
#SBATCH -t 1:0:0 # Request 1 hour runtime
module load diamond
# Create a binary DIAMOND database
diamond makedb --db example \
--in example.fa \
--threads ${SLURM_NTASKS}
# Run the alignment task
diamond blastx --db example \
--out matches \
--query query.fna \
--threads ${SLURM_NTASKS}