Skip to content

Jellyfish

Jellyfish is a tool for fast, memory-efficient counting of k-mers in DNA. Jellyfish can count k-mers quickly by using an efficient encoding of a hash table and by exploiting the "compare-and-swap" CPU instruction to increase parallelism.

Jellyfish is available as a module on Apocrita.

Usage

Number of threads

By default, Jellyfish will run in single threaded mode. If you are requesting multiple cores, pass the -t parameter with the value of ${SLURM_NTASKS} as shown in the example job below.

To run the default installed version of Jellyfish, simply load the jellyfish module:

$ module load jellyfish
$ jellyfish --help
Usage: jellyfish <cmd> [options] arg...
Where <cmd> is one of: count, bc, info, stats, histo, dump, merge, query, cite, mem, jf.
Options:
  --version  Display version
  --help     Display this message

For usage documentation, pass the --help switch after any of the jellyfish command:

$ jellyfish count --help
Usage: jellyfish count [options] file:path+

Count k-mers in fasta or fastq files

Options (default value in (), *required):
 -m, --mer-len=uint32        *Length of mer
 -s, --size=uint64           *Initial hash size
 -t, --threads=uint32        Number of threads (1)

(output omitted)

Example job

Serial job

Here is an example job running on 2 cores and 2GB of memory:

#!/bin/bash
#SBATCH -n 2               # (or --ntasks=2) Request 2 cores
#SBATCH --mem-per-cpu=1G   # Request 1GB RAM per core
#SBATCH -t 1:0:0           # Request 1 hour runtime

module load jellyfish

jellyfish count -t ${SLURM_NTASKS} \
                -m 16 \
                -s 10M \
                -C \
                -o 16mer.jf < ERR458495.fastq

References