GATK¶
GATK is a collection of command-line tools for analysing high-throughput sequencing data with a primary focus on variant discovery.
GATK is available as a module on Apocrita.
Usage¶
To run the default installed version of GATK, simply load the
gatk module:
$ module load gatk
$ gatk -h
Usage: gatk <subcommand> [arguments]
For full usage documentation, run gatk -h.
Example job¶
Serial job¶
Here is an example job running on 1 core and 2GB of memory:
#!/bin/bash
#SBATCH -n 1 # (or --ntasks=1) Request 1 core
#SBATCH --mem-per-cpu=2G # Request 2GB RAM per core
#SBATCH -t 1:0:0 # Request 1 hour runtime
module load gatk
# Run HaplotypeCaller in default mode on a single input BAM file containing
# sequence data and outputs a VCF file containing variant calls.
gatk HaplotypeCaller -R reference.fasta -I sample1.bam -O variants.vcf