GATK: the Genome Analysis Toolkit
The Genome Analysis Toolkit or more simply, GATK, is a software which is widely used to analyze high throughput sequencing data. GATK has been developed by the Data Science and Data Engineering group working at the Broad Institute. This specialized toolkit provides users with a broad selection tools and to focus on variant discovery, genotyping and data quality assurance. The software architecture allows it to handle projects of any given size.
An industry standard
The Genome Analysis Toolkit has become an industry standard in identifying SNPs along with indels in germline DNA and RNA sequencing data. GATK is currently being adapted also to process data from somatic cells.
GATK tools have been designed to process data from whole exome and whole genome which are generated with the Illumina sequencing technology. However, they can also be adapted to a multiple of other technologies and experimental designs. Even though the original design was meant for human genetics, the Genome Analysis Toolkit is now able to handle genomic data coming from any organism with any amount of ploidy.
The software features a variety of tools which can be used either out of the box or together with scripts. These tools can be easily parallelized by multithreading. GATK also comes with a complete reads-to-results variant discovery workflow recommendations to reach the best possible accuracy and the highest computational efficiency.