The ExAC database: one of the most useful resources for the scientific community

Last update: November 24, 2015

ImmagineExAC is acronym for Exome Aggregation Consortium, which is a coalition of investigators seeking to aggregate and harmonize exome sequencing data from a wide variety of large-scale sequencing projects. As such, ExAC is also the name of the database that the consortium is feeding.

In the ExAC database, which is currently in its beta version, the genetic data from exome sequencing of more than 60,000 individuals have been so far collected. Data from individuals affected by severe paediatric disease have been intentionally removed, so that the database may serve as a valuable tool for severe rare diseases studies. ExAC is already used in genetic research and it has already been cited in several peer-reviewed publications, whereas the flagship article about ExAC is expected by the end of 2015.

How to use it

The searching mask on the homepage is quite user-friendly, as it can easily take different types of entries, from the transcript number to the variant coordinates to the gene name (notably it takes also old gene names). The result page is really comprehensive and surprisingly easy to understand. A search for CFTR, for instance, will immediately return a gene overview with its most useful links and its key numbers (e.g. the number of all identified variants and UCSC coordinates). But the best part of the page comes immediately below, where the gene structure is visually represented in its alternation of exons and introns with point of different colors indicating the position and type of all variants so far identified: BLACK=5’UTR, 3’UTR and intronic, BROWN=missense (benign or pathogenic), RED=stop, frameshift and impacting splicing, GREEN= synonymous or not affecting splicing.

Alternation of exon/introns in the ExAC gene representation. The blue hills are the level of coverage of each gene fragment.

Alternation of exon/introns in the ExAC gene representation. The blue hills are the level of coverage of each gene fragment.

The blue hills are representing the quote of individuals above a certain limit of coverage, of which value can be set manually on the right side. The visual representation of the gene can be zoomed in and out to focus on particular exons or introns and all variants are listed in an interactive table below the graph, where they can be sorted and re-sorted by clicking on the parameters of the various columns.

Of note the gene is graphically represented in its canonical script, which is the transcript most used by laboratories in establishing their protocols: for human, the canonical transcript for a gene is set according to the following hierarchy: 1. longest CCDS translation with no stop codons. 2. if no (1), it’ll be the longest Ensembl/Havana merged translation with no stop codons. 3. if no (2), it’ll be the longest translation with no stop codons. 4. if no translation, it’ll be the longest non-protein-coding transcript.

Breda Genetics and ExAC

Is Breda Genetics using ExAC to filter down the results of its analyses? Of course. We rate ExAC to be one of the most innovating and useful tools currently available to the international scientific community and we regularly interrogate it to filter down and interpret large dataset from our exome and genome sequencing runs.

References: http://exac.broadinstitute.org/, www.pubmed.com (query for “ExAC” as of 09/11/2015), http://www.ensembl.org/Help/Glossary?id=346

Posted in Academia, Technohub and tagged , , , , .