Besides DGV, DGVa and DECIPHER, there is another database that lists structural DNA variations: it’s dbVar, which we expect to be the most comprehensive one, based on its illustrious trademark: NCBI.
What is a structural variation?
Before going on, we just want to recall here that a genomic structural variation or variant is usually defined as a copy number variation (CNV) or other type of rearrangment (inversion, translocation) which involves a chromosomal fragment of at least 1kb in size. This definition may vary (DGV, for example, consider all variations larger than just 50 base pairs), but the concept is simpler if we look at it from a practical point of view, considering a structural variation any mutation which is too big to be detected by standard sequencing and too small to be detected by karyoptyping or FISH and which requires alternative methods to be (totally or partially) identified (e.g. qPCR, MLPA, CGH array, ROMA, etc).
The three dbVar main objects
dbVar is structured around three main objects:
– STUDIES [std]: this is the WHO. The std number identifies the study in which the variant has been reported. Unlike single nucleotide variations (e.g. SNPs), which are widespread in the population, structural variations are very rare and are often seen in one single patient. As consequence, any single structural variations is reported in one specific publication. Note: nstd identifies studies accessioned at NCBI, whereas estd identifies studies accesioned at EBI.
– VARIANT REGIONS [sv]: this is the WHERE. The variant regions are the regions which submitters define as the genomic area within which the structural variation is falling. The variant region cannot be considered itself as the “reference” sequence, but it simply represents the submitter’s assumption concerning the area within which the variation falls. dbVar states indeed that “because the systematic detection of structural variation is still a nascent field, it is not possible to define reference variants based on current data. As detection technology and variant-calling algorithms improve, it may become possible to detect precise breakpoints both reliably and unambiguously, making it feasible to establish a reference structural variant set.” Note: nsv identifies variant regions accessioned at NCBI, esv variant regions accesioned at EBI.
– VARIANT CALLS [ssv]: this is the WHAT. Variant calls are the single structural variations themselves, as they have been observed in each study. Several class of variants can be listed in dbVar, identified as “copy number gain”, “copy number loss”, “deletion”, “duplication”, etc. (see also sequence ontology IDs). Note: nssv identifies data accessioned at NCBI, essv data accesioned at EBI.
The NCBI browser
All structural variations can be graphically displayed in the NCBI browser, where different colours corresponds to different variation classes (from blue to light indigo any type of variant, from gains and losses to insertions and translocations, is represented – see the chapter “dbVar variant rendering”). In these graphics the breakpoints of the variation can be depicted in different manners, depending on whether they have been determined precisely (clear cut, simple bars) or with uncertainty (bars with inward or outward arrows at the extremity: inward arrows indicate that the breakpoints lie inside the defined region, outward arrows indicate that the breakpoints lie outside the defined region).
Methods for variant detection
Structural variations reported in dbVar have been detected by submitters by means of several different techniques, from study to study. Such techniques include BAC aCGH and oligo aCGH, Representational Oligonucleotide Microarray Analysis (ROMA), fosmid or NGS paired-end mapping (PEM) and analysis of SNP genotyping data.