What are segmental duplications?
Segmental duplications (also known as or low-copy repeats) are DNA fragments longer than 1 Kbp (i.e. 1,000 base pairs), distributed within and between chromosomes and sharing more than 90% genomic sequence identity. They are thought to hold a significant role in evolution and adaptability, although their functional significance remains largely unknown, also due to the difficulty of sequencing them, which is the main topic we’ll discuss in this article.
Segmental duplications in whole exome sequencing and whole genome sequencing
Luckily, the vast majority of genes are free from segmental duplications, so techniques such as whole exome sequencing and whole genome sequencing warrant the correct reading of most parts of the human genome. However, the number of genes with segmental duplications is not neglectable and segmental duplications still represent a major issue in delivering quality genetic diagnostics. Some genes contain segmental duplications in one or a few exons only, while some other genes lie on segmental duplication across their entire length (see, for instance, the genes of alpha-thalassemia HBA1 and HBA2, or the gene of classic Ehlers-Danlos syndrome-like type 1, TNXB). The interpretation of sequencing signals coming from genes lying in segmental duplications may be very difficult. Typically, segmental duplications can give a risk of about 15% of incorrect variant calling. Notably, the risk of incorrect calling exists not only for next generation sequencing but also for other molecular applications. For instance, segmental duplications may affect the sensitivity of mutation calling in MLPA, as its efficacy is based on probe sequence homology with the targeted gene sequence. The same thing happens in Sanger sequencing as well, if the PCR primers are designed to bind within the duplicated region.
Why is it so difficult to read DNA sequences lying in segmental duplications?
As far as today, the answer to this question is easy: NGS methods used for diagnostics are based on short reads (e.g. Illumina sequencing produces reads of 150 bp) (to know the meaning of read, please see here). As said above, segmental duplications are usually larger than 1,000 bp, so reads of 150 bp may be wrongly mapped in any of the known sequences showing 90% of homology. By contrast, longer reads, with extremities binding to specific DNA sequences surrounding the segmental duplication, would enable mapping of the sequence to the right chromosome. Long read sequencing actually exists for several years, but the problem is with its adaptability to diagnostics, because, while being more and more precise in the identification of large structural variations, long read sequencing still has a high error rate in single nucleotide variation detection.
How do I overcome the problem of segmental duplications?
Detecting a genetic variant in a portion of a gene that maps on a segmentally duplicated region poses several problems. We have to ask ourselves: is the variant falling in this gene or is it actually located in another part of the genome? Answering this question correctly will make the difference between a confirmed genetic diagnosis and a negative report (or, even worse, a report with a wrong diagnosis!).
So, we need to pay attention. By rule, a supposedly clinically significant variant detected in a segmentally duplicated region should be confirmed by an orthogonal method (e.g. Sanger sequencing, with or without preliminary long-range PCR to isolate the DNA fragment where we believe the variant is falling into). However, even before proceeding to molecular confirmation by another technique, there are some indicators that the experienced Geneticist may take into account in their evaluations. For instance:
- How relevant is the patient’s clinical information to the gene in question?
If the variant detected falls in a gene associated with a condition which is well matching with the patient’s clinical features, there is a possibility (although not certainty) that the variant is located in that gene and not in another part of the genome.
- How good are the variant quality values?
How good are the variant sequencing parameters? Variant sequencing parameters, from genotype quality to coverage depth are pivotal in variant filtering and come at hand also in segmental duplications. Evaluating these parameters correctly and weighing them in relation to the complexity of the clinical case is very important. Some variant parameters may also be "visually" assessed (or re-assessed) by looking at the sequencing alignments. Alignments are certainly crucial in evaluating the real localization of variants which are falling in segmental duplications.
Delehelle F, Cussat-Blanc S, Alliot JM, Luga H, Balaresque P.Bioinformatics. 2018 Aug 15;34(16):2708-2714. doi: 10.1093/bioinformatics/bty172.PMID: 30101303
Nfonsam L, Ordorica S, Ghani M, Potter R, Schaffer A, Daoud H, Vasli N, Chisholm C, Sinclair-Bourque E, McGowan-Jordan J, Smith AC, Jarinova O, Bronicki L.J Med Genet. 2019 Jun;56(6):408-412. doi: 10.1136/jmedgenet-2018-105443. Epub 2018 Sep 21.PMID: 30242101
Dennis MY, Eichler EE.Curr Opin Genet Dev. 2016 Dec;41:44-52. doi: 10.1016/j.gde.2016.08.001. Epub 2016 Aug 30.PMID: 2758485