NGS, coverage depth, and CNV detection.
Next-generation sequencing (NGS) has completely transformed the world of genetic testing, enabling us to sequence human whole exome or whole genome in one single experiment at an unprecedented scale, capacity and convenience. High-throughput sequencing based on short reads has empowered laboratories worldwide. We can now sift a huge amount of data to single out pathogenic mutations in a faster way. This is very useful for rare genetic diseases, particularly for those cases of diagnostic odyssey.
Coverage depth is important in NGS. Coverage depth is defined as the numbers of unique reads that unambiguously map to a given reference DNA trait or nucleotide position in the reconstructed DNA sequence. A sufficient number of properly mapped reads is required to correctly read a DNA sequence. Higher levels of coverage are more expensive, but they reduce the probability of false positives and false negatives.
In Mendelian disorders, Copy Number Variation (CNV) analysis can be performed on NGS data. Especially when doing whole exome sequencing, the better the coverage, the more refined the results of CNV analysis! In CNV analysis based on whole exome sequencing, coverage comparison in target genomic regions among a well-curated sets of controls is a key step.
Molecular Biology against Mathematics
OK, let’s step back. What are CNVs? Copy Number Variations (CNV) are genomic structural variants consisting of gain or loss of large DNA segments. The majority of CNVs contribute to human genomic diversity, but some of them can cause genetic diseases. While whole exome and whole genome sequencing are primarily used to detect small single nucleotide variations (SNV), they can be used now to detected CNVs as well.
Chromosomal microarray analysis (CMA, comprising array-CGH and high-density SNP-array) has long been considered the gold standard for CNVs analysis. CMA has also been considered the first-tier approach for mental retardation and congenital anomalies. CMA is a microchip-based testing technology, so it’s molecular (the patient’s DNA is materially used to performed a “wet” analysis on an array).
Algorithmic CNV analysis consists of the identification of large deletions/duplications based on calculations (so it’s not molecular, it’s a mathematical method, very much based on the average values of a reference pool of controls and the validity of the algorithm applied).
Low-pass genome sequencing: less is more!
Low-pass genome sequencing (LPGS, or low-pass GS, or low-coverage genome sequencing) consists of high-throughput sequencing of the entire human genome at low coverage.
A milestone is an article published in 2015 by The 1000 Genomes Project Consortium. The authors reported the completion of the project by the reconstruction of genomic sequences of 2,504 individuals from 26 different populations. In this project, control individuals have been sequenced using low-coverage whole genome sequencing (mean depth of 7.4x) and targeted exome sequencing with mean coverage depth of 65.7x. Surprisingly, low-coverage genome sequencing showed its ability to correctly identify CNVs in DNA samples.
Hold on tight, we are going into technical details now!
LPGS of a human DNA sample at coverage of 0.5x is expected to have at least one read on 33 million of the 85 million sites in the 1000 Genome Project, whereas a genotyping from CMA will detect variants with half of the magnitude at least. Furthermore, LPGS is free from ascertainment biases, as it works independently of the DNA region where variants are located.
Furthermore, CMA results depends on array probes density, whereas whole genome sequencing is supposed to map CNV across the entire human DNA.
Moving from theory to practice
In one study, LPGS has been used for CNVs analysis in a cohort of 1,077 couples with recurrent miscarriage in comparison with standard chromosomal analysis. Authors were able to find chromosomal abnormalities even in couples that formerly received a normal karyotype. So LPGS led to an increased diagnostic yield (Dong et al., 2019).
LPGS has been evaluated in comparison to CMA in prenatal diagnostics as well. LPGS was able to detect not only all chromosomal rearrangements identified by CMA, but revealed additional chromosomal anomalies as well. By using CMA results as reference, LPGS showed a sensitivity of 99.9% and a specificity of 87.7%. These results suggest that applying LPGS may be an alternative prenatal diagnostic test.
Furthermore, LPGS showed its efficacy in detecting mosaic CNV as well. More precisely, if the size of the structural variant is larger than 2.5Mb, LPGS resolution in CNV detection can be lower to a mosaic rate of 20%.
Recent studies have validated LPGS as a cost-effective and higher-resolution alternative to CMA for CNV detection. Tested on hundreds of specimens for detection of microdeletions/microduplications, uniparental isodisomy, triploidy, and whole chromosome aneuploidies, LPGS showed accuracy, precision, specificity, and sensitivity levels that enabled the detection of all CNVs previously identified by CMA. This has led to think that LPGS may be the new state-of-the-art technology for CNV detection, with lower costs and higher resolution, possibly replacing CMA as the gold standard for CNVs analysis.
Shall we switch from CMA to LPGS?
Breda Genetics has started an internal review of the literature regarding the application of LPGS in CNV screening. The following evidence is emerging:
– affordable cost;
– 100% of accuracy, precision, specificity, and sensitivity (by using CMA results as reference) ;
– higher capability in detecting chromosomal rearrangements compared to CMA, with an increased diagnostic yield;
– possible identification of mosaic CNVs;
– application in the context of prenatal and postnatal genetic testing.
While additional studies are needed to confirm these good premises, low-pass genome sequencing is proving to be a powerful method that can become the new gold standard in the CNVs analysis.
Miller et al., 2010. Consensus statement: chromosomal microarray is a first-tier clinical diagnostic test for individuals with developmental disabilities or congenital anomalies. Am J Hum Genet. 2010 May 14;86(5):749-64. PMID: 20466091.
Petrackova et al., 2019. Standardization of Sequencing Coverage Depth in NGS: Recommendation for Detection of Clonal and Subclonal Mutations in Cancer Diagnostics. Front Oncol. 2019 Sep 4;9:851. PMID: 31552176.
The 1000 Genomes Project Consortium, 2015. A global reference for human genetic variation. Nature. 2015 Oct 1;526(7571):68-74. PMID: 26432245.
Li et al., 2021. Low-pass sequencing increases the power of GWAS and decreases measurement error of polygenic risk scores compared to genotyping arrays. Genome Res. 2021 Feb 3;gr.266486.120. PMID: 33536225
Chaubey et al., 2020. Low-Pass Genome Sequencing. Validation and Diagnostic Utility from 409 Clinical Cases of Low-Pass Genome Sequencing for the Detection of Copy Number Variants to Replace Constitutional Microarray. J Mol Diagn. 2020 Jun;22(6):823-840. PMID: 32344035.
Hoi Kin Chau et al., 2020. Low-pass genome sequencing: a validated method in clinical cytogenetics. Hum Genet. 2020 Nov;139(11):1403-1415. PMID: 32451733.
Dong et al., 2019. Genome Sequencing Explores Complexity of Chromosomal Abnormalities in Recurrent Miscarriage. Am J Hum Genet. 2019 Dec 5;105(6):1102-1111. PMID: 31679651.
Wang et al., 2019. Low-pass genome sequencing versus chromosomal microarray analysis: implementation in prenatal diagnosis. Genet Med. 2020 Mar;22(3):500-510. PMID: 31447483.