What does “coverage” means?
Although the meaning of the term “coverage” may seem very simple, this term is often misused.
In the context of Next-Generation Sequencing (NGS), coverage indicates the average number of reads that "cover" a specific target region. Coverage therefore always describes a relationship between the number of reads and a reference region and can be expressed in terms of percentage or average coverage (e.g. 100X means that on average the target regions are covered by 100 reads). Care must be taken not to confuse coverage with sequencing depth which instead describes the number of total reads produced by sequencing in absolute terms and the reads depth which describes the number of reads that cover each individual base.
Which is the best coverage for sequencing?
There is no single answer to this question. The coverage necessary for the success of the analysis is influenced by numerous factors such as: (1) the length of the reads, (2) the size of the reference genome, (3) the specific application of interest, (4) the error rate of the technology used, (5) the gene expression levels and (6) the complexity of the target regions. For this, an initial experimental phase is very often necessary to establish the optimal coverage for one's analysis, using the data present in the literature for clinical trials and the indications of the scientific community as a starting point.
What about NGS-based clinical diagnostics?
Again, there is no universal answer. When using NGS for clinical diagnostics, it must be taken into account that it is necessary to have multiple observations for a single base to have a reliable call. One would say the higher the coverage the better, but the cost-result ratio must be taken into consideration. Very high coverage is often associated with a high cost of the analysis which becomes unsustainable.
So what is the best cost-result ratio?
Although to date there are no official guidelines that establish the average coverage for diagnostic analysis, the scientific community shares fairly universal coverage parameters, which depend on the application of interest. In general, it is used:
- WGS (whole genome sequencing): recommended coverage 30X-50X;
- WES (whole exome sequencing): recommended coverage 80-180X;
- ChiP-Seq (ChiP sequencing): recommended coverage 100X;
These parameters generally allow to overcome the technical errors due to the sequencing method, reduce false negatives and obtain reliable biological data.
There are some applications, such as the search for low-frequency mutations, which require higher average coverage. Low-frequency mutations are mutations that do not follow classical Mendelian genetics and can be found, for example, in cases of somatic or germinal mosaicism and in the study of tumors (clonal and subclonal mutations). Even the sequencing of mitochondrial DNA requires higher coverage, due to the existence of heteroplasmy.
Even in these cases, there is still no univocal opinion and each laboratory adapts its protocols in order to obtain the best results. In general, coverage ranges from a minimum of 250X up to a maximum greater than 1300X depending on the purpose and technology used.
Some applications go against the current...
With the advancement of technology and the advent of third-generation sequencing techniques, there is more and more evidence that for some applications, such as the search for Copy Number Variants (CNVs) or structural variants, it is possible to use techniques with very low coverage (even below 10X). Techniques such as low-depth nanopore sequencing and Low-Pass Genome Sequencing are able to identify CNVs with high sensitivity, so much so that recently Low-Pass Genome Sequencing has been proposed as a validated method for clinical cytogenetics, with a diagnostic yield even higher than chromosomal microarray analysis.
Deng C, Daley T, Calabrese P, Ren J, Smith AD. Predicting the Number of Bases to Attain Sufficient Coverage in High-Throughput Sequencing Experiments. J Comput Biol. 2020 Jul;27(7):1130-1143. doi: 10.1089/cmb.2019.0264. Epub 2019 Nov 15. PMID: 31725321
Tham CY, Tirado-Magallanes R, Goh Y, Fullwood MJ, Koh BTH, Wang W, Ng CH, Chng WJ, Thiery A, Tenen DG, Benoukraf T. NanoVar: accurate characterization of patients' genomic structural variants using low-depth nanopore sequencing. Genome Biol. 2020 Mar 3;21(1):56. doi: 10.1186/s13059-020-01968-7. PMID: 32127024;
Petrackova A, Vasinek M, Sedlarikova L, Dyskova T, Schneiderova P, Novosad T, Papajik T, Kriegova E. Standardization of Sequencing Coverage Depth in NGS: Recommendation for Detection of Clonal and Subclonal Mutations in Cancer Diagnostics. Front Oncol. 2019 Sep 4;9:851. doi: 10.3389/fonc.2019.00851. PMID: 31552176
Chau MHK, Wang H, Lai Y, Zhang Y, Xu F, Tang Y, Wang Y, Chen Z, Leung TY, Chung JPW, Kwok YK, Chong SC, Choy KW, Zhu Y, Xiong L, Wei W, Dong Z. Low-pass genome sequencing: a validated method in clinical cytogenetics. Hum Genet. 2020 Nov;139(11):1403-1415. doi: 10.1007/s00439-020-02185-9. Epub 2020 May 25. PMID: 32451733.
Sims D, Sudbery I, Ilott NE, Heger A, Ponting CP. Sequencing depth and coverage: key considerations in genomic analyses. Nat Rev Genet. 2014 Feb;15(2):121-32. doi: 10.1038/nrg3642. PMID: 24434847.