What does “coverage” mean?
Although the meaning of the term “coverage” may seem very simple, this term is often misused.
In the context of Next-Generation Sequencing (NGS), coverage indicates the average number of reads that “cover” a specific target region. Coverage, therefore, always describes a relationship between the number of reads and a reference region and can be expressed in terms of percentage or average coverage (e.g., 100X means that, on average, the target regions are covered by 100 reads). Do not confuse coverage with sequencing depth, which describes the number of total reads produced by sequencing in absolute terms, and the reads depth, which represents the number of reads covering each base.
Which is the best coverage for sequencing?
There is no single answer to this question. The coverage necessary for the success of the analysis is influenced by numerous factors such as (1) the length of the reads, (2) the size of the reference genome, (3) the specific application of interest, (4) the error rate of the technology used, (5) the gene expression levels and (6) the complexity of the target regions. For this, an initial experimental phase is often necessary to establish the optimal coverage for one’s analysis, using the data present in the literature for clinical trials and the indications of the scientific community as a starting point.
What about NGS-based clinical diagnostics?
Again, there is no universal answer. When using NGS for clinical diagnostics, multiple observations for a single base are necessary for a reliable variant calling. One would say the higher the coverage, the better, but costs must be considered. Very high coverage is often associated with a high price, which may be unsustainable under certain circumstances.
So what is the best cost-result ratio?
Although there are no official guidelines that establish the average coverage for diagnostic analysis, the scientific community shares fairly universal coverage parameters, which depend on the application of interest. In general, it is used:
- WGS (whole genome sequencing): recommended coverage 30X-50X;
- WES (whole exome sequencing): recommended coverage 80-180X;
- ChiP-Seq (ChiP sequencing): recommended coverage 100X;
These parameters generally allow overcoming technical errors due to the sequencing method, reducing false negatives, and obtaining reliable biological data.
Some applications, such as the search for low-frequency mutations, require higher average coverage. Low-frequency mutations do not follow classical Mendelian Genetics. They can be found, for example, in cases of somatic or germinal mosaicism and tumors (clonal and subclonal mutations). Even the sequencing of mitochondrial DNA requires higher coverage due to heteroplasmy.
Even in these cases, there is still no univocal opinion, and each laboratory adapts its protocols to obtain the best results. As a result, coverage ranges may vary from 250X to 1500X depending on the purpose and technology used.
New applications and coverage
With the advancement of technology and the advent of third-generation sequencing techniques, coverage may vary. For instance, for Copy Number Variants (CNVs) detection based on NGS data, the higher the coverage, the better. However, lower coverage such as 10x are used in Low-Pass Genome Sequencing for genome-wide CNV detection (notably, however, this technique may be used for CNV only and not for SNVs).
Deng C, Daley T, Calabrese P, Ren J, Smith AD. Predicting the Number of Bases to Attain Sufficient Coverage in High-Throughput Sequencing Experiments. J Comput Biol. 2020 Jul;27(7):1130-1143. doi: 10.1089/cmb.2019.0264. Epub 2019 Nov 15. PMID: 31725321
Tham CY, Tirado-Magallanes R, Goh Y, Fullwood MJ, Koh BTH, Wang W, Ng CH, Chng WJ, Thiery A, Tenen DG, Benoukraf T. NanoVar: accurate characterization of patients’ genomic structural variants using low-depth nanopore sequencing. Genome Biol. 2020 Mar 3;21(1):56. doi: 10.1186/s13059-020-01968-7. PMID: 32127024;
Petrackova A, Vasinek M, Sedlarikova L, Dyskova T, Schneiderova P, Novosad T, Papajik T, Kriegova E. Standardization of Sequencing Coverage Depth in NGS: Recommendation for Detection of Clonal and Subclonal Mutations in Cancer Diagnostics. Front Oncol. 2019 Sep 4;9:851. doi: 10.3389/fonc.2019.00851. PMID: 31552176
Chau MHK, Wang H, Lai Y, Zhang Y, Xu F, Tang Y, Wang Y, Chen Z, Leung TY, Chung JPW, Kwok YK, Chong SC, Choy KW, Zhu Y, Xiong L, Wei W, Dong Z. Low-pass genome sequencing: a validated method in clinical cytogenetics. Hum Genet. 2020 Nov;139(11):1403-1415. doi: 10.1007/s00439-020-02185-9. Epub 2020 May 25. PMID: 32451733.
Sims D, Sudbery I, Ilott NE, Heger A, Ponting CP. Sequencing depth and coverage: key considerations in genomic analyses. Nat Rev Genet. 2014 Feb;15(2):121-32. doi: 10.1038/nrg3642. PMID: 24434847.
Nicely explained. I have one thing to understand. What does ‘Targeting median coverage of >500X,’ aiming for 99% coverage at > 100x’ mean?
please reply to firstname.lastname@example.org.
Thanks a lot, in advance.