The sequencing library’s preparation is the first step in any Next Generation Sequencing analysis. There are different ways to prepare a sequencing library, depending on the sequencing platform (Life Technologies, Illumina, Roche, Pacific Biosciences) and the planned analysis (whole genome sequencing, whole exome sequencing, targeted DNA sequencing, whole-transcriptome sequencing, targeted RNA sequencing, ChIP-seq, RIP-seq, epigenetic studies and more). The NGS library can also be successfully prepared from single cells in case of need.
DNA and RNA sequencing libraries
The sequencing library can be prepared by starting from genomic DNA or RNA. The workflow for the preparation of a DNA sequencing library consists of three fundamental steps:
- Fragmentation and sizing of the nucleic acid (DNA or RNA) to obtain fragments of a predefined length
- Attachment of the adaptors (adapters) to the extremities of the fragments
- Library quantification
There’s an additional step in any RNA sequencing library: the RNA conversion in cDNA. The fragmentation step can be done before or after the cDNA synthesis.
Nucleic acid fragmentation (DNA, RNA, or cDNA) can be done by utilizing physical methods (acoustic shearing, better known as sonication), enzymatic methods (by using aspecific endonucleases such as the DNase I, Fragmentase, or commercial enzymatic kits like the Nextera tagmentation kit – Illumina – which not only breaks the DNA but also attaches the adapters with a transposase) or chemical methods. The physical and enzymatic methods are the most widely used (see, for instance, the sonication made with Covaris to obtain DNA fragments in the 100–5,000 bp range). However, for mate-pair libraries, long fragments can be obtained (6,000 to 20,000 bp).
The size of the fragments is crucial. The optimal size of the library fragments depends on the platform to be used and the scope of the analysis. For instance, up to 1,500 bp fragments can be used on Illumina platforms. However, in the case of exome sequencing, it is recommended to use an insert size of 200-250 bp as a maximum (the term “insert” refers to the DNA fragment once the adapters have been added to its extremities). This is due to the average size of a human exon, which is about 200 bp.
Attachment of the adapters
Once the DNA or RNA fragmentation is complete, the so-called adapters (or adaptors) must be attached to both extremities of each fragment. A sequencing library is, by definition, a pool of DNA fragments with adapters attached. Adapters are designed to interact with a specific sequencing platform, either the flow-cell surface (Illumina) or beads (Ion Torrent). After the adapters have been attached, there’s a sizing phase, during which all fragments of undesired size and all adapters dimers are removed. Adapters dimers result from self-ligation of the adapters without a library insert sequence and are particularly abundant when the initial DNA quantity. Since adapter dimers can significantly affect the sequencing yield by consuming valuable space on the flow cell, removing them from the library is essential. A clean-up can successfully remove the dimers with magnetic beads.
The library quantification is a pivotal step and should be made using the most accurate and sensible method. The sequencing library quantification is usually made by PCR-based methods (digital PCR – dPCR – or quantitative PCR – qPCR).
The final quality of the sequencing library
When preparing a sequencing library, getting the highest complexity level possible is essential. In other words, the final library must reflect the singularity of the starting material as much as possible. This result can be obtained by limiting the number of segmental duplications. The shorter the fragments, the higher the chance that the fragments are less specific and can align at more than one locus of the reference sequence. So library complexity can be essentially measured by the percentage of duplicate reads present in the sequencing data.
A sound sequencing library will help minimize the PCR step needed to enrich the target because the amplification step can be a source of bias. For example, the GC content substantially impacts PCR amplification efficiency. However, some enzymes can reduce the GC content-related biases, such as the Kapa HiFi (Kapa Biosystems, Wilmington, MA) or AccuPrime Taq DNA Polymerase High Fidelity (Life Technologies).
Another critical factor to consider is the mitigation of batch effects. For example, when doing multiplexing, errors can occur in whole batches of samples, e.g., because of bad reaction settings reaction, reagent quality, pipetting accuracy, or different technicians.
Breda Genetics srl
Breda Genetics srl is supported by global biotechnological partners in processing its sequencing loads. Different fragmentation methods can be used during the preparation of the sequencing library, depending on the exome type (from sonication to enzymatic fragmentation).