The preparation of the sequencing library is the very first step in any Next Generation Sequencing analysis. There are different ways to prepare a sequencing library, depending on the sequencing platform (Life Technologies, Illumina, Roche, Pacific Biosciences) and the planned analysis (whole genome sequencing, whole exome sequencing, targeted DNA sequencing, whole-transcriptome sequencing, targeted RNA sequencing, ChIP-seq, RIP-seq, epigenetic studies and more). In case of need, NGS library preparation can be successfully obtained also from single cells.
DNA and RNA sequencing libraries
A sequencing library can be made by starting from genomic DNA or from RNA. The workflow for the preparation of a DNA sequencing library consists of three fundamental steps:
- Fragmentation and sizing of the nucleic acid (DNA or RNA) to obtain fragments of a predifined length
- Attachment of the adaptors (adapters) to the extremities of the fragments
- Library quantification
In any RNA sequencing library there’s an additional step: the RNA conversion in cDNA. The fragmentation step can be done before or after the cDNA synthesis.
Nucleic acid fragmentation (DNA, RNA or cDNA) can be done by utilizing physical methods (acoustic shearing, better known as sonication), enzymatic methods (by using aspecific endonucleases such as the DNase I, Fragmentase or commercial enzymatic kits like the Nextera tagmentation kit – Illumina – which not only breaks the DNA, but also attaches the adapters with a transposase) or chemical methods. The physical and enzymatic methods are the most widely used (see for instance the sonication made with Covaris to obtain DNA fragments in the 100–5,000 bp range). For mate-pair libraries, particularly long fragments can be obtained (6,000 to 20,000 bp).
The size of the framgents is crucial. The optimal size of the library fragments depends on the platform to be used and on the scope of the analysis. For instance, fragments of up to 1,500 bp can be used on Illumina platforms. However, in case of exome sequencing, it is recommended to use an insert size of 200-250 bp as a maximum (the term “insert” refers to the DNA fragment once the adapters have been added to its extremities). This is due to the average size of a human exon, which is about 200 bp.
Attachment of the adapters
Once the DNA or RNA fragmentation is complete, the so called adapters (or adaptors) must be attached to both extremities of each fragment. A sequencing library is, by definition, a pool of DNA fragments with adapters attached. Adapters are designed to interact with a specific sequencing platform, either the surface of the flow-cell (Illumina) or beads (Ion Torrent). After the adapters have been attached, there’s a sizing phase, during which all fragments of undesired size and all adapters dimers are revomed. Adapters dimers are the result of self-ligation of the adapters without a library insert sequence and are particularly abundant when the initial DNA quantity. Since adapter dimers can significantly affect the sequencing yield by consuming valuable space on the flow cell, it is very important to to remove them from the library. The dimers can be successfully removed by a clean-up with magnetic beads.
The library quantification is a pivotal step and should be made using the most accurate and sensible method. The sequencing library quantification is usually made by PCR-based methods (digital PCR – dPCR – or quantitative PCR – qPCR).
Final quality of the sequencing library
When preparing a sequencing library it is important to get the highest complexity level as possible. In other words it is important that the final library reflects as much as possible the singularity of the starting material. This result can be obtained first of all by limiting the number of segmental duplications. The shorter the fragments, the higher the chance that the fragments are less specific and can align at more than one locus of the reference sequence. So library complexity can be essentially measured by the percentage of duplicate reads that are present in the sequencing data.
A good sequencing library will help minimizing the PCR step needed to enrich the target, because the amplification step can be a source of bias itself. For example, the GC content has a substantial impact on PCR amplification efficiency. Some enzymes can be used to reduced the GC content-related biases, such as the Kapa HiFi (Kapa Biosystems, Wilmington, MA) or AccuPrime Taq DNA Polymerase High Fidelity (Life Technologies).
Another important factor to consider is the mitigation of batch effects. When doing multiplexing, errors can occur in whole batches of samples, e.g. because of wrong reaction settings reaction, reagent quality, pipetting accuracy, or different technicians.
Breda Genetics srl
Breda Genetics srl is supported by first level biotechnological partners in the processing of its sequencing loads. During the preparation of the sequencing library, depending on the exome type, different fragmentation methods can be used (sonication for EXOME 15MB and EXOME 50MB, enzymatic fragmentation for EXOME 38MB). Since insert size is a pivotal parameter in exome sequencing, we insist on using inserts of about 240 bp to meet with confidence the average size of human exons (about 200 bp)