What are the bed files?
If you are interested, or directly involved, in Next Generation Sequencing applications for research or clinical diagnostics, especially in whole exome sequencing or targeted multigene panel testing, you’ve certainly heard of the so-called bed files. If you haven’t heard of them yet, you’ll soon need to learn what they are and how to use them!
The bed files are reference files, with extension .bed, which are provided by the manufacturer of the capturing kit. So, if you want the bed files, first make sure to have the email address of your kit producer, because you need to ask them! The bed files precisely illustrate the extension of the coverage of your kit. If you are wondering whether a certain gene (or part f it) is included in the NGS assay you are about to perform, you need to watch at the bed files.
How do I read a bed file?
Bed files are designed to view the extension of an exome kit or a multigene panel enrichment. If you simply click on the bed file itself, your system will likely open it in a notepad application as a text file and you won’t be able to use it. To correctly visualize the bed file, you have to upload it on a visual genomic platform such as IgV (Integrative Genomics Viewer) or the UCSC Genome Browser.
Is it important to use the bed files?
Using bed files before performing an NGS analysis such as whole exome sequencing or targeted NGS panel testing is essential. Indeed, as you may also see from our large menu of exome sequencing solutions, there are several types of exomes, depending on the extension of the human exonic content, which may range from less than 10 MB to up to more than 90 MB, the number of genes (from less than 3,000 to the totality of the 20,000 human genes), and the inclusion or exclusion of regulatory regions in the untranslated regions (5′-UTRs and 3′-UTRs). In addition, although exome sequencing should include the entirety of coding regions by definition, some capturing kits may cover only proven pathogenic hotspots or may lack coverage of gene portions due to the specific conformation of the local sequence or even the expected expression level of different gene isoforms.
Are you the Bioinformatics guy?
Are you the guy dealing with heavy calculations to process alignments and variant calling all day long? Then, you’ll certainly want to know all the crazy details about bed files and, of course, we don’t want to disappoint you!
The .bed files belong to the category of data files and are edited in a tab-delimited text file format. These files are made to annotate the coordinates of genomic regions. They consist of numerous lines, containing from 3 to 12 columns of data, each representing one of the genomic region of interest. In each line, there are 3 required data columns, while the others are optional. The first three required fields in each line are:
- Chromosome: name of the chromosome, supplied with or without the prefix “chr”;
- Chromosome Start: starting position of the region of interest in a standard coordinate system;
- Chromosome End: ending position of the region of interest in a standard coordinate system.
Optional fields can be added to this information to provide more details of the region of interest, always delimited by spaces or tabs.