Coding and non-coding exons in the genes structure
Genes are the coding part of the genome and represent only 2% of the entire DNA chain. Despite this, the vast majority of pathogenic mutations causing rare disorders (up to 85%) falls right in the genes. Genes have a well-defined structure: they are made up of exons, which represent the coding part, alternate with introns, which represent the non-coding part. They are preceded by a promoter that controls their transcription, and most also have upstream and downstream regulatory regions, in the so-called UTRs (untranslated regions) called respectively 5′-UTR and 3′-UTR. Such regulatory regions contain sequence elements, like CRE (cis-acting regulatory elements), which are pivotal for gene expression.
Even if most exons of a gene (which are regularly sequenced when doing whole exome sequencing or whole genome sequencing) are coding for protein, some of them may be non-coding.
Where are the non-coding exons located?
The non-coding exons can find either at the beginning and/or at the end of the gene. When located at the beginning of the gene, they lie before the translation initiation codon (ATG). In some cases, the ATG itself is located in the middle of a non-coding exon, so that the first part of that exon is non-coding and the following is coding.
Why do non-coding exons exist?
Non-coding exons can contain some regulatory elements that modulate the protein expression, such as enhancers, silencer, or small non-coding RNA. Moreover, they can include some sequences that are targeted by initiation translational factors that speed-up or slow-down the protein translation, modulating the protein expression in a specific cell type or tissue. Finally, they are involved in the maintenance of the stability of the mRNA and its half-life.
Are there pathogenic mutations in non-coding exons?
Most capturing kits designed for whole exome sequencing do not include probes to cover non-coding regions, such as deep intronic sequences and some non-coding exons. Small variants (changing only one nucleotide, or insertions/deletions of few bases) that fall in these regions, are supposedly not pathogenic, since the error can not be conveyed to the protein due to the lack of translation. However, large deletions or duplications including the promoter or other important CREs, which are clearly pathogenic, may extend so to include even non-coding exons. Examples of mutations of this kind have been reported for some genes, e.g. CDKL5 (causing Rett-like phenotype with epilepsy) and EYS (causing autosomal recessive retinitis pigmentosa).
In summary, small point mutations in non-coding exons are not expected to be pathogenic, although the entire loss of these exons may be the beacon of a larger mutation which are pathogenic. So, if sequence analysis of the non-coding region may be of secondary importance, CNVs testing of those exons, at least in some genes, should be considered either by targeted molecular assays (MLPA, qPCR) or algorithmic CNV analysis based on NGS data.
References
- Clinvar database: “CDKL5”
- Eisenberger et al (2014) – Increasing the yield in targeted next-generation sequencing by implicating CNV analysis, non-coding exons and the overall variant load: the example of retinal dystrophies. PLoS One. 2013 Nov 12;8(11):e78496.