Definition
In most cases, a pseudogene can be considered as the ancient extra copy of a preexisting protein-coding gene (called parental gene), that undergoes a process of pseudogenization by disruption to its sequence due to accumulation of deleterious mutations. The result is usually a ‘nonfunctional gene’ with high sequence homology with the parental gene.
Classification
Pseudogenes can be classified according to their origin:
1) Duplicated pseudogenes. The extra copy is generated during the DNA replication phase of the cell. These pseudogenes conserved their original structure with promoters, exons and introns, and are often located in a locus close to the parental gene’s.
2) Processed pseudogenes. The extra copy is generated by retrotransposition mechanisms. Therefore, they are mRNA sequences copied and inserted into the genome. They do not contain introns or promoters, and are usually located in different chromosomes compared to their parental genes.
3) Unitary pseudogene. This particular type of pseudogene is not an extra copy of a pre-existing gene, but a normal copy of the parental gene itself, that becomes non-functional.
Function
About 20,000 pseudogenes have been identified in the human genome. For decades they were considered as merely nonfunctional genes or ‘junk DNA’. But evidences are now emerging that confirm their active role in genes regulation and in expanding the individual genetic information.
Some recent studies confirm not only that some pseudogenes encode proteins, but suggest that they might even be taken into account as biomarkers for disease diagnosis (i.e.: to distinguish normal tissues from lesion tissue) and as biological target for new therapeutic strategies (i.e.: miRNA decoys).
Given their high grade of homology with parental genes, pseudogenes are an important factor to take into account during NGS data analysis. Further below there’s a list of the most common pseudogenes that can interfere with a correct evaluation of variant calling.
Examples
Here’s a list of the most common pseudogenes. These pseudogenes may cause major problems in variant calling when performing NGS analysis.
[Gene – Associated diseases – Related pseudogenes]
CYP21A2 – Congenital adrenal hyperplasia – CYP21A1P
PKD1 – Polycystic kidney disease 1 – More than 7 pseudogenes
GBA – Gaucher disease – GBAP
HYDIN – Primary ciliary dyskinesia 5 – HYDIN2
IKBKG – Incontinentia pigmenti – IKBKGP1
SMN1 – Spinal muscular atrophy – SMN2, SMNP, LOC100132090
PMS2 – Lynch syndrome – More than 14 pseudogenes
References
• Chen et al. Re-recognition of pseudogenes: From molecular to clinical applications. Theranostics. 2020 Jan; 10(4):1479-1499. PMID: 32042317
• Poliseno R. Pseudogenes. Methods Mol Biol. 2014; 1167. PMID: 24940584
• Sen et Ghosh. Brief Funct Genomics. 2013 Nov; 12(6):536-47. PMID: 23900003
• Tutar Y. Pseudogenes. Comp Funct Genomics. 2012:424526. PMID: 22611337
• Wallace et Bean. Resources for Genetics Professionals — Genes with Highly Homologous Gene Family Members or a Pseudogene(s). In: GeneReviews® [Internet]. Seattle (WA): University of Washington, Seattle; 1993–2021. 2018 Mar 08.