Pseudogenes are universal and plentiful within genomes. They originate from the decay of duplicated genes throughout evolution and resemble functional genes but contain deficiencies within the coding sequence such as stop codons, frameshifts, deletions, etc. These deficiencies may have been acquired during duplication and as a result, may result in a loss of gene function. While it may seem disadvantageous to retain pseudogenes riddled with mutations and stop codons, research has shown that some pseudogenes have beneficial roles. Some pseudogenes have since obtained recognition as having vital roles in the regulation of their parent genes, and many still transcribe into RNA transcripts. These transcripts may form small interfering RNA (siRNA) or even decrease microRNA concentrations (miRNA). Although the exact number of pseudogenes is unknown, extrapolations have approximated around twenty thousand pseudogenes within the human genome. Historically, pseudogenes were synonymous with nonfunctional artifacts as they occur within noncoding regions of the human genome.
Some pseudogenes that researchers studied in different species share conserved mutations. Pseudogenes from humans, chimpanzees, dogs, cows, mice, and rats were found to have conserved point mutations at various gene locations. This idea of shared mutations amongst different organisms is thought to have correlations to a common ancestor or evolutionary descent. Mutations that are not deleterious to pseudogenes allow them to persist and undergo evolution with acquired random mutations and genetic drifts. These established pseudogenes can, therefore, be powerful tools for phylogenetic studies investigating the evolution of specific genes.
Pseudogenes can classify into nonprocessed or processed pseudogenes. Processed pseudogenes appear near their paralogous gene form through retrotransposition. The process of retrotransposition occurs through the reintegration of a reverse transcribed mRNA transcript (cDNA) at a new location within the genome. Unprocessed pseudogenes differ from processed forms by retaining their intron-exon structure.
Issues of Concern
Currently, due to the diverse and vast amount of pseudogenes sharing similar coding sequences to their functional gene counterparts, there is no standard method for identifying them without identifying the entire genome. However, many independent groups are currently working to resolve this dilemma and create a standardized and efficient means of pseudogene identification.
Not many full genome sequencing projects have counted the number of pseudogenes present. One study of human chromosomes 21 and 22 revealed 393 pseudogenes amongst these two chromosomes alone. The study further extrapolated the data and estimated a total of 20000 pseudogenes within the human genome. Other studies estimate as many as 23000 to 33000 pseudogenes. All studies have estimated that pseudogenes represent up to one-third of the human genome.
The term "nonfunctional pseudogenes" is also a concern when defining the relationship between pseudogenes and their parental copies. One study found that nitric oxide synthase (NOS) in the snail family, Lymnaea stagnalis, and its pseudogene have an inhibitory relationship with each other. The NOS pseudogene transcript was revealed to act as an antisense RNA through hybridization and decrease the expression of the functional mRNA. However, the NOS pseudogene itself has numerous defects and is unable to code for an actual protein like its functional copy.
On a molecular level, pseudogenes have associations with several roles. Numerous studies have revealed that specific genes and their associated pseudogenes display regulatory roles in the cell. Some pseudogenes have exhibited antisense RNA properties, siRNA properties, and even an ability to affect mRNA stability. Through these molecular functions, pseudogene transcripts can modulate the number of parental copy transcripts expressed.
Scientists have studied specific examples of pseudogenes for their observed roles regarding the regulation of biochemical processes. An example of pseudogene regulation is with Oct4, a transcription factor, and its associated pseudogene. The Oct4 pseudogene’s RNA transcript has been observed to inhibit differentiation with the original Oct4’s RNA transcript. Additionally, researchers found a knockdown or Oct4 pseudogene antisense RNA to increase concentrations of Oct4 and its associated pseudogene.
siRNAs additionally regulate gene expression. In one study, pseudogene transcripts were observed to form hairpin structures through folding and become functional siRNAs that repressed gene expression. The study additionally removed Dicer, a protein responsible for producing siRNAs, and observed a decreased concentration of pseudogene-derived siRNAs with subsequent increased expression of the coding gene’s mRNA products. The study further supports the idea of pseudogene derived siRNA regulation.
Another mechanism of pseudogene function lies in its ability to affect mRNA stability through interactions with miRNA. miRNA typically pairs with the 3’ untranslated region of an mRNA transcript and causes degradation or lower expression levels. One example of how pseudogene mRNA transcripts interact with miRNA is in the relationship between PTEN, a tumor suppressor, and its pseudogene PTENP1. In one study, the PTENP1 pseudogene mRNA binds to miRNA and has decreased the concentration of the functional miRNA within the cell. In this way, PTENP1 allows the PTEN mRNA to escape miRNA repression. Another study is exploring the regulatory relationship between heat shock proteins, Hsp90, and its associated pseudogenes HSP90AA1 and HSP90AA2. Heat shock proteins actively express and account for two percent of all expressed proteins. These proteins, through microarray data, have numerous retrotransposed pseudogenes.
Historically, pseudogenes were referred to as "junk DNA" due to their location in non-coding sequences of the genome. However, recent studies have begun to unravel various functions of pseudogenes and their mRNA transcripts. These functions vary in regulatory roles, and they include antisense RNA, siRNA, miRNA-like, and miRNA binding or inhibiting properties.
Pseudogenes are located diffusely throughout the genome. Due to the sequence similarity between pseudogenes and their functional counterparts, it is often difficult for scientists as misidentification errors can and do occur frequently. Additionally, not every gene within the genome has associated pseudogenes. Some genes even have paralogous pseudogenes, that, with evolution, have been able to insert into different chromosomes from their functional gene copies. However, the identification of pseudogenes is significant and necessary to understand their molecular role in disease as well as their relationship to their functional gene copy.
The identification of pseudogenes is especially difficult when they originate from mitochondrial DNA that retrotransposed into nuclear DNA. The complexity of identification may be able to be overcome using in silico analysis, which utilizes a homology-based, whole-genome approach. While the identification of pseudogenes is continually evolving and is a current work in progress. Many independent groups, such as REGEXP, PseudoFinder, RetroFinder, PseudoPipe, and GIS-PET, have an ongoing effort to standardize the identification of pseudogenes.
Pseudogenes play an essential role in comparative studies regarding genomics as they can provide a record of ancient genes. They are used to determine the rate of gene duplication and follow the evolution of sequence changes in organisms. Thus, pseudogenes are unique and helpful for phylogenetic studies.
Additionally, pseudogenes play essential parts in gene regulation. Research has shown that pseudogenes code for RNA transcripts that can regulate their respective parental copy genes. Through this level of regulation, pseudogene products can increase or decrease the level of expression of these parental genes and their protein products.