RNA splicing mutations play major role in genetic variation and disease
RNA splicing is a major underlying factor that links mutations to complex traits and diseases, according to an exhaustive analysis of gene expression in whole genome and cell line data. Reporting in Science on April 29, researchers from the University of Chicago and Stanford University investigated how thousands of mutations affect gene regulation in traits such as height, and diseases such as multiple sclerosis. The findings highlight the need for a better understanding of the role of RNA splicing on variation in complex traits and disease, and enable more accurate functional interpretations of genome-wide association study results.
"We were able to comprehensively identify how mutations perturb gene expression all the way from transcription to translation, and how they affect different regulatory mechanisms," said Yoav Gilad, PhD, professor of human genetics at the University of Chicago and co-senior author of the study. "We found that a significant proportion of the associations between mutation and variation in disease is explained by effects on RNA splicing. We can now work to better understand this relationship and add another tool in our kit to figure out the biological mechanisms that cause disease."
Over the past decade, genome-wide association studies (GWAS) have been remarkably successful at revealing variations in the human genome that are associated with biological traits and complex diseases. These numerous single-letter mutations, known as quantitative trait loci (QTL), are mostly found in regions outside genes and are assumed to play a role in gene regulation. However, the functional significance of the vast majority of QTLs is unclear.
To take a comprehensive look at the underlying roles of genetic variants, Gilad and his colleagues, co-led by Jonathan Pritchard, PhD, professor of genetics at Stanford University, applied a suite of powerful statistical tools toward whole genome and cell line data gathered from 70 individuals. In a series of experiments that spanned eight years, they analyzed QTLs associated with seven regulatory phenotypes, including gene expression levels, RNA transcription and protein translation. For each regulatory phenotype, the team identified QTLs and quantified their effect on almost every step of gene regulation. They found that many of the QTLs overlapped in their effect on transcription, translation and ultimately protein levels.
"We have never considered so many data sets from a population sample of the exact same individuals before, so this type of analysis was never done," Gilad said.
The researchers, led by study author Yang Li, also developed a new computational method, called LeafCutter, which for the first time enables the effective identification of QTLs that are specifically involved in RNA splicing. All genes undergo splicing, where a precursor form of mRNA is cut and re-stitched together into numerous combinations. This significantly increases the number of proteins a single gene can code for and is thought to explain much of the complexity in higher-order organisms. At least 15 percent of all human diseases are thought to be due to splicing errors. However, until LeafCutter, no methodology existed to effectively identify and analyze splicing QTLs.
The team's analysis revealed almost three thousand splicing-specific QTLs, and many appear to play a major contributing role in the biology of genetic traits and disease. Splicing QTLs were most enriched in multiple sclerosis, and for other traits were roughly equal in influence with QTLs that affect global gene expression levels. Many of the splicing QTLs did not affect gene expression levels, suggesting that RNA splicing is a separate, but equally important, mechanism that underlies complex traits and disease.
"We now have a new appreciation for how important splicing is for disease," Gilad said. "Intuitively, we had assumed it is important, but we didn't really have a lot of genome-wide evidence until this study." The results provide the first comprehensive data for RNA splicing as an important link between genetic variation and disease. Importantly, genetic variants identified through GWAS can now be assayed for potential roles in RNA splicing. If only overall gene expression is measured, the function of many of these sites would remain opaque.
"When we incorporate more information about more mechanisms in more diseases, we can better understand how genetic variation drives disease and someday perturb or fix that process," Gilad said. "We clearly have to consider RNA splicing now in addition to gene expression, histone accessibility and other factors, as we try to learn these rules."