Kyoto University scientists are getting closer to finding ways to identify changes to RNA sequences that impact protein formation and can cause diseases. Their approach, published in the journal Genomics, utilizes probability algorithms together with an already-available, high-throughput sequencing technology.
“Modifications that are found in all types of biological RNA influence gene regulation, which ultimately decides how different cells function in our body,” explains Ganesh Pandian Namasivayam of Kyoto University’s Institute for Integrated Cell-Material Science (WPI-iCeMS). “Abnormalities in these modifications can lead to severe diseases, like diabetes, neurodevelopmental disorders, and cancer. Knowing how and where these RNA modifications are is of prime importance from a clinical viewpoint.”
There are already ways to identify RNA modifications, but they are imperfect. Biophysical approaches such as chromatography and mass spectrometry can only process small amounts of RNA at a time. High-throughput sequencing methods, which can process large amounts of RNA, involve laborious sample preparation, can’t simultaneously map multiple modifications, and are error-prone.
Namasivayam and colleagues at Kyoto University tested and found two approaches that can relatively successfully distinguish a well-known and abundant RNA modification involving the replacement of the nucleotide base uracil with another called pseudouridine.
Similar to DNA, RNA is formed of a strand of varying combinations of four different nucleotide bases: uracil, cytosine, adenine and guanine. How these bases are arranged determines the code that signals what protein is meant to be made. When pseudouridine replaces uracil in the RNA backbone, it can lead to increased protein production or to changing the code from one that signals the cessation of information translation to one that signals amino acid formation.
The team’s approach involves using an already available direct RNA sequencing platform developed by Oxford Nanotechnologies. In this platform, RNA strands pass through tiny pores in a membrane. Disruptions are caused in the current moving through the membrane depending on the order of the different RNA bases. This allows scientists to ‘read’ the sequence. But scientists using this approach often find it difficult to distinguish different types of modifications from one another.
Namasivayam and his colleagues found they could use algorithms to identify a high probability of existence of a pseudouridine substitution compared to the possibility that it was a different kind of base change.
One of their strategies compares short RNA runs of five nucleotide bases in which uracil, pseudouridine or cytosine are surrounded on either side by the same bases. The readings then go through algorithms that calculate the probability of the middle base being one of the three. They used their strategy, called Indo-Compare (Indo-C), on engineered RNA sequences and then on yeast and human RNA and found it was good at distinguishing the pseudouridine substitutions from the others.
They were also able to identify pseudouridine substitutions by mixing a chemical probe with RNA samples, called CMC, that attaches to it. This changed the sequence readings in a way that identifies the modification.
“We believe our work will make nanopore sequencing-based methods less laborious for detecting RNA modifications and more capable of characterizing the impacts of these modifications on development and disease,” says Namasivayam.
The team next aims to optimize the use of both approaches together to more accurately identify RNA and DNA modifications. This will involve fabricating new chemical probes that correspond to specific changes. They also plan on further developing advanced machine learning algorithms that complement chemical probe-based direct RNA sequencing approaches.