This article has been reviewed according to Science X’s editorial process and policies. Editors have highlighted the following characteristics while ensuring the credibility of the content:
Credit: CC0 Public Domain
Credit: CC0 Public Domain
Deep learning – a form of artificial intelligence capable of improving itself with limited user input – has radically reshaped the landscape of biomedical research since its emergence in the early 2010s. It has been particularly impactful in genomics, a field of biology that studies how our DNA is organized into genes and how those genes are turned on or off in individual cells.
Despite this synergy, genomics researchers looking to apply this technology are often challenged by the actual coding required to analyze huge pools of dense data.
Now, researchers at the University of California San Diego have simplified this task for researchers by creating a new deep learning platform that can be quickly and easily adapted to a wide variety of genomics projects. The newly developed software, called EUGENe, is described in a study published on November 16, 2023 in Natural Computational Science.
“Each of our cells has the same DNA, but the way that DNA is expressed changes what our cells look like and what they do,” explained Hannah Carter, Ph.D., associate professor in the Department of Medicine at UC San Diego School of medicine.
“Deep learning can provide valuable insight into the biological machinery that drives this variation, but can be challenging to implement for researchers without extensive computer science expertise. We wanted to create a platform that can help genomics researchers streamline their deep learning data analysis to make predictions. from raw data.”
Although genes that code for specific proteins make up only about 2% of our total genome, the remaining 98% of our DNA sequence, often referred to as “junk” DNA of no known function, plays a critical role in determining when, where and how certain genes are activated. Unraveling the functions of these non-coding regions of the genome is a long-standing goal of genomics researchers, and deep learning has proven to be a powerful tool for achieving this goal—at least when researchers can figure out how to use it.
“Many existing platforms require many hours of coding and data wrangling to use,” said first author Adam Klie, a Ph.D. student in Carter’s lab. “Most projects require researchers to start from scratch, which requires expertise that not all labs interested in these things have access to.”
Klie designed the new software to solve the computing challenges he faced in his own work.
“With EUGENe, you give an algorithm a sequence of DNA and ask it to make predictions about anything you would expect DNA to predict, such as whether a particular DNA sequence is functional or whether it regulates a gene in a certain biological context,” Klie said.
“This lets you explore the properties of the DNA sequence and ask what would happen if I modified this piece here or moved this piece there. This is especially relevant for researchers studying complex genetic disorders where many different sequences are implicated. ”
The researchers tested EUGENe by trying to reproduce the results of three existing genomic studies that used several different types of sequencing data. Normally, analyzing these different types of data would require mixing and matching multiple technology platforms. However, EUGENe proved to be adaptable enough to reproduce the results of each of these studies.
“Being able to reproduce results is critically important in all scientific research, but can be very difficult in genomic studies that use deep learning,” Carter said.
“EUGENe is already showing great promise in how adaptable it is to different types of DNA sequencing data and supports a lot of different deep learning models. We hope it will evolve into a platform that can support collaborative tool development by the research community and accelerate genome research .”
While the current version of EUGENe works on many types of genomic data, the researchers are working to expand its scope to include an even wider range of data types, such as single-cell sequencing data, which looks at the genomics of individual cells rather than of an entire tissue . They also plan to make EUGENe available to research groups around the world.
“One of the exciting things about this project is that the more people use the platform, the better we can make it over time, which will be important as deep learning continues to evolve so rapidly,” Carter said. “We hope our platform will open many doors for researchers in this field and help them answer new questions about the complex molecular machinery inside all of us.”
Co-authors of the study include: David Laub, James V. Talwar, Joe J. Solvason and Emma K. Farley at UC San Diego, Hayden Stites at Daniel Land High School and Tobias Jores at the University of Washington.
Predictive analyzes of regulatory sequences with EUGENe, Natural Computational Science (2023). DOI: www.nature.com/articles/s43588-023-00544-w , www.nature.com/articles/s43588-023-00544-w
Natural Computational Science