An easy-to-use deep learning genomics software


Credit: CC0 Public domain

Deep learning – a form of artificial intelligence that can improve itself with limited input from the user – has radically reshaped the landscape of biomedical research since its emergence in the early 2010s. It has had a particular impact on the field of genomics, an area of biology that investigates how our DNA is organized into genes and how these genes are activated or deactivated in individual cells.

Despite this synergy, genomics researchers looking to use this technology are often challenged by the actual coding required to analyze massive amounts of dense data.

Now, researchers at the University of California San Diego have simplified this task for scientists by creating a new deep learning platform that can be quickly and easily adapted to a wide variety of different genomics projects. The newly developed software, called EUGENe, is detailed in a study published on November 16, 2023 in Natural Computational Science.

“Each of our cells has the same DNA, but the way that DNA is expressed changes what our cells look like and what they do,” explains Hannah Carter, Ph.D., associate professor in the Department of Medicine at UC San Diego School out. of medicine.

“Deep learning can provide valuable insights into the biological machinery that powers this variety, but it can be challenging to implement for researchers without extensive computer science expertise. We wanted to create a platform that can help genomics researchers advance their deep learning data analysis to streamline making predictions from raw data.”

Although genes that encode specific proteins make up only about 2% of our total genome, the remaining 98% of our DNA sequence, often referred to as ‘junk’ DNA with no known function, plays a crucial role in determining when, where and how certain genes are activated. Unraveling the functions of these non-coding regions of the genome has been a long-standing goal of genomics researchers, and deep learning has proven to be a powerful tool for achieving this goal – at least if researchers can figure out how to use it .

“Many existing platforms require many hours of coding and data processing,” says first author Adam Klie, a Ph.D. student in Carter’s lab. “Most projects require researchers to start from scratch, which requires expertise that not all labs interested in this kind of thing have access to.”

Klie designed the new software to address the computing challenges he faced in his own work.

“With EUGENe, you give an algorithm a DNA sequence and ask it to make predictions about anything you would expect DNA to predict, such as whether a particular DNA sequence is functional or whether it regulates a gene in a particular biological context. Klie said.

“This allows you to examine the properties of the DNA sequence and ask what would happen if I adjusted this piece here or moved this piece there. This is especially relevant for researchers studying complex genetic disorders involving many different sequences. ”

The researchers tested EUGENe by attempting to reproduce the results of three existing genomics studies using different types of sequence data. Normally, analyzing these different types of data would require mixing and matching multiple technology platforms. However, EUGENe proved adaptable enough to reproduce the findings of each of these studies.

“Being able to reproduce results is critical in all scientific research, but can be very difficult in genomics studies that use deep learning,” says Carter.

“EUGENe is already showing great promise in how adaptable it is to different types of DNA sequence data and supports many different deep learning models. We hope it will evolve into a platform that can support the development of collaborative tools by the research community and accelerate genomics research. ”

While the current version of EUGENe works on many types of genomic data, the researchers are working to expand its scope to an even wider variety of data types, such as single-cell sequencing data, which looks at the genomics of individual cells rather than from to in a whole tissue. They also plan to make EUGENe available to research groups around the world.

“One of the exciting things about this project is that the more people use the platform, the better we can make it over time, which will be essential as deep learning continues to evolve so quickly,” Carter said. “We hope that our platform will open many doors for researchers in this field and help them answer new questions about the complex molecular machinery that lies within all of us.”

Co-authors of the study include: David Laub, James V. Talwar, Joe J. Solvason and Emma K. Farley at UC San Diego, Hayden Stites at Daniel Land High School and Tobias Jores at the University of Washington.

More information:
Predictive analyzes of regulatory sequences with EUGENe, Natural Computational Science (2023). DOI:,

Presented by the University of California – San Diego

Quote: Introducing EUGENe: an easy-to-use deep learning genomics software (2023, November 16) retrieved November 17, 2023 from genomics-software.html

This document is copyrighted. Except for fair dealing purposes for the purpose of private study or research, no part may be reproduced without written permission. The content is provided for informational purposes only.

READ MORE  The impact of artificial intelligence on software development? Still unclear