Our news feeds are flooded with articles about artificial intelligence, with ChatGPT and related AI technologies drawing a lot of public attention. Beyond the widely used chatbots, biologists are figuring out methods to use AI to investigate the fundamental functioning of our genes.
Researchers at the University of California, San Diego, who study DNA sequences that activate genes have previously utilized artificial intelligence to solve a puzzling jigsaw piece connected to gene activation, a fundamental mechanism involved in growth, development, and disease. James T. Kadonaga, a professor in the School of Biological Sciences, and his team used machine learning, a form of artificial intelligence, to identify the downstream core promoter region (DPR), a “gateway” DNA activation code that plays a role in up to one-third of our genes’ activity.
The researchers Long Vo ngoc and Torrey E. Rhyne, along with Kadonaga, built on this discovery by using machine learning to find “synthetic extreme” DNA sequences that have been purposefully created to operate in gene activation. The research team compared the DPR gene activation element in humans and fruit flies (Drosophila) before testing millions of alternative DNA sequences using machine learning (AI). They discovered unique, specially crafted DPR sequences using AI that are active in people but inactive in fruit flies, and the other way around. Now, more generally, this method could be used to find synthetic DNA sequences that have properties that could be helpful in biotechnology and medicine.
This method may one day be utilized to locate artificial extreme DNA sequences with useful and practical uses. The potential of drug A (condition X) but not drug B (condition Y) to activate a gene might be tested instead of comparing humans (condition X) to fruit flies (condition Y), according to Kadonaga, a renowned professor in the Department of Molecular Biology. This approach could also be used to identify specifically customized DNA sequences that activate a gene in tissue 1 (condition X) but not in tissue 2 (condition Y). This AI-based strategy has a plethora of real-world uses. The likelihood of finding synthetic extreme DNA sequences is quite low—perhaps one in a million—but if they do, AI could find them.
In the field of machine learning, which is a subset of AI, computer systems continuously develop and learn based on data and expertise. Support vector regression was used in the new study by Kadonaga, Vo ngoc (a former postdoctoral researcher at UC San Diego who is now at Velia Therapeutics), and Rhyne (a staff research associate) to “train” machine learning models with 200,000 established DNA sequences based on information from real-world laboratory experiments. These objectives served as examples for the machine learning system that was being used. The machine learning algorithms for humans and fruit flies were then “fed” 50 million test DNA sequences, and instructed to compare the sequences and find unique sequences within the two massive data sets.
Although the machine learning algorithms revealed that human and fruit fly sequences mainly overlapped, the researchers concentrated on the fundamental question of whether the AI models could identify rare situations where gene activation is very active in humans but not in fruit flies. The response was a resounding “yes.” The machine learning models were successful in locating DNA sequences that were unique to humans and fruit flies. It’s significant to note that conventional (wet lab) testing techniques were used at Kadonaga’s lab to confirm the functionalities of the extreme sequences that the AI had predicted.
We weren’t sure the AI models were “intelligent” enough to anticipate the activities of 50 million sequences before starting our work, especially outlier “extreme” sequences with odd activities. Kadonaga noted that since each wet lab experiment would take almost three weeks to complete, it would be practically impossible to conduct the comparable 100 million wet lab experiments that the machine learning technology analyzed. Therefore, it is very impressive and quite remarkable that the AI models could predict the behaviors of the rare one-in-a-million extreme sequences.
The machine learning system’s discovery of the unusual sequences serves as a successful demonstration and paves the way for additional applications of machine learning and other AI technologies in biology.
People are constantly coming up with novel uses for AI tools like ChatGPT in daily life. Here, we show how AI may be used to create unique elements of DNA for gene activation. According to Kadonaga, this approach should be useful in biotechnology and biomedical research. In general, biologists are likely just beginning to harness the power of AI technology.