Diagnosing Genetic Diseases with AI tools

Researchers at Goodfire, a San Francisco research firm, and the Mayo Clinic claim to have utilized an AI model to forecast which genetic mutations cause disease—and, critically, to explain why—offering a novel method for large-scale genetic disorder diagnosis and research.

The study predicts and comprehends which gene mutations may be “pathogenic” using methods from AI interpretability, a new field of study devoted to comprehending the opaque brains of AI systems.

According to Matthew Callstrom, a radiology professor and the director of the Mayo Clinic’s generative AI program, early detection and treatment of some tumors can mean the difference between life and death. But the human genome has more than 3 billion base pairs, making it a huge needle in a haystack.

The researchers used Evo 2, an open-source “genomic foundation model” taught by the Arc Institute, to identify potential biological characteristics and forecast which DNA mutations cause disease. Similar to how large language models (LLMs) like ChatGPT are trained to predict the next word in a text passage, Evo 2 is taught to predict the next “letter” in a DNA sequence. ChatGPT learns language structure and world facts through training on the majority of online texts. Evo 2 is trained on 128,000 genomes covering all domains of life—each made of only four letters (G, T, C, and A), the molecules that make up DNA—and learns which genetic sequences are ‘conducive to life,” explains Nicholas Wang, one of the paper’s authors.

However, this knowledge is encoded in the seven billion numbers that make up the model’s artificial brain: researchers can see the numbers, but their meaning is unclear. Just as an EEG detecting electrical activity in a human brain does not reveal what the patient is thinking, AI researchers can observe what is going on within the AI’s brain but fail to understand it.

The Goodfire researchers exposed Evo 2 to examples of harmful and benign gene mutations and monitored which sections of its brain lit up in response, allowing them to isolate the AI’s sensitivity to pathogenic mutations. They discovered that they could use this to identify which mutations caused sickness better than any other computational tool they tested—despite the fact that Evo 2 had never been expressly taught for this task. As with LLMs, the scale of the data used to train Evo 2—roughly ten times more than the previous greatest genomic foundation model—enabled it to identify patterns of what healthy DNA shared.

In the clinic, however, prediction is inadequate. “It’s extremely important that we understand why a model is making a decision,” says Matt Redlon, Chair of the Mayo Clinic’s AI department and co-author of the paper.

Further investigation indicated that Evo 2 inferred relevant biological properties from a DNA sequence. For example, Evo 2 learned to recognize the borders between various portions of DNA, despite the fact that the genomes on which it was trained lacked clear labels for these boundaries.

These biological traits help to explain why some mutations cause disease while others do not. A mutation near the boundary of two portions of DNA is more likely to result in a damaged protein, causing a hereditary disease. A mutation inside a region that is removed before the protein is formed is usually harmless.

The paper’s capacity to identify biological aspects of mutations rather than simply offering an opaque pathogenicity score is a “significant advance,” according to Bo Wang, chief AI scientist at Canada’s University Health Network.

Methods of interpreting the genetic data, like this one, could enable scientists “go back to the biology” and develop “personalized therapies” for individuals as the cost of genome sequencing decreases, with recent systems claiming to read a full genome for $100, according to Redlon.

However, before Goodfire’s approach is ready for clinical use, it must undergo FDA approval and additional trials to determine how well it works on broader populations. Furthermore, according to James Zou, a Stanford professor of biomedical data science, even though the researchers discovered biological concepts stored inside Evo 2, there is “no guarantee” that the model was indeed using those concepts to identify which mutations were harmful.

Interpretability is gaining popularity as AI is applied to the life sciences and beyond. Goodfire, which was formed in 2023 to improve the interpretability of AI models—a task that company co-founder and CTO Dan Balsam called “the most important problem in the world”—was valued at $1.25 billion by February. In January, Goodfire reported research identifying fresh indicators for Alzheimer’s stored in the brain of an AI model, increasing the prospect of discovering new concepts inside the brains of AI models that have escaped human scientists.

“In my view, the most interesting part of [interpretability] is to be able to open the black box and see, ‘Did the model actually learn something about science beyond what we have known?'” Zou adds. Goodfire’s recently released research does not achieve this since it only searches Evo 2 for established notions, Zou explained.

Large language models, such as ChatGPT and Claude, have also been analyzed for interpretability. Anthropic researchers recently discovered that Claude Mythos, the next generation of the company’s main AI model, demonstrated internal indicators of knowledge of being evaluated before cheating on the tests—despite never explicitly declaring that it was aware of being tested. The likelihood that AI models will cheat on safety-related tests underscores the relevance of approaches that allow researchers to check AI brains for signals of wrongdoing.

“If there’s some barrier like, ‘Is interpretability useful?’ I think we’ve been cracking it, and I think we’ve smashed through it,” Balsam said.

Source link