Francois-Gilbert Viault, a French physician, descended a mountain in the Andes in 1889, extracted blood from his arm, and examined it under a microscope. Red blood cells, which carry oxygen, had increased 42% in Viault. He had found the mysterious ability of the human body to manufacture more of these vital cells when needed.
Early in the 20th century, scientists hypothesized that hormones were to blame. The hypothetical hormone was named erythropoietin, which means “red maker” in Greek. Seventy years later, after filtering six hundred gallons of pee, researchers discovered real erythropoietin.
A unusual kidney cell that produces the hormone when oxygen levels are too low was discovered, according to Israeli experts, around half a century later. The Norse gods who were thought to govern human destiny inspired the name “Norn cell.”
Norn cells were discovered by humans 134 years later. Computers in California found them on their own in six weeks last summer.
When Stanford University researchers taught the computers to educate themselves biology, they made the breakthrough. The computers were running a ChatGPT-like artificial intelligence program, which is modeled after the well-known chatbot that learned to speak after being exposed to billions of text messages from the internet. However, millions of real cells’ chemical and genetic composition were used as raw data by the Stanford researchers to train their computers.
What these measurements meant was not communicated to the computers by the researchers. They omitted to mention that various cell types have unique metabolic signatures. For example, they did not specify which cells produce antibodies or which ones capture light in our eyes.
Using their own computational power, the computers analyzed the data and built a multidimensional model of every cell in the system. The machines had acquired an incredible amount of knowledge by the time they were finished. Among more than a thousand different types, they could identify a cell they had never seen before. It was the Norn cell, one of them.
According to Jure Leskovec, a Stanford computer scientist who educated the computers, that’s amazing because no one ever informed the model that a Norn cell exists in the kidney.
This program is one of many emerging artificial intelligence (AI)-driven applications, or “foundation models,” that are focusing on the principles of biology. The information that biologists are gathering is not merely being organized by the models. Scientists are learning new things about the development of cells and the functioning of genes.
The head of the Scripps Research Translational Institute, Dr. Eric Topol, predicted that at some point there would be a significant biological discovery that would not have been made by biologists otherwise.
It’s debatable just how far they will go. Some doubters believe the models will never develop beyond a certain point, but other hopeful scientists think foundation models will even address the most important biological question of all: what distinguishes life from nonlife?
Heart Cells and Mole Rats
For a long time, biologists have been trying to figure out how our bodies’ many cells use genetic material to execute all the various tasks that keep us alive.
Industrial-scale efforts were initiated about ten years ago by researchers to extract genetic fragments from individual cells. They documented their findings in “cell atlases,” or catalogs, which grew to contain billions of data points.
Boston Children’s Hospital resident physician Dr. Christina Theodoris learned about a novel AI model created by Google developers in 2017 for language translations while reading about it. The model was given millions of English sentences by the researchers, along with their translations into French and German. The model gained the ability to interpret sentences that it has never encountered before. Theodoris questioned if a comparable model could learn to interpret the information found in cell atlases on its own.
She had trouble locating a lab in 2021 where she might attempt to construct one. “There was a great deal of doubt that this strategy would succeed at all,” she stated.
At the Dana-Farber Cancer Institute in Boston, Shirley Liu, a computational biologist, gave her a try. Theodoris entered data into a tool called GeneFormer, combining information from 106 published human research totaling 30 million cells.
The model was able to comprehend in great detail how our genes function in various cells. For example, it projected that severely disrupting a certain type of heart cell by shutting off a gene called TEAD4 would have that effect. Her team tested the prediction in real cells called cardiomyocytes, and found that the heart cells’ beating weakened.
In an additional experiment, she and her associates displayed GeneFormer heart cells from both healthy individuals and those with abnormal pulse rhythms. Theodoris, who is currently employed at the University of California, San Francisco, said, “Then we said, now tell us what changes we need to happen to the unhealthy cells to make them healthy.”
Four genes that were previously unrelated to heart disease were suggested to have less activity using GeneFormer. By following the model’s instructions, Theodoris’ team was able to eliminate all four genes. The therapy enhanced the way the cells contracted in two of the four cases.
Following their involvement in the creation of CellXGene, one of the world’s largest cell databases, the Stanford team entered the foundation-model industry. The researchers focused on messenger RNA, a subset of genetic information, and trained their computers on the 33 million cells in the database starting in August. In addition, the model was fed the three-dimensional structures of proteins, which are the result of gene production.
The approach, known as Universal Cell Embedding, or UCE, evaluated the similarities between cells and classified them into more than 1,000 groups based on how they employed their genes. The clusters related to cell kinds identified by generations of biologists.
In addition, UCE gained valuable knowledge regarding the development of cells from a single fertilized egg. As an illustration, UCE discovered that every cell in the body can be grouped based on which of the three layers of the early embryo it originated from.
According to Stanford biophysicist and UCE co-developer Stephen Quake, it effectively rediscovered developmental biology.
New species could also benefit from the model’s ability to transfer knowledge. When exposed to the genetic makeup of an unfamiliar animal, such as a naked mole rat, UCE was able to recognize numerous cell types in the animal.
You may bring in a completely new organism—chicken, frog, fish, whatever—and get something beneficial out, according to Leskovec.
Once the Norn cells were identified by UCE, Leskovec and associates searched the CellXGene database to determine their origin. Although lungs or other organs provided part of the cells, the kidneys provided the majority of the cells. The possibility occurred to the researchers that the body had unknown Norn cells in various places.
Dr. Katalin Susztak, a physician-scientist at the University of Pennsylvania who specializes in Norn cell research, expressed her curiosity about the discovery. “I wish to examine these cells,” she uttered.
Given the lack of other findings involving the erythropoietin hormone, she is dubious that the model discovered genuine Norn cells outside of the kidneys. However, it’s possible that the new cells sense oxygen similarly to Norn cells.
To put it another way, UCE might have found a novel kind of cell before biologists did.
An ‘Internet of Cells’
Similar to ChatGPT, biological models are not perfect. A battery of tests was recently administered to GeneFormer and another foundation model, scGPT, by Kasia Kedzierska, a computational biologist at the University of Oxford, and her colleagues. They gave the models unfamiliar cell atlases to look at and gave them assignments like categorizing the cells into different kinds. When compared to more basic computer programs, the models did well on certain tasks but poorly on others.
Although Kedzierska expressed great expectations for the models, she cautioned that “they should not be used out of the box without a proper understanding of their limitations” at this time.
According to Leskovec, the models were getting better as a result of scientists training them on more data. However, the most recent cell atlases provide relatively little information when compared to ChatGPT’s training on the whole internet. He declared, “I’d like an entire internet of cells.”
As larger cell atlases come online, more cells will be added. And from each of those atlases’ cells, scientists are extracting various types of information. Scientists are documenting molecules that bind to genes and capturing pictures of cells to show where proteins are located exactly. Equations about how cells function can be inferred from all of that data via foundation models.
Scientists are also building methods that will allow foundation models to blend what they are learning on their own with what real biologists have uncovered. The objective is to link the findings from thousands of published scientific articles to cell measuring databases.
With enough data and processing power, scientists believe they can ultimately generate a comprehensive mathematical picture of a cell.
That will be incredibly groundbreaking for the biological sector, according to Bo Wang, the University of Toronto computational biologist who developed scGPT. He surmised that this virtual cell would allow one to forecast the actions of a real cell under any given circumstance. Instead of using petri dishes, scientists might conduct whole experiments on their computers.
Quake believes the foundation models will pick up knowledge about potential cell types as well as the kinds of cells that presently inhabit human bodies. According to his conjectures, a cell can only be sustained by specific combinations of biochemistry. Quake envisions creating a map of the realm of the possible—beyond which life is impossible—using foundation models.
According to Quake, these models will give us a more basic grasp of the cell and, in turn, a better comprehension of what life is all about.
Having a blueprint of what is feasible and impossible to support life may also allow scientists to build new cells that do not currently exist in nature. The fundamental model may be able to create chemical recipes that turn ordinary cells into novel, remarkable ones. Those new cells may consume plaque in blood arteries or investigate a damaged organ to report on its condition.
“It’s very ‘Fantastic Voyage’-esque,” Quake confessed. But who knows what the future will hold?
New Risks
There will be several additional hazards if foundation models fulfill Quake’s visions. A request for regulation of the technology to prevent its use in the development of new biological weapons was signed by more than eighty biologists and professionals in artificial intelligence on Friday. This could be a problem for newly created cell types derived from the models.
Even sooner privacy violations may occur. By programming individualized foundation models, researchers intend to examine each person’s distinct genome and the ways in which it functions in cells. This additional layer of understanding may clarify how variations in genes impact cellular functions. However, it might also provide some of the closest knowledge possible about the individuals whose DNA and cells were donated to science to the owners of a foundation model.
However, some scientists are skeptical about how far foundational models will advance in the “Fantastic Voyage” journey. The quality of the models depends on the data they are fed. Having access to data that we haven’t yet found out how to gather could be crucial to making a significant new discovery about life. We may not even be aware of the data that the models require.
According to Sara Walker, an Arizona State University physicist who investigates the fundamental foundations of life, they might find some interesting new discoveries. However, in terms of fresh, fundamental advancements, they are ultimately constrained.
However, the way foundation models work has already made its creators consider the place of human biologists in an era where computers are capable of making significant discoveries on their own. In the past, biologists have been recognized for their innovative and labor-intensive studies that shed light on some of the principles underlying life. However, by sifting through billions of cells to find patterns we are blind to, computers may be able to see those workings in a matter of weeks, days, or even hours.
According to Quake, “it will force a complete rethink of what we consider creativity.” “Professors ought to be extremely anxious.”