In February 2021, NASA officials and people around the world celebrated the wildly successful landing of the Mars Perseverance rover. We all applauded the scientists and engineers responsible for this incredible achievement, which built on 80 years of multidisciplinary advances in physics, chemistry, mechanical engineering and materials science — and previous achievements such as the moon landing.
In the field of AI-driven drug discovery, we are still seeking our own proverbial moon landing — namely, understanding and curing a major disease in a data-driven way.
Succeeding in AI-driven cures will vastly improve and save the lives of billions of people throughout the planet now and into the future. But to do so, we must move past our discipline’s usual boundaries and comfort zones and work collaboratively across fields — which means software engineers must delve into biology and genomics, and scientists must adopt new technologies, like machine learning, that could have previously appeared distant to them.
I believe machine learning is at the heart of this coming revolution in drug discovery, just as it has been pivotal in many other industries. From text translation and image classification to credit risk assessment, advertising optimization and shopping item recommendation, we take these advances in machine learning for granted in our daily lives. But if we apply similar data-driven thinking in drug discovery, we could unlock hidden patterns in biological data that will allow us to understand diseases and produce cures thought previously unreachable.
Furthermore, this might be AI’s most impactful area yet, and here’s why.
First, the amount of biological data is increasing exponentially due to the decreasing cost associated with generating it. Due to advances in genomic sequencing throughput, it’s much more cost-efficient to adopt and operate sequencing machinery; in effect, pharmaceutical biotech companies, large academic initiatives and startups alike have been able to use this technology to generate data at a large scale.
However, since data from sequencing tends to be extremely high-dimensional (think an entire hard drive worth of data per blood draw), elementary observational statistics or standard algorithms can’t accurately detect the patterns that are critical for understanding how diseases work. But where these methods have failed to reach the best accuracies, machine learning has helped fill in the blanks and see the patterns previously invisible.
For example, last year machine learning and AI showed that we can understand protein binding more accurately than with any previous method.
In addition to the ease of generating data described above, with the advent of single-cell genomics, leading labs and companies have been able to analyze in-depth individual cells at scale, enabling a much more detailed assessment of the same biological samples. For example,’ Single Cell Gene Expression, Immune Profiling device can target up to 10,000 cells in a single chip. Each chip can process eight libraries in parallel, allowing users to target up to 80,000 cells per chip. This has led to amassed datasets with hundreds of millions of cells, where one can measure, per cell, tens of thousands of molecules from tens of thousands of genes.
To put this into perspective, let’s compare assessing a gene in a blood sample with trying to assess the behavior of a set of people. What single-cell genomics brings is individual, in-depth, interviews with each person, when before you only had a census (that is an average across cells). These single-cell-based datasets will reach the many billions of cells in a few years, and I predict that a few leading companies will have datasets on tens of exabytes scale (10s of millions of terabytes) by 2025, making some internet-data-based datasets of today pale in comparison.
Finally, the boom. The Nobel prize-winning work of Jennifer Doudna and Emmanuelle Charpentier, who discovered the CRISPR (genetic editing) technology, is now being leveraged to edit biology itself. That is, we can now not only observe biology (that is, naturally existing, single cells) but also change the genetic code of the cells themselves. For example, earlier this year researchers at Tel Aviv University demonstrated that they could edit in a PLK1 gene into ovarian cancer tumor cells in mice, which strongly inhibited tumor growth, and resulted in an 80% increase in overall survival.
Typically, with most machine learning applications, we must work with naturally occurring, real-world examples. But now with CRISPR, we can modify our input variables, which creates data that allows machine learning practitioners to better learn cause-and-effect relationships for biology, not just correlation. Several companies have already employed CRISPR to generate and understand this biological data, from public corporations like Intellia to smaller startups like Caribou Biosciences.
Reaching this mechanistic, cause-and-effect level of understanding on large amounts of data will allow us all to better understand the code of life and disease. Going forward, the most significant successes will come from those who have the most and highest quality data, particularly single-cell genomics and CRISPR-edited data.
Every month, advances in protocols, reagents and equipment allow us to generate more robust and granular data that most of us had not imagined possible even a year ago. Just as the bold, multidisciplinary, decades-long effort led to the Moon and Mars landing, the intersection of biology, software engineering, and AI will lead to cures for many major diseases. We now have the opportunity to build the rocket ships that will help navigate humanity to longer, healthier, happier lives.
This article has been published from the source link without modifications to the text. Only the headline has been changed.