HomeArtificial IntelligenceArtificial Intelligence NewsPredicting proteins attachment through AI

Predicting proteins attachment through AI

Antibodies, small proteins produced by the immune system, can attach to and neutralize specific parts of the virus. As scientists continue to fight SARSCoV2, the virus that causes Covid19, a potential weapon is a synthetic antibody that binds to the virus’s peplomer and prevents the virus from invading human cells.

To develop a successful synthetic antibody, researchers need to understand exactly how this binding occurs. Proteins with a rugged 3D structure that contains many folds can stick together in millions of combinations, and it takes a very long time to find the right protein complex from the nearly myriad of candidates.

To streamline the process, MIT researchers have created a machine learning model that can directly predict the complex formed when two proteins bind to each other. Their technique is 80-500 times faster than state-of-the-art software techniques and often predicts protein structures that are close to those actually observed experimentally.

This technique may help scientists better understand some biological processes that involve protein interactions, such as DNA replication and repair. It also has the potential to speed up the process of developing new drugs.

“Deep learning is very good at capturing interactions between different proteins that are difficult for chemists and biologists to experimentally explain. Some of these interactions are very complex. People haven’t found a good way to express them. This deep learning model can learn these types of interactions from data, “said a postdoc at the MIT Institute for Computer Science and Artificial Intelligence (CSAIL).

Ganea’s co-lead author is Xinyuan Huang, a PhD student at ETH Zurich. MIT co-authors include Regina Barzilay, a prominent professor of CSAIL Engineering for AI and Health, and Tommi Jaakkola, a professor of electrical engineering CSAIL Thomas Siebel and a member of the Institute for Data Systems and Social Sciences. The research results will be presented at the International Conference on Learning Representations.

Protein attachment

A model developed by researchers called Equidock focuses on rigid body docking. This happens when two proteins attach by rotating or translating in 3D space, but their shape does not compress or bend.

The model takes the 3D structures of the two proteins and transforms these structures into 3D graphics that can be processed by the neural network. Proteins are made up of chains of amino acids, each of which is represented by a node in the graph.

Researchers have integrated geometric knowledge into the model to understand how objects change when rotated or translated in 3D space. This model also has the mathematical knowledge to ensure that proteins adhere in the same way, no matter where they are in 3D space. This is how proteins dock to the human body.

Using this information, machine learning systems identify the atoms of two proteins that are most likely to interact and undergo a chemical reaction called a binding pocket point. Then use those points to bind the two proteins into one complex.

“Once we can understand from the proteins which parts are likely to be their binding pocket points, we get all the information we need to put the two proteins together. We found a set of these two points. Then you can understand how proteins are rotated and shifted so that one set fits into the other, “Ganea explains.

One of the biggest challenges in building this model was overcoming the lack of training data. Gaine says it was especially important to incorporate geometric knowledge into Equidock, as there is very little experimental 3D data for proteins. Without these geometric constraints, the model can detect false correlations in the dataset.

Seconds vs. hours

After training the model, the researchers compared it to four software methods. Equidock can predict the final protein complex in 1-5 seconds. All base levels took much longer, from 10 minutes to over an hour.

In terms of quality, a measure of how closely predicted protein complexes match actual protein complexes, Equidock was often close to baseline levels, but sometimes inferior.
“We are still behind one of the benchmarks. Our method can still be improved and is still useful. It can be used on very large virtual screens to understand how thousands of proteins can interact and form complexes. Our method can be used to generate an initial set of candidates very quickly and can be fine-tuned using more accurate but slower existing methods,” he says,
In addition to the usage of this approach with conventional models, the crew desires to comprise precise atomic interactions into Equidock so it may make extra correct predictions. For instance, once in a while atoms in proteins will connect via hydrophobic interactions, which contain water molecules.
Their approach may also be carried out to the improvement of small, drug-like molecules, Ganea says. These molecules bind with protein surfaces in precise ways, so unexpectedly figuring out how that attachment takes place ought to shorten the drug improvement timeline.
In the future, they plan to decorate Equidock so it could make predictions for bendy protein docking. The largest hurdle there’s a loss of information for training, so Ganea and his colleagues are running to generate artificial information they might use to enhance the model.
Part of this work is funded by the Machine Learning Consortium for Drug Discovery and Synthesis, the Swiss National Science Foundation, the Abdul Latif Jameel Clinic for Machine Learning in Health, and the DTRA Discovery (DOMANE) Program for Medical Measures Against New Threats and DARPA Acccelerated Molecular Discovery program.

 

Most Popular