Antibodies, small proteins produced by the immune system, can attach to and neutralize specific parts of the virus. As scientists continue to fight SARSCoV2, the virus that causes Covid19, a potential weapon is a synthetic antibody that binds to the virus’s peplomer and prevents the virus from invading human cells.
To develop a successful synthetic antibody, researchers need to understand exactly how this binding occurs. Proteins with a rugged 3D structure that contains many folds can stick together in millions of combinations, and it takes a very long time to find the right protein complex from the nearly myriad of candidates.
To streamline the process, MIT researchers have created a machine learning model that can directly predict the complex formed when two proteins bind to each other. Their technique is 80-500 times faster than state-of-the-art software techniques and often predicts protein structures that are close to those actually observed experimentally.
This technique may help scientists better understand some biological processes that involve protein interactions, such as DNA replication and repair. It also has the potential to speed up the process of developing new drugs.
“Deep learning is very good at capturing interactions between different proteins that are difficult for chemists and biologists to experimentally explain. Some of these interactions are very complex. People haven’t found a good way to express them. This deep learning model can learn these types of interactions from data, “said a postdoc at the MIT Institute for Computer Science and Artificial Intelligence (CSAIL).
Protein attachment
A model developed by researchers called Equidock focuses on rigid body docking. This happens when two proteins attach by rotating or translating in 3D space, but their shape does not compress or bend.
The model takes the 3D structures of the two proteins and transforms these structures into 3D graphics that can be processed by the neural network. Proteins are made up of chains of amino acids, each of which is represented by a node in the graph.
Researchers have integrated geometric knowledge into the model to understand how objects change when rotated or translated in 3D space. This model also has the mathematical knowledge to ensure that proteins adhere in the same way, no matter where they are in 3D space. This is how proteins dock to the human body.
Using this information, machine learning systems identify the atoms of two proteins that are most likely to interact and undergo a chemical reaction called a binding pocket point. Then use those points to bind the two proteins into one complex.
“Once we can understand from the proteins which parts are likely to be their binding pocket points, we get all the information we need to put the two proteins together. We found a set of these two points. Then you can understand how proteins are rotated and shifted so that one set fits into the other, “Ganea explains.
One of the biggest challenges in building this model was overcoming the lack of training data. Gaine says it was especially important to incorporate geometric knowledge into Equidock, as there is very little experimental 3D data for proteins. Without these geometric constraints, the model can detect false correlations in the dataset.
Seconds vs. hours
After training the model, the researchers compared it to four software methods. Equidock can predict the final protein complex in 1-5 seconds. All base levels took much longer, from 10 minutes to over an hour.