EPFL researchers created a novel AI-driven model that predicts protein sequences from backbone scaffolds while accounting for complicated molecular surroundings. It offers substantial advances in protein engineering and its applications in a variety of industries, including medicine and biotechnology.
Understanding and altering the sequences and structures of proteins is required when designing them to execute specific activities. This task is critical for designing disease-specific therapies and producing enzymes for industrial use.
Designing proteins “de novo,” or from scratch, to alter their characteristics for certain functions, is one of the main problems in protein engineering. Significant ramifications arise for the fields of biology, medicine, and materials science. For example, highly precise targeted proteins can address specific disorders, providing a competitive alternative to conventional small molecule-based medications.
Moreover, rare or nonexistent processes in nature can be facilitated by specially created enzymes, which function as natural catalysts. In the pharmaceutical business, this ability is very useful for creating complex medicinal compounds, and in environmental technologies, it can be used to break down plastics or pollutants more effectively.
A group of researchers at EPFL under the direction of Matteo Dal Peraro have now created CARBonAra (Context-aware Amino acid Recovery from Backbone Atoms and Heteratoms), an artificial intelligence (AI)-driven model that is capable of predicting protein sequences while accounting for the constraints imposed by various molecular environments. This is a remarkable achievement.
About 370,000 subunits from the Protein Data Bank (PDB) are used to train CARBonAra, while an additional 100,000 are used for validation and 70,000 for testing. Nature Communications is the journal where the research is published.
Building upon the design of the Protein Structure Transformer (PeSTo) framework, which was also created by Lucien Krapp inside Dal Peraro’s team, is CARBonAra. It learns and predicts complicated structures using geometric transformers, which are deep learning models that interpret spatial relationships between points, like atomic coordinates.
Using protein molecules’ structural frameworks, or backbone scaffolds, CARBonAra is able to predict amino acid sequences. One of CARBonAra’s most notable qualities, though, is its context awareness, which is particularly evident in the way it raises sequence recovery rates, or the proportion of correctly predicted amino acids at each location in a protein sequence as compared to a known reference sequence.
When combined with molecular “contexts” such as protein interfaces with nucleic acids, lipids, ions, or other proteins, CARBonAra dramatically increased recovery rates. Dal Peraro notes that this is because the model is trained using a wide variety of molecules and only uses atomic coordinates, which allows it to handle a wider range of molecules than only proteins. This characteristic improves the model’s capacity for prediction and its suitability for use in intricate biological systems found in real life.
Not only does the model perform well on artificial benchmarks, but it has also undergone experimental validation. Utilizing CARBonAra, the scientists created novel variations of the TEM-1 β-lactamase enzyme, a key player in the emergence of antimicrobial resistance.
While the wild-type enzyme is already inactive at high temperatures, some of the predicted sequences, which differ from it by around 50%, were folded correctly and retain some catalytic activity.
Protein engineering has new opportunities thanks to CARBonAra’s precision and flexibility. It is a useful tool for creating proteins with particular activities and will improve next drug discovery campaigns because of its capacity to account for complex molecular settings. Furthermore, CARBonAra’s potential for scientific study and industrial applications is demonstrated by its effectiveness in enzyme engineering.