ML discovering new sequences to boost drug delivery

Duchenne muscular dystrophy (DMD), a rare genetic condition usually diagnosed in young boys, gradually weakens muscles throughout the body until the heart or lungs fail. Symptoms usually appear after 5 years; As the disease progresses, patients lose the ability to walk around the age of 12. Today, the average life expectancy of DMD patients is around 26.

It was big news in 2019 when Sarepta Therapeutics, based in Cambridge, Massachusetts, announced a revolutionary drug that directly targets the mutated gene responsible for DMD. The therapy uses antisense phosphorodiamidate morpholino oligomers (PMO), a large synthetic molecule that permeates the cell nucleus in order to modify the dystrophin gene, which enables the production of a key protein that is normally lacking in DMD patients. “But there’s a problem with PMO itself. It’s not very good at penetrating cells,” says Carly Schissel, a PhD student in the MIT Department of Chemistry.

To increase delivery to the nucleus, researchers can attach cell-penetrating peptides (CPPs) to the drug, helping it cross cell and nuclear membranes to reach its destination. Which peptide sequence is best for the job, however, has remained a looming question.

Researchers at MIT have developed a systematic approach to solving this problem by combining experimental chemistry with artificial intelligence to discover non-toxic and highly active peptides that can bind to PMO to aid release. By developing these new sequences, they hope to speed up gene development quickly, Therapies for DMD and other diseases.

The results of their study have now been published in the journal Nature Chemistry in an article led by Schissel and Somesh Mohapatra, a PhD student in MIT’s Department of Materials Science and Engineering, who are lead authors. Rafael GomezBombarelli, Professor Assistant Materials Science and Engineering, and Bradley Pentelute, Professor of Chemistry, are the lead authors of the paper. Other authors are Justin Wolfe, Colin Fadzen, Kamela Bellovoda, ChiaLing Wu, Jenna Wood, Annika Malmberg and Andrei Loas.

“Using a computer to propose new peptides is not very difficult. To judge whether they are good or not is the difficult thing,” says GomezBombarelli. “The key innovation is to use machine learning to link the sequence of a peptide, particularly a peptide that contains unnatural amino acids, to experimentally measured biological activity”.

Dream data

CPPs are relatively short chains made up of five to 20 amino acids. While one CPP can have a positive impact on drug delivery, several related ones have a synergistic effect on drug delivery to the finish line. These longer chains, which contain 30 to 80 amino acids, are called mini proteins.

Before a model could make meaningful predictions, experimental researchers had to create a robust data set. By mixing and matching 57 different peptides, Schissel and his colleagues were able to build a library of 600 mini-proteins, each of which was bound to PMO. In the experiment, the team was able to quantify how well each miniprotein was able to move its cargo through the cell.

The decision to test the activity of each sequence with the PMO already attached was an important one. Since any drug is likely to alter the activity of a CPP sequence, it is difficult to reuse existing data and data generated in a single laboratory on the same machines by the same people to become a gold standard for data set consistency for to meet machine learning.

One of the goals of the project was to develop a model that could work with any amino acid. While only 20 amino acids occur naturally in the human body, there are hundreds more elsewhere, such as an amino acid expansion package for drug development. In the machine learning model, researchers typically use one-hot coding, a method that maps each component to a series of binary variables. For example, three amino acids would be represented as 100, 010 and 001. It would have to increase, which means that the researchers would be forced to rebuild their model with each addition.

Instead, the team decided to represent the amino acids with topological fingerprinting, which essentially creates a unique barcode for each sequence, and each line in the barcode indicates the presence or absence of a particular molecular substructure. ” Even if the model has not seen [a sequence] before we can barcode it that conforms to the rules that the model saw, “says Mohapatra, who led the project’s development efforts. This display system allowed the researchers to expand their toolbox of possible sequences.

The team trained a convolutional neural network on the miniprotein library, labeling each of the 600 miniproteins with its activity, indicating its ability to enter the cell. Initially, the model suggested miniproteins loaded with arginine, an amino acid that punches a hole in the cell membrane, which is not ideal for keeping cells alive. To solve this problem, the researchers used an optimizer to decentralize arginine to prevent the model from cheating.

In the end, being able to interpret the predictions suggested by the model was key. “Usually it is not enough to have a black box because the models may look at something wrongly or imperfectly exploit a phenomenon,” says GomezBombarelli.

In this case, the researchers could overlay the predictions generated by the model with the barcode that represents the structure of the sequence. “This highlights certain regions that the model believes play the most important role in the high activity,” says Schissel. “It’s not perfect, but it gives you focused regions to play with. This information would definitely help us to empirically design new sequences in the future”.

Delivery boost

Ultimately, the machine learning model suggested sequences that were more effective than any previously known variant. In particular, you can increase PMO delivery by 50 times. By injecting these computer-suggested sequences into mice, the researchers confirmed their predictions and showed that the miniproteins are non-toxic.

It’s too early to say how this work will affect patients in the future, but better PMO administration will benefit in many ways. If patients are exposed to lower levels of the drug, they may experience fewer side effects, for example, when patients are exposed to lower amounts of the drug, they may experience fewer side effects or require less frequent doses (PMO is given intravenously, often weekly). Treatment can also be cheaper. As evidence of the concept, recent clinical studies showed that a proprietary CPP from Sarepta Therapeutics could reduce PMO exposure by 10-fold. PMO isn’t the only drug that can improve with mini-proteins. In further experiments, the miniproteins produced by the model transported other functional proteins into the cell.

Since Mohapatra noticed a discrepancy between the work of machine learning researchers and experimental chemists, Mohapatra posted the model on GitHub, along with a tutorial for experimenters who have their own list of sequences and activities. He notes that to date, more than a dozen people around the world have adopted the model and reused it to make their own meaningful predictions for a variety of drugs.

Source link