Mass spectrometry, in its conventional form, poses some challenges in effectively and correctly recognising protein patterns.
Proteomics is a field of study that deals with the analysis of the protein component of a cell or a tissue under a set of defined conditions. It is used to detect protein expression patterns under a particular stimulus and determine the functional protein networks at a cell or tissue level. Proteomics has major applications in medicine and drug development.
Over time, Proteomics has grown into a leading method for identifying and characterising proteins, thanks to the copious amount of genomic sequence data available today. The developments in mass spectrometry, protein fractionation techniques and bioinformatics have kicked Proteomics to the next level.
Proteomics involves:
- A method to fractionate or separate complex protein or peptide mixtures
- Using mass spectrometry to acquire data that is necessary for identifying individual proteins
- Bioinformatics for analysing and assembling the mass spectrometry data
Mass Spectrometry-Based Proteomics
The field of ‘omics’, which includes genomics, proteomics, and metabolomics, has been a game-changer in personalised medicine and healthcare. In the case of proteomics (which deals with proteins of abnormal genes), protein profiling through novel biochemical mass spectrometric methods can help identify and classify thousands of proteins.
Mass spectrometry is an analytical method to characterise biological samples. Due to its targeted, nontargeted, and high throughput abilities, it is a highly preferred method in proteomics. Mass spectrometry generates large datasets requiring the application of informatics approaches such as machine learning techniques, to analyse and interpret discrete data. Machine learning techniques can be applied in two ways, as noted in the paper titled ‘Application of machine learning to proteomics data: classification and biomarker identification in postgenomics biology’:
- Directly on the mass spectral peaks
- On the proteins identified by sequence database searching
A huge range of proteins can be identified from the analysed samples using mass spectrometry and machine learning techniques. The techniques have been pivotal in biomarker discovery for different types of diseases. The method has an obvious advantage over two-dimensional gel electrophoresis, enzyme-linked immunosorbent assays (ELISAs), protein arrays, affinity separation etc.
Recent Developments in Use Of AI/ML In Proteomics
Mass spectrometry, in its conventional form, poses some challenges in effectively and correctly recognising protein patterns. The technique doesn’t measure protein directly. It analyses smaller parts consisting of amino acid sequences with up to 30 building blocks. The measured spectra of these sequences are then compared with the database and assigned to specific proteins. Since the evaluation software is only a part of the spectra for comparison, certain proteins are not recognised correctly and completely.
Apart from optimising mass spectrometry for proteomics, AI is also useful in speeding up massive datasets analysis. Conventional methods such as microscopy and fluorescence resonance energy transfer (FRET) techniques call for a high level of expertise. Researchers from the Novo Nordisk Foundation Center for Protein Research and the Niels Bohr Institute have developed a machine learning algorithm to quickly recognise protein patterns, allowing the classification of data sets in mere seconds.
Wrapping Up
Proteomics’ study is crucial for early diagnosis, prognosis, and monitoring of diseases as fatal as cancer. It also plays a vital role in drug development. One major challenge in this field of study is that proteome or the set of proteins in a cell/tissue/organism fluctuates from time to time. In such events, artificial intelligence and machine learning techniques can prove helpful in quick and accurate protein pattern recognition and classification.