A group of researchers recently assessed the use of prediction algorithms coupled with clinical data, metatranscriptomics, and lower respiratory tract microbiome in a preprint article published to Preprints with The Lancet*.
Their findings imply that machine learning models might develop into a quick diagnostic tool, avoiding the morbidity and death linked to traditional microbiological tests.
LRTI diagnosis
Over 3 million people die each year from lower respiratory tract infections (LRTIs), making them one of the most common infectious causes of death worldwide. Traditional respiratory infection diagnosis has historically been blamed for the significant morbidity and mortality of LRTIs. Traditional diagnosis is insensitive, unable to detect 60–70% of the underlying causes, and requires 24–48 hours or longer to characterize infections.
Asthma, chronic obstructive pulmonary disease (COPD), and cystic fibrosis are just a few examples of non-infectious diseases that frequently co-occur with LRTIs in terms of symptoms. Clinicians thus choose to postpone a patient’s diagnosis rather than take a chance on a disease misdiagnosis, both of which could be fatal.
Recent research questions the conventional understanding of LRTI aetiology, which holds that the lungs are initially sterile. It needs a specific number of pathogenic bacteria to infect the lungs in order to overwhelm the immune system and cause an infection to spread quickly.
A increasing corpus of studies employing microbial genomes suggests that poor microbial species diversity, high total biomass, and host inflammatory response are all related to the development of LRTIs.
Changes in the respiratory tract microbiomes have also been noted in non-infectious conditions including asthma, highlighting the importance of microbiome investigations in the detection and characterization of LRTI. Metagenomic next-generation sequencing (mNGS), a new field, is being investigated as a potential, quick, and sensitive replacement for conventional diagnosis techniques.
In contrast to the current days it takes for standard diagnostic equipment to produce an appropriate diagnosis, mNGS only needs microlitre amounts of patient samples.
About the study
In the current preprint paper, researchers made an effort to compile and correlate clinical data with transcriptional profiling of the host and the respiratory microbiota. After training a machine-learning model, they used the compiled data to test the model’s speed and accuracy of diagnosis.
Between May 2020 and January 2021, researchers at the Peking University People’s Hospital in Beijing enrolled individuals who were thought to have LRTIs. 136 patients were selected for the study after being screened for radiography, clinical presentation, and demographic characteristics in accordance with the US Centers for Disease Control/National Healthcare Safety Network (CDC/NHSN).
For the purpose of diagnosing LRTI, all subjects underwent standard microbiological and serological testing. In addition, bronchoalveolar lavage fluid (BALF) was collected by the researchers for model characterization and training. DNA and RNA sequences from BALF were obtained. To be confident that the remaining RNA readings belonged to the lung microbiome, they were checked against the human transcriptome and the SILVA rRNA database.
By comparing transcripts per million (TMP) expression in hosts to the relative concentration of microbial flora, the relationship between the host transcriptome and microbiome was established. Following that, machine learning models were trained using this data.
11 distinguishing factors were evaluated by researchers from clinical signs, microbial flora abundance, and host TMP upregulation. The algorithm was trained on the data from 91 individuals, and the method was tested on the data from 45 participants.
Study findings
In the trial, 136 patients were enrolled; 81 were determined to have LRTIs and comprised the LRTI cohort, while the remaining 55 patients made up the non-LRTI cohort. Comparing LRTI-positive people with their non-LRTI counterparts, it was discovered that LRTI-positive people had much more prior antibiotic use.
Notably, the results of the lab tests, including the white blood cell (WBC) count and indications of inflammation, were the same in the two groups. This demonstrates the limited characterization capabilities of traditional diagnostic techniques.
Patients with LRTIs had significantly less variety in the BALF microbiome than samples without LRTIs, it was shown. As a result of the differences in relative microbiota abundance between the groups, pathogenic Klebsiella pneumoniae, Stenotrophomonas maltophilia, Pseudomonas aeruginosa, and Streptococcus pneumoniae were found at high concentrations in BALF and LRTI samples.
The largest abundance of Halomonas pacifica, a symbiont typically present in healthy lungs and respiratory tracts, was found in the BALF of the non-LRTI group. In the non-LRTI group, the pathogenic microorganisms that were present in the LRTI samples were either absent or present in very little amounts.
674 differentially expressed genes (DEGs) were discovered by transcriptome analysis. In contrast to the non-LRTI group, 613 of these DEGs were found to be up-regulated in the LRTI cohort, while the remaining 61 were found to be down-regulated. LRTI up-regulated DEGs were connected to pathways for pathogen infection, according to a KEGG (Kyoto Encyclopaedia of Genes and Genomes) screen.
According to correlations between host transcriptomes and microbiota diversity, 31 host genes (and their relative expression levels) are linked and change according on the proportion of normal LRT flora to harmful microorganisms.
Using these data to train the Random forest model, predictions of LRTIs were made using 70 features (11 clinical, 39 lung microbiome, and 20 host response). The model’s diagnostic accuracy was discovered to be 88.2%, and findings could be obtained in a matter of hours, both of which were notable advancements over conventional diagnostic methods.
The primary constraints of this study are the high cost and technical requirements of mNGS at the moment. Furthermore, even while these machine learning models could be used as LRTI diagnostic markers, they do not in any way describe or account for the actual processes involved in the observed microbiota-host transcriptome interactions.
Conclusions
This preliminary study offers a fresh method for identifying lower respiratory tract infections. In the past, diagnosing LRTIs could take several days and showed low sensitivity to more than 60% of infectious pathogens. These factors lead to incorrect disease classifications and delayed treatment, which greatly increases morbidity and mortality.
Researchers characterised microbial abundance in LRTI and non-LRTI cohorts in this work, combining their findings with transcriptome and response data from the host. These data were then used to train machine learning models, which were then capable of accurately diagnosing 88.2% of patients with LRTI in a fraction of the time it takes to do so using traditional methods.
In the future, if this research is confirmed by peer review and developed to lower its high inherent cost, practitioners may be able to quickly and reliably identify LRTI, lowering the high fatality rate related to the condition.