Machine learning systems can mitigate burden and boost EHR usability for disease phenotyping to support clinical research, according to a new study.
According to a recent Mount Sinai study published in the journal Patterns, machine learning systems can help EHR ease of use and reduce the burden of disease phenotyping to aid clinical research. The machine learning-based algorithm diagnosed patients as accurately as the standard set of disease phenotyping algorithms for conditions like Dementia, sickle cell anemia, and multiple sclerosis.
“There continues to be an explosion in the amount and types of data electronically stored in a patient’s medical record,” Benjamin S. Glicksberg, PhD, a senior author of the study, said in a press release. “Disentangling this complex web of data can be highly burdensome, thus slowing advancements in clinical research.”
“In this study, we created a new method for mining data from electronic health records with machine learning that is faster and less labor intensive than the industry standard,” continued Glicksberg, an assistant professor of genetics and genomic sciences and a member of the Hasso Plattner Institute for Digital Health at Mount Sinai (HPIMS).
Clinical researchers currently use a standard set of disease phenotyping algorithms that are maintained by a system called the Phenotype Knowledgebase (PheKB). The study’s authors found that while implementing a PheKB algorithm into a new data set is effective, it is time-consuming and requires data in a variable format, such as specific clinical or laboratory information, and that PheKB algorithms have limited scalability because they are selected based on expert knowledge for one disease at a time, the researchers explained. To develop a new algorithm for a disease, researchers must manually check the EHR data for specific data related to the disease and then program an algorithm to identify patients with that disease-specific data.
Researchers at Mount Sinai automated the process of phenotyping the disease through machine learning to save clinical researchers time and effort. The research team’s new method, Phe2vec, was based on studies that had already been carried out.
“Previously, we showed that unsupervised machine learning could be a highly efficient and effective strategy for mining electronic health records,” explained Riccardo Miotto, PhD, a former assistant professor at the HPIMS and a senior author of the study.
“The potential advantage of our approach is that it learns representations of diseases from the data itself,” Miotto continued. “Therefore, the machine does much of the work experts would normally do to define the combination of data elements from health records that best describes a particular disease.”
Glicksberg noted that the study’s promising results suggest the algorithm could be used for large-scale phenotyping of diseases in EHR data.
“With further testing and refinement, we hope that it could be used to automate many of the initial steps of clinical informatics research, thus allowing scientists to focus their efforts on downstream analyses like predictive modeling,” he said. “We hope that this will be a valuable tool that will facilitate further, and less biased, research in clinical informatics.”
The study’s authors said they plan to analyze how phenotypes change over time and also plan to include other types of data such as genetics and clinical imaging in the framework for refined phenotyping of the disease. Additionally, they intend to explore the use of the system for creating reliable disease-specific control cohorts for observational studies.
Source link