Finding temperature-genome correlation using Machine learning

 

A previously unidentified aspect of an organism’s genome has been discovered by researchers from Western University and the University of Waterloo.

A genome is commonly understood to be a storehouse of genetic data essential for growth, operation, and reproduction. But the group of biologists and computer scientists investigated the idea that a genome might include information beyond taxonomy or genealogy.

Their focus shifted to extremophiles, which are organisms that can survive in extremely harsh environments such as strong radiation, bone-chilling temperatures below -12°C, or extremes in pressure and acidity. According to their hypothesis, the genome may hold information about the conditions that allow these extremophiles to thrive.

Cracking the code

By approaching genomic DNA as a language made up of DNA words, the scientists took a novel method. Nucleotides—adenine, cytosine, guanine, and thymine (A, C, G, and T)—strung together form DNA sequences, which are similar to lines of text.

Counting the instances of these DNA phrases allowed scientists to determine the species of an organism in the 1990s.

Kathleen A. Hill and Lila Kari, co-authors of the study, used the comparison to distinguish between English and French literature based on word frequencies when describing their findings on The Conversation.

They pointed out that the frequency profile of DNA words is independent of both the position and length of the DNA sequence chosen to represent that genome, much as the word-frequency profile of a book is independent of a specific set of pages.

A genome’s DNA word frequency profile functions as the organism’s “genomic signature.”

To start their adventure, the researchers put together a dataset of 700 microbial extremophiles that can survive at extremely high or low pH levels. They used supervised and unsupervised machine learning techniques to investigate the possibility that signals from the environment could be revealed by the DNA word-frequency profile.

The results were remarkable. Based only on DNA word-frequency patterns, extremophiles were categorized by unsupervised machine learning, which functioned without any prior knowledge of taxonomy or environment. Remarkably, several bacteria and archaea continually grouped together because of their tolerance to high temperatures.

Implications for evolution and life

The discovery made by the scientists calls into question the conventional wisdom of the Tree of Life, which divides all living things into three domains: bacteria, archaea, and eukarya.

The authors pointed out that even while these domains are genetically different, the recently discovered environmental component—particularly the grouping of bacteria and archaea—indicates that the extremely high temperatures at which they reside have induced systemic changes in their genomic language.

Our understanding of Earth’s evolutionary history may change if the environmental component of genomes is revealed. In addition, the scientists are expanding their study to include extremophiles resistant to radiation, such as Deinococcus radiodurans.

These extremophiles are known to endure extreme conditions for up to three years, including radiation exposure and space travel. Their survival may provide important insights into the genetic effects of different environments.

This discovery raises questions about the adaptability of life beyond Earth and creates opportunities for redefining the language of genomes. Thinking about the possibility of life in the great emptiness of space requires a grasp of the genetic language shaped by harsh settings, which is especially important as humanity aspires to the stars.

The journal Scientific Reports published the team’s findings.

Source link