
Pinpointing uncommon microbes within microbiome information has become more straightforward. A group of scientists from Portugal and Canada has created an innovative instrument that employs machine learning for the automatic identification of scarce life forms in ecological databases.
The objective is to swiftly, independently, and without supervision recognize uncommon microorganisms within microbiome data sets. This innovative tool, called ulrb, addresses a longstanding issue in microbial ecology: differentiating scarce microorganisms from those that are most prevalent in natural settings.
The updated methodology along with the new ULRB software has recently been implemented. published In the research titled "Defining the Microbial Rare Biosphere Using Unsupervised Machine Learning" published in the journal, Communications Biology .
This research paper stems from an international collaborative effort involving the Interdisciplinary Center for Marine and Environmental Research (CIIMAR), the Faculty of Sciences at the University of Porto, the Institute of Bioengineering and Biosciences (iBB) within the Instituto Superior Técnico at the University of Lisbon, along with contributions from the School of Electrical Engineering and Computer Science (EECS) as well as the Faculty of Computer Science at Dalhousie University, both located in Canada.
This product stems from the Ph.D. research conducted by CIIMAR scholar Francisco Pascoal, supervised by CIIMAR investigator Catarina Magalhães, with additional guidance provided by investigators Rodrigo Costa from iBB and Paula Branco from EECS.
The fresh software enhancement will boost both the precision of ecological studies conducted across various microbiomes and ecosystems, as well as the thoroughness with which such examinations are performed. This advancement will eventually deepen our comprehension of microbial variety and its significance in bolstering ecosystem durability.
What does the term "rare biosphere" refer to?
Typically, microbial populations exhibit a trend where just a handful of species are present in high numbers, whereas the overwhelming majority of varieties occur infrequently and fall into what’s known as the “rare biosphere.” For instance, over one thousand types of prokaryotic microbes might be found within a single liter of ocean water. Nevertheless, merely 2% to 5% of these types flourish prominently; conversely, most remain scarce and challenging to spot and classify because of current technological constraints.
Even though they are not plentiful, rare species possess the highest level of genetic variation globally. These species play a crucial role in enhancing the robustness of ecosystems. As stated by Pascoal, "Should the dominant species become vulnerable due to climate shifts, rarer types could step in and maintain the functionality of the microbial community, thereby sustaining ecological stability."
The rare biosphere therefore plays a very important role in ecosystem responses to major changes in the environment, such as the effects of climate change. Studying rare organisms allows us to understand the resilience of ecosystems to these changes and to study their reaction to environmental alterations.
The innovation of ulrb
By employing unsupervised machine learning techniques, ulrb allows researchers to quickly and reliably identify rare microorganisms in a community. A major advantage of this method is its adaptability to different methodological contexts, i.e., the algorithm "learns" the patterns present in the data itself, regardless of its origin.
The ability to detect uncommon microbes emerged with advancements in high-throughput DNA sequencing techniques; however, despite having these data sets, scientists disagreed on methods for recognizing such rare organisms because they tended to be eclipsed by more prevalent species. Consequently, numerous investigators confined their efforts to setting arbitrary thresholds of occurrence, an inadequate strategy lacking robust biological rationale.
"Using this novel approach, we could leverage sequencing data to automatically identify rare microorganisms within each sample, as indicated by the information provided," explains Pascoal, who is the lead author of the study.
An algorithm has been developed to streamline this procedure, which consolidates microorganisms exhibiting the highest similarity concerning their prevalence within a specific sample. Since this method relies on the proportional separation among these organisms, it allows for automation and scalability across databases of varying sizes, yielding outcomes that maintain strict ecological and biological consistency and uniformity.
"In essence, the algorithm identifies the prevalent groups within a microbial community and correlates them with an abundance category, thereby enabling the differentiation between organisms that are scarce and those that are common," explains Pascoal.
Possible applications
The ULRB can be utilized with data obtained through standard microbial ecology procedures, making it valuable for examining new diseases and biological invasions. Given that this approach can extend beyond just microbes, it may help identify animal and plant species vulnerable under specific conditions, thus aiding in ecological surveillance activities.
If you're a researcher looking to use this tool with your own dataset, ulrb can be accessed as an open-source R package. CRAN and GitHub The group of scientists has likewise developed a website With educational resources to motivate you to utilize the tool.
More information: Francisco Pascoal et al., Using Unsupervised Machine Learning to Define the Microbial Rare Biosphere, Communications Biology (2025). DOI: 10.1038/s42003-025-07912-4
Furnished by the Interdisciplinary Center for Marine and Environmental Research
This tale was initially released on Massima . Subscribe to our newsletter For the most recent science and technology news updates.