Hunters of Microbial & Functional Dark Matter

Bacteria and Archaea are the oldest, most abundant, and particularly most diverse forms of life. They dominate many functions of the biosphere and harbour a huge potential for sustainable biotechnological applications. The task of fully characterizing microbial diversity is incomprehensibly vast. The ‘known unknowns’ have become increasingly apparent in the recent years, while the number of “unknown unknowns” is still heavily debated: some studies suggesting millions, some billions or even trillions of prokaryotic species on Earth. For the majority, only phylogenetic marker genes are known and no cultured representatives exist. At this point it is therefore safe to say that over 99% of all microbial species from the environment remain uncharacterized. They are therefore referred to as Microbial Dark Matter (MDM).

 

Strikingly, even todays’ best-studied microorganisms have not been fully characterised yet. For instance, in the “favourite pet” of microbiologists for over 100 years – the model organism Escherichia coli – the function of more than 30% of proteins has not been determined experimentally and more than 2% of protein-encoding genes have no characterization at all. These so-called hypothetical proteins (HPs) are found across all microbial species and represent an enormous gap of knowledge. They are referred to as Functional Dark Matter (FDM). In addition to their importance for fundamental understanding of biology and evolution, these proteins might also provide novel solutions for medical treatments, bioremediation or bioenergy production.

 

At IBG-5 we aim to elucidate MDM and FDM using cultivation-independent omics approaches, especially single cell omics. We have established one of three high-throughput pipelines worldwide, investigating minimal genome requirements and syntrophic interactions as well as probing horizontal gene transfer and evolutionary pressure. While we have found many novel prokaryotic species and provided direct link information between phylogenetic and metabolic markers, it has become increasingly apparent, that the function of the vast majority of the protein encoding genes in those organisms remains elusive. We therefore strive to improve current gold standard pipelines using Big Omics Data, Artificial Intelligence such as machine and deep learning algorithms, as well as mathematical models for functional predictions to discover novel domains and metabolic features and thereby boost the use of microorganisms in biotechnological applications in the future.