Until the last decade, our knowledge on microorganisms was vastly biased due to the difficulties associated with studying the yet-uncultured microbes. Recent advances in sequencing technologies and metagenome-enabled genome recovery are filling this gap and have already led to the discovery of numerous new groups of yet-uncultivated microorganisms (e.g. Margulisbacteria, Doudnabacteria). One prominent example is the genomes affiliated to candidate phyla radiation (CPR) that appear to be extremely diverse on both the environmental and genetic levels. Moreover, CPR bacteria harbor a very unusual biology including small cell size, reduced genome content, and the absence of many essential genes required for organisms exhibiting a free-living lifestyle.
More strikingly, approximately half of the genes encode for proteins with yet-unknown functions. All the current evidence suggests a symbiotic lifestyle for CPR members, so we expect that many of these proteins with yet-unknown functions to be involved in the interaction between CPR members and their partners. This set of proteins may represent an ample reservoir for novel antibiotics and other biotechnological applications. This project focuses on developing an in silico pipeline to predict these proteins’ yet-unknown functions. This will be achieved by analyzing their distribution patterns across thousands of genomes retrieved from various environments. This will lay the groundwork for more detailed investigation the functions of these proteins using molecular biology techniques.