Predicting KEGG Orthologs Associated with Microbial Metabolism in Autotrophic Freshwater Microbes Using a Statistical Model
Abstract
Microbes play a crucial role in Earth’s biogeochemical cycles, yet linking microbial KEGG orthologs to carbon fixation remains challenging due to fragmented datasets and limitations in functional annotation. This study analyzed microbial DNA fragments from Siders Pond in Falmouth, Massachusetts, a salt-stratified meromictic lake. Microbial DNA fragments recovered through metagenomic sequencing of environmental samples were linked to microbial activity to carbon cycling using the DNA-stable isotope probing (DNA-SIP) methods and the important features selected using the LASSO regression statistical model. Environmental samples were incubated with 12C or 13C labeled dissolved inorganic carbon to track microbial carbon incorporation, followed by metagenomic sequencing. Contigs were annotated using both the Protein Families Database (PFAM) and the KEGG Orthology (KO) database, with a bit score threshold of >30, and were linked to excess atom fraction (EAF) values representing microbial carbon assimilation. While both annotation sources were utilized, a greater number of KEGG (Kyoto Encyclopedia of Genes and Genomes) orthologs were identified in this specific dataset, guiding the focus of the analysis. LASSO regression identified key KEGG orthologs potentially involved in carbon cycling. The approach resulted in identifying acyl-CoA synthetase (K00142), BamB – Outer membrane assembly (K17713), glucose-fructose oxidoreductase (K00118), and 23S rRNA pseudouridine2604 synthase (K06182), as key features associated with microbial metabolic processes potentially influencing carbon cycling. Additionally, a domain within hydrazine synthase plays a role in anaerobic ammonium oxidation (PF18582), linking the nitrogen and carbon cycles by converting ammonium and nitric oxide into hydrazine. This suggests a potential role for hydrazine synthesis in microbial carbon metabolism under anoxic conditions. It contributes to a better understanding of microbial roles in carbon cycling and explores new ways of using statistical models to study environmental systems. The findings could help expand knowledge on how microbes influence global carbon cycles. They highlight the potential to uncover novel carbon-fixing pathways, which are crucial for climate and sustainability research.