Event Detail

Event Type: 
Mathematical Biology Seminar
Wednesday, June 5, 2019 - 12:00 to 12:50
Kidder 237

Speaker Info

Oregon State University, Microbiology

Large-scale microbiome studies looking to understand microbial roles and disease-inducing dysbiosis base their findings on differences between microbial community structures in contrasting environments (healthy vs. diseased tissues). Such efforts are impeded by the curse of dimensionality, whereby biological effect is obscured beneath mounds of confounding variables (small n large p problem). Here, we use NLP-inspired embedding algorithms and large public microbiome sequencing datasets to reduce the dimensionality of query datasets in a biologically informed way. We show that a random forest model trained on the new feature space produces more accurate and robust predictions of IBD than state-of-the-art data transformations. Additionally, we find strong correlations between dimensions in the reduced feature space and annotated metabolic pathways.