NLM DIR Seminar Schedule

UPCOMING SEMINARS

RECENT SEMINARS

Scheduled Seminars on April 7, 2026

Speaker
Henry Secaira Morocho
PI/Lab
Xiaofang Jiang
Time
11 a.m.
Presentation Title
Toward a systematic method of database enrichment for reference-based metagenomics
Location
Hybrid
In-person: Building 38A/B2N14 NCBI Library or Meeting Link

Contact NLMDIRSeminarScheduling@mail.nih.gov with questions about this seminar.

Abstract:

Tracing millions of shotgun metagenomic sequencing reads to their source requires mapping them to a database of reference microbial genomes. While most efforts have improved the efficiency and accuracy of sequence aligners and taxonomic profilers, the engineering of reference databases has received inadequate attention. On the one hand, databases with few genomes yield a highly incomplete picture of the community because multiple taxonomic groups are missing. On the other hand, databases with the most comprehensive genome collections can introduce inaccuracies and misidentifications in the inferred community due to a highly skewed distribution, in which most sequenced genomes belong to a handful of microbial species of biomedical interest. Therefore, striking a balance between under- and over-represented taxa is essential for reference-based metagenomics, as database composition determines which taxa are inferred to be present in the community. Here, we explore whether database composition can be adaptively enriched for each sample. For this, we developed an iterative enrichment method that traverses a distance-based tree connecting all genomes to identify additional microbes of interest. Using simulated metagenomic datasets, we demonstrate that enriched databases improve metagenomic classification at the genome level and can recover microbial community composition with accuracy comparable to that of larger databases. Our results indicate that database engineering has a significant impact on metagenomic analysis, and incorporating enriched databases can yield a more accurate description of microbial communities.