NLM DIR Seminar Schedule

UPCOMING SEMINARS

RECENT SEMINARS


The NLM DIR holds a public weekly seminar series for NLM trainees, staff scientists, and investigators to share details on current and exciting research projects at NLM. Seminars take place on Tuesdays at 11:00 AM, EST and some Thursdays at 3:00 PM, EST. Seminars are held in the B2 Library of Building 38A on the main NIH campus in Bethesda, MD.

To schedule a seminar, click the “Schedule Seminar” button to the right, select an appropriate date on the calendar to sign up, and then complete the form. You will need an NIH PIV card to access the “Schedule Seminar” page.

Please include seminars by invited visiting scientists in the NLM DIR seminar series. These need not be on a Tuesday or Thursday.

If you would like to schedule a seminar by a visiting scientist, click the “Schedule Seminar” and complete the form. Contact NLMDIRSeminarScheduling@mail.nih.gov with questions. Please follow this link to subscribe/unsubscribe to/from the NLM DIR seminar mailing list.

Titles and Abstracts for Upcoming Seminars


(based on the current date)

Leann Lindsey
May 19, 2026 at 11 a.m.

Are Genomic Language Models Learning? Insights from Tokenization Analysis and Prophage Detection in Bacterial Genomes

Genomic language models (gLMs) promise to decode the regulatory and functional logic encoded in DNA, yet whether current architectures learn meaningful biological representations remains contested. Recent studies question the foundational abilities of gLMs, demonstrating that they fail to outperform randomly initialized or simple supervised models on standard benchmarks, while model authors point to zero-shot performance and unsupervised motif discovery as evidence of foundational biological understanding. We present two complementary efforts to investigate this question. First, we systematically evaluate how tokenization strategy (nucleotide, k-mer, and byte-pair encoding) affects model behavior across three genomic benchmarks, probing whether token granularity shapes what gLMs capture at the nucleotide level. Second, we introduce LAMBDA, a genomic language model benchmark that leverages bacteriophages as a test system to investigate the annotation abilities of genomic language models. Unlike well-annotated model organism genomes, the vast majority of phage genomes remain poorly characterized, making them an ideal domain for testing whether gLMs identify meaningful sequence patterns beyond homology. LAMBDA evaluates gLM embeddings through phage-bacteria discrimination tasks of increasing complexity, including genome-wide prophage detection, and provides a rigorous framework for evaluating model performance on a genome-wide annotation task with direct relevance to microbiology and medicine.

Harutyun Saakyan
May 26, 2026 at 11 a.m.

TBD

Brian Abraham
May 27, 2026 at 11 a.m.

Cis-Regulatory Organization and Transcription Factor Control of Cell Identity and Disease

Cell identity and disease states are shaped by transcription factor (TF) activity within cis-regulatory landscapes organized in three-dimensional chromatin space. Regulatory element composition and spatial organization are highly dynamic and frequently disrupted in cancer. The Abraham lab develops computational approaches to identify cis-regulatory architectures governing cell identity and their aberrant activation in pediatric malignancies. Using graph-based machine learning, we define higher-order communities of cis-regulatory elements, termed “3D super-enhancers,” that coordinate long-range gene regulation and association with transcriptional condensates. We also examine early enhancer and TF activation events during rhabdomyosarcoma transformation, revealing coordinate activation of muscle and neural lineage programs preceding overt oncogenic states and underscoring fundamental principles of gene dysregulation in cancer.