NLM DIR Seminar Schedule

Schedule Seminar

UPCOMING SEMINARS

July 15, 2025 Noam Rotenberg
Cell phenotypes in the biomedical literature: a systematic analysis and the NLM CellLink text mining corpus

RECENT SEMINARS

July 3, 2025 Matthew Diller
Using Ontologies to Make Knowledge Computable
July 1, 2025 Yoshitaka Inoue
Graph-Aware Interpretable Drug Response Prediction and LLM-Driven Multi-Agent Drug-Target Interaction Prediction
June 10, 2025 Aleksandra Foerster
Interactions at pre-bonding distances and bond formation for open p-shell atoms: a step toward biomolecular interaction modeling using electrostatics
June 3, 2025 MG Hirsch
Interactions among subclones and immunity controls melanoma progression
May 29, 2025 Harutyun Sahakyan
In silico evolution of globular protein folds from random sequences

The NLM DIR holds a public weekly seminar series for NLM trainees, staff scientists, and investigators to share details on current and exciting research projects at NLM. Seminars take place on Tuesdays at 11:00 AM, EST and some Thursdays at 3:00 PM, EST. Seminars are held in the B2 Library of Building 38A on the main NIH campus in Bethesda, MD.

To schedule a seminar, click the “Schedule Seminar” button to the right, select an appropriate date on the calendar to sign up, and then complete the form. You will need an NIH PIV card to access the “Schedule Seminar” page.

Please include seminars by invited visiting scientists in the NLM DIR seminar series. These need not be on a Tuesday or Thursday.

If you would like to schedule a seminar by a visiting scientist, click the “Schedule Seminar” and complete the form. Contact NLMDIRSeminarScheduling@mail.nih.gov with questions. Please follow this link to subscribe/unsubscribe to/from the NLM DIR seminar mailing list.

Titles and Abstracts for Upcoming Seminars

(based on the current date)

Noam Rotenberg

July 15, 2025 at 11 a.m.

Cell phenotypes in the biomedical literature: a systematic analysis and the NLM CellLink text mining corpus

Single-cell technologies enable the discovery of many novel cell phenotypes, but this growing body of knowledge remains fragmented across the scientific literature. Natural language processing (NLP) offers a promising approach to extract this information at scale.
We present NLM CellLink, a new corpus of excerpts from recent articles, manually annotated with mentions of human and mouse cell populations. The corpus distinguishes three types of mentions: (1) specific cell phenotypes (cell types and states), (2) heterogenous cell populations, and (3) vague cell population descriptions. Mentions of the first two types were linked to Cell Ontology identifiers where possible, using their meaning in context, with matches labeled as exact or related. Annotation was performed by four cell biologists using a multi-round process.
NLM CellLink contains over 22,000 annotations across more than 3,000 passages selected from 2,700 articles, covering nearly half the concepts in the current Cell Ontology. We fine-tune BiomedBERT in a named entity recognition (NER) task and apply SapBERT and MedCPT in a simplified entity linking framework to demonstrate consistency and usability of the corpus. The fine-tuned NER models perform significantly better than LLAMA and GPT with zero-shot inference.
The NLM CellLink corpus will be a valuable resource for developing automated systems to identify cell phenotype mentions in the biomedical literature, a challenging benchmark for evaluating biomedical NLP systems, and a foundation for the future extraction of relationships between cell types and key biomedical entities, including genes, anatomical structures, and diseases.

All Seminars

NLM DIR Seminar Schedule

UPCOMING SEMINARS

RECENT SEMINARS

Titles and Abstracts for Upcoming Seminars

ARCHIVES