NLM DIR Seminar Schedule
UPCOMING SEMINARS
-
Jan. 20, 2026 Anastasia Gulyaeva
TBD -
Jan. 22, 2026 Mario Flores
AI Pipeline for Characterization of the Tumor Microenvironment -
Jan. 27, 2026 Zhaohui Liang
TBD -
Jan. 29, 2026 Mehdi Bagheri Hamaneh
FastSpel: A simple peptide spectrum predictor that achieves deep learning-level performance at a fraction of the computational cost -
Feb. 3, 2026 Matthew Diller
TBD
RECENT SEMINARS
-
Jan. 8, 2026 Won Gyu Kim
LitSense 2.0: AI-powered biomedical information retrieval with sentence and passage level knowledge discovery -
Dec. 16, 2025 Sarvesh Soni
ArchEHR-QA: A Dataset and Shared Task for Grounded Question Answering from Electronic Health Records -
Dec. 2, 2025 Qingqing Zhu
CT-Bench & CARE-CT: Building Reliable Multimodal AI for Lesion Analysis in Computed Tomography -
Nov. 25, 2025 Jing Wang
MIMIC-EXT-TE: Millions Clinical Temporal Event Time-Series Dataset -
Oct. 21, 2025 Yifan Yang
TBD
Scheduled Seminars on Nov. 1, 2022
Contact NLMDIRSeminarScheduling@mail.nih.gov with questions about this seminar.
Abstract:
Although a growing amount of health-related literature has been made available to a large audience online, the language of scientific articles can be difficult for the general public to comprehend. Thus, simplifying and adapting this expert-level language into plain language versions is needed for the public to reliably understand the vast health-related literature. Machine and Deep Learning algorithms for automatic adaptation are a possible solution; however, gold standard datasets are needed to properly evaluate their performances. Current datasets consist of either pairs of comparable professional- and general public-facing documents or pairs of semantically similar sentences mined from such documents. This creates a trade-off between imperfect alignments and small test sets. To address this issue, we created the Plain Language Adaptation of Biomedical Abstracts dataset. This dataset is the first manually adapted dataset that is both document- and sentence-aligned. It contains 750 adapted abstracts, totaling 7643 sentence pairs. Along with describing the dataset, we benchmark state-of-the-art Deep Learning approaches on the dataset, setting baselines for future research.