NLM DIR Seminar Schedule
UPCOMING SEMINARS
RECENT SEMINARS
-
April 7, 2026 Henry Secaira Morocho
Toward a systematic method of database enrichment for reference-based metagenomics -
March 17, 2026 Roman Kogay
Diversification vs Streamlining: Selection Landscapes of Prokaryotic Genome Evolution -
March 10, 2026 Zhizheng Wang
Large Language Models for Gene Set Analysis -
March 5, 2026 Hasan Balci
From Sketch to SBGN: An AI-Assisted and Interactive Workflow for Generating Pathway Maps -
March 3, 2026 Gianlucca Goncalves Nicastro
Systematic identification of Salmonella T6SS effectors uncovers a lipid-targeting family.
Scheduled Seminars on Nov. 1, 2022
Contact NLMDIRSeminarScheduling@mail.nih.gov with questions about this seminar.
Abstract:
Although a growing amount of health-related literature has been made available to a large audience online, the language of scientific articles can be difficult for the general public to comprehend. Thus, simplifying and adapting this expert-level language into plain language versions is needed for the public to reliably understand the vast health-related literature. Machine and Deep Learning algorithms for automatic adaptation are a possible solution; however, gold standard datasets are needed to properly evaluate their performances. Current datasets consist of either pairs of comparable professional- and general public-facing documents or pairs of semantically similar sentences mined from such documents. This creates a trade-off between imperfect alignments and small test sets. To address this issue, we created the Plain Language Adaptation of Biomedical Abstracts dataset. This dataset is the first manually adapted dataset that is both document- and sentence-aligned. It contains 750 adapted abstracts, totaling 7643 sentence pairs. Along with describing the dataset, we benchmark state-of-the-art Deep Learning approaches on the dataset, setting baselines for future research.