NLM DIR Seminar Schedule
UPCOMING SEMINARS
RECENT SEMINARS
-
Dec. 17, 2024 Joey Thole
Training set associations drive AlphaFold initial predictions of fold-switching proteins -
Dec. 10, 2024 Amr Elsawy
AI for Age-Related Macular Degeneration on Optical Coherence Tomography -
Dec. 3, 2024 Sarvesh Soni
Toward Relieving Clinician Burden by Automatically Generating Progress Notes -
Nov. 19, 2024 Benjamin Lee
Reiterative Translation in Stop-Free Circular RNAs -
Nov. 12, 2024 Devlina Chakravarty
Fold-switching reveals blind spots in AlphaFold predictions
Scheduled Seminars on Nov. 1, 2022
Contact NLMDIRSeminarScheduling@mail.nih.gov with questions about this seminar.
Abstract:
Although a growing amount of health-related literature has been made available to a large audience online, the language of scientific articles can be difficult for the general public to comprehend. Thus, simplifying and adapting this expert-level language into plain language versions is needed for the public to reliably understand the vast health-related literature. Machine and Deep Learning algorithms for automatic adaptation are a possible solution; however, gold standard datasets are needed to properly evaluate their performances. Current datasets consist of either pairs of comparable professional- and general public-facing documents or pairs of semantically similar sentences mined from such documents. This creates a trade-off between imperfect alignments and small test sets. To address this issue, we created the Plain Language Adaptation of Biomedical Abstracts dataset. This dataset is the first manually adapted dataset that is both document- and sentence-aligned. It contains 750 adapted abstracts, totaling 7643 sentence pairs. Along with describing the dataset, we benchmark state-of-the-art Deep Learning approaches on the dataset, setting baselines for future research.