NLM DIR Seminar Schedule

UPCOMING SEMINARS

RECENT SEMINARS

Scheduled Seminars on Nov. 1, 2022

Speaker
Kush Attal
Time
11 a.m.
Presentation Title
Presenting a Dataset for Plain Language Adaptation of Scientific Text
Location
Virtual - see link below

Contact NLMDIRSeminarScheduling@mail.nih.gov with questions about this seminar.

Abstract:

Although a growing amount of health-related literature has been made available to a large audience online, the language of scientific articles can be difficult for the general public to comprehend. Thus, simplifying and adapting this expert-level language into plain language versions is needed for the public to reliably understand the vast health-related literature. Machine and Deep Learning algorithms for automatic adaptation are a possible solution; however, gold standard datasets are needed to properly evaluate their performances. Current datasets consist of either pairs of comparable professional- and general public-facing documents or pairs of semantically similar sentences mined from such documents. This creates a trade-off between imperfect alignments and small test sets. To address this issue, we created the Plain Language Adaptation of Biomedical Abstracts dataset. This dataset is the first manually adapted dataset that is both document- and sentence-aligned. It contains 750 adapted abstracts, totaling 7643 sentence pairs. Along with describing the dataset, we benchmark state-of-the-art Deep Learning approaches on the dataset, setting baselines for future research.