NLM DIR Seminar Schedule
UPCOMING SEMINARS
-
April 8, 2025 Jaya Srivastava
Leveraging a deep learning model to assess the impact of regulatory variants on traits and diseases -
April 15, 2025 Pascal Mutz
TBD -
April 18, 2025 Valentina Boeva, Department of Computer Science, ETH Zurich
Decoding tumor heterogeneity: computational methods for scRNA-seq and spatial omics -
April 22, 2025 Stanley Liang
TBD -
April 29, 2025 MG Hirsch
TBD
RECENT SEMINARS
-
April 1, 2025 Roman Kogay
Horizontal transfer of bacterial operons into eukaryote genomes -
March 25, 2025 Yifan Yang
Adversarial Manipulation and Data Memorization in Large Language Models for Medicine -
March 11, 2025 Sofya Garushyants
Tmn – bacterial anti-phage defense system -
March 4, 2025 Sanasar Babajanyan
Evolution of antivirus defense in prokaryotes depending on the environmental virus load -
Feb. 25, 2025 Zhizheng Wang
GeneAgent: Self-verification Language Agent for Gene Set Analysis using Domain Databases
Scheduled Seminars on Nov. 1, 2022
Contact NLMDIRSeminarScheduling@mail.nih.gov with questions about this seminar.
Abstract:
Although a growing amount of health-related literature has been made available to a large audience online, the language of scientific articles can be difficult for the general public to comprehend. Thus, simplifying and adapting this expert-level language into plain language versions is needed for the public to reliably understand the vast health-related literature. Machine and Deep Learning algorithms for automatic adaptation are a possible solution; however, gold standard datasets are needed to properly evaluate their performances. Current datasets consist of either pairs of comparable professional- and general public-facing documents or pairs of semantically similar sentences mined from such documents. This creates a trade-off between imperfect alignments and small test sets. To address this issue, we created the Plain Language Adaptation of Biomedical Abstracts dataset. This dataset is the first manually adapted dataset that is both document- and sentence-aligned. It contains 750 adapted abstracts, totaling 7643 sentence pairs. Along with describing the dataset, we benchmark state-of-the-art Deep Learning approaches on the dataset, setting baselines for future research.