NLM DIR Seminar Schedule

Seminars Home

Schedule Seminar

UPCOMING SEMINARS

Sept. 9, 2025 Chih-Hsuan Wei
No Data Left Behind: FAIR-SMart Enables FAIR Access to Supplementary Materials for Research Transparency
Sept. 16, 2025 James Leaman JR.
TBD
Sept. 23, 2025 Martha Nelson
TBD
Sept. 30, 2025 Erez Persi
TBD
Oct. 7, 2025 Liana Yeganova
TBD

RECENT SEMINARS

July 15, 2025 Noam Rotenberg
Cell phenotypes in the biomedical literature: a systematic analysis and the NLM CellLink text mining corpus
July 3, 2025 Matthew Diller
Using Ontologies to Make Knowledge Computable
July 1, 2025 Yoshitaka Inoue
Graph-Aware Interpretable Drug Response Prediction and LLM-Driven Multi-Agent Drug-Target Interaction Prediction
June 10, 2025 Aleksandra Foerster
Interactions at pre-bonding distances and bond formation for open p-shell atoms: a step toward biomolecular interaction modeling using electrostatics
June 3, 2025 MG Hirsch
Interactions among subclones and immunity controls melanoma progression

Scheduled Seminars on Feb. 28, 2023

Speaker

Po-Ting Lai

PI/Lab

Time

11 a.m.

Presentation Title

Data-centric Artificial Intelligence: Improving Biomedical Relation Extraction by Leveraging Heterogeneous Datasets

Location

Virtual - see link below

Contact NLMDIRSeminarScheduling@mail.nih.gov with questions about this seminar.

Abstract:

Biomedical relation extraction (RE) is the task of automatically identifying and characterizing relations between biomedical concepts from free text. RE is a central task in biomedical natural language pro-cessing (NLP) research and plays a critical role in many downstream applications, such as literature-based discovery and knowledge graph construction. State-of-the-art methods were used primarily to train machine learning models on individual RE datasets, such as protein-protein interaction and chemical-induced disease relation. Manual dataset annotation, however, is highly expensive and time-consuming, as it requires domain knowledge. Existing RE datasets are usually domain-specific or small, which limits the development of generalized and high-performing RE models. In this work, we present a novel framework for systematically addressing the data heterogeneity of individual datasets and combining them into a large dataset. Based on the framework and dataset, we report on BioREx, a data-centric based approach for extracting relations. Our evaluation shows that BioREx achieves significantly higher performance than the benchmark system trained on the individual dataset, improving the F1-score from 74.4% to 79.6%. We further demonstrate that the combined dataset can improve performance for five different RE tasks. In addition, we compare BioREx with transfer learning and multi-task learning ap-proaches, and the results show that it outperforms them in BioRED and for most tasks. Further, we used BioREx’s pre-trained model and demonstrated its portability in two RE tasks: drug-drug N-ary combina-tion and document-level gene-disease RE. The results show improvements in both tasks.

NLM DIR Seminar Schedule

UPCOMING SEMINARS

RECENT SEMINARS

Scheduled Seminars on Feb. 28, 2023

Abstract:

ARCHIVES