NLM DIR Seminar Schedule
UPCOMING SEMINARS
-
Sept. 9, 2025 Chih-Hsuan Wei
No Data Left Behind: FAIR-SMart Enables FAIR Access to Supplementary Materials for Research Transparency -
Sept. 16, 2025 James Leaman JR.
TBD -
Sept. 23, 2025 Martha Nelson
TBD -
Sept. 30, 2025 Erez Persi
TBD -
Oct. 7, 2025 Liana Yeganova
TBD
RECENT SEMINARS
-
July 15, 2025 Noam Rotenberg
Cell phenotypes in the biomedical literature: a systematic analysis and the NLM CellLink text mining corpus -
July 3, 2025 Matthew Diller
Using Ontologies to Make Knowledge Computable -
July 1, 2025 Yoshitaka Inoue
Graph-Aware Interpretable Drug Response Prediction and LLM-Driven Multi-Agent Drug-Target Interaction Prediction -
June 10, 2025 Aleksandra Foerster
Interactions at pre-bonding distances and bond formation for open p-shell atoms: a step toward biomolecular interaction modeling using electrostatics -
June 3, 2025 MG Hirsch
Interactions among subclones and immunity controls melanoma progression
Scheduled Seminars on Feb. 28, 2023
Contact NLMDIRSeminarScheduling@mail.nih.gov with questions about this seminar.
Abstract:
Biomedical relation extraction (RE) is the task of automatically identifying and characterizing relations between biomedical concepts from free text. RE is a central task in biomedical natural language pro-cessing (NLP) research and plays a critical role in many downstream applications, such as literature-based discovery and knowledge graph construction. State-of-the-art methods were used primarily to train machine learning models on individual RE datasets, such as protein-protein interaction and chemical-induced disease relation. Manual dataset annotation, however, is highly expensive and time-consuming, as it requires domain knowledge. Existing RE datasets are usually domain-specific or small, which limits the development of generalized and high-performing RE models. In this work, we present a novel framework for systematically addressing the data heterogeneity of individual datasets and combining them into a large dataset. Based on the framework and dataset, we report on BioREx, a data-centric based approach for extracting relations. Our evaluation shows that BioREx achieves significantly higher performance than the benchmark system trained on the individual dataset, improving the F1-score from 74.4% to 79.6%. We further demonstrate that the combined dataset can improve performance for five different RE tasks. In addition, we compare BioREx with transfer learning and multi-task learning ap-proaches, and the results show that it outperforms them in BioRED and for most tasks. Further, we used BioREx’s pre-trained model and demonstrated its portability in two RE tasks: drug-drug N-ary combina-tion and document-level gene-disease RE. The results show improvements in both tasks.