NLM DIR Seminar Schedule
UPCOMING SEMINARS
-
March 25, 2025 Yifan Yang
TBD -
April 1, 2025 Roman Kogay
TBD -
April 8, 2025 Jaya Srivastava
TBD -
April 15, 2025 Pascal Mutz
TBD -
April 18, 2025 Valentina Boeva, Department of Computer Science, ETH Zurich
Decoding tumor heterogeneity: computational methods for scRNA-seq and spatial omics
RECENT SEMINARS
-
March 11, 2025 Sofya Garushyants
Tmn – bacterial anti-phage defense system -
March 4, 2025 Sanasar Babajanyan
Evolution of antivirus defense in prokaryotes depending on the environmental virus load -
Feb. 25, 2025 Zhizheng Wang
GeneAgent: Self-verification Language Agent for Gene Set Analysis using Domain Databases -
Feb. 18, 2025 Samuel Lee
Efficient predictions of alternative protein conformations by AlphaFold2-based sequence association -
Feb. 11, 2025 Po-Ting Lai
Enhancing Biomedical Relation Extraction with Directionality
Scheduled Seminars on Feb. 28, 2023
Contact NLMDIRSeminarScheduling@mail.nih.gov with questions about this seminar.
Abstract:
Biomedical relation extraction (RE) is the task of automatically identifying and characterizing relations between biomedical concepts from free text. RE is a central task in biomedical natural language pro-cessing (NLP) research and plays a critical role in many downstream applications, such as literature-based discovery and knowledge graph construction. State-of-the-art methods were used primarily to train machine learning models on individual RE datasets, such as protein-protein interaction and chemical-induced disease relation. Manual dataset annotation, however, is highly expensive and time-consuming, as it requires domain knowledge. Existing RE datasets are usually domain-specific or small, which limits the development of generalized and high-performing RE models. In this work, we present a novel framework for systematically addressing the data heterogeneity of individual datasets and combining them into a large dataset. Based on the framework and dataset, we report on BioREx, a data-centric based approach for extracting relations. Our evaluation shows that BioREx achieves significantly higher performance than the benchmark system trained on the individual dataset, improving the F1-score from 74.4% to 79.6%. We further demonstrate that the combined dataset can improve performance for five different RE tasks. In addition, we compare BioREx with transfer learning and multi-task learning ap-proaches, and the results show that it outperforms them in BioRED and for most tasks. Further, we used BioREx’s pre-trained model and demonstrated its portability in two RE tasks: drug-drug N-ary combina-tion and document-level gene-disease RE. The results show improvements in both tasks.