NLM DIR Seminar Schedule
UPCOMING SEMINARS
-
Jan. 14, 2025 Ryan Bell
Comprehensive analysis of the YprA-like helicase family provides deep insight into the evolution and potential mechanisms of widespread and largely uncharacterized prokaryotic antiviral defense systems -
Jan. 16, 2025 Qingqing Zhu
GPTRadScore and CT-Bench: Advancing Multimodal AI Evaluation and Benchmarking in CT Imaging -
Jan. 17, 2025 Xuegong Zhang
Using Large Cellular Models to Understand Cell Transcriptomics Language -
Jan. 21, 2025 Qiao Jin
Artificial Intelligence for Evidence-based Medicine -
Jan. 28, 2025 Kaleb Abram
TBD
RECENT SEMINARS
-
Dec. 17, 2024 Joey Thole
Training set associations drive AlphaFold initial predictions of fold-switching proteins -
Dec. 10, 2024 Amr Elsawy
AI for Age-Related Macular Degeneration on Optical Coherence Tomography -
Dec. 3, 2024 Sarvesh Soni
Toward Relieving Clinician Burden by Automatically Generating Progress Notes -
Nov. 19, 2024 Benjamin Lee
Reiterative Translation in Stop-Free Circular RNAs -
Nov. 12, 2024 Devlina Chakravarty
Fold-switching reveals blind spots in AlphaFold predictions
Scheduled Seminars on Feb. 29, 2024
Contact NLMDIRSeminarScheduling@mail.nih.gov with questions about this seminar.
Abstract:
Expert curation is essential to capture knowledge of enzyme functions from the scientific literature in FAIR open knowledgebases but cannot keep pace with the rate of new discoveries and new publications. In this work we present EnzChemRED, for Enzyme Chemistry Relation Extraction Dataset, a new training and benchmarking dataset to support the development of Natural Language Processing (NLP) methods such as language models that can assist enzyme curation. EnzChemRED consists of 1,210 expert curated PubMed abstracts in which enzymes and the chemical reactions they catalyze are annotated using identifiers from the UniProt Knowledgebase (UniProtKB) and the ontology of Chemical Entities of Biological Interest (ChEBI). We show that fine-tuning pre-trained language models with EnzChemRED can significantly boost their ability to identify mentions of proteins and chemicals in text (Named Entity Recognition, or NER) and to extract the chemical conversions in which they participate (Relation Extraction, or RE), with average F1 scores of 86.30% for NER, 86.66% for RE for chemical conversion pairs, and 83.79% for RE for chemical conversion pairs and linked enzymes. We combine the best performing methods after fine-tuning using EnzChemRED to create an end-to-end pipeline for knowledge extraction from text and apply this to abstracts at PubMed scale to create a draft map of enzyme functions in literature to guide curation efforts in UniProtKB and the reaction knowledgebase Rhea.