NLM DIR Seminar Schedule

Seminars Home

Schedule Seminar

UPCOMING SEMINARS

July 3, 2025 Matthew Diller
Using Ontologies to Make Knowledge Computable
July 15, 2025 Noam Rotenberg
Cell phenotypes in the biomedical literature: a systematic analysis and the NLM CellLink text mining corpus

RECENT SEMINARS

July 3, 2025 Matthew Diller
Using Ontologies to Make Knowledge Computable
July 1, 2025 Yoshitaka Inoue
Graph-Aware Interpretable Drug Response Prediction and LLM-Driven Multi-Agent Drug-Target Interaction Prediction
June 10, 2025 Aleksandra Foerster
Interactions at pre-bonding distances and bond formation for open p-shell atoms: a step toward biomolecular interaction modeling using electrostatics
June 3, 2025 MG Hirsch
Interactions among subclones and immunity controls melanoma progression
May 29, 2025 Harutyun Sahakyan
In silico evolution of globular protein folds from random sequences

Scheduled Seminars on Feb. 29, 2024

Speaker

Po-Ting Lai

PI/Lab

Time

3 p.m.

Presentation Title

EnzChemRED, a rich enzyme reaction relation extraction dataset

Location

Virtual

Contact NLMDIRSeminarScheduling@mail.nih.gov with questions about this seminar.

Abstract:

Expert curation is essential to capture knowledge of enzyme functions from the scientific literature in FAIR open knowledgebases but cannot keep pace with the rate of new discoveries and new publications. In this work we present EnzChemRED, for Enzyme Chemistry Relation Extraction Dataset, a new training and benchmarking dataset to support the development of Natural Language Processing (NLP) methods such as language models that can assist enzyme curation. EnzChemRED consists of 1,210 expert curated PubMed abstracts in which enzymes and the chemical reactions they catalyze are annotated using identifiers from the UniProt Knowledgebase (UniProtKB) and the ontology of Chemical Entities of Biological Interest (ChEBI). We show that fine-tuning pre-trained language models with EnzChemRED can significantly boost their ability to identify mentions of proteins and chemicals in text (Named Entity Recognition, or NER) and to extract the chemical conversions in which they participate (Relation Extraction, or RE), with average F1 scores of 86.30% for NER, 86.66% for RE for chemical conversion pairs, and 83.79% for RE for chemical conversion pairs and linked enzymes. We combine the best performing methods after fine-tuning using EnzChemRED to create an end-to-end pipeline for knowledge extraction from text and apply this to abstracts at PubMed scale to create a draft map of enzyme functions in literature to guide curation efforts in UniProtKB and the reaction knowledgebase Rhea.

NLM DIR Seminar Schedule

UPCOMING SEMINARS

RECENT SEMINARS

Scheduled Seminars on Feb. 29, 2024

Abstract:

ARCHIVES