NLM DIR Seminar Schedule

Seminars Home

Schedule Seminar

UPCOMING SEMINARS

July 15, 2025 Noam Rotenberg
Cell phenotypes in the biomedical literature: a systematic analysis and the NLM CellLink text mining corpus

RECENT SEMINARS

July 3, 2025 Matthew Diller
Using Ontologies to Make Knowledge Computable
July 1, 2025 Yoshitaka Inoue
Graph-Aware Interpretable Drug Response Prediction and LLM-Driven Multi-Agent Drug-Target Interaction Prediction
June 10, 2025 Aleksandra Foerster
Interactions at pre-bonding distances and bond formation for open p-shell atoms: a step toward biomolecular interaction modeling using electrostatics
June 3, 2025 MG Hirsch
Interactions among subclones and immunity controls melanoma progression
May 29, 2025 Harutyun Sahakyan
In silico evolution of globular protein folds from random sequences

Scheduled Seminars on Oct. 29, 2024

Speaker

Rezarta Islamaj

PI/Lab

Time

11 a.m.

Presentation Title

The importance of data in the age of AI (NLM-Gene/GNorm2 and BioCreative)

Location

Virtual

Contact NLMDIRSeminarScheduling@mail.nih.gov with questions about this seminar.

Abstract:

It is a concept universally acknowledged in the field of computer science that flawed, biased or poor-quality information, produces output of similar quality, i.e., garbage in, garbage out. The explosion of biomedical big data and information, not only creates opportunities for new discoveries but also necessitate equally importantly the quality of algorithms and the quality of data that is used to train those algorithms. Before 1990, around 6% of PubMed abstracts contained the mention of a gene, while in 2022, more than 35% of newly published articles mentioned a gene name (~568K). Six percent of these (~35K) were linked to specific GENE database records by the human curators at the NLM indexing section. In this talk, I will describe our journey in developing and evaluating the current best performing gene name recognition and normalization algorithm, GNorm2, and its application for assisting the gene linking curators at the NLM. In the second part of the talk, I will describe our efforts in organizing the eighth BioCreative workshop and challenge, which was collocated with AMIA Annual Symposium 2023. BioCreative is a biannual event organized by our group to foster the development of text mining systems in the biological domain and evaluate their ability in the information extraction of real-world data. I will also describe our BioRED evaluation track which involved the development of a biomedical entity relationship corpus of 1000 PubMed abstracts, manually curated by our NLM curators, and the challenge that received 94 submissions from 14 teams worldwide. Both the NLM-Gene dataset and the BioRED dataset are established as benchmarks to evaluate new systems.

NLM DIR Seminar Schedule

UPCOMING SEMINARS

RECENT SEMINARS

Scheduled Seminars on Oct. 29, 2024

Abstract:

ARCHIVES