NLM DIR Seminar Schedule

UPCOMING SEMINARS

RECENT SEMINARS

Scheduled Seminars on Oct. 29, 2024

Speaker
Rezarta Islamaj
Time
11 a.m.
Presentation Title
The importance of data in the age of AI (NLM-Gene/GNorm2 and BioCreative)
Location

Contact NLMDIRSeminarScheduling@mail.nih.gov with questions about this seminar.

Abstract:

It is a concept universally acknowledged in the field of computer science that flawed, biased or poor-quality information, produces output of similar quality, i.e., garbage in, garbage out. The explosion of biomedical big data and information, not only creates opportunities for new discoveries but also necessitate equally importantly the quality of algorithms and the quality of data that is used to train those algorithms. Before 1990, around 6% of PubMed abstracts contained the mention of a gene, while in 2022, more than 35% of newly published articles mentioned a gene name (~568K). Six percent of these (~35K) were linked to specific GENE database records by the human curators at the NLM indexing section. In this talk, I will describe our journey in developing and evaluating the current best performing gene name recognition and normalization algorithm, GNorm2, and its application for assisting the gene linking curators at the NLM. In the second part of the talk, I will describe our efforts in organizing the eighth BioCreative workshop and challenge, which was collocated with AMIA Annual Symposium 2023. BioCreative is a biannual event organized by our group to foster the development of text mining systems in the biological domain and evaluate their ability in the information extraction of real-world data. I will also describe our BioRED evaluation track which involved the development of a biomedical entity relationship corpus of 1000 PubMed abstracts, manually curated by our NLM curators, and the challenge that received 94 submissions from 14 teams worldwide. Both the NLM-Gene dataset and the BioRED dataset are established as benchmarks to evaluate new systems.