NLM DIR Seminar Schedule
UPCOMING SEMINARS
RECENT SEMINARS
-
Dec. 17, 2024 Joey Thole
Training set associations drive AlphaFold initial predictions of fold-switching proteins -
Dec. 10, 2024 Amr Elsawy
AI for Age-Related Macular Degeneration on Optical Coherence Tomography -
Dec. 3, 2024 Sarvesh Soni
Toward Relieving Clinician Burden by Automatically Generating Progress Notes -
Nov. 19, 2024 Benjamin Lee
Reiterative Translation in Stop-Free Circular RNAs -
Nov. 12, 2024 Devlina Chakravarty
Fold-switching reveals blind spots in AlphaFold predictions
Scheduled Seminars on Oct. 29, 2024
Contact NLMDIRSeminarScheduling@mail.nih.gov with questions about this seminar.
Abstract:
It is a concept universally acknowledged in the field of computer science that flawed, biased or poor-quality information, produces output of similar quality, i.e., garbage in, garbage out. The explosion of biomedical big data and information, not only creates opportunities for new discoveries but also necessitate equally importantly the quality of algorithms and the quality of data that is used to train those algorithms. Before 1990, around 6% of PubMed abstracts contained the mention of a gene, while in 2022, more than 35% of newly published articles mentioned a gene name (~568K). Six percent of these (~35K) were linked to specific GENE database records by the human curators at the NLM indexing section. In this talk, I will describe our journey in developing and evaluating the current best performing gene name recognition and normalization algorithm, GNorm2, and its application for assisting the gene linking curators at the NLM. In the second part of the talk, I will describe our efforts in organizing the eighth BioCreative workshop and challenge, which was collocated with AMIA Annual Symposium 2023. BioCreative is a biannual event organized by our group to foster the development of text mining systems in the biological domain and evaluate their ability in the information extraction of real-world data. I will also describe our BioRED evaluation track which involved the development of a biomedical entity relationship corpus of 1000 PubMed abstracts, manually curated by our NLM curators, and the challenge that received 94 submissions from 14 teams worldwide. Both the NLM-Gene dataset and the BioRED dataset are established as benchmarks to evaluate new systems.