NLM DIR Seminar Schedule
UPCOMING SEMINARS
RECENT SEMINARS
-
Dec. 2, 2025 Qingqing Zhu
CT-Bench & CARE-CT: Building Reliable Multimodal AI for Lesion Analysis in Computed Tomography -
Nov. 25, 2025 Jing Wang
MIMIC-EXT-TE: Millions Clinical Temporal Event Time-Series Dataset -
Oct. 21, 2025 Yifan Yang
TBD -
Oct. 14, 2025 Devlina Chakravarty
TBD -
Oct. 9, 2025 Ziynet Nesibe Kesimoglu
TBD
Scheduled Seminars on Jan. 20, 2022
Contact NLMDIRSeminarScheduling@mail.nih.gov with questions about this seminar.
Abstract:
Since a genome is essentially a document written in the alphabet of nucleotides, the field of Computational Biology has been informed by Natural Language Processing techniques since its inception. In this talk I will describe how "MinHash", a relatively obscure algorithm developed for searching the web, has been transformative for the task of genomic similarity estimation. I will go into how and why the algorithm works for sequences of nucleotides and amino acids rather than natural language documents, and I will discuss the creation and validation of tools employing the algorithm, variations for different kinds of searches, and the range of applications it can help with.