NLM DIR Seminar Schedule
UPCOMING SEMINARS
-
April 22, 2025 Stanley Liang, PhD
Large Vision Model for medical knowledge adaptation -
April 29, 2025 Pascal Mutz
Characterization of covalently closed cirular RNAs detected in (meta)transcriptomic data -
May 2, 2025 Dr. Lang Wu
Integration of multi-omics data in epidemiologic research -
May 6, 2025 Leslie Ronish
TBD -
May 8, 2025 MG Hirsch
TBD
RECENT SEMINARS
-
April 18, 2025 Valentina Boeva, Department of Computer Science, ETH Zurich
Decoding tumor heterogeneity: computational methods for scRNA-seq and spatial omics -
April 8, 2025 Jaya Srivastava
Leveraging a deep learning model to assess the impact of regulatory variants on traits and diseases -
April 1, 2025 Roman Kogay
Horizontal transfer of bacterial operons into eukaryote genomes -
March 25, 2025 Yifan Yang
Adversarial Manipulation and Data Memorization in Large Language Models for Medicine -
March 11, 2025 Sofya Garushyants
Tmn – bacterial anti-phage defense system
Scheduled Seminars on March 25, 2025
Contact NLMDIRSeminarScheduling@mail.nih.gov with questions about this seminar.
Abstract:
Large language models (LLMs) have been integrated into numerous biomedical application frameworks. Despite their significant potential, they possess vulnerabilities that can lead to serious consequences. In this seminar, we will discuss two types of vulnerabilities in LLMs and explore potential solutions to address them: adversarial manipulations and data memorization.
Adversarial manipulations can cause LLMs to generate harmful medical suggestions or promote specific stakeholder interests. We will demonstrate two methods by which a malicious actor can achieve this: prompt injection and data poisoning. Thirteen models were tested, and all exhibited significant behavioral changes after manipulation in three tasks. Although newer models performed slightly better, they were still greatly affected.
Data memorization is another concern, particularly when LLMs are fine-tuned with medical corpora or patient records for specific tasks. This can lead to the unintended memorization of training data, resulting in the exposure of sensitive patient information and breaches of confidentiality. Controlled text generation can be employed to mitigate such memorization, effectively reducing the risk of exposing patient information during inference and enhancing privacy protection.