NLM DIR Seminar Schedule
UPCOMING SEMINARS
-
July 3, 2025 Matthew Diller
Using Ontologies to Make Knowledge Computable -
July 15, 2025 Noam Rotenberg
Cell phenotypes in the biomedical literature: a systematic analysis and the NLM CellLink text mining corpus
RECENT SEMINARS
-
July 3, 2025 Matthew Diller
Using Ontologies to Make Knowledge Computable -
July 1, 2025 Yoshitaka Inoue
Graph-Aware Interpretable Drug Response Prediction and LLM-Driven Multi-Agent Drug-Target Interaction Prediction -
June 10, 2025 Aleksandra Foerster
Interactions at pre-bonding distances and bond formation for open p-shell atoms: a step toward biomolecular interaction modeling using electrostatics -
June 3, 2025 MG Hirsch
Interactions among subclones and immunity controls melanoma progression -
May 29, 2025 Harutyun Sahakyan
In silico evolution of globular protein folds from random sequences
Scheduled Seminars on Dec. 3, 2024
In-person: Building 38A/B2N14 NCBI Library or Meeting Link
Contact NLMDIRSeminarScheduling@mail.nih.gov with questions about this seminar.
Abstract:
Regular documentation of progress notes is one of the main contributors to clinician burden. The abundance of structured chart information in medical records further exacerbates the burden, however, it also presents an opportunity to automate the generation of progress notes. Thus, we propose a task to automate progress note generation using structured or tabular information present in electronic health records. To this end, we present a novel framework and a large dataset, ChartPNG, for the task which contains 7089 annotation instances (each having a pair of progress notes and interim structured chart data) across 1616 patients. We established baselines on the dataset using locally run large language models (LLMs) from general and biomedical domains. Further, we performed automated and manual analyses to identify the challenges with the proposed task and opportunities for future research.
Due to the sensitive nature of patient electronic health records (EHRs), locally run models are preferred for various reasons including privacy, bias, and cost. However, most open-source locally run models (including medical-specific) are much smaller with limited input context size compared to the more powerful closed-source LLMs generally available through web APIs (Application Programming Interfaces). To address these challenges, we propose a framework to harness superior reasoning capabilities and medical knowledge from closed-source online LLMs in a privacy-preserving manner and seamlessly incorporate it into locally run models. Specifically, we leverage a web-based model to distill the vast patient information available in EHRs into a clinically relevant subset without sending sensitive patient health information online and use this distilled knowledge to generate progress notes by a locally run model. Our ablation results indicated that the proposed framework improves the performance of the baseline models on progress note generation by up to 4.6 points on ROUGE (a text-matching based metric) and 7.56 points on MEDCON F1 (a metric that measures the clinical concepts overlap).