NLM DIR Seminar Schedule
UPCOMING SEMINARS
-
March 25, 2025 Yifan Yang
TBD -
April 1, 2025 Roman Kogay
TBD -
April 8, 2025 Jaya Srivastava
TBD -
April 15, 2025 Pascal Mutz
TBD -
April 18, 2025 Valentina Boeva, Department of Computer Science, ETH Zurich
Decoding tumor heterogeneity: computational methods for scRNA-seq and spatial omics
RECENT SEMINARS
-
March 11, 2025 Sofya Garushyants
Tmn – bacterial anti-phage defense system -
March 4, 2025 Sanasar Babajanyan
Evolution of antivirus defense in prokaryotes depending on the environmental virus load -
Feb. 25, 2025 Zhizheng Wang
GeneAgent: Self-verification Language Agent for Gene Set Analysis using Domain Databases -
Feb. 18, 2025 Samuel Lee
Efficient predictions of alternative protein conformations by AlphaFold2-based sequence association -
Feb. 11, 2025 Po-Ting Lai
Enhancing Biomedical Relation Extraction with Directionality
Scheduled Seminars on May 14, 2024
Contact NLMDIRSeminarScheduling@mail.nih.gov with questions about this seminar.
Abstract:
Objective: Latent diffusion model (LDM) is the state-of-the-art method to synthesize medical image with designated knowledge. We propose a novel knowledge-driven strategy to establish the cross-modal binding between medical knowledge and target visual patterns related to COVID-19 pneumonia with an LDM model by a class prior preservation technique.
Method: We used the Stable Diffusion 2-1-base LDM pretrained with by large image datasets as the basic model for optimization. The LDM was respectively trained by a chest X-ray (CXR) image dataset with 2,599 frontal CXR images and a chest computed tomography (CT) image dataset with 104 CT scans of confirmed COVID-19 cases and 56 normal CT scans. When trained with the CXR images, the images in the CXR dataset were paired with the pattern identifier “bilateral lung edema mRALE 24” and the class identifier “chest x-ray”. When trained with the CT images, the images in the CT dataset were paired with the pattern identifier “COVID-19 pneumonia” and the class identifier “chest CT”. The model was optimized by an objective loss function combined with the class-specific prior preservation loss and the reconstruction loss to bind the medical concepts to the corresponding visual patterns via the CLIP text encoder and the VAE in the LDM architecture. We also synthesized images respectively using Wasserstein GAN with gradient penalty (WGAN-GP) and a pure denoising diffusion implicit model (DDIM) for quality comparison.
Results: After training, the synthetic CXR images generated by the combined text prompt “bilateral lung edema mRALE 24, chest x-ray” via the LDM have the Frechet inception distance (FID) of 9.2158 and kernel inception distance (KID) 0.0818 computed with the real positive CXR images, which indicates superior quality over other methods. The classification accuracy is 0.9975 with precision of 1.0 and recall of 0.9950 when the synthetic positive images with the real negative images were classified by a trained vision transformer (ViT). The synthetic CT images generated by the combined text prompt “COVID-19 pneumonia, chest CT” via the LDM have the Frechet Inception Distance (FID) of 7.99 and Kernel Inception Distance (KID) of 0.041 computed with the real positive CT slices, which also indicates superior quality over other methods. The synthetic CT images had the classification accuracy of 0.965, F1 of 0.963, recall of 0.930, and sensitivity of 0.930 when they were considered as COVID-19 positive and classified using a model trained with real CT images.
Conclusion: We conclude that the LDM can synthesize both high quality CXR and CT images with the designated COVID-19 pneumonia patterns using the proposed knowledge driven method. It provides a new approach for cross-modality knowledge representation with large vision models.