NLM DIR Seminar Schedule
UPCOMING SEMINARS
-
Dec. 10, 2024 Amr Elsawy
AI for Age-Related Macular Degeneration on Optical Coherence Tomography -
Dec. 17, 2024 Joey Thole
TBD -
Jan. 7, 2025 Qiao Jin
TBD -
Jan. 14, 2025 Ryan Bell
TBD -
Jan. 21, 2025 Qingqing Zhu
TBD
RECENT SEMINARS
-
Dec. 3, 2024 Sarvesh Soni
Toward Relieving Clinician Burden by Automatically Generating Progress Notes -
Nov. 19, 2024 Benjamin Lee
Reiterative Translation in Stop-Free Circular RNAs -
Nov. 12, 2024 Devlina Chakravarty
Fold-switching reveals blind spots in AlphaFold predictions -
Nov. 5, 2024 Max Burroughs
Revisiting the co-evolution of multicellularity and immunity across the tree of life -
Nov. 4, 2024 Finn Werner
African Swine Fever Virus transcription – from transcriptome to molecular structure
Scheduled Seminars on May 14, 2024
Contact NLMDIRSeminarScheduling@mail.nih.gov with questions about this seminar.
Abstract:
Objective: Latent diffusion model (LDM) is the state-of-the-art method to synthesize medical image with designated knowledge. We propose a novel knowledge-driven strategy to establish the cross-modal binding between medical knowledge and target visual patterns related to COVID-19 pneumonia with an LDM model by a class prior preservation technique.
Method: We used the Stable Diffusion 2-1-base LDM pretrained with by large image datasets as the basic model for optimization. The LDM was respectively trained by a chest X-ray (CXR) image dataset with 2,599 frontal CXR images and a chest computed tomography (CT) image dataset with 104 CT scans of confirmed COVID-19 cases and 56 normal CT scans. When trained with the CXR images, the images in the CXR dataset were paired with the pattern identifier “bilateral lung edema mRALE 24” and the class identifier “chest x-ray”. When trained with the CT images, the images in the CT dataset were paired with the pattern identifier “COVID-19 pneumonia” and the class identifier “chest CT”. The model was optimized by an objective loss function combined with the class-specific prior preservation loss and the reconstruction loss to bind the medical concepts to the corresponding visual patterns via the CLIP text encoder and the VAE in the LDM architecture. We also synthesized images respectively using Wasserstein GAN with gradient penalty (WGAN-GP) and a pure denoising diffusion implicit model (DDIM) for quality comparison.
Results: After training, the synthetic CXR images generated by the combined text prompt “bilateral lung edema mRALE 24, chest x-ray” via the LDM have the Frechet inception distance (FID) of 9.2158 and kernel inception distance (KID) 0.0818 computed with the real positive CXR images, which indicates superior quality over other methods. The classification accuracy is 0.9975 with precision of 1.0 and recall of 0.9950 when the synthetic positive images with the real negative images were classified by a trained vision transformer (ViT). The synthetic CT images generated by the combined text prompt “COVID-19 pneumonia, chest CT” via the LDM have the Frechet Inception Distance (FID) of 7.99 and Kernel Inception Distance (KID) of 0.041 computed with the real positive CT slices, which also indicates superior quality over other methods. The synthetic CT images had the classification accuracy of 0.965, F1 of 0.963, recall of 0.930, and sensitivity of 0.930 when they were considered as COVID-19 positive and classified using a model trained with real CT images.
Conclusion: We conclude that the LDM can synthesize both high quality CXR and CT images with the designated COVID-19 pneumonia patterns using the proposed knowledge driven method. It provides a new approach for cross-modality knowledge representation with large vision models.