NLM DIR Seminar Schedule

UPCOMING SEMINARS

RECENT SEMINARS

Scheduled Seminars on Feb. 17, 2026

Speaker
Zhaohui Liang
PI/Lab
Sameer Antani
Time
11 a.m.
Presentation Title
Heterogeneous Graph Re-ranking for CLIP-based Medical Cross-modal Retrieval
Location
Hybrid
In-person: Building 38A/B2N14 NCBI Library or Meeting Link

Contact NLMDIRSeminarScheduling@mail.nih.gov with questions about this seminar.

Abstract:

Cross-modal retrieval of medical radiographs is a critical component of clinical decision support, cohort discovery, and large-scale data reuse. While CLIP-based vision–language models enable effective zero-shot retrieval, ranking based solely on embedding similarity does not explicitly capture higher-order relationships among images, reports, and clinical semantics. We propose a heterogeneous graph re-ranking framework that augments CLIP-based retrieval with structured relational reasoning while keeping the backbone representation model frozen. Starting from an initial CLIP ranking, the method constructs a heterogeneous k-nearest-neighbor graph over image and report embeddings and applies relation-aware message passing to refine candidate rankings.
We instantiate the framework using three representative graph neural network layer variants (GraphSAGE, GCN, and GAT), and evaluate it on chest radiograph retrieval using the OpenI-CXR and MIMIC-CXR datasets under both within-dataset validation and cross-dataset transfer. On the smaller OpenI dataset, heterogeneous graph re-ranking yields substantial improvements, with GraphSAGE increasing Strong MRR by 47.7%, Precision@10 by 58.2%, and mAP@10 by 45.3%, alongside consistent gains in nDCG. Text-to-image retrieval benefits most, with MRR improving from 0.254 to 0.384 (50.8%). On the larger MIMIC-CXR dataset, gains are more moderate but consistent: GAT improves Strong Precision@10 by 8.5% and mAP@20 by 4.9%, while GraphSAGE enhances weak retrieval performance and normal CXR screening accuracy by up to 3.1%. Cross-dataset experiments further show that heterogeneous graph re-ranking improves robustness relative to embedding-only retrieval, with attention-based models providing the most stable transfer performance.
Overall, these results demonstrate that heterogeneous graph re-ranking is an effective and practical extension to CLIP-based medical cross-modal retrieval, improving ranking quality, clinically relevant screening performance, and generalization without modifying the underlying vision–language encoder.