1. Academic Validation
  2. Hybrid framework for lesion-aware, clinically coherent chest X-ray report generation using contrastive learning and large language models

Hybrid framework for lesion-aware, clinically coherent chest X-ray report generation using contrastive learning and large language models

  • Sci Rep. 2026 Jan 5;16(1):4645. doi: 10.1038/s41598-025-34799-2.
Won-Jun Noh # 1 Sun-Woo Pi # 1 Byoung-Dai Lee 2 3
Affiliations

Affiliations

  • 1 Department of Computer Science, Graduate School, Kyonggi University, 154-42, Gwanggyosan-ro, Yeongtong-gu, Suwon-si, 16227, Gyeonggi-do, Republic of Korea.
  • 2 Department of Computer Science, Graduate School, Kyonggi University, 154-42, Gwanggyosan-ro, Yeongtong-gu, Suwon-si, 16227, Gyeonggi-do, Republic of Korea. [email protected].
  • 3 Division of AI and Computer Engineering, Kyonggi University, 154-42, Gwanggyosan-ro, Yeongtong-gu, Suwon-si, 16227, Gyeonggi-do, Republic of Korea. [email protected].
  • # Contributed equally.
Abstract

Automated radiology report generation from chest X-rays (CXRs) has the potential to reduce the workload of radiologists and improve diagnostic consistency. However, conventional approaches have been constrained by trade-offs between understanding global images and characterizing fine-grained lesions, often leading to omissions or clinically inconsistent narratives. This study proposed a hybrid framework, CLALA-Net, to integrate global and regional representations through three key modules: Lesion Cross-Attention (LCA), Lesion-Level Contrastive Learning (LLCL), and Image-Text Contrastive Learning (ITCL). LCA injects lesion-level cues derived from full-image classification into each region of interest (ROI), LLCL enhances discriminability by aligning lesion representations across CXRs, and ITCL improves visual-textual semantic alignment. A large language model (LLM)-based aggregator was utilized to consolidate ROI-level descriptions into a clinically coherent report. An LLM-driven label extraction pipeline was introduced to generate fine-grained lesion annotations for training and evaluation. Extensive experiments on the Chest-Imagenome dataset demonstrated that CLALA-Net outperformed existing baselines in both lesion-level accuracy (mean F1-score: 0.40) and report-level consistency (total score: 14.32/20). Ablation studies confirmed the complementary roles of LCA and LLCL, whereas the sensitivity analysis indicated strong performance gains with improved label quality. By bridging full-image contextual reasoning with regional-level lesion analysis, CLALA-Net produced accurate, semantically consistent, and clinically reliable chest radiography reports. This framework provides a robust and interpretable foundation for the real-world deployment of automated radiological reporting.

Supplementary Information: The online version contains supplementary material available at 10.1038/s41598-025-34799-2.

Keywords

Chest x-ray; Contrastive learning; Large language model; Multimodal learning; Radiology report generation.

Figures
Products
  • Cat. No.
    Product Name
    Description
    Target
    Research Area
  • HY-50767
    99.94%, CDK4/6 Inhibitor
    CDK