Multimodal CNN–Transformer Framework for Explainable Pathogen Identification and Infection Severity Scoring from Microscopy Images

Authors

  • V. Kavitha
  • M. Nagseshudu
  • P. V. Kusuma
  • G. Veera Sankara Reddy
  • S. Prateep kumar
  • D. Mahendra Reddy

DOI:

https://doi.org/10.63001/tbs.2024.v19.i03.pp264-278

Keywords:

Pathogen Identification, Microscopy Image Analysis, Multimodal Deep Learning, CNN–Transformer Hybrid, Infection Severity Scoring, Explainable AI (XAI), Grad-CAM Visualization, Calibration, Domain Generalization, Medical Image Classification

Abstract

Microscopic examination remains a cornerstone of infectious disease diagnosis, yet it is constrained by inter-observer variability, limited scalability, and subjective interpretation. To overcome these challenges, we propose a Multimodal CNN–Transformer framework that integrates local texture extraction (CNN), global contextual reasoning (Vision Transformer), and metadata-aware feature fusion for automated pathogen species classification and infection severity scoring from stained microscopy images. The framework employs FiLM-based metadata conditioning to enhance cross-domain generalization and multi-task learning to jointly optimize categorical and ordinal objectives. A calibration module improves prediction reliability using temperature scaling, while Grad-CAM visualizations provide transparent, clinically interpretable infection region localization. Evaluated on 23,700 images from bacterial, fungal, and parasitic datasets collected across four laboratories, the proposed model achieved 96.2% accuracy, macro-F1 of 0.937, and QWK of 0.84, surpassing both CNN-only and Transformer-only baselines. Cross-site experiments confirm robust generalization with <2.5% accuracy drop, and explainability analysis shows >92% overlap with expert annotations. This approach demonstrates the feasibility of explainable, calibration-aware AI for reliable, point-of-care pathogen diagnostics in resource-constrained clinical environments.

Downloads

Published

2024-04-07

How to Cite

V. Kavitha, M. Nagseshudu, P. V. Kusuma, G. Veera Sankara Reddy, S. Prateep kumar, & D. Mahendra Reddy. (2024). Multimodal CNN–Transformer Framework for Explainable Pathogen Identification and Infection Severity Scoring from Microscopy Images. The Bioscan, 19(3), 264–278. https://doi.org/10.63001/tbs.2024.v19.i03.pp264-278