CAFNet: Cross-Modal Attention Fusion Network with Uncertainty-Aware Risk Stratification for Automated Uterine Tumor Classification in Ultrasound Imaging
Keywords:
Cross modal, attention, Uterine fibroid, Multi-modal fusion, Uncertainty quantification, Deep learning, Ultrasound imaging.Abstract
By the age of 50, 70% of women have developed uterine fibroids, leading to healthcare costs of $34 billion every year in the U.S. and 200,000+ hysterectomies. Existing diagnosis based on subjective MUSA criteria has high inter-observer variability (κ=0.42-0.68). Although B-mode and Doppler ultrasound complement each other in morphological and hemodynamic data, there is no framework that uses cross-modal fusion as well as uncertainty quantification to yield real-time classification. We present CAFNet (Cross-Modal Attention Fusion Network), which is able to process in real-time at 23.8 FPS with 28.4M parameters and 3.5 GFLOPs. Based on the Uterine Fibroid Ultrasound data (1,990 images, 369 patients), CAFNet uses two EfficientNet-B0 encoders with Monte Carlo Dropout, bidirectional Query-Key-Value cross-attention. CAFNet was evaluated through 5-fold cross-validation, obtaining 96.4+0.3% accuracy, 94.6+0.5% sensitivity, 97.7+0.2% specificity, and 0.966+0.003 AUC, which is significantly higher than both TransUNet (p=0.008) and MUSA criteria (p<0.001). Multi-modality provided +8.0% improvement compared to single-modality, and the cross-modality gave +3.6% compared to self-attention. Uncertainty quantification allows case handling to be automated (98.2% accuracy) with 72.5% of the workload being taken up by radiologists, and 78% of misclassifications being researched by experts. CAFNet defines the first uncertainty-aware cross-modal attention system to classify uterine tumors, aims to tackle diagnostic variability under resource constraints.



















