EXPLAINABLE TWO-STAGE VISION TRANSFORMER FRAMEWORK FOR CORAL REEF DISEASE DETECTION AND INTERPRETATION
Abstract
Coral reefs are increasingly threatened by climate change and diseases like white band disease. Accurate monitoring of coral health is essential for effective conservation. Several machine learning and deep learning models have been developed for coral reef monitoring, achieving high accuracy. However, most existing models lack interpretability and do not provide insight into the reasoning behind their predictions. To address this limitation, this paper proposes a novel architecture for coral reef type classification and white band disease detection using a two-stage Vision Transformer (ViT) framework combined with Explainable Artificial Intelligence (XAI) technique. Coral reef images undergo preprocessing to enhance quality, followed by augmentation to expand the dataset and improve model robustness. These processed images are fed into the two-stage ViT framework for feature extraction and classification. In the first stage, the model identifies the type of coral reef. In the second stage, the original image is analyzed together with the stage one output to detect the presence of white band disease. Performance of the proposed model is evaluated using standard metrics, including accuracy, precision, recall, and F1-score. Grad-CAM visualization is employed to highlight the regions influencing the model’s decisions, providing interpretability and increasing trust in predictions. Experimental results demonstrate that the proposed framework not only accurately classifies coral reef types but also effectively detects white band disease with higher performance compared to existing methods. The integration of XAI and two-stage ViT architecture enables both precise predictions and interpretable results, making the framework a valuable tool for coral reef monitoring and conservation efforts.
Coral reef classification, Deep learning, Explainable artificial intelligence, Grad-CAM, Vision transformer



















