
Research Article
An Explainable AI Based Deep Ensemble Transformer Framework for Gastrointestinal Disease Prediction from Endoscopic Images
@ARTICLE{10.4108/airo.9795, author={Prof. Dr. Abdul kadar Muhammad Masum and Abu Kowshir Bitto and Shafiqul Islam Talukder and Md Fokrul Islam Khan and Mohammed Shamsul Alam and Khandaker Mohammad Mohi Uddin}, title={An Explainable AI Based Deep Ensemble Transformer Framework for Gastrointestinal Disease Prediction from Endoscopic Images}, journal={EAI Endorsed Transactions on AI and Robotics}, volume={4}, number={1}, publisher={EAI}, journal_a={AIRO}, year={2025}, month={8}, keywords={Gastrointestinal Disease, Medical Image Processing, Transformer Models, Ensemble Model, Explainable AI}, doi={10.4108/airo.9795} }
- Prof. Dr. Abdul kadar Muhammad Masum
Abu Kowshir Bitto
Shafiqul Islam Talukder
Md Fokrul Islam Khan
Mohammed Shamsul Alam
Khandaker Mohammad Mohi Uddin
Year: 2025
An Explainable AI Based Deep Ensemble Transformer Framework for Gastrointestinal Disease Prediction from Endoscopic Images
AIRO
EAI
DOI: 10.4108/airo.9795
Abstract
Gastrointestinal diseases such as gastroesophageal reflux disease (GERD) and polyps remain prevalent and challenging to diagnose accurately due to overlapping visual features and inconsistent endoscopic image quality. In this study, we investigate the application of transformer-based deep learning models—Vision Transformer (ViT), Swin Transformer, and a novel Ensemble Transformer model—for classifying four categories: GERD, GERD Normal, Polyp, and Polyp Normal from endoscopic images. The dataset was curated and collected in collaboration with Zainul Haque Sikder Women's Medical College & Hospital, ensuring high-quality clinical annotations. All models were evaluated using precision, recall, F1 score, and overall classification accuracy. Our proposed Ensemble Transformer model, which fuses the outputs of ViT and Swin Transformer, achieved superior performance by delivering well-balanced F1 scores across all classes, reducing misclassification, and improving robustness with an overall accuracy of 87%. Furthermore, we incorporated explainable AI (XAI) techniques such as Grad-CAM and Grad-CAM++ to generate visual explanations of the model’s predictions, enhancing interpretability for clinical validation. This work demonstrates the potential of integrating global and local attention mechanisms along with XAI in building reliable, real-time, AI-assisted diagnostic support systems for gastrointestinal disorders, particularly in resource-limited healthcare settings.
Copyright © 2025 Abdul Kadar Muhammad Masum et al., licensed to EAI. This is an open access article distributed under the terms of the CC BY-NC-SA 4.0, which permits copying, redistributing, remixing, transformation, and building upon the material in any medium so long as the original work is properly cited.