Journal ArticleOpen Access
EFFResNet-ViT: A Fusion-Based Convolutional and Vision Transformer Model for Explainable Medical Image Classification
Authors
Author Affiliations
University of Electro-Communications, University of Science and Technology of China, Karakoram International University, National Kaohsiung University of Science and Technology, ...
Published InIEEE Access
Year2025
Citations77
Abstract
The rapid advancement of medical imaging technologies requires the development of advanced, automated, and interpretable diagnostic tools for clinical decision-making. Although convolutional neural networks (CNNs) have shown significant promise in medical image analysis, they have limitations in capturing the global context and lack interpretability, thereby hindering their clinical adoption. This study presents EFFResNet-ViT, a novel hybrid deep learning (DL) model designed to address these challenges by combining EfficientNet-B0 and ResNet-50 CNN backbones with a vision transformer (ViT) module. The proposed architecture employs a feature fusion strategy to integrate the local feature extraction strengths of CNNs with the global dependency modeling capabilities of transformers. The extracted features are further refined through a post-transformer CNN and a global average pooling layer to…
View at Publisher
BORR does not host full-text PDFs. The button above takes you to the original publisher.