Journal ArticleOpen Access

EFFResNet-ViT: A Fusion-Based Convolutional and Vision Transformer Model for Explainable Medical Image Classification

Authors

Tahir Hussain, Hayaru Shouno, Abid Hussain, Dostdar Hussain, …

Author Affiliations

University of Electro-Communications, University of Science and Technology of China, Karakoram International University, National Kaohsiung University of Science and Technology, ...

Published InIEEE Access

Year2025

Citations77

DOI10.1109/access.2025.3554184

Abstract

The rapid advancement of medical imaging technologies requires the development of advanced, automated, and interpretable diagnostic tools for clinical decision-making. Although convolutional neural networks (CNNs) have shown significant promise in medical image analysis, they have limitations in capturing the global context and lack interpretability, thereby hindering their clinical adoption. This study presents EFFResNet-ViT, a novel hybrid deep learning (DL) model designed to address these challenges by combining EfficientNet-B0 and ResNet-50 CNN backbones with a vision transformer (ViT) module. The proposed architecture employs a feature fusion strategy to integrate the local feature extraction strengths of CNNs with the global dependency modeling capabilities of transformers. The extracted features are further refined through a post-transformer CNN and a global average pooling layer to…

View at Publisher

BORR does not host full-text PDFs. The button above takes you to the original publisher.

Fields & Keywords

Health Sciences Medicine Radiology, Nuclear Medicine and Imaging Radiomics and Machine Learning in Medical Imaging AI in cancer detection Medical Imaging and Analysis Artificial intelligence Computer vision Linguistics Electrical engineering