Journal ArticleUnknown
Improved Bangla Speech Emotion Recognition using Multi-Feature Extraction and Hybrid Neural Architecture
Author Affiliations
Pabna University of Science and Technology
Year2025
Abstract
Speech-emotion recognition (SER) is an important research area in human-computer interaction, which can play a role in the development of emotion-sensing technologies. Although extensive research has been done on it in different languages including English, the work for Bangla is still in the initial stage. The diversity of accents and the lack of adequate open datasets have made SER in Bengali particularly challenging. The study presents a deep learning-based hybrid model using the BanglaSER corpus, where CNN, Transformer Encoder, and BiGRU combine to extract chronological and contextual information from spectrograms and low-level features. Multiple features have been used as input, including mail-spectrogram, delta-feature, and data augmentation such as Noise Addition, Pitch Shifting, Time Shifting and SpecAugment. Learning rate scheduling and…
View at Publisher
BORR does not host full-text PDFs. The button above takes you to the original publisher.