Back to Search
Journal ArticleUnknown

Improved Bangla Speech Emotion Recognition using Multi-Feature Extraction and Hybrid Neural Architecture

Author Affiliations
Pabna University of Science and Technology
Year2025

Abstract

Speech-emotion recognition (SER) is an important research area in human-computer interaction, which can play a role in the development of emotion-sensing technologies. Although extensive research has been done on it in different languages including English, the work for Bangla is still in the initial stage. The diversity of accents and the lack of adequate open datasets have made SER in Bengali particularly challenging. The study presents a deep learning-based hybrid model using the BanglaSER corpus, where CNN, Transformer Encoder, and BiGRU combine to extract chronological and contextual information from spectrograms and low-level features. Multiple features have been used as input, including mail-spectrogram, delta-feature, and data augmentation such as Noise Addition, Pitch Shifting, Time Shifting and SpecAugment. Learning rate scheduling and…
View at Publisher

BORR does not host full-text PDFs. The button above takes you to the original publisher.