Journal ArticleUnknown

A Deep Learning Approach to Speech Emotion Recognition in Bengali Code-Mixed Speech

Authors

Meharun Islam, Tabassuma, Anupam Singha, Kingkar Prosad Ghosh, …

Author Affiliations

Shahjalal University of Science and Technology, Bangladesh Air Force Shaheen College

Year2025

DOI10.1109/sti69347.2025.11367575

Abstract

The expansion of multilingual and code-switched communication in South Asia necessitates the development of Speech Emotion Recognition (SER) systems capable of handling complex acoustic-linguistic diversity. This study has collected 2,500 real-world recordings that serve as the foundation for this study’s meticulously developed SER framework for BanglaEnglish mixed speech. 14,562 samples that were appropriate for computational modeling were produced after the audio data underwent extensive preprocessing, including standardization, vocal separation, noise reduction, spectrogram synthesis, and consistent temporal segmentation. Data augmentation was used to enhance class-wise balance, and an expert-driven annotation methodology utilizing a majority-agreement strategy was used to assign emotion labels across five categories. Using MFCC representations, a hybrid CNN-BiLSTM architecture was created to simultaneously capture long-range temporal relationships and fine-grained…

View at Publisher

BORR does not host full-text PDFs. The button above takes you to the original publisher.

Fields & Keywords

Social Sciences Psychology Experimental and Cognitive Psychology Emotion and Mood Recognition Speech Recognition and Synthesis Sentiment Analysis and Opinion Mining Speech recognition Artificial intelligence Natural language processing Machine learning