Home  | Publications | MSS+24

Prototypical Networks for Speech Emotion Recognition in Spanish

MCML Authors

Link to Profile Björn Schuller

Björn Schuller

Prof. Dr.

Principal Investigator

Abstract

We explore the utilisation of prototypical networks in the Speech Emotion Recognition (SER) problem, creating prototypical representations of the targeted emotions in the embeddings space. We hypothesise this technique can help to improve the performance and robustness of the models, in comparison to standard classification-based approaches. We investigate two approaches to train the prototypes: one optimising a triplet loss, and the other minimising a prototypical loss. To assess our hypothesis, we exploit the EmoMatchSpanishDB Corpus; a novel dataset for SER in Spanish, which includes speech samples conveying the six basic emotions defined by Paul Ekman, in addition to the neutral state. We methodologically split the available samples into three speaker-independent train, development, and test partitions. The proposed splitting is not only balanced in terms of the speakers’ gender, but also homogenised in terms of their recognition difficulty. We analyse the performance of our models with a gender perspective. The models exploit the eGeMAPS and the wav2vec 2.0 feature representations extracted from the speech samples. We choose the Unweighted Average Recall (UAR) as the evaluation metric to assess the models’ performance. The chance level UAR for a seven-class classification problem is 14.3%. The models optimising the prototypical loss obtain the highest UAR scores on the test set, 52.0% and 52.7%, with the eGeMAPS and the wav2vec 2.0 representations, respectively. Nevertheless, the best performances are obtained with a Support Vector Classifier (SVC) implementing a radial basis function kernel, with a UAR of 54.4% and 56.9% when exploiting the eGeMAPS and the wav2vec 2.0 representations, respectively.

inproceedings


IberSPEECH 2024

7th Conference IberSPEECH 2024. Aveiro, Portugal, Nov 11-13, 2024.

Authors

A. Mallol-RagoltaA. Spiesberger • A. B. Salvador • B. W. Schuller

Links

PDF

Research Area

 B3 | Multimodal Perception

BibTeXKey: MSS+24

Back to Top