Home  | Publications | MMS24

Multi-Triplet Loss-Based Models for Categorical Depression Recognition From Speech

MCML Authors

Link to Profile Björn Schuller

Björn Schuller

Prof. Dr.

Principal Investigator

Abstract

We analyse four different acoustic feature sets towards the automatic recognition of depression from speech signals. Specifically, the feature sets investigated are based on Mel-Frequency Cepstral Coefficients (MFCC), the Low-Level Descriptors (LLD) of the eGeMAPS feature set, Mel-spectrogram coefficients, and pretrained self-supervised Wav2Vec 2.0 representations. The main hypothesis investigated lies in the use of a multi-triplet loss to improve the inter-class separability of the data representations learnt in the embedding space, boosting, ultimately, the overall system performance. To assess this aspect, we implement three different techniques to perform the classification of the embedded representations learnt. These include the combination of two fully connected layers with softmax, a linear support vector classifier, and a clustering-based classifier with k−Means. We conduct our experiments on the Extended Distress Analysis Interview Corpus, released in the Detecting Depression Subchallenge (DDS) of the 9th Audio/Visual Emotion Challenge (AVEC), in 2019. We select the Unweighted Average Recall (UAR) as the evaluation metric. Our best model exploits the eGeMAPS-based feature set, optimises a triplet loss, and utilises a LinearSVC as the classifier. Tackling the task as a 6-class classification problem, this model scores a UAR of 25.7% on the test partition, an increment in 9% of the chance level.

inproceedings


IberSPEECH 2024

7th Conference IberSPEECH 2024. Aveiro, Portugal, Nov 11-13, 2024.

Authors

A. Mallol-RagoltaM. MillingB. W. Schuller

Links

PDF

Research Area

 B3 | Multimodal Perception

BibTeXKey: MMS24

Back to Top