Home | Publications | MMS24

Multi-Triplet Loss-Based Models for Categorical Depression Recognition From Speech

MCML Authors

Adria Mallol-Ragolta

→ Group Björn Schuller
Health Informatics

Manuel Milling

→ Group Björn Schuller
Health Informatics

Björn Schuller

Prof. Dr.

Principal Investigator

Health Informatics

Abstract

We analyse four different acoustic feature sets towards the automatic recognition of depression from speech signals. Specifically, the feature sets investigated are based on Mel-Frequency Cepstral Coefficients (MFCC), the Low-Level Descriptors (LLD) of the eGeMAPS feature set, Mel-spectrogram coefficients, and pretrained self-supervised Wav2Vec 2.0 representations. The main hypothesis investigated lies in the use of a multi-triplet loss to improve the inter-class separability of the data representations learnt in the embedding space, boosting, ultimately, the overall system performance. To assess this aspect, we implement three different techniques to perform the classification of the embedded representations learnt. These include the combination of two fully connected layers with softmax, a linear support vector classifier, and a clustering-based classifier with k−Means. We conduct our experiments on the Extended Distress Analysis Interview Corpus, released in the Detecting Depression Subchallenge (DDS) of the 9th Audio/Visual Emotion Challenge (AVEC), in 2019. We select the Unweighted Average Recall (UAR) as the evaluation metric. Our best model exploits the eGeMAPS-based feature set, optimises a triplet loss, and utilises a LinearSVC as the classifier. Tackling the task as a 6-class classification problem, this model scores a UAR of 25.7% on the test partition, an increment in 9% of the chance level.

inproceedings MMS24