Home | Publications | MSS24

Face Mask Type and Coverage Area Recognition From Speech With Prototypical Networks

MCML Authors

Adria Mallol-Ragolta

→ Group Björn Schuller
Health Informatics

Anika Spiesberger

→ Group Björn Schuller
Health Informatics

Björn Schuller

Prof. Dr.

Principal Investigator

Health Informatics

Abstract

We investigate the use of prototypical networks on the problems of face mask type (3 classes), face mask coverage area (3 classes), and face mask type and coverage area (5 classes) recognition from speech. We explore the MASCFLICHT Corpus, a dataset containing 2 h 27 m 55 s of speech data from 30 German speakers recorded with a smartphone. We extract formant-related features and the spectrogram representations from the samples. We enrich the spectrograms overlaying the traces of the central frequency of the first four formants. Our experiments also consider the fusion via concatenation of the embedded representations extracted from the formant-related features and the spectrogram representations. We implement classification- and prototypical encoder-based networks. The results obtained on the test sets support the suitability of the prototypical encoder models, scoring an Unweighted Average Recall (UAR) of 49.9%, 45.0%, and 31.6% on the three considered problems, respectively.

inproceedings MSS24