Home  | Publications | MSS24

Face Mask Type and Coverage Area Recognition From Speech With Prototypical Networks

MCML Authors

Link to Profile Björn Schuller

Björn Schuller

Prof. Dr.

Principal Investigator

Abstract

We investigate the use of prototypical networks on the problems of face mask type (3 classes), face mask coverage area (3 classes), and face mask type and coverage area (5 classes) recognition from speech. We explore the MASCFLICHT Corpus, a dataset containing 2 h 27 m 55 s of speech data from 30 German speakers recorded with a smartphone. We extract formant-related features and the spectrogram representations from the samples. We enrich the spectrograms overlaying the traces of the central frequency of the first four formants. Our experiments also consider the fusion via concatenation of the embedded representations extracted from the formant-related features and the spectrogram representations. We implement classification- and prototypical encoder-based networks. The results obtained on the test sets support the suitability of the prototypical encoder models, scoring an Unweighted Average Recall (UAR) of 49.9%, 45.0%, and 31.6% on the three considered problems, respectively.

inproceedings


IberSPEECH 2024

7th Conference IberSPEECH 2024. Aveiro, Portugal, Nov 11-13, 2024.

Authors

A. Mallol-RagoltaA. SpiesbergerB. W. Schuller

Links

PDF

Research Area

 B3 | Multimodal Perception

BibTeXKey: MSS24

Back to Top