Home  | Publications | GXG+25

Multi-Task Partially Spoofed Speech Detection Using a Dual-View Graph Neural Network Assisted Segment-Level Module

MCML Authors

Link to Profile Björn Schuller

Björn Schuller

Prof. Dr.

Principal Investigator

Abstract

The Partially Spoofed Speech Detection (PSSD), as a multi-task learning problem, typically comprises segment- and utterance-level detection tasks, benefitting from diverse feature representations for effective classification. However, existing models for multi-tasks PSSD usually employ a shared feature processing module for the two tasks, which may lead to suboptimal performance compared with task-specific strategies. Further, most of existing works mainly capture segment-level information from a single view, which may result in poorly modeling local differences between fake and bonafide segments. In this regard, we propose a Dual-view Graph neural network Assisted segment-level Module (DGAM) for multi-task PSSD. The proposed approach contains three modules: Shared representation extracting, task-specific feature processing for the utterance-level task, and a Dual-View Graph Neural Network (D-GNN) with a dual-view consistency loss for the segment-level task through the graph attention mechanism with cosine similarity and heat kernel function with Euclidean distance as two different views, which capture semantic and Euclidean spatial relationships, respectively. Experimental evaluations on multiple spoofed-speech datasets demonstrate that, the proposed approach outperforms existing approaches in both segment- and utterance-level detection in terms of equal error rate, showcasing its effectiveness for the multi-task partially spoofed scenario.

article


IEEE Transactions on Audio, Speech and Language Processing

33. Jul. 2025.
Top Journal

Authors

Z. Ge • X. Xu • H. Guo • B. W. Schuller

Links

DOI

Research Area

 B3 | Multimodal Perception

BibTeXKey: GXG+25

Back to Top