Home  | Publications | DSA+25

Cross-Dialect Bird Species Recognition With Dialect-Calibrated Augmentation

MCML Authors

Link to Profile Björn Schuller

Björn Schuller

Prof. Dr.

Principal Investigator

Abstract

Dialect variation hampers automatic recognition of bird calls collected by passive acoustic monitoring. We address the problem on DB3V, a three-region, ten-species corpus of 8-s clips, and propose a deployable framework built on Time-Delay Neural Networks (TDNNs). Frequency-sensitive normalisation (Instance Frequency Normalisation and a gated Relaxed-IFN) is paired with gradient-reversal adversarial training to learn region-invariant embeddings. A multi-level augmentation scheme combines waveform perturbations, Mixup for rare classes, and CycleGAN transfer that synthesises Region 2 (Interior Plains)-style audio, , with Dialect-Calibrated Augmentation (DCA) softly down-weighting synthetic samples to limit artifacts. The complete system lifts cross-dialect accuracy by up to twenty percentage points over baseline TDNNs while preserving in-region performance. Grad-CAM and LIME analyses show that robust models concentrate on stable harmonic bands, providing ecologically meaningful explanations. The study demonstrates that lightweight, transparent, and dialect-resilient bird-sound recognition is attainable.

misc


Preprint

Sep. 2025

Authors

J. Ding • Q. Sun • A. Akman • B. W. Schuller

Links


Research Area

 B3 | Multimodal Perception

BibTeXKey: DSA+25

Back to Top