Enhancing drone audition with rotor-conditioned deep models

Gulli, Andrea; Fontana, Federico; Drioli, Carlo; Salvati, Daniele; Ferrin, Giovanni

Enhancing drone audition with rotor-conditioned deep models

Gulli, Andrea

•

Fontana, Federico

•

Drioli, Carlo

altro

Ferrin, Giovanni

2025

journal article

Periodico

EURASIP JOURNAL ON AUDIO, SPEECH AND MUSIC PROCESSING

Abstract

The problem of recovering speech from audio recordings captured by a microphone aboard an unmanned aerial vehicle during flight is investigated. Enhancing a recording in this condition is difficult due to non-stationary noise from the motors and the propellers, along with environmental disturbance and motion-induced air flows. Together, these sources dramatically decrease the signal-to-noise ratio (SNR). This paper investigates the integration of rotor speed time series as a structured conditioning signal into neural speech enhancement models. We implement and evaluate rotor-informed variants of three state-of-the-art architectures: Wave-U-Net (time domain), DCCRN, and DCUNet (both time-frequency domain). Experiments on a custom UAV acoustics dataset spanning SNR levels from − 30 to 0 dB show that rotor conditioning yields consistent and statistically significant improvements across SNR, SI-SDR, STOI, and PESQ metrics. These benefits generalize across model families, and a lightweight rotor-informed variant achieves best or near-best results despite using only 25% of the parameters. The findings establish rotor-informed conditioning as a robust and generalizable strategy for speech enhancement in low-SNR UAV environments.

DOI

10.1186/s13636-025-00425-2

WOS

WOS:001599535700001

Archivio

https://hdl.handle.net/11390/1316824

info:eu-repo/semantics/altIdentifier/scopus/2-s2.0-105019940505

https://ricerca.unityfvg.it/handle/11390/1316824

Diritti

open access

license:creative commons

license uri:http://creativecommons.org/licenses/by/4.0/

google-scholar

Opzioni

Enhancing drone audition with rotor-conditioned deep models