EURASIP JOURNAL ON AUDIO, SPEECH AND MUSIC PROCESSING
Abstract
The problem of recovering speech from audio recordings captured by a microphone aboard an unmanned aerial vehicle during flight is investigated. Enhancing a recording in this condition is difficult due to non-stationary noise from the motors and the propellers, along with environmental disturbance and motion-induced air flows. Together, these sources dramatically decrease the signal-to-noise ratio (SNR). This paper investigates the integration of rotor speed time series as a structured conditioning signal into neural speech enhancement models. We implement and evaluate rotor-informed variants of three state-of-the-art architectures: Wave-U-Net (time domain), DCCRN, and DCUNet (both time-frequency domain). Experiments on a custom UAV acoustics dataset spanning SNR levels from − 30 to 0 dB show that rotor conditioning yields consistent and statistically significant improvements across SNR, SI-SDR, STOI, and PESQ metrics. These benefits generalize across model families, and a lightweight rotor-informed variant achieves best or near-best results despite using only 25% of the parameters. The findings establish rotor-informed conditioning as a robust and generalizable strategy for speech enhancement in low-SNR UAV environments.