Electroencephalography (EEG) is largely used for its very informative value of brain activity, high temporal resolution, but also portability and relatively low cost. However, its modeling is a challenging task due to the unavailability of large datasets and its very low signal-to-noise ratio. Deep learning (DL) has the potential to cope with some of these weaknesses. Unfortunately, DL models are very sensitive, not only to the size, but also to the quality of the input, and learning from clean EEG is not always guaranteed. In this work, we show how hvEEGNet, a DL model that provides high-fidelity reconstruction of multi-channel EEG data, can handle different amounts and types of artefacts. Specifically, we show how mis-labeled artefacts in the benchmark dataset 2a from the BCI competition IV lead to reconstruction failures and we investigate the relationship between the quality of the input and the model’s learning ability. This work shows the effectiveness of hvEEGNet as an anomaly detector, but also opens new critical directions for future investigations towards the development of more reliable and fair DL models for noisy EEG data.