We propose two training techniques for improving the robustness of Neural
Networks to adversarial attacks, i.e. manipulations of the inputs that are
maliciously crafted to fool networks into incorrect predictions. Both methods
are independent of the chosen attack and leverage random projections of the
original inputs, with the purpose of exploiting both dimensionality reduction
and some characteristic geometrical properties of adversarial perturbations.
The first technique is called RP-Ensemble and consists of an ensemble of
networks trained on multiple projected versions of the original inputs. The
second one, named RP-Regularizer, adds instead a regularization term to the
training objective.