Adversarial Robustness From Well-Separated Data

Classifiers are known to be vulnerable to adversarial examples, which are imperceptible modifications of true inputs that lead to misclassification. This raises many concerns, and recent research aims to better understand this phenomenon. We make progress on two fronts:
1) We take a holistic look at adversarial examples for non-parametric classifiers (k-NN, decision trees, random forests). We provide a general defense method, adversarial pruning, that preprocesses the dataset to become well-separated. To test our defense, we provide a novel attack that applies to many classifiers and is often optimal. Along the way, we derive a theoretically optimal robust classifier, analogous to the Bayes Optimal classifier. Empirically, we show that our general attack and defense lead to better or competitive results for k-NN, decision trees, random forests, even compared to classifier-specific methods.
2) Robustness often leads to lower test accuracy, which is undesirable. Fortunately, when the dataset is separated (e.g., cat images and dog images are not arbitrarily close), then it is possible to achieve both accuracy and robustness with a classifier that is locally smooth around the data. We consider classifiers obtained by rounding locally Lipschitz functions. Theoretically, we prove that such classifiers exist for any dataset with a positive distance between the support of different classes. Empirically, we verify this separation on MNIST, CIFAR-10, and ImageNet. Then we investigate neural network training methods that encourage smooth functions, and we show that a small local Lipschitz constant indeed correlates with high test accuracy and robustness.

Joint work with Yao-Yuan Yang, Yizhen Wang, Hongyang Zhang, Kamalika Chaudhuri, Ruslan Salakhutdinov, based on the following two papers:

May 27 @ 12:30
12:30 pm (1h)


Cyrus Rashtchian