Mix'n'Squeeze: Thwarting Adaptive Adversarial Samples Using Randomized Squeezing


Deep Learning (DL) has been shown to be particularly vulnerable to adversarial samples. To combat adversarial strategies, numerous defenses have been proposed in the literature. Among these, feature squeezing emerges as an effective defense by reducing unnecessary features without changing the DL model. However, feature squeezing is a static defense and does not resist adaptive attacks. Namely, feature squeezing is a deterministic process: as soon as an adversarial sample is found, this sample will always succeed against the classifier. In this work, we address this problem and introduce Mix'n'Squeeze, the first randomized feature squeezing defense that leverages key-based randomness and is secure against adaptive whitebox adversaries. Our defense consists of pre-processing the classifier inputs by embedding carefully selected randomness within each feature, before applying feature squeezing, so that an adaptive whitebox attacker can no longer predict the effect of their own perturbations on the resulting sample. We thoroughly implement and evaluate Mix'n'Squeeze in the context of image classification in light of the various reported strategies to generate adversarial samples. We also analyze the resilience of Mix'n'Squeeze with respect to state of the art adaptive strategies and we show that---in contrast to common belief---Mix'n'Squeeze does not hamper the classifier's accuracy while significantly decreasing the success probability of an adaptive whitebox adversary.