Out-of-the-box adversarial examples do fail under image transformations. Below, we show the same cat picture, adversarially perturbed to be incorrectly classified as a desktop computer by Inception v3 trained on ImageNet. A zoom of as little as 1.002 causes the classification probability for the correct label tabby cat to override the adversarial label
Scale-invariant adversarial examples
Adversarial examples can be created using an optimization method called projected gradient descent to find small perturbations to the image that arbitrarily fool the classifier.
Instead of optimizing for finding an input that’s adversarial from a single viewpoint, we optimize over a large ensemble of stochastic classifiers that randomly rescale the input before classifying it. Optimizing against such an ensemble produces robust adversarial examples that are scale-invariant.
Even when we restrict ourselves to only modifying pixels corresponding to the cat, we can create a single perturbed image that is simultaneously adversarial at all desired scales.
Transformation-invariant adversarial examples
By adding random rotations, translations, scales, noise, and mean shifts to our training perturbations, the same technique produces a single input that remains adversarial under any of these transformations.
Our transformations are sampled randomly at test time, demonstrating that our example is invariant to the whole distribution of transformations.