Source: https://drive.google.com/file/d/1Pou5InOgSZiEYtRofYpRd13sRL40C9cs/view

Adversarial Examples for Human Vision - Vijay Veerabadran

Adversarial Examples

  • Perturbations designed by an adversary to misalign predictions made by machine learning models
  • In computer vision: Perturbations added to images that cause an image classifier to make wrong predictions

Security Concerns Caused by Adversarial Images

  • Deep learning models deployed in critical applications such as autonomous driving, ATS, etc. are exposed to security concerns.

Adversarial examples have been assumed to uniquely plague machine perception. Do they also influence human perception?

Generating adversarial examples

  • Adversarial attacks often generated through an optimization process requiring access to the victim model’s parameters, gradients, and classification decisions
  • These details are inaccessible from the human vision system

Hints suggesting effect on human perception

  • ”Blackbox” adversarial images are capable of affecting predictions of deep learning models
    • Without access to the model’s learned parameters
    • Without access to the model’s architecture
    • Without requiring to run inference on the model (or access to classification decisions)

Hypothesis

H: Adversarial examples that transfer strongly across deep learning models target semantic features that may also influence human visual perception
Aim: Identify conditions wherein we can detect adversarial examples influencing human perception

Study 1: Adversarial examples that fool both computer vision models and time-limited humans

Experiment 1 setup

  • Subjects performed a time-limited classification task
  • Dynamic mask used to reduce influence from feedback connections
  • Subjects responded with one of two options using a response time box

Experiment 1 conditions

  • Conditions
    • image: Clean image which is classified as class T
    • AAdversarial image corresponding to “image” that is classified as A
    • control: Perturbed (non-adversarial) image with the same amount of noise as A

Experiment 1 results

A has an increased error rate over the control group

Limitations of Experiment 1

  • Adversarial perturbations were large in magnitude ( intensity levels)
  • Effect demonstrated only for time-limited stimulus viewing

Current study: Subtle adversarial image manipulations influence both human and machine perception

Image-pair assessment experiment setup

  • Task: Choose which image looks more <A>-like
  • Number of subjects: n=100
  • Data collected:
    • Participant choice
    • Response time

Experiment 4 conditions (A vs A')

  • image: Unperturbed image classified as T
  • A: targeted adversarial image, target = A
  • A’: targeted adversarial image, target = A’
  • : L- norm of perturbation magnitude ()

Experiment 4 results

A image perceived more A-like than A’ images (and vice-versa)

Summary so far

  • Subtle adversarial image manipulations influence human perception
  • We identify three conditions where the effect can be robustly detected for human vision
  • What ANN properties are crucial for influencing human perception?

ANN properties crucial for transfer to humans

  • Are there certain factors about certain ANNs perturbations that appeal more to humans?
  • We designed an experiment to test for differences as a function of model architectures in producing perturbations that transfer better to humans

Self-attention models vs. convolution models

Alignment between humans and self-attention models was higher than the alignment between humans and convolutional models

Self-attention vs. Convolution Perturbations

  • Visualizing adversarial perturbations (delta images) obtained from self-attention and convolutional models
  • Self-attention perturbation images contain relatively more evidence for edges