Universal and Physical Attacks

Universal attack は、多くの入力に共通して効く perturbation を探す攻撃です。Physical attack は、印刷、貼り紙、照明、camera angle など現実世界の変換を通っても model を誤らせる攻撃です。

Universal adversarial perturbation

通常の adversarial example は input ごとに perturbation $\delta(x)$ を作ります。Universal perturbation は、同じ perturbation $v$ が多くの input に効くことを狙います。

\|v\|_p \le \xi, \quad \mathbb{P}_{x\sim\mathcal{D}}\left(f(x+v) \ne f(x)\right) \ge 1-\rho

ここで $\xi$ は perturbation budget、 $\rho$ は許容 failure rate です。

Universal perturbation は、decision boundary が data manifold の近くに広く存在することを示唆します。

Adversarial patch

Adversarial patch は、画像全体に微小 noise を加えるのではなく、局所的な patch を貼る攻撃です。Patch は大きく目に見える場合がありますが、物理世界で実行しやすい点が重要です。

Patch $p$ 、mask $m$ 、変換 $T$ を使うと、patched image は次のように表せます。

x' = (1-m)\odot x + m\odot T(p)

Targeted patch attack では、期待変換の下で target class $t$ の確率を上げます。

\max_p \; \mathbb{E}_{x\sim\mathcal{D},\; T\sim\mathcal{T}} \left[\log P_\theta(t \mid (1-m)\odot x + m\odot T(p))\right]

Expectation over Transformation

Physical world では、camera angle、distance、lighting、print quality、motion blur などの変換が入ります。Expectation over Transformation (EOT) は、これらの変換分布を training objective に入れます。

\max_\delta \; \mathbb{E}_{T\sim\mathcal{T}}\left[ \ell(f_\theta(T(x+\delta)),y) \right]

これにより、特定の digital input だけでなく、現実世界の変換後にも残る perturbation を作ります。

Physical-world examples

対象	攻撃例	重要な変換
Image classifier	印刷した adversarial image	camera、照明、角度
Object detector	adversarial patch / sticker	object scale、occlusion、viewpoint
Traffic sign	sticker や poster	distance、motion blur、weather
Face recognition	adversarial eyeglasses	head pose、lighting
Robot / embodied AI	adversarial texture / object	sensor fusion、control loop

Digital attack との違い

Perturbation が見える場合があります。
Sensor pipeline、compression、resizing、color correction を通る必要があります。
Attack success は single image ではなく、変換分布上の成功率で測ります。
Physical attack は safety / security impact が大きいため、responsible disclosure が重要です。

防御の観点

Data augmentation と physical transformation を含む adversarial training
Object detector の multi-view consistency check
Sensor fusion による単一 modality 依存の緩和
Human-in-the-loop と anomaly monitoring
Physical security、tamper detection

主なソース

Universal Adversarial Perturbations: https://arxiv.org/abs/1610.08401
Adversarial Patch: https://arxiv.org/abs/1712.09665
Synthesizing Robust Adversarial Examples: https://arxiv.org/abs/1707.07397
Robust Physical-World Attacks on Deep Learning Visual Classification: https://arxiv.org/abs/1707.08945
ShapeShifter: Robust Physical Adversarial Attack on Faster R-CNN: https://arxiv.org/abs/1804.05810

Universal adversarial perturbation​

Adversarial patch​

Expectation over Transformation​

Physical-world examples​

Digital attack との違い​

防御の観点​

関連ページ​

主なソース​