Certified Robustness

Certified robustness は、特定の perturbation set の中では model の予測が変わらないことを数学的に保証する分野です。Empirical robustness が「この attack では破れなかった」という経験的評価であるのに対し、certified robustness は「この radius までは破れない」と証明します。

Randomized smoothing

自作概念図。Randomized smoothing は、入力周辺に Gaussian noise を加えて base classifier の多数決を取り、class probability の差から $L_2$ certified radius を計算します。

Certificate の定義

入力 $x$ に対して、半径 $R$ の範囲では予測が変わらないことを保証できるとします。

\forall \delta \; \text{s.t.}\; \|\delta\|_p < R, \quad g(x+\delta)=g(x)

この $R$ を certified radius と呼びます。Dataset 上では、ある半径 $\epsilon$ で certificate を持つ sample の割合を certified accuracy として報告します。

\mathrm{CertifiedAcc}(\epsilon)=\frac{1}{n}\sum_i \mathbf{1}[g(x_i)=y_i \;\land\; R_i \ge \epsilon]

Randomized smoothing

Randomized smoothing は、任意の base classifier $f$ から smoothed classifier $g$ を作ります。

g(x)=\arg\max_c \; \mathbb{P}_{\eta\sim\mathcal{N}(0,\sigma^2 I)}(f(x+\eta)=c)

最も確率が高い class を $A$ 、二番目に高い class を $B$ とし、確率を $p_A$ 、 $p_B$ とします。このとき、 $L_2$ certified radius は次のように与えられます。

R = \frac{\sigma}{2}\left(\Phi^{-1}(p_A)-\Phi^{-1}(p_B)\right)

ここで $\Phi^{-1}$ は standard Gaussian CDF の inverse です。 $p_A$ と $p_B$ の差が大きいほど、より大きい radius を certify できます。

Monte Carlo 推定

実際には $p_A$ と $p_B$ は closed-form でわからないため、多数の noise sample で推定します。Certification では統計的 confidence bound を使い、過大評価しないように lower bound / upper bound を計算します。

Bound propagation

Interval Bound Propagation (IBP) は、入力の perturbation interval を layer ごとに伝播し、出力 logit の上限 / 下限を計算します。

入力が $x \in [\underline{x},\overline{x}]$ にあるとき、linear layer $z=Wx+b$ の bound は、 $W$ の正負に分けて計算できます。

\underline{z}=W^+\underline{x}+W^-\overline{x}+b

\overline{z}=W^+\overline{x}+W^-\underline{x}+b

ここで、 $W^+=\max(W,0)$ 、 $W^-=\min(W,0)$ です。最後に、正解 class の logit lower bound が他 class の upper bound より大きければ certify できます。

\underline{z}_y > \max_{j\ne y}\overline{z}_j

Convex relaxation

ReLU network の exact verification は一般に難しいため、ReLU を convex relaxation で緩めて bound を計算する方法があります。CROWN、Fast-Lin、DeepPoly などはこの family に入ります。

Lipschitz bound

Model の Lipschitz constant $K$ がわかれば、logit margin $m(x)$ から certificate を得られます。

\|f(x)-f(x+\delta)\| \le K\|\delta\|

正解 class と他 class の margin が十分大きければ、予測が変わらない半径を下界できます。ただし、tight な Lipschitz bound を得るのは難しいです。