Text-to-3D Score Distillation Variants

Text-to-3D optimization では、text-to-image diffusion model の prior を使って 3D representation を最適化します。DreamFusion の Score Distillation Sampling（SDS）が代表的な出発点ですが、その後に VSD、CSD、multi-view SDS など多くの改良が提案されました。

SDS の基本形

3D representation の parameter を $\theta$ 、camera view $v$ からの rendering を $x=g_\theta(v)$ とします。Diffusion model の noise prediction を $\epsilon_\phi(x_t,t,y)$ とすると、SDS の勾配は次のように書けます。

\nabla_\theta\mathcal{L}_{SDS} =\mathbb{E}_{t,\epsilon,v}\left[ w(t)(\epsilon_\phi(x_t,t,y)-\epsilon)\frac{\partial x}{\partial\theta} \right]

この式の気持ちは、「diffusion model が prompt に沿う画像へ denoise したい方向を、rendering 画像を通して 3D parameter に戻す」というものです。

SDS の問題

SDS は強力ですが、次の問題が起こりやすいです。

Janus problem: 正面の顔が複数方向に現れる。
Over-saturation: 色や texture が過剰に強くなる。
Over-smoothing: geometry が丸くなりやすい。
View inconsistency: 各 view は prompt らしいが、同じ 3D object として矛盾する。

これらは、text-to-image diffusion prior が multi-view consistency を直接持たないことに由来します。

VSD の考え方

Variational Score Distillation（VSD）は、3D rendering 分布 $q_\theta(x)$ と text-conditioned image distribution $p_\phi(x\mid y)$ の KL divergence を最小化する見方を導入します。

\min_\theta D_{KL}(q_\theta(x)\|p_\phi(x\mid y))

VSD では、pretrained diffusion model の score と、rendered image に合わせた variational distribution の score の差を使います。

\nabla_\theta\mathcal{L}_{VSD} \propto \mathbb{E}\left[(s_\phi(x_t,t,y)-s_\psi(x_t,t))\frac{\partial x}{\partial\theta}\right]

この式の気持ちは、「単に pretrained prior に引っ張られるのではなく、現在の 3D rendering 分布を近似する score との差分で更新する」というものです。これにより、SDS の過剰な mode seeking を緩和できます。

Multi-view SDS

Multi-view diffusion prior を使う場合、複数 view rendering $x_{1:V}$ を同時に diffusion model に渡します。

\nabla_\theta\mathcal{L}_{MV} =\mathbb{E}\left[ \sum_{v=1}^{V}w(t)(\epsilon_{\phi,v}(x_{1:V,t},t,y)-\epsilon_v)\frac{\partial x_v}{\partial\theta} \right]

この式の気持ちは、「各 view を独立に prompt へ合わせるのではなく、複数 view が同じ object として整合するような score を使う」というものです。Multi-view prior を使うと Janus problem は減りますが、事前学習データの camera 分布や view 数に依存しやすくなります。

主なソース

DreamFusion: https://arxiv.org/abs/2209.14988
ProlificDreamer / VSD: https://arxiv.org/abs/2305.16213
MVDream: https://arxiv.org/abs/2308.16512

SDS の基本形​

SDS の問題​

VSD の考え方​

Multi-view SDS​

関連ページ​

主なソース​

SDS の基本形

SDS の問題

VSD の考え方

Multi-view SDS

関連ページ

主なソース