MoGe

MoGe (Monocular Geometry) は、単眼画像から affine-invariant な 3D point map を直接予測する monocular geometry foundation model です。Depth Anything が「monocular depth foundation」だったのに対し、MoGe は「monocular geometry foundation」を狙います。

何が新しいのか

通常の monocular depth model は depth map を出します。MoGe は、

各 pixel の 3D 座標 (pointmap)
Camera intrinsics に対する affine ambiguity を吸収する
そこから focal length まで復元できる

を一つの model で扱います。

Depth Anything との違い

観点	Depth Anything	MoGe
出力の中心	Depth (relative / metric)	3D pointmap
Intrinsics	別途必要	Pointmap から復元可能
3D Reconstruction との接続	Prior として	直接 geometry を提供
系譜	DPT / monocular depth	DUSt3R / pointmap

MoGe は、Depth Anything と DUSt3R の中間的な位置にあり、**「単眼でも pointmap を直接出せる foundation」**として整理できます。

なぜ pointmap を出すのか

Pixel ごとに depth だけを出すと、焦点距離の仮定が必要です。3D pointmap を出すと、

Camera intrinsics への依存が緩む
反射・透明領域以外で metric scale に近い形が得やすい
Downstream で NeRF / 3DGS / SfM への接続が自然

という利点があります。

主なソース

MoGe paper: https://arxiv.org/abs/2410.19115
MoGe project page: https://wangrc.site/MoGePage/

何が新しいのか​

Depth Anything との違い​

なぜ pointmap を出すのか​

関連ページ​

主なソース​

何が新しいのか

Depth Anything との違い

なぜ pointmap を出すのか

関連ページ

主なソース