Camera Pose Estimation and Relocalization

Camera Pose Estimation は、image を撮影した camera の位置と向きを推定する task です。3D Reconstruction、SLAM、AR、robotics、visual localization で中心的な役割を持ちます。

Absolute pose と relative pose

種類	入力	出力
Relative pose	二つの image	二 view 間の rotation と translation direction
Absolute pose	2D-3D correspondence	World coordinate における camera pose
Relocalization	Query image + map / database	既存 map 内での camera pose

Relative pose は Epipolar Geometry、absolute pose は PnP に基づくことが多いです。

PnP-based localization

既存の 3D map と query image がある場合、次の pipeline が典型的です。

2D feature と 3D point の対応から PnP を解き、RANSAC で outlier を除きます。

Large-scale localization では、まず image retrieval で近い database image を探し、その後に local feature matching と PnP を行います。

Global descriptor には NetVLAD、DELG、DINO 系 feature などが使われます。

Learning-based pose regression

PoseNet のように、image から camera pose を直接回帰する method もあります。ただし、high-precision localization では、feature matching + geometry の pipeline が依然として強いことが多いです。

失敗しやすい状況

見た目が変わる (昼夜、季節、天候)
Repetitive structure が多い
Query と map の viewpoint が大きく違う
Dynamic object が多い
Textureless environment

数式で見る camera relocalization

Camera relocalization は、query image $I_q$ から camera pose $\mathbf{T}_{qw}\in SE(3)$ を推定する問題です。3D map point $\mathbf{X}_j$ と query image 上の observation $\mathbf{u}_j$ が対応している場合、PnP と同じ再投影誤差を最小化できます。

\hat{\mathbf{T}}=\arg\min_{\mathbf{T}\in SE(3)} \sum_j \rho\left(\left\|\mathbf{u}_j-\pi(\mathbf{K}\mathbf{T}\mathbf{X}_j)\right\|^2\right)

ここで、 $\rho$ は robust loss です。この式の気持ちは、「map 上の 3D point が query image の観測位置に投影されるような camera pose を探す」というものです。

Learning-based relocalization では、network が pose を直接回帰する場合もあります。

\mathcal{L}=\|\mathbf{t}-\mathbf{t}^*\|_2+ \lambda d_R(\mathbf{R},\mathbf{R}^*)

$d_R$ は rotation error です。Translation と rotation は単位が異なるため、重み $\lambda$ の設計が重要になります。

主なソース

Cambridge Landmarks / PoseNet: https://www.repository.cam.ac.uk/items/53788265-4166-46f8-b09c-267c1fb8b4fe
Aachen Day-Night benchmark: https://www.visuallocalization.net/
Hierarchical Localization (hloc): https://github.com/cvg/Hierarchical-Localization

Camera Pose Estimation and Relocalization

Absolute pose と relative pose

PnP-based localization

Image retrieval + pose refinement

Learning-based pose regression

失敗しやすい状況

数式で見る camera relocalization

関連ページ

主なソース

Absolute pose と relative pose​

PnP-based localization​

Image retrieval + pose refinement​

Learning-based pose regression​

失敗しやすい状況​

数式で見る camera relocalization​

関連ページ​

主なソース​

Absolute pose と relative pose

PnP-based localization

Image retrieval + pose refinement

Learning-based pose regression

失敗しやすい状況

数式で見る camera relocalization

関連ページ

主なソース