6D Object Pose Estimation

6D Object Pose Estimation は、known または unknown object の 3D rotation と translation を推定する task です。Robotics grasping、AR、industrial inspection、object tracking で重要です。

6D pose とは

Rigid object の pose は、rotation $\mathbf{R} \in SO(3)$ と translation $\mathbf{t} \in \mathbb{R}^3$ で表されます。

\mathbf{T} = \begin{bmatrix} \mathbf{R} & \mathbf{t} \\ 0 & 1 \end{bmatrix} \in SE(3)

これを 6D pose と呼びます。自由度は rotation 3、translation 3 の合計 6 です。

入力設定

設定	説明
RGB	Color image のみから pose を推定します。
RGB-D	Depth も使うため、3D alignment がしやすくなります。
Model-based	CAD model が既知です。
Model-free	対象 object の CAD model がない、または few-shot reference だけがあります。
Category-level	Instance ではなく category 内の object pose を推定します。

典型 pipeline

RGB-D の場合は、object mask 内の point cloud と CAD model を ICP で refine することもあります。

Symmetry の問題

Object pose では symmetry が大きな問題になります。円柱やボトルのような対称 object は、見た目が同じでも rotation が一意に決まりません。そのため、ADD-S のような symmetry-aware metric が使われます。

FoundationPose への流れ

近年は、category-specific な pose estimator から、model-free / foundation model 的な pose estimator へ移行しています。代表例が FoundationPose です。

数式で見る ADD / ADD-S metric

6D object pose では、rotation $\mathbf{R}$ と translation $\mathbf{t}$ を推定します。Object model の 3D point 集合を $\mathcal{M}$ とすると、ADD は次のように定義されます。

\mathrm{ADD}=\frac{1}{|\mathcal{M}|}\sum_{\mathbf{x}\in\mathcal{M}} \left\|(\mathbf{R}\mathbf{x}+\mathbf{t})-(\mathbf{R}^*\mathbf{x}+\mathbf{t}^*)\right\|_2

ここで、 $\mathbf{R}^*,\mathbf{t}^*$ は ground truth pose です。この式の気持ちは、「object model 上の点を予測 pose と正解 pose でそれぞれ配置し、その三次元的なずれを平均する」というものです。

対称物体では、見た目が同じになる点の対応が一意ではありません。そのため ADD-S では最近傍距離を使います。

\mathrm{ADD\text{-}S}=\frac{1}{|\mathcal{M}|}\sum_{\mathbf{x}_1\in\mathcal{M}} \min_{\mathbf{x}_2\in\mathcal{M}} \left\|(\mathbf{R}\mathbf{x}_1+\mathbf{t})-(\mathbf{R}^*\mathbf{x}_2+\mathbf{t}^*)\right\|_2

この式は、「対称性のためにどの点が対応しても同じ見た目になる場合には、最も近い点同士で評価する」という考え方です。

主なソース

BOP benchmark: https://bop.felk.cvut.cz/
PoseCNN: https://arxiv.org/abs/1711.00199
DenseFusion: https://arxiv.org/abs/1901.04780
FoundationPose: https://nvlabs.github.io/FoundationPose/

6D pose とは​

入力設定​

典型 pipeline​

Symmetry の問題​

FoundationPose への流れ​

数式で見る ADD / ADD-S metric​

関連ページ​

主なソース​