Feature Matching

Feature Matching は、異なる image の間で同じ 3D point に対応する 2D point を見つける task です。SfM、SLAM、image retrieval、pose estimation、stitching などの基礎になります。

Classical pipeline

古典的な feature matching は、次の流れで行われます。

Keypoint と descriptor

Keypoint detector は、画像中の repeatable な点を検出します。Descriptor は、その点の近傍 patch を比較可能な vector に変換します。

代表的な classical method には次があります。

Method	特徴
SIFT	Scale と rotation に比較的頑健な hand-crafted descriptor です。
SURF	SIFT を高速化する方向の method です。
ORB	FAST keypoint と BRIEF descriptor を組み合わせた高速な binary feature です。

Matching

Descriptor 間の距離を使って対応点を探します。SIFT のような real-valued descriptor では Euclidean distance が使われることが多く、ORB のような binary descriptor では Hamming distance が使われます。

Outlier を減らすために、Lowe's ratio test や mutual nearest neighbor check がよく使われます。

Geometric verification

Descriptor matching だけでは outlier が多いため、Epipolar Geometry による verification が必要です。RANSAC を使って fundamental matrix や essential matrix を推定し、その model に合わない match を除去します。

Learned matching

近年は、SuperPoint、SuperGlue、LoFTR、LightGlue のように、feature detection、descriptor、matching を neural network で学習する method が広く使われます。

SuperPoint は keypoint と descriptor を jointly に学習します。
SuperGlue は graph neural network と attention を使って keypoint 間の matching を行います。
LoFTR は detector-free な dense matching に近い approach です。
LightGlue は practical な高速 matching を重視した method です。

数式で見る matching と verification

Descriptor matching は、各 keypoint $a$ の descriptor $\mathbf{d}_a$ と、別 image の keypoint $b$ の descriptor $\mathbf{d}'_b$ の距離を比べる問題として書けます。

b^*(a)=\arg\min_b \,\|\mathbf{d}_a-\mathbf{d}'_b\|_2

この式は、「keypoint $a$ にもっとも見た目が近い keypoint $b$ を探す」という意味です。ORB のような binary descriptor では、Euclidean distance の代わりに Hamming distance を使います。

Lowe の ratio test は、最も近い候補と二番目に近い候補の距離を比べます。

\frac{\|\mathbf{d}_a-\mathbf{d}'_{b_1}\|}{\|\mathbf{d}_a-\mathbf{d}'_{b_2}\|}<\tau

ここで、 $b_1$ は最近傍、 $b_2$ は二番目の最近傍、 $\tau$ は threshold です。この式の気持ちは、「一番近い候補が、二番目の候補よりも十分にはっきり近い場合だけ対応として採用する」というものです。似た模様が多い画像では、最近傍と二番目の距離が近くなりやすいため、この test は曖昧な match を減らします。

ただし、descriptor が近いだけでは幾何的に正しいとは限りません。対応点 $\tilde{\mathbf{x}}_a$ と $\tilde{\mathbf{x}}'_b$ は、fundamental matrix $\mathbf{F}$ に対して次の epipolar constraint を満たす必要があります。

{\tilde{\mathbf{x}}'_b}^{\top}\mathbf{F}\tilde{\mathbf{x}}_a = 0

RANSAC では、少数の対応から $\mathbf{F}$ や essential matrix $\mathbf{E}$ を仮定し、その仮定に合う対応の数を数えます。したがって、feature matching は「見た目の近さ」と「幾何的な整合性」の二段階で outlier を落とす処理として理解できます。

主なソース

Lowe, “Distinctive Image Features from Scale-Invariant Keypoints”, IJCV 2004: https://www.cs.ubc.ca/~lowe/papers/ijcv04.pdf
ORB paper: https://ieeexplore.ieee.org/document/6126544
OpenCV Feature Matching tutorial: https://docs.opencv.org/4.x/dc/dc3/tutorial_py_matcher.html
SuperGlue paper: https://arxiv.org/abs/1911.11763
LoFTR paper: https://arxiv.org/abs/2104.00680

Classical pipeline​

Keypoint と descriptor​

Matching​

Geometric verification​

Learned matching​

数式で見る matching と verification​

関連ページ​

主なソース​