3D Reconstruction Overview

3D Reconstruction は、2D image、video、RGB-D、LiDAR、IMU などの sensor data から、camera pose、scene geometry、surface、semantic structure を推定する分野です。Photogrammetry、robotics、AR / VR、autonomous driving、digital twin、graphics、NeRF / 3D Gaussian Splatting など、多くの応用にまたがります。

3D Reconstruction task map

自作概念図。3D Reconstruction は、入力 sensor data から perception、geometry、foundation model、reconstruction、representation へ流れる複合 pipeline として整理できます。

何を推定するのか

3D Reconstruction で推定する対象は、一つではありません。代表的には次のようなものがあります。

推定対象	例
Camera pose	各 frame の camera position と orientation
Sparse geometry	feature point を triangulation して得る sparse point cloud
Dense depth	各 pixel の depth map や disparity map
Surface	mesh、TSDF、surfels、point cloud、normal map
Motion	optical flow、scene flow、visual odometry
Semantics	semantic segmentation、instance segmentation、panoptic segmentation
Neural scene representation	NeRF、3D Gaussian Splatting、neural implicit surface

典型的な pipeline

Image collection から 3D model を作る場合、典型的な pipeline は次のようになります。

Real-time robotics では、この pipeline は Visual Odometry や SLAM として、online に実行されます。一方で、offline photogrammetry では Structure from Motion と Multi-View Stereo を組み合わせることが多いです。

知識要素の地図

分類	ページ
幾何の基礎	Camera Models and Coordinates, Epipolar Geometry, Triangulation, PnP, and ICP
画像間対応	Feature Matching, Stereo Matching, Optical Flow, Scene Flow
Dense perception	Depth Estimation, Surface Normals and Normal Maps, Photometric Stereo, Segmentation for 3D Reconstruction
Reconstruction pipeline	Structure from Motion, Multi-View Stereo, Bundle Adjustment, Point Clouds, Meshes, and TSDF
Real-time mapping	Visual Odometry, Place Recognition and Loop Closure, SLAM
Neural methods	Neural 3D Reconstruction, Gaussian Splatting Overview, VGGT, DUSt3R Family, Depth Anything, Segment Anything
3D Generation	3D Generation Overview
評価と実務	Datasets and Metrics for 3D Reconstruction, Practical 3D Reconstruction Pipeline

重要な分け方

Sparse と dense

Sparse reconstruction は、特徴点など限られた点だけを使って geometry を推定します。SfM や SLAM の初期 map は sparse であることが多いです。Dense reconstruction は、pixel 単位の depth や surface を推定し、mesh や dense point cloud を作ります。

Offline と online

Offline reconstruction は、全画像をまとめて処理し、高品質な 3D model を作ることを重視します。Online reconstruction は、robot や AR device が動きながら、現在位置と map を同時に更新することを重視します。

Geometry-based と learning-based

Geometry-based method は、camera geometry、multi-view consistency、photometric consistency を明示的に使います。Learning-based method は、大量の data から depth、matching、segmentation、scene representation を学習します。実用上は、両者を組み合わせることが多いです。

主なソース

COLMAP documentation: https://colmap.readthedocs.io/en/stable/
Schönberger and Frahm, “Structure-from-Motion Revisited”, CVPR 2016: https://openaccess.thecvf.com/content_cvpr_2016/papers/Schonberger_Structure-From-Motion_Revisited_CVPR_2016_paper.pdf
Cadena et al., “Past, Present, and Future of SLAM”, 2016: https://arxiv.org/abs/1606.05830
OpenCV documentation: https://docs.opencv.org/
Szeliski, “Computer Vision: Algorithms and Applications”: https://szeliski.org/Book/

何を推定するのか​

典型的な pipeline​

知識要素の地図​

重要な分け方​

Sparse と dense​

Offline と online​

Geometry-based と learning-based​

主なソース​