Point Clouds, Meshes, and TSDF

3D Reconstruction の出力は、point cloud、mesh、voxel、TSDF、surfels、neural field など、さまざまな representation で表されます。用途に応じて適切な representation を選ぶ必要があります。

Point cloud

Point cloud は、3D point の集合です。各点は position に加えて、color、normal、confidence、semantic label などを持つことがあります。

Point cloud は扱いやすく、LiDAR や MVS の出力として自然です。一方で、surface connectivity を明示的に持たないため、そのままでは rendering や simulation に向かない場合があります。

Mesh

Mesh は、vertex、edge、face から構成される surface representation です。Triangle mesh が一般的です。

Mesh は graphics や simulation で扱いやすく、surface を明示的に表せます。一方で、hole、noise、non-manifold geometry などの cleanup が必要になることがあります。

TSDF

TSDF は Truncated Signed Distance Function の略です。各 voxel に、最も近い surface までの signed distance を保存します。距離は一定範囲で truncate されます。

KinectFusion のような RGB-D reconstruction では、各 frame の depth map を TSDF volume に fuse して、滑らかな surface を構築します。最終的には Marching Cubes などで mesh を抽出します。

Surfels

Surfel は surface element の略で、小さな oriented disk として surface を表します。Position、normal、radius、color などを持ちます。Dense SLAM や real-time mapping で使われます。

表現の選び方

Representation	向いている用途
Point cloud	計測データ、LiDAR、MVS、簡易 visualization
Mesh	Rendering、simulation、asset 化
TSDF / voxel	RGB-D fusion、dense mapping、surface extraction
Surfels	Real-time dense SLAM
Neural field	View synthesis、implicit reconstruction

数式で見る TSDF fusion

TSDF は、voxel center $\mathbf{x}$ から観測された surface までの signed distance を保存します。Depth image $D_i$ と camera projection $\pi_i$ があるとき、voxel $\mathbf{x}$ を camera $i$ に投影した pixel を $\mathbf{u}=\pi_i(\mathbf{x})$ とします。Camera 座標での voxel の depth を $z_i(\mathbf{x})$ とすると、signed distance は次のように書けます。

\phi_i(\mathbf{x})=D_i(\mathbf{u})-z_i(\mathbf{x})

この値が正なら voxel は観測 surface より camera 側にあり、負なら surface の奥側にあります。TSDF では、この距離を truncation 幅 $\mu$ で切ります。

\psi_i(\mathbf{x})=\mathrm{clip}\left(\frac{\phi_i(\mathbf{x})}{\mu},-1,1\right)

複数 frame を統合するときは、重み付き平均で更新します。

F_{new}(\mathbf{x})= \frac{W(\mathbf{x})F(\mathbf{x})+w_i(\mathbf{x})\psi_i(\mathbf{x})} {W(\mathbf{x})+w_i(\mathbf{x})}

W_{new}(\mathbf{x})=W(\mathbf{x})+w_i(\mathbf{x})

ここで、 $F$ は現在の TSDF 値、 $W$ は蓄積重み、 $w_i$ は今回の観測の信頼度です。この式の気持ちは、「各 depth frame が少しずつ noisy な surface 観測を持っているので、同じ voxel に関する観測を平均して滑らかな surface に近づける」というものです。最終的な surface は、 $F(\mathbf{x})=0$ となる zero crossing として抽出されます。

主なソース

Curless and Levoy, “A Volumetric Method for Building Complex Models from Range Images”, 1996: https://graphics.stanford.edu/papers/volrange/
KinectFusion paper: https://www.microsoft.com/en-us/research/publication/kinectfusion-real-time-dense-surface-mapping-tracking/
Open3D documentation: http://www.open3d.org/docs/latest/

Point cloud​

Mesh​

TSDF​

Surfels​

表現の選び方​

数式で見る TSDF fusion​

関連ページ​

主なソース​