Point Cloud Foundation Models

Point Cloud Foundation Models は、点群を対象にした大規模 representation learning の流れです。2D image で ViT、MAE、DINO が効いたように、3D point cloud でも transformer と self-supervised learning が重要になっています。

Point cloud の特徴

Point cloud は image と違い、次の特徴を持ちます。

点の順序がありません。
密度が不均一です。
Occlusion や sensor noise が多いです。
Local geometry と global shape の両方が重要です。

Masked point modeling

Point-BERT や Point-MAE は、point cloud の一部を mask し、missing part の token / coordinate / feature を予測します。

Point cloud transformer

Point Transformer 系は、point cloud の local neighborhood と attention を組み合わせます。Point Transformer V3 では、scalability と efficiency が重視されます。

3D Reconstruction との関係

Point cloud foundation model は、

3D semantic segmentation
Object recognition
Registration
Completion
3D open-vocabulary understanding

の基礎 representation として使われます。

数式で見る point cloud foundation model

Point cloud foundation model は、点集合 $\mathcal{P}=\{\mathbf{p}_i\in\mathbb{R}^3\}_{i=1}^{N}$ から、per-point feature $\mathbf{F}\in\mathbb{R}^{N\times d}$ を出す関数として書けます。

\mathbf{F}=f_\theta(\mathcal{P})

Point cloud は順序不変なので、 $f_\theta$ も permutation 不変または equivariant に設計されます。代表的な構成として、点ごとに MLP を適用し、最後に対称な集約を取る形があります。

\mathbf{g}=\bigoplus_{i=1}^{N}\mathrm{MLP}(\mathbf{p}_i), \qquad \mathbf{F}_i=\mathrm{MLP}([\mathbf{p}_i;\mathbf{g}])

ここで、 $\bigoplus$ は max や sum など symmetric な集約です。この式の気持ちは、「各点を独立に処理してから、scene 全体の context を集約し、それを各点に戻すことで局所 / 大域の両方を扱う」というものです。

Self-supervised pretraining では、masked point modeling が広く使われます。Mask された点の集合 $\mathcal{M}$ について、表現または座標を予測する loss を取ります。

\mathcal{L}_{\mathrm{MPM}}=\sum_{i\in\mathcal{M}}\ell\left(\hat{\mathbf{p}}_i,\mathbf{p}_i\right)

3D point cloud は scene scale や density が dataset で大きく違うため、normalization、voxelization、neighborhood radius の選び方が品質に直結します。

主なソース

Point-BERT: https://arxiv.org/abs/2111.14819
Point-MAE: https://arxiv.org/abs/2203.06604
Point Transformer V3: https://arxiv.org/abs/2312.10035
Self-supervised point cloud survey: https://arxiv.org/abs/2305.04691

Point cloud の特徴​

Masked point modeling​

Point cloud transformer​

3D Reconstruction との関係​

数式で見る point cloud foundation model​

関連ページ​

主なソース​