Autonomous Driving World Models

Autonomous Driving World Models は、自動運転環境の future を予測・生成する model です。Camera、LiDAR、map、ego action、traffic agent の状態から、未来の scene、occupancy、trajectory、sensor observation を予測します。

なぜ自動運転で world model が必要か

自動運転では、単に現在の物体を検出するだけでは不十分です。

自車が加速 / 減速 / lane change したら何が起こるか
歩行者や車がどのように動くか
見えない領域に何がありそうか
危険な rare event を simulation できるか

を予測する必要があります。

World model の入力と出力

GAIA-1

GAIA-1 は、camera video、text、action input などから driving scene を生成する generative world model です。自動運転向けの realistic な video generation / simulation を目指します。

重要なのは、GAIA-1 が単なる dashcam video generator ではなく、control input に応じた future を作ることを狙っている点です。

DriveDreamer / Drive-WM 系

DriveDreamer や Drive-WM などは、diffusion や video generation を自動運転 scene に適用し、BEV、map、trajectory、agent condition を使って future driving video を生成します。

この方向では、perception、prediction、planning、simulation の境界が近づいています。

Occupancy world model

自動運転では、画像そのものよりも BEV occupancy や 3D occupancy の予測が有用な場合があります。Occupancy は「どの空間セルが占有されているか」を表すため、planning と衝突回避に直接使えます。

主なソース

GAIA-1 paper: https://arxiv.org/abs/2309.17080
DriveDreamer: https://arxiv.org/abs/2309.09777
Drive-WM: https://arxiv.org/abs/2405.01471
Waymo Open Dataset: https://waymo.com/open/

なぜ自動運転で world model が必要か​

World model の入力と出力​

GAIA-1​

DriveDreamer / Drive-WM 系​

Occupancy world model​

関連ページ​

主なソース​