Skip to main content

Embodied AI and Robotics Trend

Embodied AI は、視覚・言語・行動を統合し、physical world の中で agent が目標達成するための研究領域です。Computer Vision の top conference では、視覚認識単体ではなく、robot action や planning と接続する研究が増えています。

なぜ vision だけでは足りないか

認識 model が object を検出できても、robot が行動するには次が必要です。

Object の 3D pose
Graspable な geometry
Object の material / affordance
Action に対する future prediction
Failure recovery

代表的 component

Component	例
Perception	SAM、Depth Anything、VGGT
Geometry	3D Reconstruction、pose estimation、SLAM
World model	Dreamer、Genie、V-JEPA 2
Policy	VLA model、diffusion policy、imitation learning

関連ページ

なぜ vision だけでは足りないか
代表的 component
関連ページ