AI Agents Overview

AI Agent は、LLM を中心に据え、外部 tool・memory・環境と相互作用しながら目標を達成する system です。単一の prompt-response を超えて、複数 step の reasoning、tool 呼び出し、計画、自己修正を行います。

AI agent loop

自作概念図。User の goal を受け取り、LLM brain が memory と tools を使いながら environment に作用し、output を返す。

何が「agent」なのか

定義は研究者によって違いますが、共通する性質は次の通りです。

目標指向: 単発の応答ではなく、目標達成のために複数 step を実行
Autonomy: 次の action を自分で決める
Tool use: 外部 API、コード実行、検索、ファイル操作などを呼び出せる
Memory: 過去の情報を保持・参照する
Environment 観測: Web、OS、files、API の状態を観測

基本 loop

これは古典 RL agent の loop と本質的に同じですが、policy が LLM であり、action が 自然言語 + tool 呼び出し である点が特徴です。

Agent の構成要素

要素	例
Brain	LLM (reasoning / planning)
Tools	search、code execution、file IO、shell、browser、API
Memory	conversation、vector store、knowledge graph
Planner	ReAct、Plan-and-Execute、Tree-of-Thoughts、PDDL
Executor	tool dispatcher、sandbox
Critic	self-reflection、test、judge

詳細ページ

ページ	内容
ReAct and Reasoning Agents	ReAct、Reflexion、Plan-and-Execute
Tool Use and Function Calling	Function calling、JSON schema、Toolformer
Agent Memory	Short / long-term memory、vector store
Multi-Agent Systems	AutoGen、CrewAI、debate、role split
Coding Agents	SWE-agent、Devin、Cursor、Codex 系
Web and Computer-Use Agents	Browser-Use、Operator、Claude Computer Use
Agent Frameworks	LangGraph、AutoGen、OpenAI Agents SDK
Agent Evaluation	SWE-Bench、GAIA、AgentBench、WebArena

LLM・RL との接続

Agent は、LLM の prompt-response を policy として roll out している、と RL 的に見ることができます。実際、

ReAct は CoT を action-conditioned に拡張したもの
学習可能な部分は post-training (SFT / DPO / GRPO) で改善できる
Tool use の reward は実行結果 (test pass、回答正誤、user feedback) で測れる

ため、Agent 系の training は LLM alignment と reasoning RL の延長線上にあります。

数式で見る agent as POMDP

AI agent は、部分観測マルコフ決定過程（POMDP）として整理できます。

\mathcal{M}=(\mathcal{S},\mathcal{A},\mathcal{O},P,O,R,\gamma)

ここで、 $\mathcal{S}$ は環境状態、 $\mathcal{A}$ は行動、 $\mathcal{O}$ は観測、 $P$ は状態遷移、 $O$ は観測モデル、 $R$ は reward です。LLM agent は完全な状態 $s_t$ を直接見られず、prompt、tool result、browser observation などの観測 $o_t$ から次の行動を選びます。

a_t\sim\pi_\theta(a_t\mid h_t), \qquad h_t=(o_1,a_1,\ldots,o_t)

この式の気持ちは、「agent は現在の画面だけではなく、これまでの観測と行動の履歴を文脈として持ち、その文脈から次の tool call や発話を選ぶ」というものです。Memory や scratchpad は、この履歴 $h_t$ を圧縮・構造化する仕組みとして見られます。

主なソース

ReAct: https://arxiv.org/abs/2210.03629
LLM Powered Autonomous Agents (Lilian Weng): https://lilianweng.github.io/posts/2023-06-23-agent/
LangGraph documentation: https://langchain-ai.github.io/langgraph/
OpenAI Agents SDK: https://openai.github.io/openai-agents-python/
Anthropic "Building Effective Agents": https://www.anthropic.com/research/building-effective-agents

何が「agent」なのか​

基本 loop​

Agent の構成要素​

詳細ページ​

LLM・RL との接続​

数式で見る agent as POMDP​

関連ページ​

主なソース​