DETR Family

DETR (DEtection TRansformer) は、object detection を set prediction として定式化した転換点となる model です。Anchor、NMS、region proposal を取り除き、Transformer の object query で直接 box と class を出します。

基本構造

各 object query は、最終的に一つの predicted object に対応
Hungarian matching で予測と GT を 1-to-1 に対応づけ
Set-based loss を使うため NMS が不要

何が新しかったか

End-to-end set prediction
Anchor 不要、NMS 不要
Object query という新しい設計概念
後続の DINO、Mask2Former、SAM の decoder などに影響

主要な発展

Model	主な改良
DETR	初の transformer detection、収束が遅い
Deformable DETR	Deformable attention で収束高速化、多 scale
Conditional DETR / DN-DETR	Query 設計の改良
DINO	Contrastive denoising training、SoTA
Mask2Former	Detection と segmentation を統一
RT-DETR	Real-time DETR、YOLO 級速度
Grounding DINO	Text を融合、open-vocabulary

RT-DETR

RT-DETR は、DETR 系を real-time に対応させた model です。Efficient hybrid encoder と uncertainty-minimal query selection によって、YOLOv8 と同等以上の速度・精度を実現します。Industrial 用途で DETR を使う場合の主力候補です。

YOLO 系との使い分け

観点	YOLO 系	DETR 系
設計	Grid + anchor	Set prediction
NMS	必要 (v10 以降不要)	不要
解釈性	Anchor 単位	Query 単位
Open-vocab 拡張	YOLO-World	Grounding DINO
Mask / pose 統合	YOLOv8	Mask2Former

数式で見る Hungarian matching

DETR は、object detection を set prediction として扱います。予測 query の集合を $\{\hat{y}_i,\hat{\mathbf{b}}_i\}_{i=1}^{N}$ 、ground truth の集合を $\{y_j,\mathbf{b}_j\}_{j=1}^{M}$ とすると、まず一対一対応 $\sigma$ を Hungarian algorithm で求めます。

\hat{\sigma}=\arg\min_{\sigma}\sum_{j=1}^{M} \mathcal{C}\left((y_j,\mathbf{b}_j),(\hat{y}_{\sigma(j)},\hat{\mathbf{b}}_{\sigma(j)})\right)

Cost は class cost と box cost の和として設計されます。

\mathcal{C}= -\hat{p}_{\sigma(j)}(y_j) +\lambda_1\|\mathbf{b}_j-\hat{\mathbf{b}}_{\sigma(j)}\|_1 +\lambda_2\mathcal{L}_{\mathrm{GIoU}}(\mathbf{b}_j,\hat{\mathbf{b}}_{\sigma(j)})

この式の気持ちは、「どの prediction がどの object を担当するかを明示的に決め、その後で分類と box を学習する」というものです。NMS に頼らず set として object を出す点が、従来の dense detector との大きな違いです。

主なソース

DETR: https://arxiv.org/abs/2005.12872
Deformable DETR: https://arxiv.org/abs/2010.04159
DINO: https://arxiv.org/abs/2203.03605
RT-DETR: https://arxiv.org/abs/2304.08069
Mask2Former: https://arxiv.org/abs/2112.01527

基本構造​

何が新しかったか​

主要な発展​

RT-DETR​

YOLO 系との使い分け​

数式で見る Hungarian matching​

関連ページ​

主なソース​