Large Language Models | Haruk1y Wiki

📄️LLM Overview

Large Language Model の全体像、pretraining、post-training、推論、RAG、agent との関係を整理します。

Self-attention、multi-head attention、FFN、residual、layer norm を整理します。

トークン化から logits と生成まで、現代的な Transformer / LLM の内部処理を自然な日本語で整理します。

Transformer の Self-Attention における Query、Key、Value、causal mask、attention weight の直感を整理します。

Next-token prediction、データキュレーション、curriculum、計算規模を整理します。

Compute、data、parameter の scaling law と Chinchilla optimal を整理します。

BPE、SentencePiece、tiktoken、多言語、コード対応のトークナイザを整理します。

Sparse な MoE LLM、router、top-k routing、load balancing、Switch / Mixtral / DeepSeek-V3 を整理します。

RoPE、ALiBi、NTK scaling、YaRN、long context training を整理します。

Instruction tuning、chat template、SFT データ設計、catastrophic forgetting を整理します。

LoRA、QLoRA、Adapter、Prompt tuning など PEFT 手法を整理します。

Few-shot、Chain-of-Thought、self-consistency、ToT、prompt engineering を整理します。

RAG architecture、embedding、ranking、hybrid search、agentic RAG を整理します。

o1、DeepSeek-R1、long CoT、test-time compute scaling、reasoning RL を整理します。

KV cache、FlashAttention、PagedAttention、quantization、batching を中心に LLM inference optimization を整理します。

LLM の autoregressive decoding を高速化する KV cache、prefill、decode、TTFT、GQA / MQA、PagedAttention との関係を整理します。

LLM の decoding で使われる temperature、top-k、top-p、min-p、typical sampling を整理します。

小さい draft model と大きい target model を組み合わせて LLM decoding を高速化する speculative decoding を整理します。

複数の fine-tuned checkpoint を重み空間で合成する model merging、Model Soups、Task Arithmetic、TIES、DARE、SLERP を整理します。

MMLU、HumanEval、GPQA、Chatbot Arena、LLM-as-a-Judge、benchmark contamination を整理します。