Generative AI Model Ranking Matrix
Snapshot · April 2026
Side-by-side comparison of frontier language, video, and image generation models — benchmarks, capabilities, release dates, and price.
Frontier LLMs — coding & reasoning
Higher is better on benchmarks. Price is $/1M input tokens (lower is better).
| Model↕ | Vendor↕ | License↕ | Capabilities↕ | Released↕ | Context↕ | SWE-Bench Verified↕ | SWE-Bench Pro↕ | Terminal-Bench↕ | Reasoning*↕ | $/M in↕ | Best for↕ |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Claude Opus 4.7 | Anthropic | closed | VTRC | 2026-04-16 | 1M | 87.6 | 64.3 | 69.4 | 95 | 5 | Leads SWE-Bench Verified (87.6) & Pro (64.3); new tokenizer (~35% more tokens) |
| Claude Opus 4.6 | Anthropic | closed | VTRC | 2026-02 | 1M | 80.8 | 57.3 | — | 92 | 15 | Large-codebase reasoning, multi-file refactors |
| Claude Sonnet 4.6 | Anthropic | closed | VTRC | 2025-12 | 1M | 79.6 | — | — | 86 | 3 | Default coding driver — 98% of Opus at ⅕ cost |
| Claude Haiku 4.5 | Anthropic | closed | VTRC | 2025-10-15 | 200k | 73.3 | — | — | 78 | 1 | 4–5× faster than Sonnet 4.5; cheap multi-agent driver |
| GPT-5.5 | OpenAI | closed | VATRC | 2026-04-24 | 1M | 87.6 | 58.6 | 82.7 | 95 | 5 | SOTA Terminal-Bench 82.7; GDPval 84.9; OSWorld 78.7; ties Opus 4.7 on SWE-V |
| GPT-5.3 Codex | OpenAI | closed | VTRC | 2026-03 | 400k | 80.0 | 56.8 | 77.3 | 91 | 10 | Full SDLC agent: debug, terminal, PRDs, tests |
| Gemini 3.1 Pro | closed | VATRC | 2026-03 | 2M | 78.0 | — | — | 94 | 7 | Adjustable Deep Think; agentic browsing (BrowseComp 85.9) | |
| Grok 4.20 Beta 2 | xAI | closed | VT | 2026-03-03 | 256k | 75 | — | — | 80 | 2 | 4-agent backbone · IFBench #1 (83); 8% of Opus cost |
| Qwen3.6-Max-Preview | Alibaba | closed | VTR | 2026-04-20 | 1M | 79.5 | — | 77.1 | 89 | 6 | Agent loops — preserve_thinking across tool calls |
| DeepSeek V4-Pro | DeepSeek | open | TR | 2026-04-27 | 128k | 80.6 | 55.4 | 67.9 | 92 | 0.9 | 1.6T MoE / 49B active · matches Opus 4.6 at fraction of cost |
| DeepSeek V4-Flash | DeepSeek | open | TR | 2026-04-27 | 128k | — | — | — | — | 0.3 | Fresh · 158B fast variant of V4 |
| Kimi K2.6 | Moonshot | open | VTR | 2026-04-29 | 256k | 80.2 | 58.6 | 66.7 | 94 | 0.6 | 1.1T MoE · ties GPT-5.5 on SWE-Pro at $0.6/M; HLE 54 leads all |
| GLM-5.1 | Zhipu / Z.ai | open (MIT) | VTR | 2026-04-07 | 128k | 79.0 | 58.4 | — | 88 | 0.11 | 754B · #1 SWE-Bench Pro among open weights |
| MiniMax M2.7 | MiniMax | open | TR | 2026-04-20 | 205k | 78 | 56.22 | 57.0 | 87 | 0.3 | 229B · self-evolving; matches Codex on SWE-Pro |
| Qwen3.5-397B-A17B | Alibaba | open | VTR | 2026-04-24 | 256k | 80.0 | — | 54.0 | 92 | — | 403B MoE / 17B active · GPQA 88.4, MMLU-Pro 87.8, SWE-V 80.0 |
| Qwen3.6-35B-A3B | Alibaba | open | VTR | 2026-04-24 | 256k | 73.4 | 49.5 | — | 84 | — | 36B / 3B active · 73.4 SWE-V on tiny active params |
| MiMo-V2.5-Pro | Xiaomi | open | TR | 2026-04-29 | 1M | — | 57.2 | — | 89 | — | 1.02T MoE / 42B active · matches Opus 4.6 on SWE-Pro w/ 40% fewer tokens |
| Tencent Hy3-preview | Tencent | open (preview) | VTR | 2026-04-24 | 128k | 74.4 | — | 54.4 | 88 | — | 295B / 21B active · +40pts SWE-V vs Hy2; topped Tsinghua math PhD exam |
| Mistral Large 3 | Mistral | open (Apache 2.0) | VT | 2025-12-02 | 256k | — | — | — | 50 | 2 | 675B MoE / 41B active · non-reasoning (AIME ~40, GPQA ~44) · Apache 2.0 |
| Mistral Medium 3.5 | Mistral | open weights | TR | 2026-04-30 | 128k | 77.6 | — | — | 80 | 0.6 | 128B · SWE-V 77.6 beats Devstral 2 + Qwen3.5; τ³-Telecom 91.4 |
| Devstral-2-123B | Mistral | open weights | TC | 2026-02-25 | 128k | — | — | — | — | 0.6 | 125B · code-specialized variant |
| Gemma 4 31B-it | open | VT | 2026-04-29 | 256k | 52 | — | — | 87 | — | 31B dense · AIME 89.2, GPQA 84.3, τ²-Retail 86.4 (12× jump from G3) | |
| Nemotron-3-Nano-Omni | NVIDIA | open | VATR | 2026-04-29 | 128k | — | — | — | — | — | Fresh (~17h) · 30B-A3B any-to-any reasoning |
| Meta Avocado | Meta | rumored / closed | tbd | 2026-Q2 (est.) | — | — | — | — | — | — | Llama successor, slipping to May/Jun 2026 |
Capabilities key: V Vision (image input) A Audio T Tools / function calling R Extended reasoning C Computer / browser use
top tier mid lower *Reasoning is a composite score (GPQA Diamond / HLE / AIME / ARC-AGI-2, normalized 0–100, directional).
Video generation models
Quality is Artificial Analysis Image-to-Video rank (1 = best).
| Model↕ | Vendor↕ | License↕ | Released↕ | I2V Rank↕ | Max len (s)↕ | Resolution↕ | Audio↕ | $/gen↕ | Best for↕ |
|---|---|---|---|---|---|---|---|---|---|
| Kling 3.0 | Kuaishou | closed | 2026-02 | 1 | 15 (multi-shot) | 1080p | synced dialogue + SFX | 1.00 | Cinematic shots; #1 general-purpose |
| Veo 3.1 | closed | 2026-01 | 2 | 8 | 1080p | yes | 0.75 | High-fidelity realism, prompt adherence | |
| Sora 2 | OpenAI | closed | 2025-10 | 3 | 12 | 1080p | yes | 0.90 | Imaginative T2V; ChatGPT-integrated |
| Seedance 2.0 | ByteDance | closed | 2026-04 | 4 | 15 | 1080p | yes | 0.50 | Product ads, e-comm, character consistency |
| LTX-2 | Lightricks | open | 2025-Q4 | 5 | 10 | 4K@50fps | native sync | 0.20 | Open-weights leader; 4K + audio |
| Wan 2.2 | Alibaba | open | 2025-Q3 | 6 | 6 | 720p | — | 0.05 | Runs on a 4070; novel MoE denoiser |
Image generation models
Quality is directional ELO-style score from public leaderboards. Speed is typical wall-clock per image.
| Model↕ | Vendor↕ | License↕ | Released↕ | Max res↕ | Quality*↕ | Speed (s)↕ | $/img↕ | Best for↕ |
|---|---|---|---|---|---|---|---|---|
| Midjourney v8 | Midjourney | closed | 2026-03 | 2K | 95 | 10 | 0.04 | Aesthetic king · rewritten engine, 5× faster than v7 |
| FLUX.2 [pro] | Black Forest Labs | closed (API) | 2026-Q1 | 2K | 93 | 4.5 | 0.04 | Photoreal commercial — best skin, lighting, materials |
| FLUX.2 [dev] | Black Forest Labs | open weights | 2026-Q1 | 2K | 89 | 6 | 0 | Top open-weights photoreal model |
| GPT Image 2 | OpenAI | closed | 2026-Q1 | 1K+ | 92 | 6 | 0.04 | Best prompt adherence — complex composed scenes |
| Imagen 4 Ultra | closed | 2025-Q4 | 2K | 94 | 5 | 0.04 | Photoreal flagship; strong text rendering | |
| Imagen 4 Fast | closed | 2026-03 | 1K | 86 | 2 | 0.02 | Cheapest fast quality at $0.02/img | |
| Nano Banana 2 | Google (Gemini 3.1 Flash Image) | closed | 2026-03 | 1K | 82 | 1.5 | 0.02 | Fastest end-to-end (~1–3s) · in-chat editing |
| Recraft V4 | Recraft | closed | 2025-Q4 | 2K | 88 | 5 | 0.04 | Brand / design assets, vector-style outputs |
| Ideogram 3 | Ideogram | closed | 2025-Q4 | 1K | 84 | 5 | 0.03 | Typography & in-image text rendering |
| Seedream 4.5 | ByteDance | closed | 2026-Q1 | 2K | 90 | 5 | 0.03 | Asian aesthetics, character consistency |
| Stable Diffusion 4 | Stability AI | open | 2026-Q1 | 2K | 80 | 6 | 0 | Open ecosystem · ControlNet / LoRA backbone |
| Hunyuan Image 3 | Tencent | open | 2026-Q1 | 2K | 83 | 5 | 0 | Open Tencent flagship; bilingual prompts |