Generative AI Model Ranking Matrix
Living benchmark · refreshed nightly
Frontier language, video, and image generation models compared side by side: benchmarks, capabilities, release dates, and price.
Updated
Frontier LLMs: coding & reasoning
License Caps / filter 1–3 tabs
No models match.
| Model | Vendor | Params (B) | License | Capabilities | Released | Context | SWE-Bench Verified | SWE-Bench Pro | Terminal-Bench | Reasoning* | $/M in | Best for |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Mistral Medium 3.5 | Mistral | 128 | open weights | TR | 2026-04-30 | 128k | 77.6 | N/A | N/A | 80 | 0.6 | 128B · SWE-V 77.6 beats Devstral 2 + Qwen3.5; τ³-Telecom 91.4 |
| Kimi K2.6 | Moonshot | 1000 | open | VTR | 2026-04-29 | 256k | 80.2 | 58.6 | 66.7 | 94 | 0.6 | 1.1T MoE · ties GPT-5.5 on SWE-Pro at $0.6/M; HLE 54 leads all |
| MiMo-V2.5-Pro | Xiaomi | 1023.2 | open | TR | 2026-04-29 | 1M | N/A | 57.2 | N/A | 89 | N/A | 1.02T MoE / 42B active · matches Opus 4.6 on SWE-Pro w/ 40% fewer tokens |
| Gemma 4 31B-it | 31 | open | VT | 2026-04-29 | 256k | 52 | N/A | N/A | 87 | N/A | 31B dense · AIME 89.2, GPQA 84.3, τ²-Retail 86.4 (12× jump from G3) | |
| Nemotron-3-Nano-Omni | NVIDIA | 30 | open | VATR | 2026-04-29 | 128k | N/A | N/A | N/A | N/A | N/A | Fresh (~17h) · 30B-A3B any-to-any reasoning |
| DeepSeek V4-Pro | DeepSeek | 861.6 | open | TR | 2026-04-27 | 128k | 80.6 | 55.4 | 67.9 | 92 | 0.9 | 1.6T MoE / 49B active · matches Opus 4.6 at fraction of cost |
| DeepSeek V4-Flash | DeepSeek | 158.1 | open | TR | 2026-04-27 | 128k | N/A | N/A | N/A | N/A | 0.3 | Fresh · 158B fast variant of V4 |
| GPT-5.5 | OpenAI | N/A | closed | VATRC | 2026-04-24 | 1M | 87.6 | 58.6 | 82.7 | 95 | 5 | SOTA Terminal-Bench 82.7; GDPval 84.9; OSWorld 78.7; ties Opus 4.7 on SWE-V |
| Qwen3.5-397B-A17B | Alibaba | 397 | open | VTR | 2026-04-24 | 256k | 80 | N/A | 54 | 92 | N/A | 403B MoE / 17B active · GPQA 88.4, MMLU-Pro 87.8, SWE-V 80.0 |
| Qwen3.6-35B-A3B | Alibaba | 35 | open | VTR | 2026-04-24 | 256k | 73.4 | 49.5 | N/A | 84 | N/A | 36B / 3B active · 73.4 SWE-V on tiny active params |
| Tencent Hy3-preview | Tencent | 298.8 | open (preview) | VTR | 2026-04-24 | 128k | 74.4 | N/A | 54.4 | 88 | N/A | 295B / 21B active · +40pts SWE-V vs Hy2; topped Tsinghua math PhD exam |
| Qwen3.6-Max-Preview | Alibaba | N/A | closed | VTR | 2026-04-20 | 1M | 79.5 | N/A | 77.1 | 89 | 6 | Agent loops: preserve_thinking across tool calls |
| MiniMax M2.7 | MiniMax | 228.7 | open | TR | 2026-04-20 | 205k | 78 | 56.22 | 57 | 87 | 0.3 | 229B · self-evolving; matches Codex on SWE-Pro |
| Claude Opus 4.7 | Anthropic | N/A | closed | VTRC | 2026-04-16 | 1M | 87.6 | 64.3 | 69.4 | 95 | 5 | Leads SWE-Bench Verified (87.6) and Pro (64.3); new tokenizer (~35% more tokens) |
| GLM-5.1 | Zhipu / Z.ai | 753.9 | open (MIT) | VTR | 2026-04-07 | 128k | 79 | 58.4 | N/A | 88 | 0.11 | 754B · #1 SWE-Bench Pro among open weights |
| Meta Avocado | Meta | N/A | rumored / closed | tbd | 2026-Q2 (est.) | N/A | N/A | N/A | N/A | N/A | N/A | Llama successor, slipping to May/Jun 2026 |
| Grok 4.20 Beta 2 | xAI | N/A | closed | VT | 2026-03-03 | 256k | 75 | N/A | N/A | 80 | 2 | 4-agent backbone · IFBench #1 (83); 8% of Opus cost |
| GPT-5.3 Codex | OpenAI | N/A | closed | VTRC | 2026-03 | 400k | 80 | 56.8 | 77.3 | 91 | 10 | Full SDLC agent: debug, terminal, PRDs, tests |
| Gemini 3.1 Pro | N/A | closed | VATRC | 2026-03 | 2M | 78 | N/A | N/A | 94 | 7 | Adjustable Deep Think; agentic browsing (BrowseComp 85.9) | |
| Devstral-2-123B | Mistral | 123 | open weights | TC | 2026-02-25 | 128k | N/A | N/A | N/A | N/A | 0.6 | 125B · code-specialized variant |
| Claude Opus 4.6 | Anthropic | N/A | closed | VTRC | 2026-02 | 1M | 80.8 | 57.3 | N/A | 92 | 15 | Large-codebase reasoning, multi-file refactors |
| Mistral Large 3 | Mistral | 675 | open (Apache 2.0) | VT | 2025-12-02 | 256k | N/A | N/A | N/A | 50 | 2 | 675B MoE / 41B active · non-reasoning (AIME ~40, GPQA ~44) · Apache 2.0 |
| Claude Sonnet 4.6 | Anthropic | N/A | closed | VTRC | 2025-12 | 1M | 79.6 | N/A | N/A | 86 | 3 | Default coding driver. 98% of Opus at ⅕ cost |
| Claude Haiku 4.5 | Anthropic | N/A | closed | VTRC | 2025-10-15 | 200k | 73.3 | N/A | N/A | 78 | 1 | 4–5× faster than Sonnet 4.5; cheap multi-agent driver |
| NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16 live | NVIDIA | 560.5 | open | N/A | 2026-06-10 | N/A | N/A | N/A | N/A | N/A | N/A | NVIDIA's 550B MoE model with novel Latent-MoE architecture and multi-token prediction, frontier-scale pretraining from a major lab. |
| Qwen3-Coder-Next live | Alibaba | 79.7 | open | N/A | 2026-02-03 | N/A | N/A | N/A | N/A | N/A | N/A | Qwen3-Coder-Next introduces a genuinely novel hybrid architecture combining Gated DeltaNet linear attention with MoE (80B total/3B active params) and 256k context, representing a significant architectural innovation for frontier coding agents beyond standard transformer designs. |
| claude-opus-4.8 live | Anthropic | N/A | closed | N/A | 2026-05-27 | N/A | N/A | N/A | N/A | N/A | N/A | Anthropic's flagship Opus-class model with 1M context, multimodal inputs, and reasoning support represents a frontier-tier general-purpose system from a major lab. |
| gemini-3.5-flash live | N/A | closed | N/A | 2026-05-19 | N/A | N/A | N/A | N/A | N/A | N/A | Google Gemini 3.5 Flash is a frontier-class multimodal model from a major lab with a 1M token context, strong coding/reasoning capabilities, and competitive positioning against top-tier models at efficiency pricing. | |
| MiMo-V2.5-Pro-FP4-DFlash live | Xiaomi | 554.3 | open | N/A | 2026-06-08 | N/A | N/A | N/A | N/A | N/A | N/A | MiMo-V2.5-Pro is a 554B MoE model from Xiaomi with notable inference innovations (MXFP4 quantization + block-diffusion speculative decoding), representing frontier-scale deployment engineering with meaningful novelty. |
| NVIDIA-Nemotron-3-Ultra-550B-A55B-NVFP4 live | NVIDIA | 335 | open | N/A | 2026-06-10 | N/A | N/A | N/A | N/A | N/A | N/A | NVIDIA's 550B-parameter sparse MoE (55B active) with novel Nemotron-H hybrid architecture, latent-MoE, and MTP represents a frontier-class general-purpose model from a major lab with genuine architectural innovation. |
| claude-fable-latest live | Anthropic | N/A | closed | N/A | 2026-06-09 | N/A | N/A | N/A | N/A | N/A | N/A | Anthropic's Claude Fable is a frontier-class proprietary model with 1M context and premium pricing indicative of large-scale deployment, warranting inclusion despite unknown parameter count. |
| claude-fable-5 live | Anthropic | N/A | closed | N/A | 2026-06-09 | N/A | N/A | N/A | N/A | N/A | N/A | Frontier-class Anthropic model with 1M context, multimodal inputs, reasoning support, and premium pricing consistent with a major general-purpose release. |
| Minimax-M3 live | MiniMax | N/A | open | N/A | 2026-05-31 | N/A | N/A | N/A | N/A | N/A | N/A | MiniMax-M3 is a frontier-class multimodal model with a 1M-token context window, video understanding, and agentic capabilities from a major lab, representing meaningful scale and capability advancement. |
| grok-4.3 live | xAI | N/A | closed | N/A | 2026-04-30 | N/A | N/A | N/A | N/A | N/A | N/A | Grok 4.3 is a frontier-class reasoning model from xAI with 1M context, multimodal inputs, and agentic capabilities, representing a major lab release warranting inclusion. |
| gemini-pro-latest live | N/A | closed | N/A | 2026-04-27 | N/A | N/A | N/A | N/A | N/A | N/A | Google Gemini Pro latest is a frontier-class general-purpose model with 1M context window, proprietary scale, and represents Google's leading production offering. | |
| gemini-flash-latest live | N/A | closed | N/A | 2026-04-27 | N/A | N/A | N/A | N/A | N/A | N/A | Google Gemini Flash latest is a frontier-class general-purpose model from a major lab with 1M token context, representing a significant production deployment with ongoing updates. | |
| claude-sonnet-latest live | Anthropic | N/A | closed | N/A | 2026-04-27 | N/A | N/A | N/A | N/A | N/A | N/A | Claude Sonnet represents Anthropic's frontier-class general-purpose model family with 1M context, competitive pricing, and strong capabilities placing it among top-tier commercial models. |
| qwen3.7-max live | Alibaba | N/A | closed | N/A | 2026-05-21 | N/A | N/A | N/A | N/A | N/A | N/A | Qwen3.7-Max is a flagship frontier model from Alibaba with 1M context, agent-centric capabilities, and competitive pricing, representing a significant general-purpose release warranting inclusion. |
| gemini-3.1-flash-lite live | N/A | closed | N/A | 2026-05-07 | N/A | N/A | N/A | N/A | N/A | N/A | Google Gemini 3.1 Flash Lite is a frontier-class multimodal model from a major lab with million-token context and broad modality support, warranting inclusion despite being an efficiency-tier variant. | |
| ring-2.6-1t live | InclusionAI | 63 | closed | N/A | 2026-05-08 | N/A | N/A | N/A | N/A | N/A | N/A | 1T-parameter MoE with 63B active params featuring 262K context optimized for agentic workflows represents meaningful scale and architectural interest at the frontier. |
| mistral-medium-3-5 live | Mistral | 128 | closed | N/A | 2026-04-30 | N/A | N/A | N/A | N/A | N/A | N/A | Mistral Medium 3.5 is a frontier-class 128B dense multimodal model from a major lab with strong general-purpose capabilities, long context, and competitive pricing positioning it as a significant release. |
| gpt-latest live trusted-lab | OpenAI | N/A | closed | N/A | 2026-04-27 | N/A | N/A | N/A | N/A | N/A | N/A | Likely a frontier-class OpenAI model with high significance and scale, but the alias/redirect nature and unknown params prevent confident composite scoring or novelty assessment. |
| diffusiongemma-26B-A4B-it-NVFP4 live trusted-lab | NVIDIA | 14.4 | open | N/A | 2026-06-11 | N/A | N/A | N/A | N/A | N/A | N/A | DiffusionGemma represents a genuinely novel discrete diffusion-based LLM architecture from Google DeepMind with MoE and parallel token generation, but this specific entry is an NVIDIA NVFP4 quantized repackage of the base model, reducing novelty credit slightly; composite=20 falls just below include threshold. |
| Kimi-K2.7-Code live trusted-lab | Moonshot | N/A | open | N/A | 2026-06-12 | N/A | N/A | N/A | N/A | N/A | N/A | Coding-focused MoE variant from a credible lab with 262K context and multimodal capabilities, but unknown parameter count and niche specialization prevent confident frontier-class inclusion. |
| qwen3.7-plus live trusted-lab | Alibaba | N/A | closed | N/A | 2026-06-03 | N/A | N/A | N/A | N/A | N/A | N/A | Qwen3.7-Plus appears to be a capable multimodal model from a major lab with 1M context, but unknown parameter count, proprietary/closed license, and 'cost-effective' positioning suggest it's a mid-tier offering rather than frontier-class, warranting review rather than automatic inclusion. |
| kimi-latest live trusted-lab | Moonshot | N/A | closed | N/A | 2026-04-27 | N/A | N/A | N/A | N/A | N/A | N/A | Kimi is a notable frontier-class model from MoonshotAI with a very long 262K context window, but unknown parameter count, proprietary nature, and the 'latest' alias routing make definitive scoring difficult. |
| nemotron-3-nano-omni-30b-a3b-reasoning:free live trusted-lab | NVIDIA | 30 | closed | N/A | 2026-04-28 | N/A | N/A | N/A | N/A | N/A | N/A | 30B MoE multimodal reasoning model from NVIDIA with broad modality support and long context is notable but falls short of frontier-class scale and lacks sufficient architectural detail to confirm top-tier novelty. |
| DeepSeek-V4-Flash-NVFP4 live trusted-lab | NVIDIA | 166.7 | open | N/A | 2026-06-10 | N/A | N/A | N/A | N/A | N/A | N/A | NVFP4 quantization of DeepSeek-V4-Flash is an optimization/repackage with modest novelty, but the 284B MoE base model and NVIDIA's production-grade quantization tooling give it meaningful significance and scale just below the include threshold. |
| gpt-chat-latest live trusted-lab | OpenAI | N/A | closed | N/A | 2026-05-05 | N/A | N/A | N/A | N/A | N/A | N/A | An alias to OpenAI's latest ChatGPT model carries frontier significance but lacks transparency on params, architecture novelty, or confirmed scale, making definitive scoring unreliable. |
Capabilities key: V Vision (image input) A Audio T Tools / function calling R Extended reasoning C Computer / browser use
top tier mid lower *Reasoning is a composite score (GPQA Diamond / HLE / AIME / ARC-AGI-2, normalized 0–100, directional).
Video generation models
License I2V rank (1 = best)
No models match.
| Model | Vendor | License | Released | I2V Rank | Max len (s) | Resolution | Audio | $/gen | Best for |
|---|---|---|---|---|---|---|---|---|---|
| Kling 3.0 | Kuaishou | closed | 2026-02 | 1 | 15 (multi-shot) | 1080p | synced dialogue + SFX | 1.00 | Cinematic shots; #1 general-purpose |
| Veo 3.1 | closed | 2026-01 | 2 | 8 | 1080p | yes | 0.75 | High-fidelity realism, prompt adherence | |
| Sora 2 | OpenAI | closed | 2025-10 | 3 | 12 | 1080p | yes | 0.90 | Imaginative T2V; ChatGPT-integrated |
| Seedance 2.0 | ByteDance | closed | 2026-04 | 4 | 15 | 1080p | yes | 0.50 | Product ads, e-comm, character consistency |
| LTX-2 | Lightricks | open | 2025-Q4 | 5 | 10 | 4K@50fps | native sync | 0.20 | Open-weights leader; 4K + audio |
| Wan 2.2 | Alibaba | open | 2025-Q3 | 6 | 6 | 720p | N/A | 0.05 | Runs on a 4070; novel MoE denoiser |
Image generation models
License Quality = directional leaderboard score
No models match.
| Model | Vendor | License | Released | Max res | Quality* | Speed (s) | $/img | Best for |
|---|---|---|---|---|---|---|---|---|
| Midjourney v8 | Midjourney | closed | 2026-03 | 2K | 95 | 10 | 0.04 | Aesthetic king · rewritten engine, 5× faster than v7 |
| FLUX.2 [pro] | Black Forest Labs | closed (API) | 2026-Q1 | 2K | 93 | 4.5 | 0.04 | Photoreal commercial: best skin, lighting, materials |
| FLUX.2 [dev] | Black Forest Labs | open weights | 2026-Q1 | 2K | 89 | 6 | 0 | Top open-weights photoreal model |
| GPT Image 2 | OpenAI | closed | 2026-Q1 | 1K+ | 92 | 6 | 0.04 | Best prompt adherence: complex composed scenes |
| Imagen 4 Ultra | closed | 2025-Q4 | 2K | 94 | 5 | 0.04 | Photoreal flagship; strong text rendering | |
| Imagen 4 Fast | closed | 2026-03 | 1K | 86 | 2 | 0.02 | Cheapest fast quality at $0.02/img | |
| Nano Banana 2 | Google (Gemini 3.1 Flash Image) | closed | 2026-03 | 1K | 82 | 1.5 | 0.02 | Fastest end-to-end (~1–3s) · in-chat editing |
| Recraft V4 | Recraft | closed | 2025-Q4 | 2K | 88 | 5 | 0.04 | Brand / design assets, vector-style outputs |
| Ideogram 3 | Ideogram | closed | 2025-Q4 | 1K | 84 | 5 | 0.03 | Typography & in-image text rendering |
| Seedream 4.5 | ByteDance | closed | 2026-Q1 | 2K | 90 | 5 | 0.03 | Asian aesthetics, character consistency |
| Stable Diffusion 4 | Stability AI | open | 2026-Q1 | 2K | 80 | 6 | 0 | Open ecosystem · ControlNet / LoRA backbone |
| Hunyuan Image 3 | Tencent | open | 2026-Q1 | 2K | 83 | 5 | 0 | Open Tencent flagship; bilingual prompts |