Generative AI Model Ranking Matrix

Living benchmark · refreshed nightly

Frontier language, video, and image generation models compared side by side: benchmarks, capabilities, release dates, and price.

Updated

Frontier LLMs: coding & reasoning

License Caps / filter 13 tabs
Model Vendor Params (B) License Capabilities Released Context SWE-Bench Verified SWE-Bench Pro Terminal-Bench Reasoning* $/M in Best for
Mistral Medium 3.5 Mistral 128 open weights
TR
2026-04-30 128k 77.6 N/A N/A 80 0.6 128B · SWE-V 77.6 beats Devstral 2 + Qwen3.5; τ³-Telecom 91.4
Kimi K2.6 Moonshot 1000 open
VTR
2026-04-29 256k 80.2 58.6 66.7 94 0.6 1.1T MoE · ties GPT-5.5 on SWE-Pro at $0.6/M; HLE 54 leads all
MiMo-V2.5-Pro Xiaomi 1023.2 open
TR
2026-04-29 1M N/A 57.2 N/A 89 N/A 1.02T MoE / 42B active · matches Opus 4.6 on SWE-Pro w/ 40% fewer tokens
Gemma 4 31B-it Google 31 open
VT
2026-04-29 256k 52 N/A N/A 87 N/A 31B dense · AIME 89.2, GPQA 84.3, τ²-Retail 86.4 (12× jump from G3)
Nemotron-3-Nano-Omni NVIDIA 30 open
VATR
2026-04-29 128k N/A N/A N/A N/A N/A Fresh (~17h) · 30B-A3B any-to-any reasoning
DeepSeek V4-Pro DeepSeek 861.6 open
TR
2026-04-27 128k 80.6 55.4 67.9 92 0.9 1.6T MoE / 49B active · matches Opus 4.6 at fraction of cost
DeepSeek V4-Flash DeepSeek 158.1 open
TR
2026-04-27 128k N/A N/A N/A N/A 0.3 Fresh · 158B fast variant of V4
GPT-5.5 OpenAI N/A closed
VATRC
2026-04-24 1M 87.6 58.6 82.7 95 5 SOTA Terminal-Bench 82.7; GDPval 84.9; OSWorld 78.7; ties Opus 4.7 on SWE-V
Qwen3.5-397B-A17B Alibaba 397 open
VTR
2026-04-24 256k 80 N/A 54 92 N/A 403B MoE / 17B active · GPQA 88.4, MMLU-Pro 87.8, SWE-V 80.0
Qwen3.6-35B-A3B Alibaba 35 open
VTR
2026-04-24 256k 73.4 49.5 N/A 84 N/A 36B / 3B active · 73.4 SWE-V on tiny active params
Tencent Hy3-preview Tencent 298.8 open (preview)
VTR
2026-04-24 128k 74.4 N/A 54.4 88 N/A 295B / 21B active · +40pts SWE-V vs Hy2; topped Tsinghua math PhD exam
Qwen3.6-Max-Preview Alibaba N/A closed
VTR
2026-04-20 1M 79.5 N/A 77.1 89 6 Agent loops: preserve_thinking across tool calls
MiniMax M2.7 MiniMax 228.7 open
TR
2026-04-20 205k 78 56.22 57 87 0.3 229B · self-evolving; matches Codex on SWE-Pro
Claude Opus 4.7 Anthropic N/A closed
VTRC
2026-04-16 1M 87.6 64.3 69.4 95 5 Leads SWE-Bench Verified (87.6) and Pro (64.3); new tokenizer (~35% more tokens)
GLM-5.1 Zhipu / Z.ai 753.9 open (MIT)
VTR
2026-04-07 128k 79 58.4 N/A 88 0.11 754B · #1 SWE-Bench Pro among open weights
Meta Avocado Meta N/A rumored / closed tbd 2026-Q2 (est.) N/A N/A N/A N/A N/A N/A Llama successor, slipping to May/Jun 2026
Grok 4.20 Beta 2 xAI N/A closed
VT
2026-03-03 256k 75 N/A N/A 80 2 4-agent backbone · IFBench #1 (83); 8% of Opus cost
GPT-5.3 Codex OpenAI N/A closed
VTRC
2026-03 400k 80 56.8 77.3 91 10 Full SDLC agent: debug, terminal, PRDs, tests
Gemini 3.1 Pro Google N/A closed
VATRC
2026-03 2M 78 N/A N/A 94 7 Adjustable Deep Think; agentic browsing (BrowseComp 85.9)
Devstral-2-123B Mistral 123 open weights
TC
2026-02-25 128k N/A N/A N/A N/A 0.6 125B · code-specialized variant
Claude Opus 4.6 Anthropic N/A closed
VTRC
2026-02 1M 80.8 57.3 N/A 92 15 Large-codebase reasoning, multi-file refactors
Mistral Large 3 Mistral 675 open (Apache 2.0)
VT
2025-12-02 256k N/A N/A N/A 50 2 675B MoE / 41B active · non-reasoning (AIME ~40, GPQA ~44) · Apache 2.0
Claude Sonnet 4.6 Anthropic N/A closed
VTRC
2025-12 1M 79.6 N/A N/A 86 3 Default coding driver. 98% of Opus at ⅕ cost
Claude Haiku 4.5 Anthropic N/A closed
VTRC
2025-10-15 200k 73.3 N/A N/A 78 1 4–5× faster than Sonnet 4.5; cheap multi-agent driver
NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16 live NVIDIA 560.5 open N/A 2026-06-10 N/A N/A N/A N/A N/A N/A NVIDIA's 550B MoE model with novel Latent-MoE architecture and multi-token prediction, frontier-scale pretraining from a major lab.
Qwen3-Coder-Next live Alibaba 79.7 open N/A 2026-02-03 N/A N/A N/A N/A N/A N/A Qwen3-Coder-Next introduces a genuinely novel hybrid architecture combining Gated DeltaNet linear attention with MoE (80B total/3B active params) and 256k context, representing a significant architectural innovation for frontier coding agents beyond standard transformer designs.
claude-opus-4.8 live Anthropic N/A closed N/A 2026-05-27 N/A N/A N/A N/A N/A N/A Anthropic's flagship Opus-class model with 1M context, multimodal inputs, and reasoning support represents a frontier-tier general-purpose system from a major lab.
gemini-3.5-flash live Google N/A closed N/A 2026-05-19 N/A N/A N/A N/A N/A N/A Google Gemini 3.5 Flash is a frontier-class multimodal model from a major lab with a 1M token context, strong coding/reasoning capabilities, and competitive positioning against top-tier models at efficiency pricing.
MiMo-V2.5-Pro-FP4-DFlash live Xiaomi 554.3 open N/A 2026-06-08 N/A N/A N/A N/A N/A N/A MiMo-V2.5-Pro is a 554B MoE model from Xiaomi with notable inference innovations (MXFP4 quantization + block-diffusion speculative decoding), representing frontier-scale deployment engineering with meaningful novelty.
NVIDIA-Nemotron-3-Ultra-550B-A55B-NVFP4 live NVIDIA 335 open N/A 2026-06-10 N/A N/A N/A N/A N/A N/A NVIDIA's 550B-parameter sparse MoE (55B active) with novel Nemotron-H hybrid architecture, latent-MoE, and MTP represents a frontier-class general-purpose model from a major lab with genuine architectural innovation.
claude-fable-latest live Anthropic N/A closed N/A 2026-06-09 N/A N/A N/A N/A N/A N/A Anthropic's Claude Fable is a frontier-class proprietary model with 1M context and premium pricing indicative of large-scale deployment, warranting inclusion despite unknown parameter count.
claude-fable-5 live Anthropic N/A closed N/A 2026-06-09 N/A N/A N/A N/A N/A N/A Frontier-class Anthropic model with 1M context, multimodal inputs, reasoning support, and premium pricing consistent with a major general-purpose release.
Minimax-M3 live MiniMax N/A open N/A 2026-05-31 N/A N/A N/A N/A N/A N/A MiniMax-M3 is a frontier-class multimodal model with a 1M-token context window, video understanding, and agentic capabilities from a major lab, representing meaningful scale and capability advancement.
grok-4.3 live xAI N/A closed N/A 2026-04-30 N/A N/A N/A N/A N/A N/A Grok 4.3 is a frontier-class reasoning model from xAI with 1M context, multimodal inputs, and agentic capabilities, representing a major lab release warranting inclusion.
gemini-pro-latest live Google N/A closed N/A 2026-04-27 N/A N/A N/A N/A N/A N/A Google Gemini Pro latest is a frontier-class general-purpose model with 1M context window, proprietary scale, and represents Google's leading production offering.
gemini-flash-latest live Google N/A closed N/A 2026-04-27 N/A N/A N/A N/A N/A N/A Google Gemini Flash latest is a frontier-class general-purpose model from a major lab with 1M token context, representing a significant production deployment with ongoing updates.
claude-sonnet-latest live Anthropic N/A closed N/A 2026-04-27 N/A N/A N/A N/A N/A N/A Claude Sonnet represents Anthropic's frontier-class general-purpose model family with 1M context, competitive pricing, and strong capabilities placing it among top-tier commercial models.
qwen3.7-max live Alibaba N/A closed N/A 2026-05-21 N/A N/A N/A N/A N/A N/A Qwen3.7-Max is a flagship frontier model from Alibaba with 1M context, agent-centric capabilities, and competitive pricing, representing a significant general-purpose release warranting inclusion.
gemini-3.1-flash-lite live Google N/A closed N/A 2026-05-07 N/A N/A N/A N/A N/A N/A Google Gemini 3.1 Flash Lite is a frontier-class multimodal model from a major lab with million-token context and broad modality support, warranting inclusion despite being an efficiency-tier variant.
ring-2.6-1t live InclusionAI 63 closed N/A 2026-05-08 N/A N/A N/A N/A N/A N/A 1T-parameter MoE with 63B active params featuring 262K context optimized for agentic workflows represents meaningful scale and architectural interest at the frontier.
mistral-medium-3-5 live Mistral 128 closed N/A 2026-04-30 N/A N/A N/A N/A N/A N/A Mistral Medium 3.5 is a frontier-class 128B dense multimodal model from a major lab with strong general-purpose capabilities, long context, and competitive pricing positioning it as a significant release.
gpt-latest live trusted-lab OpenAI N/A closed N/A 2026-04-27 N/A N/A N/A N/A N/A N/A Likely a frontier-class OpenAI model with high significance and scale, but the alias/redirect nature and unknown params prevent confident composite scoring or novelty assessment.
diffusiongemma-26B-A4B-it-NVFP4 live trusted-lab NVIDIA 14.4 open N/A 2026-06-11 N/A N/A N/A N/A N/A N/A DiffusionGemma represents a genuinely novel discrete diffusion-based LLM architecture from Google DeepMind with MoE and parallel token generation, but this specific entry is an NVIDIA NVFP4 quantized repackage of the base model, reducing novelty credit slightly; composite=20 falls just below include threshold.
Kimi-K2.7-Code live trusted-lab Moonshot N/A open N/A 2026-06-12 N/A N/A N/A N/A N/A N/A Coding-focused MoE variant from a credible lab with 262K context and multimodal capabilities, but unknown parameter count and niche specialization prevent confident frontier-class inclusion.
qwen3.7-plus live trusted-lab Alibaba N/A closed N/A 2026-06-03 N/A N/A N/A N/A N/A N/A Qwen3.7-Plus appears to be a capable multimodal model from a major lab with 1M context, but unknown parameter count, proprietary/closed license, and 'cost-effective' positioning suggest it's a mid-tier offering rather than frontier-class, warranting review rather than automatic inclusion.
kimi-latest live trusted-lab Moonshot N/A closed N/A 2026-04-27 N/A N/A N/A N/A N/A N/A Kimi is a notable frontier-class model from MoonshotAI with a very long 262K context window, but unknown parameter count, proprietary nature, and the 'latest' alias routing make definitive scoring difficult.
nemotron-3-nano-omni-30b-a3b-reasoning:free live trusted-lab NVIDIA 30 closed N/A 2026-04-28 N/A N/A N/A N/A N/A N/A 30B MoE multimodal reasoning model from NVIDIA with broad modality support and long context is notable but falls short of frontier-class scale and lacks sufficient architectural detail to confirm top-tier novelty.
DeepSeek-V4-Flash-NVFP4 live trusted-lab NVIDIA 166.7 open N/A 2026-06-10 N/A N/A N/A N/A N/A N/A NVFP4 quantization of DeepSeek-V4-Flash is an optimization/repackage with modest novelty, but the 284B MoE base model and NVIDIA's production-grade quantization tooling give it meaningful significance and scale just below the include threshold.
gpt-chat-latest live trusted-lab OpenAI N/A closed N/A 2026-05-05 N/A N/A N/A N/A N/A N/A An alias to OpenAI's latest ChatGPT model carries frontier significance but lacks transparency on params, architecture novelty, or confirmed scale, making definitive scoring unreliable.
Capabilities key: V Vision (image input) A Audio T Tools / function calling R Extended reasoning C Computer / browser use
top tier mid lower *Reasoning is a composite score (GPQA Diamond / HLE / AIME / ARC-AGI-2, normalized 0–100, directional).

Video generation models

License I2V rank (1 = best)
Model Vendor License Released I2V Rank Max len (s) Resolution Audio $/gen Best for
Kling 3.0 Kuaishou closed 2026-02 1 15 (multi-shot) 1080p synced dialogue + SFX 1.00 Cinematic shots; #1 general-purpose
Veo 3.1 Google closed 2026-01 2 8 1080p yes 0.75 High-fidelity realism, prompt adherence
Sora 2 OpenAI closed 2025-10 3 12 1080p yes 0.90 Imaginative T2V; ChatGPT-integrated
Seedance 2.0 ByteDance closed 2026-04 4 15 1080p yes 0.50 Product ads, e-comm, character consistency
LTX-2 Lightricks open 2025-Q4 5 10 4K@50fps native sync 0.20 Open-weights leader; 4K + audio
Wan 2.2 Alibaba open 2025-Q3 6 6 720p N/A 0.05 Runs on a 4070; novel MoE denoiser

Image generation models

License Quality = directional leaderboard score
Model Vendor License Released Max res Quality* Speed (s) $/img Best for
Midjourney v8Midjourneyclosed2026-032K95100.04Aesthetic king · rewritten engine, 5× faster than v7
FLUX.2 [pro]Black Forest Labsclosed (API)2026-Q12K934.50.04Photoreal commercial: best skin, lighting, materials
FLUX.2 [dev]Black Forest Labsopen weights2026-Q12K8960Top open-weights photoreal model
GPT Image 2OpenAIclosed2026-Q11K+9260.04Best prompt adherence: complex composed scenes
Imagen 4 UltraGoogleclosed2025-Q42K9450.04Photoreal flagship; strong text rendering
Imagen 4 FastGoogleclosed2026-031K8620.02Cheapest fast quality at $0.02/img
Nano Banana 2Google (Gemini 3.1 Flash Image)closed2026-031K821.50.02Fastest end-to-end (~1–3s) · in-chat editing
Recraft V4Recraftclosed2025-Q42K8850.04Brand / design assets, vector-style outputs
Ideogram 3Ideogramclosed2025-Q41K8450.03Typography & in-image text rendering
Seedream 4.5ByteDanceclosed2026-Q12K9050.03Asian aesthetics, character consistency
Stable Diffusion 4Stability AIopen2026-Q12K8060Open ecosystem · ControlNet / LoRA backbone
Hunyuan Image 3Tencentopen2026-Q12K8350Open Tencent flagship; bilingual prompts