Galactai Intelligence Index

VERIFIED · MARCH 2026

Benchmarks sourced from official model cards and technical reports. Cloud models shown for reference only — open-weight models ranked by Galactai Universal Index. Open-weight figures use confirmed primary-source numbers; estimates are clearly marked †. VRAM includes ~15% KV-cache & framework overhead. Scroll down for Galactai Universal Index, Galactai Coding Index and Galactai ClawBot Index — composite rankings computed from the benchmark data.

Cloud-only API
Open-weight (locally runnable)
Estimated / unverified

Performance Radar

6 axes, all normalised 0–100. Click any node to see model details. Dashed line = missing or estimated data. † = estimated value in popup.

Model Company Released Architecture License MMLU-Pro
Reasoning
SWE-bench
Real coding
AIME 2025
Math
Context VRAM BF16
Full precision
VRAM INT4
4-bit quant
RAM INT4
CPU inference
Loading data…
Computing indexes…

Quantization Performance Impact

BF16
Full precision
Baseline
2.0 B/param
Speed: 1×
INT8
8-bit
< 2% loss
1.0 B/param
Speed: ~1.5×
INT4
4-bit Q4_K_M
1–3% loss*
0.5 B/param
Speed: ~2.7×
INT3
3-bit or lower
8–15% loss
0.375 B/param
Not recommended

* For large MoE models (≥200B), INT4 loss is typically <2% on MMLU-Pro. Smaller models (7–13B) show higher variance. Exception: Kimi K2.5 Thinking uses QAT (quantization-aware training) for lossless INT4. Speed multiplier measured on single H100 vs BF16.

Practical Deployment Rules

  • Large MoE + INT4 = sweet spot. For 200B+ MoE models, MMLU-Pro degrades only ~1–2% while VRAM drops 4× and throughput roughly doubles. Use Q4_K_M (GGUF) for Ollama/llama.cpp; GPTQ or AWQ for GPU-only stacks.
  • Dense models suffer more at INT4. MiniMax M2.5 (dense, 230B) fits in ~132 GB INT4 — 4× H100s — and is the only model here feasible on a small 4-GPU cloud instance. Dense = higher coherence per GB of active VRAM.
  • Kimi K2.5 is the outlier. At 1.04T parameters, INT4 still needs ~598 GB (~8× H100 80GB). Practical for cloud but not local. Its QAT-trained Thinking variant offers lossless INT4 — unique among these models.
  • MoE active-param cost ≠ total-param cost. DeepSeek V3.2 (671B, 37B active) generates tokens at similar speed to a ~37B dense model. Total VRAM still needs to hold all expert weights.
  • Never run BF16 locally for these models. The smallest here (MiniMax 230B) requires ~529 GB BF16 — that's 7× H100 80GB just for weights. Always quantize. FP8 is a good middle ground on H100/H200 hardware.

⚠ Changes

🔴
Kimi K2.5 context window
1M tokens256K tokens
Official HuggingFace card specifies 256K. 1M was incorrect.
🔴
Kimi K2.5 architecture
Unknown / "~1.2T"1.04T total, 32B active
Confirmed: 384 experts, 8 activated per token (Kimi blog, Jan 2026).
🔴
DeepSeek V3.2 SWE-bench
72.8% (estimated)70.0% (official HF card)
Confirmed from HuggingFace leaderboard submission.
🟡
DeepSeek V3.2 AIME 2025
89.3 (Exp version)93.1 (official V3.2 thinking)
arXiv paper Table 3 shows 93.1 for DeepSeek-V3.2 thinking mode.
🔴
Llama 4 Maverick MMLU-Pro
92.1 (unsourced)59.6 (official model card)
April 2025 model; MMLU-Pro significantly lower than newer models.
🔴
Llama 4 Maverick AIME / SWE-bench
91.0 / 70% (unsourced)Not officially published
Meta did not report AIME 2025 or SWE-bench in official benchmarks.
🟡
Kimi K2.5 SWE-bench
~80% (estimated)70.8% (official HF card)
Confirmed from official Kimi K2.5 HuggingFace model card.
🟢
Qwen 3.5 Max — all figures confirmed
MMLU-Pro=87.8, SWE=76.4, AIME=91.3 confirmed via NVIDIA NIM card & Hugging Face.
Confirmed from official model card, arXiv paper, or NVIDIA NIM card. · Estimated or unverified — not from a primary source. · !Corrected from previous version. · *Condition applies — see tooltip. · SWE-benchVerified (real GitHub issues) is the primary coding benchmark; HumanEval excluded as saturated at this tier. · AIME2025 pass@1 unless noted. DeepSeek V3.2 score is thinking mode. · VRAMBF16 ≈ params × 2.3 GB; INT4 ≈ params × 0.58 GB (incl. 15% overhead).

Cloud models need not apply.

The Galactai ClawBot Index is the AI benchmark where running on your hardware is not optional — it's the entire point. No cloud. No subscription. No compromise. If it can't run locally, it doesn't qualify.

📎 Cite this index

Academic reference (CC BY 4.0 — free to use with attribution)

Galactai (2026). Galactai Intelligence Index: LLM benchmark for local deployment. galactai.com. Retrieved from https://galactai.com/
@misc{galactai2026,
  title  = {{Galactai Intelligence Index}},
  author = {{Galactai}},
  year   = {2026},
  url    = {https://galactai.com/},
  note   = {Open benchmark dataset, CC BY 4.0}
}

Raw data: galactai.com/data.json (JSON, CC BY 4.0)

🔗 Embed on your page

Live benchmark — always reflects the latest data

<iframe
  src="https://galactai.com/"
  width="100%" height="700"
  style="border:none;border-radius:12px"
  title="Galactai Intelligence Index">
</iframe>

Every embed and citation helps keep this benchmark free, independent, and open. Citations in blog posts and research create organic reach — the dataset's CC BY 4.0 license requires attribution, naturally propagating the Galactai name.