Cursor Composer 2.5: 79.8% SWE-Bench at Under $1 per Task
Cursor Composer 2.5 hits 79.8% SWE-Bench Multilingual at under $1/task, matching frontier coding benchmarks at 11× lower cost than competitors.
Cursor Composer 2.5 hits 79.8% SWE-Bench Multilingual at under $1/task, matching frontier coding benchmarks at 11× lower cost than competitors.

Cursor's Composer 2.5 hits 79.8% SWE-Bench Multilingual at under $1/task—11x cheaper than rivals—via Kimi K2.5 fine-tuned on 25x more synthetic tasks.
Google Research's ReasoningBank separates success and failure trajectory memory for agents, yielding +8.3pp on WebArena and 57.4% on SWE-Bench with +4.3% token overhead.
Poolside AI's Laguna XS.2, a 33B MoE coding agent model, launches as Apache 2.0 and ranks #12 on SWE-Bench Pro.
TACO reduces agentic terminal agent token overhead by ~10% on SWE-Bench by learning trajectory-derived compression rules for long-horizon reasoning.
Alibaba's Apache 2.0 27B model outperforms Qwen3.5-397B-A17B on all major coding tasks and runs locally on 18 GB RAM — 'bye bye subscription era' claims are spreading.
Curated AI insights — sent when there's something worth your inbox.