
Cursor Composer 2.5: 79.8% SWE-Bench at Under $1/Task
Cursor's Composer 2.5 hits 79.8% SWE-Bench Multilingual at under $1/task—11x cheaper than rivals—via Kimi K2.5 fine-tuned on 25x more synthetic tasks.

Cursor's Composer 2.5 hits 79.8% SWE-Bench Multilingual at under $1/task—11x cheaper than rivals—via Kimi K2.5 fine-tuned on 25x more synthetic tasks.

OpenAI extends Codex to iOS and Android on all plans, letting developers monitor and redirect multi-step coding tasks while away from their computer.

LangChain's most ambitious product day: Engine auto-fixes agent traces, SmithDB delivers 12x observability performance, 5 more products at Interrupt 2026.

OpenAI's Daybreak combines frontier models and Codex to automate vulnerability detection, backlog triage, and continuous software security at enterprise scale.

Anthropic ships Agent View for Claude Code: a unified terminal dashboard for managing parallel sessions with status indicators, inline replies, and /goal.

Anthropic ships dreaming, outcomes, and multi-agent orchestration to Claude Managed Agents — the first production off-cycle memory consolidation for long-running agents.

Claude Code hits 100+ skills and Google Gemma 4 independently adopts the same skill.md format — cross-vendor agent extension standardization is happening.

Anthropic's Claude for Creative Work lets Claude operate inside Adobe Creative Cloud, Blender, Autodesk Fusion, Ableton, and Canva — moving AI from content generator to software operator.

Warp's full Rust codebase lands on GitHub under AGPL v3 with an agent-first contribution model: Oz handles coding, you review the output. OpenAI sponsors.

Anthropic moves Claude Security to public beta for Enterprise — Opus 4.7 scans codebases, validates findings, and suggests patches, no integration required.

Anthropic launches 8 MCP-based connectors for Claude: Blender, Adobe CC, Autodesk Fusion, Ableton, and more. Any MCP-compatible model can use them.

GitHub Next's ACE puts multiplayer microVM sessions at the centre of agent-driven coding — making team alignment, not implementation, the bottleneck.

Matt Pocock's two-hour AI Engineer workshop argues 30-year-old software fundamentals matter more under AI, not less — and outlines a complete methodology to prove it.

Anthropic published a post-mortem on three sequential Claude Code harness changes from March–April that degraded output quality, fixed in v2.1.116+.

ml-intern reads arXiv, cleans datasets, runs SFT/GRPO, diagnoses failures, and iterates — pushing GPQA from 10% to 32% in under 10 hours for roughly $1 of compute.
Anthropic's Claude Enterprise tier now includes cross-session persistent memory, bringing it into direct competition with OpenAI's newly announced GPT-5.1 memory features.
Curated AI insights — sent when there's something worth your inbox.