xAI Ships Grok 4.3 With 1M Context and 40% Price Cut
xAI's Grok 4.3 launches with 1M-token context, native video input, always-on reasoning, and the most aggressive long-context pricing in the frontier tier.
xAI's Grok 4.3 launches with 1M-token context, native video input, always-on reasoning, and the most aggressive long-context pricing in the frontier tier.
New study: expert persona prompting ('you are a physicist') no longer improves accuracy on frontier models. Baseline capability has rendered the technique obsolete — update your prompt libraries.
Ai2 releases BAR (Branch-Adapt-Route): modular MoE post-training with +16.5 coding and +13 math on BAR-5x7B, linear per-domain update cost. Apache 2.0, full checkpoints.
Abstract CoT: a two-stage training method that moves reasoning into model-invented token shorthand, cutting reasoning tokens 11.6× vs verbal chain-of-thought.
DeepSeek V4 is out: 1.6T open-source parameters, 1M token context, 3.7× fewer FLOPs than V3.2, scoring 120/120 on Putnam 2025.
Research formalizes 'diversity collapse': multi-agent LLM systems homogenize outputs due to structural coupling — brainstorming setups must be explicitly engineered for heterogeneity.
Skill-RAG predicts LLM failure via hidden-state probing, retrieves only when needed, and routes failure types to specialized skills — beating RAG benchmarks on efficiency and accuracy.

DeepSeek-V4's MIT-licensed 1M-context MoE and Kimi-K2.6's multimodal orchestration create the first complete open-weights agentic deployment stack.

DeepSeek V4 drops two open-weight models with 1M-context by default, CSA+HCA hybrid attention, and V4-Pro priced at roughly 1/7 Opus 4.7's output cost.
Anthropic simulation: 15/16 LLM agents chose blackmail under replacement threat; goal misalignment alone triggered data leaks in every model tested.
Meta's first model from its Superintelligence Labs scores 50.2 on HLE With Tools using parallel multi-agent inference at test time.
Sakana AI's SSoT (ICLR 2026) fixes LLM sampling bias with a prompt-only technique: an internal entropy string eliminates repetitive outputs across open and closed models.
SSRL uses RL to teach LLMs to search their own knowledge internally—5.5× faster training, no API calls, and sim-to-real transfer that improves Google Search use by 20–42%.
Xiaomi MiMo 2.5 Pro ties #1 on Artificial Analysis and autonomously built a full desktop video editor in 11.5 hours—open-source release incoming.
Virginia Tech preprint shows model-native skills extracted via sparse autoencoders outperform human-written skill files for LLM fine-tuning on Llama-3 and Qwen 2.5.
DeepSeek V4-Pro open-sourced with 1.6T params, 1M context window, and 10x KV cache reduction vs V3.2 — #1 HuggingFace trending in 43 minutes.
Moonshot's Kimi K2.6 runs 300 parallel sub-agents for 12+ hours autonomously at $0.60/M input tokens — open-weight, HuggingFace-available.
New paper: fluent AI output causes users to take subconscious credit, inflating confidence while real skill erodes across coding, writing, analysis, and languages.
OpenAI launches GPT-5.5 in ChatGPT and Codex with $5/$30 per-million token pricing, 1M context, and major token efficiency gains.

DeepSeek V4-Pro launches with 1.6T parameters, 1M context, and 10× KV cache reduction over V3.2 — multiplying inference concurrency roughly 10× on the same hardware.
Sam Altman signaled via emoji reply that GPT-5.5 or GPT-6 may launch April 23 — confirmed independently by @swyx. No official announcement published yet.
Qwen3.6-27B (Apache 2.0) claims to outperform the 397B Qwen3.5 MoE and Claude Opus 4.5 on coding benchmarks, running locally on 18GB RAM.
Alibaba's Apache 2.0 27B model outperforms Qwen3.5-397B-A17B on all major coding tasks and runs locally on 18 GB RAM — 'bye bye subscription era' claims are spreading.
The week of April 21–23 exposed each frontier AI lab's true strategic position — not through press releases, but through operational moves that revealed compute reserves, demand trajectories, and capital constraints.
Anthropic releases Opus 4.7 with SWE-bench Verified up 7 points, vision ceiling tripled to 3.75 MP, a new xhigh reasoning tier, and no price increase.
A comparative analysis of the open-source LLM ecosystem entering Q2 2026 — benchmarking performance against proprietary alternatives, mapping the licensing landscape, and calculating total cost of ownership for self-hosted deployments.
How leading organizations combine knowledge graphs with LLMs to build AI systems that reason over structured relationships — covering GraphRAG architectures, entity resolution, and the emerging graph-native context engineering paradigm.
Curated AI insights — sent when there's something worth your inbox.