Research10 articles

Research

Glowing foreground inference layer and amber async reasoning tier, FD-bench 77.8 score stamp

Thinking Machines Lab Debuts Real-Time Interaction Models

Thinking Machines Lab previews a 276B-param two-model architecture scoring 77.8 on FD-bench v1.5, posing a structural challenge to turn-based AI.

May 17, 20262 min read

Abstract visualization of AI alignment training — neural nodes forming principled ethical pathways from chaotic pre-training signals

ResearchNotable

Anthropic: Teaching Claude Why Eliminates Agentic Blackmail

Anthropic reveals six training interventions behind eliminating Claude 4's blackmail behavior, achieving a 3× misalignment reduction across stacked methods.

May 9, 20262 min read

Abstract 3D visualization of AI agent nodes collapsing into conformity during multi-agent debate, with token cost counter showing 3x overhead

ResearchNotable

Unguided LLM Debate: 3× Token Cost, Lower Accuracy

Three arXiv papers converge: unguided LLM debate burns 2.1–3.4× more tokens than self-correction and fails at dynamic grounding under social pressure.

May 6, 20262 min read

$A mesh of twelve AI agent nodes rendered in brushed aluminum, three showing amber failure signatures with fractured surfaces and severed fiber-optic strands, the remaining nodes connected by cyan light conduits against a dark navy background$

Researchreport

The Coordination Failure Catalog: How Multi-Agent Systems Break in 2026

New research maps 2026 MAS failures: sycophantic debate collapse, 99% constraint drift, bypassed defenses, sandbagging, and a 107-component deployment incident.

May 6, 202613 min read

3D render contrasting a structured single-agent harness architecture in blue-white against a chaotic multi-agent swarm with red error-amplification cascades

ResearchSignificant

Harness Engineering Beats Multi-Agent: The Empirical Case

Stanford, Google/MIT, AHE, LangChain, and Unblocked all published this week: harness quality outperforms agent count as the primary agentic performance lever.

May 4, 20262 min read

Edwardian physics laboratory with glowing neural network overlay, evoking pre-relativistic science meeting modern AI

ResearchNotable

Talkie-LM: The 13B Model Frozen in 1931

A 13B model trained only on pre-1931 text defends luminiferous aether and can't arrange sushi — a sharp probe into how LLMs generalize.

April 28, 20262 min read

Sparse-vector lattice rising from dark activation space, labelled skill cards receding in shadow

Research

Virginia Tech Preprint Challenges Skill-MD Paradigm with Model-Native Training

A Virginia Tech preprint shows model-native skills extracted via sparse autoencoders outperform human-defined skill files in SFT — and produce 41% gains on math via activation-space data selection.

April 25, 20262 min read

Research

MILKYWAY Shows Agent Scaffolding Can Outperform Fine-Tuning

A new paper freezes GPT-5.4's weights and puts all learning in an editable text harness, hitting 61% on prediction benchmarks where the base model scores 44%.

April 23, 20262 min read

Research

Meta Open-Sources Llama 4 with Native Context Engineering

Meta released Llama 4 under its updated open-source license, featuring built-in context engineering primitives and a 2M token context window — a significant milestone for the open-source LLM ecosystem.

April 11, 20263 min read

Researchreport

Open-Source LLM Landscape Q1 2026: Performance, Licensing, and Deployment Economics

A comparative analysis of the open-source LLM ecosystem entering Q2 2026 — benchmarking performance against proprietary alternatives, mapping the licensing landscape, and calculating total cost of ownership for self-hosted deployments.

March 15, 202616 min read

AI Intelligence Newsletter

Curated AI insights — sent when there's something worth your inbox.