NanoGPT-Bench: Coding Agents Recover Only 9.3% of Human AI R&D Progress
NanoGPT-Bench finds coding agents including Codex and Claude Code achieve just 9.3% of human AI R&D progress, tuning hyperparams but missing algorithmic research breakthroughs.
NanoGPT-Bench finds coding agents including Codex and Claude Code achieve just 9.3% of human AI R&D progress, tuning hyperparams but missing algorithmic research breakthroughs.
Aleph (Logic International) aces PutnamBench, VeriSoftBench, and Verina. Fully autonomous formal verification agent achieves new SOTA across all major theorem benchmarks.
New study: expert persona prompting ('you are a physicist') no longer improves accuracy on frontier models. Baseline capability has rendered the technique obsolete — update your prompt libraries.
Yann LeCun left Meta in late 2025 and founded Paris-based AMI Labs, valued at $3.5B pre-launch, with a mission focused on world models over LLMs.
OpenAI's GPT-5.4 Pro contributed to solving an Erdős problem open for 60 years — a concrete milestone for AI-assisted frontier mathematics research.
Curated AI insights — sent when there's something worth your inbox.