NVIDIA RTX Spark: 128GB Unified Memory for On-Device AI
NVIDIA RTX Spark brings ~1 PFLOP FP4 and up to 128GB unified memory to laptops and desktops — enough to run frontier-sized open models on-device. Shipping fall 2026.
NVIDIA RTX Spark brings ~1 PFLOP FP4 and up to 128GB unified memory to laptops and desktops — enough to run frontier-sized open models on-device. Shipping fall 2026.
Google Magenta RealTime 2: open-weights real-time music at ~200ms latency, text/audio/MIDI control, 2.4B params — runs on a MacBook without a GPU. Previous latency: ~3 seconds.
Google's Gemma 4 12B is encoder-free multimodal — text, audio, video, image — in 16GB VRAM under Apache 2.0. Day-0 in Transformers, llama.cpp, MLX, and Red Hat OpenShift.
Apple plans to use WWDC 2026 to showcase on-device AI via custom silicon and a distilled Gemini model — framing local inference as its competitive edge over cloud-dependent rivals.
Google shipped the Chrome on-device Prompt API over objections from W3C, Mozilla, WebKit, and Microsoft, letting any site query Gemini Nano directly.

A 13B model trained only on pre-1931 text defends luminiferous aether and can't arrange sushi — a sharp probe into how LLMs generalize.
Apple puts two silicon engineers at the top, signaling a strategic pivot from cloud-AI velocity to on-device inference — a bet on winning a different race entirely.
Curated AI insights — sent when there's something worth your inbox.