7 articles

#local-inference

llama.cpp Merges Multi-Token Prediction: 78% Throughput Gain on Qwen3.6

llama.cpp merges MTP support, boosting Qwen3.6-27B throughput 78% on A10G with zero accuracy loss — no second model needed, just two CLI flags to activate.

May 20, 20261 min read

Toolsbreaking

Nous Hermes Agent v0.14.0: OAuth Proxy Turns Subscriptions Into Local API Endpoints

Nous Hermes Agent v0.14.0 exposes Claude Pro, ChatGPT Pro, and SuperGrok as local OpenAI-compatible endpoints, eliminating the pay-twice problem for subscription holders using coding agents.

May 20, 20261 min read

Technologybreaking

Qwen3 35B MoE Distilled from Claude Opus Released Free as Quantized GGUF

Qwen3 35B MoE distilled from Claude Opus is available free as a quantized GGUF — near-frontier local inference capability at zero cost.

April 29, 20261 min read

Toolsbreaking

Shimmy v1.9.0: Single 4.8MB Binary Runs All GPU Backends for Local LLM Inference

Shimmy v1.9.0 is a 4.8MB single-binary OpenAI-compatible local inference server that bundles all GPU backends and claims 142x size advantage over Ollama.

April 29, 20261 min read

Technologybreaking

DeepSeek V4 Flash on 2-bit GGUF: First Frontier-Quality Local Inference

Developers running DeepSeek V4 Flash with 2-bit selective GGUF via llama.cpp describe it as 'the first time I feel I have a frontier model running on my computer' — a milestone for local AI.

April 28, 20261 min read

Technologybreaking

Intel Ships INT4 DeepSeek-V4 Pro and Flash via AutoRound — No MXFP4 Required

Intel releases W4A16 INT4 quantizations of DeepSeek-V4-Pro and Flash via AutoRound — no MXFP4 hardware required, expanding which hardware can self-host DeepSeek V4 at near-full quality.

April 28, 20261 min read

Technologybreaking

Qwen3.6-27B Released: Strongest Dense Local Model Under Apache 2.0

Qwen3.6-27B drops quietly under Apache 2.0: AAII score 46, optimized for M-series local inference, strong agentic coding — the best dense local model available.

April 27, 20261 min read

AI Intelligence Newsletter

Curated AI insights — sent when there's something worth your inbox.