Alibaba's AgenticQwen-30B (3B Active) Matches Qwen3-235B on Tool-Use
AgenticQwen-30B-A3B scores 50.2 avg matching Qwen3-235B on tool-use benchmarks. Dual RL flywheels flip the cost curve for production agents.
AgenticQwen-30B-A3B scores 50.2 avg matching Qwen3-235B on tool-use benchmarks. Dual RL flywheels flip the cost curve for production agents.
Alibaba Happy Horse tops Artificial Analysis video leaderboard but fails independent real-world tests for physics and prompt fidelity — a benchmark contamination signal.
Qwen3.6-27B (Apache 2.0) beats Qwen3.5-397B on coding benchmarks, validating that targeted training recipe outperforms brute parameter scale for agentic coding tasks.
Qwen3.6-27B (Apache 2.0) claims to outperform the 397B Qwen3.5 MoE and Claude Opus 4.5 on coding benchmarks, running locally on 18GB RAM.
Alibaba's Apache 2.0 27B model outperforms Qwen3.5-397B-A17B on all major coding tasks and runs locally on 18 GB RAM — 'bye bye subscription era' claims are spreading.
Curated AI insights — sent when there's something worth your inbox.