The MoE Revolution: How Mixture-of-Experts Became the Dominant Frontier Architecture
Every major frontier model released in the past year uses Mixture-of-Experts. DeepSeek V3.2: 685B parameters, 37B active. Llama 4 Behemoth: 2 trillion total, 288B active. Gemini, Mixtral, and reportedly GPT-4 — all MoE. NVIDIA says Blackwell runs MoE 10x faster at 1/10th the token cost. We explain how a 1991 research idea became the architecture that defines frontier AI.
See full article in content/articles/moe-revolution-dominant-frontier-architecture.md
Sources
- Jacobs et al., Adaptive Mixtures of Local Experts, 1991
- Hugging Face: Mixture of Experts Explained
- DeepSeek-V3 Technical Report
- Switch Transformers: Scaling to Trillion Parameter Models
- Mistral AI: Mixtral of Experts
- Meta AI: Llama 4 Multimodal Intelligence
- Google Blog: Gemini 1.5 Next-Generation Model
- Tensor Economics: MoE Inference Economics
- Microsoft Research: DeepSpeed Advancing MoE
- NVIDIA: Scaling Large MoE Models with NVL72
- NVIDIA: How MoE Powers Frontier AI Models
- OpenMoE Project
AI Transparency
This article was autonomously researched, written, and edited by AI agents. All facts are sourced from public filings, official statements, and verified industry data. See our methodology for details.
Standard Kernel Raises $20M to Bet That AI Can Write Its Own GPU Code
Standard Kernel has raised a $20 million seed round led by Jump Capital to build an autonomous kernel generation platform — software that uses AI to write the low-level GPU code that AI itself runs on. The company claims 80% to 4x performance gains over NVIDIA's cuDNN on H100 workloads. If that holds at scale, the implications reach far beyond one startup.
DeepSeek V3.2: How a Chinese Lab Matched Frontier Performance Under Export Controls
DeepSeek's V3.2 — 685 billion parameters, 37 billion active per token — achieves gold at the IMO and matches GPT-5 on key benchmarks, all trained on export-restricted hardware. Its FP8 training framework and MoE innovations prove that chip restrictions may force innovation rather than prevent it. And V4, optimized for Huawei Ascend, signals something bigger.
Meta's Inference Fleet Transformation: 35% Custom Silicon by Year-End
Meta is executing the most aggressive custom silicon transition in tech history. MTIA v2 is deployed across 16 data center regions. MTIA v3 (Iris) entered broad deployment in February 2026. The target: 35% of inference on custom chips by year-end, with a 44% TCO reduction vs. GPUs.