![Mixtral of Experts - Summary](/blog/content/images/size/w720/2024/04/2-1.png)
Mixtral of Experts - Summary
The paper introduces Mixtral 8x7B, a Sparse Mixture of Experts (SMoE) language model that outperforms existing models like Llama 2 70B and GPT-3.5 on various benchmarks. It uses a routing network to select two experts per token, allowing access to 47B parameters while actively using only 13B, enhan