What are Mixture-of-Experts Models | ft. Aritra

Hugging Face Hugging Face

561
2 ngày trước
In this clip, Aritra Roy Gosthipaty from the Hugging Face Transformers team breaks down one of the most important (and often misunderstood) architectures in modern AI: Mixture-of-Experts models.

Main MOE explainer: what they are, why they became mainstream, and why the ecosystem shifted around them.

Chapters:
- 00:00 Why Mixture-of-Experts Models Matter
- 00:14 Mixture-of-Experts Layers
- 01:07 vLLM and Serving Stacks
- 01:51 DeepSeek-V2
- 02:55 Mixtral 8x7B
- 03:20 Switch Transformers
- 04:25 Inference Providers
- 05:12 Unsloth Kernels

Topics covered:
- Mixture-of-Experts Layers
- vLLM and Serving Stacks
- DeepSeek-V2
- Mixtral 8x7B
- Switch Transformers
- Inference Providers

Sources mentioned:
- Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer — https://arxiv.org/abs/1701.06538
- vLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention — https://arxiv.org/abs/2309.06180
- DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model — https://arxiv.org/abs/2405.04434
- Mixtral of Experts — https://arxiv.org/abs/2401.04088
- Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity — https://arxiv.org/abs/2101.03961
- Inference Providers — https://huggingface.co/docs/inference-providers/index
- Unsloth Docs — https://unsloth.ai/docs

Listen to the full podcast on Spotify: https://open.spotify.com/show/2BWAr3zLa2xhUqoHlg8DAD?si=-nXiwfyyQfaowCqb58Ig-w

Watch the full conversation on YouTube: https://youtu.be/O3Ul6H20pLI