What are Mixture-of-Experts Models | ft. Aritra

Hugging Face

Kho Tổng Hợp

571

1 tháng trước

Xem video

XEM MÔ TẢ

In this clip, Aritra Roy Gosthipaty from the Hugging Face Transformers team breaks down one of the most important (and often misunderstood) architectures in modern AI: Mixture-of-Experts models.

Main MOE explainer: what they are, why they became mainstream, and why the ecosystem shifted around them.

Chapters:
- 00:00 Why Mixture-of-Experts Models Matter
- 00:14 Mixture-of-Experts Layers
- 01:07 vLLM and Serving Stacks
- 01:51 DeepSeek-V2
- 02:55 Mixtral 8x7B
- 03:20 Switch Transformers
- 04:25 Inference Providers
- 05:12 Unsloth Kernels

Topics covered:
- Mixture-of-Experts Layers
- vLLM and Serving Stacks
- DeepSeek-V2
- Mixtral 8x7B
- Switch Transformers
- Inference Providers

Sources mentioned:
- Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer — https://arxiv.org/abs/1701.06538
- vLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention — https://arxiv.org/abs/2309.06180
- DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model — https://arxiv.org/abs/2405.04434
- Mixtral of Experts — https://arxiv.org/abs/2401.04088
- Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity — https://arxiv.org/abs/2101.03961
- Inference Providers — https://huggingface.co/docs/inference-providers/index
- Unsloth Docs — https://unsloth.ai/docs

Listen to the full podcast on Spotify: https://open.spotify.com/show/2BWAr3zLa2xhUqoHlg8DAD?si=-nXiwfyyQfaowCqb58Ig-w

Watch the full conversation on YouTube: https://youtu.be/O3Ul6H20pLI

Thêm vào playlist

Tạo playlist mới