Sparse Mixture of Experts - The transformer behind the most efficient LLMs (DeepSeek, Mixtral)
About
No channel description available.
Video Description
In this video, we discuss Mixture of Experts Transformers - the backbone behind popular LLMs like DeepSeek V3, Mixtral 8x22B, and more. You will learn concepts like Dense MOEs, Sparse MOEs, Top-K Routing, Noisy Routing, Expert Capacity, Switch Transformers, Auxilliary load balancing losses, and many more. Everything is presented visually to help conceptualize what is going on, and code snippets are provided to make it more concrete! Follow on Twitter: https://x.com/neural_avb To support this channel, you can buy me a coffee at: https://ko-fi.com/neuralavb Join the channel on Patreon to receive updates about the channel, and get access to bonus content used in all my videos. You will get the slides, notebooks, code snippets, word docs, and animations that went into producing this video. Here is the link: https://www.patreon.com/NeuralBreakdownwithAVB Visit AI Agent Store Page: https://aiagentstore.ai/?ref=avishek #pytorch #transformers #deepseek Videos and playlists you would like: Attention to Transformers playlist: https://www.youtube.com/playlist?list=PLGXWtN1HUjPfq0MSqD5dX8V7Gx5ow4QYW Guide to fine-tuning open source LLMs: https://youtu.be/bZcKYiwtw1I Generative Language Modeling from scratch: https://youtu.be/s3OUzmUDdg8 References and additional links: Sparse Mixture of Experts paper: https://arxiv.org/abs/1701.06538 Mixtral of Experts: https://arxiv.org/abs/2401.04088 DeepSeek V2: https://arxiv.org/abs/2405.04434 DeepSeek V3: https://arxiv.org/abs/2412.19437 Switch Transformers / Expert Capacity: https://arxiv.org/abs/2101.03961 A Blog post: https://brunomaga.github.io/Mixture-of-Experts A visual guide: https://newsletter.maartengrootendorst.com/p/a-visual-guide-to-mixture-of-experts Survey paper: https://arxiv.org/pdf/2407.06204 Timestamps: 0:00 - Intro 1:52 - Mixture of Experts Intuition 4:53 - Transformers 101 9:20 - Dense MOEs 14:50 - Sparse MOEs 16:34 - Router Collapse and Top-K Routing 19:20 - Noisy TopK, Load Balancing 20:56 - Routing Analysis by Mixtral 22:30 - Auxilliary Losses & DeepSeek 24:05 - Expert Capacity 26:07 - 6 Points to Remember
Upgrade Your AI Learning Journey
AI-recommended products based on this video

Cat Deterrent Spray with Added Citrus Essential Oil,150ml Efficient Anti-Scratch Cat Spray,Safe,Natural Indoor/Outdoor Training Spray for Furniture to Prevent Cat Scratching and Territory Marking

Hey Curls toronto 5-in-1 Hot Air Styler & Blow Dryer Brush, Negative Ionic Feature for Efficient Drying, Frizz Management, Curling, Volumizing & Straightening (White)

soundcore by Anker P20i True Wireless Earbuds, 10mm Drivers with Big Bass, Bluetooth 5.3, 30H Long Playtime, IPX5 Water-Resistant, 2 Mics for AI Clear Calls, 22 Preset EQs, Customization via App

Real-Time AI Translation Earbuds - 198 Languages 3-in-1 Translating Device, Écouteurs traducteurs Anglais-français, Clear Voice Translation Headphones with Charging Case for Travel & Learning Global Recycled Standard

Real-Time AI Translation Earbuds - 198 Languages 3-in-1 Translating Device,Écouteurs traducteurs Anglais-français, Clear Voice Translation with Charging Case for Travel & Learning Global Recycled Standard

GMKtec EVO-X2 AI Mini PC Ryzen Al Max+ 395 (up to 5.1GHz) Mini Gaming Computers, 96GB LPDDR5X 8000MHz (12GB*8) 1TB PCIe 4.0 SSD, Quad Screen 8K Display, WiFi 7 & USB4, SD Card Reader 4.0 Global Recycled Standard

GMKtec EVO-X2 AI Mini PC, Ryzen Al Max+ 395 (up to 5.1GHz) Mini Gaming Computers, 64GB LPDDR5X 8000MHz (8GB*8) 1TB PCIe 4.0 SSD, Quad Screen 8K Display, WiFi 7 & USB4, SD Card Reader 4.0 Global Recycled Standard

BASESAILOR iPhone 16 15 Pro Max Charger Cord,USB to USB C Cable 6.6FT/2Pack,Type C Charging Power Cord for Apple iPad 10,Samsung Galaxy AI S25 S24 S23 S22 S20 Plus Ultra Flip,A52 A53 A54,Google Pixel




