Stanford CS25: V4 I Demystifying Mixtral of Experts

Stanford Online May 16, 2024
Video Thumbnail
Stanford Online Logo

Stanford Online

View Channel

About

You can gain access to a world of education through Stanford Online, the Stanford School of Engineering’s portal for academic and professional education offered by schools and units throughout Stanford University. https://online.stanford.edu/ Our robust catalog of degree programs, credit-bearing education, professional certificate programs, and free and open content is developed by Stanford faculty, enabling you to expand your knowledge, advance your career, and enhance your life. Stanford Online is operated and managed by the Stanford Engineering Center for Global & Online Education (CGOE). CGOE expands access to Stanford teaching and research, working in collaboration with faculty in the School of Engineering and throughout Stanford University to design and deliver extensive global, online, and enterprise education to a global audience.

Video Description

April 25, 2024 Speaker: Albert Jiang, Mistral AI / University of Cambridge Demystifying Mixtral of Experts In this talk I will introduce Mixtral 8x7B, a Sparse Mixture of Experts (SMoE) language model. Mixtral has the same architecture as Mistral 7B, with the difference that each layer is composed of 8 feedforward blocks (i.e. experts). For every token, at each layer, a router network selects two experts to process the current state and combines their outputs. Even though each token only sees two experts, the selected experts can be different at each timestep. As a result, each token has access to 47B parameters, but only uses 13B active parameters during inference. I will go into the architectural details and analyse the expert routing decisions made by the model. About the speaker: Albert Jiang is an AI scientist at Mistral AI, and a final-year PhD student at the computer science department of Cambridge University. He works on language model pretraining and reasoning at Mistral AI, and language models for mathematics at Cambridge. More about the course can be found here: https://web.stanford.edu/class/cs25/ View the entire CS25 Transformers United playlist: https://www.youtube.com/playlist?list=PLoROMvodv4rNiJRchCzutFw5ItR_Z27CM

You May Also Like