Mistral / Mixtral Explained: Sliding Window Attention, Sparse Mixture of Experts, Rolling Buffer

Umar Jamil November 25, 2024
Video Thumbnail
Umar Jamil Logo

Umar Jamil

View Channel

About

I'm a Machine Learning Engineer from Milan, Italy, teaching complex deep learning and machine learning concepts to my cat, 奥利奥. 我也会中文.

Video Description

Mistral / Mixtral Explained: Sliding Window Attention, Sparse Mixture of Experts, Rolling Buffer

You May Also Like