Mamba and S4 Explained: Architecture, Parallel Scan, Kernel Fusion, Recurrent, Convolution, Math
Umar Jamil
@umarjamilaiAbout
I'm a Machine Learning Engineer from Milan, Italy, teaching complex deep learning and machine learning concepts to my cat, 奥利奥. 我也会中文.
Latest Posts
Video Description
Explanation of the paper Mamba: Linear-Time Sequence Modeling with Selective State Spaces In this video I will be explaining Mamba, a new sequence modeling architecture that can compete with the Transformer. I will first start by introducing the various sequence modeling architectures (RNN, CNN and Transformer) and then deep dive into State Space Models. To fully understand State Space Models, we need to have some background in differential equations. That's why, I will provide a brief introduction to differential equations (in 5 minutes!) and then proceed to derive the recurrent formula and the convolutional formula from first principles. I will also prove mathematically (with the help of visual diagrams) why State Space Models can be run as a convolution. I will explain what is the HIPPO matrix and how it can help the model "memorize" the input history in a finite state. In the second part of the video, I will explore Mamba and in particular the Selective Scan algorithm, but first explaining what is the scan operation and how it can be parallelized, and then showing how the authors further improved the algorithm with Kernel Fusion and activations recomputation. I will also provide a brief lesson on the memory hierarchy in the GPU and why some operations may be IO-bound. In the last part of the video we will explore the architecture of Mamba and some performance results to compare it with the Transformer. Slides PDF and Parallel Scan (excel file): https://github.com/hkproj/mamba-notes Chapters 00:00:00 - Introduction 00:01:46 - Sequence modeling 00:07:12 - Differential equations (basics) 00:11:38 - State Space Models 00:13:53 - Discretization 00:23:08 - Recurrent computation 00:26:32 - Convolutional computation 00:34:18 - Skip connection term 00:35:21 - Multidimentional SSM 00:37:44 - The HIPPO theory 00:43:30 - The motivation behind Mamba 00:46:56 - Selective Scan algorithm 00:51:34 - The Scan operation 00:54:24 - Parallel Scan 00:57:20 - Innovations in Selective Scan 00:58:00 - GPU Memory Hierarchy 01:01:23 - Kernel Fusion 01:01:48 - Activations recomputation 01:06:48 - Mamba architecture 01:10:18 - Performance considerations 01:12:54 - Conclusion
Essential Angle Grinder Kits
AI-recommended products based on this video

10" Pro Chicken Shredder Tool Twist Large for Kitchen, BPA-Free Food-Safe Meat Grinder, Visible Chicken Breast Meat Shredder with Strong Anti-Slip Bottom & Ergonomic Handle, Black

Amazon Basics Disposable Striped Plastic Flex Straws, 7.5" Long, Assorted Colors, 100 Count, Large Pack

EastRock Blues Harmonica Mouth Organ 10 Hole C Key with Case, Diatonic Harmonica for Professional Player, Beginner, Students gifts, Adult, Friends, Gift Black

Mini Mic Pro (Latest Model) - Professional Wireless Microphone for iPhone, iPad, Android, Lavalier Microphone for Video Recording - iPhone Mic Crystal Clear Recording with USB-C for Content Creators

2 Pack Professional Wireless Lavalier Microphone for iPhone iPad(iPhone 14 and Below), Lapel Mics Plug-Play Clip on Lapel Mic for YouTube, Recording, Vlog, Facebook Live



