Mamba: Linear-Time Sequence Modeling with Selective State Spaces (Paper Explained)

Yannic Kilcher December 24, 2023
Video Thumbnail
Yannic Kilcher Logo

Yannic Kilcher

View Channel

About

I make videos about machine learning research papers, programming, and issues of the AI community, and the broader impact of AI in society. Twitter: https://twitter.com/ykilcher Discord: https://ykilcher.com/discord BitChute: https://www.bitchute.com/channel/yannic-kilcher LinkedIn: https://www.linkedin.com/in/ykilcher BiliBili: https://space.bilibili.com/2017636191 If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannickilcher Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

Video Description

#mamba #s4 #ssm OUTLINE: 0:00 - Introduction 0:45 - Transformers vs RNNs vs S4 6:10 - What are state space models? 12:30 - Selective State Space Models 17:55 - The Mamba architecture 22:20 - The SSM layer and forward propagation 31:15 - Utilizing GPU memory hierarchy 34:05 - Efficient computation via prefix sums / parallel scans 36:01 - Experimental results and comments 38:00 - A brief look at the code Paper: https://arxiv.org/abs/2312.00752 Abstract: Foundation models, now powering most of the exciting applications in deep learning, are almost universally based on the Transformer architecture and its core attention module. Many subquadratic-time architectures such as linear attention, gated convolution and recurrent models, and structured state space models (SSMs) have been developed to address Transformers' computational inefficiency on long sequences, but they have not performed as well as attention on important modalities such as language. We identify that a key weakness of such models is their inability to perform content-based reasoning, and make several improvements. First, simply letting the SSM parameters be functions of the input addresses their weakness with discrete modalities, allowing the model to selectively propagate or forget information along the sequence length dimension depending on the current token. Second, even though this change prevents the use of efficient convolutions, we design a hardware-aware parallel algorithm in recurrent mode. We integrate these selective SSMs into a simplified end-to-end neural network architecture without attention or even MLP blocks (Mamba). Mamba enjoys fast inference (5× higher throughput than Transformers) and linear scaling in sequence length, and its performance improves on real data up to million-length sequences. As a general sequence model backbone, Mamba achieves state-of-the-art performance across several modalities such as language, audio, and genomics. On language modeling, our Mamba-3B model outperforms Transformers of the same size and matches Transformers twice its size, both in pretraining and downstream evaluation. Authors: Albert Gu, Tri Dao Links: Homepage: https://ykilcher.com Merch: https://ykilcher.com/merch YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://ykilcher.com/discord LinkedIn: https://www.linkedin.com/in/ykilcher If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannickilcher Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

You May Also Like

Upgrade Your Learning Game

AI-recommended products based on this video

Loading...
Freenove Ultimate Starter Kit for BBC micro bit (V2 Included), 316-Page Detailed Tutorial, 225 Items, 44 Projects, Blocks and Python Code

Freenove Ultimate Starter Kit for BBC micro bit (V2 Included), 316-Page Detailed Tutorial, 225 Items, 44 Projects, Blocks and Python Code

(382)
$94.95
PrimeFREE delivery Sat, Jun 14
Loading...
12.3 Inch Secondary Monitor, IPS Stretched Bar LCD Travel Display 1920 * 720 HDMI USBC, Mini Portable Monitor Metal Material for Laptop Computer Windows Aida64 GPU CPU RAM Monitoring,Built-in Speaker

12.3 Inch Secondary Monitor, IPS Stretched Bar LCD Travel Display 1920 * 720 HDMI USBC, Mini Portable Monitor Metal Material for Laptop Computer Windows Aida64 GPU CPU RAM Monitoring,Built-in Speaker

(1)
$129.99
FREE delivery Sun, Aug 10
Loading...
12.3 Inch Touchscreen Secondary Monitor, IPS Stretched Bar LCD Travel Touch Display 1920 * 720 HDMI USBC, Portable Touch Screen for Laptop Computer Windows Aida64 GPU CPU RAM Monitoring

12.3 Inch Touchscreen Secondary Monitor, IPS Stretched Bar LCD Travel Touch Display 1920 * 720 HDMI USBC, Portable Touch Screen for Laptop Computer Windows Aida64 GPU CPU RAM Monitoring

(161)
$134.98
FREE delivery Mon, Aug 11
Loading...
10.3'' Small Touchscreen Monitor, 400cd/m², VESA, IPS Stretched Bar LCD Mini Travel Touch Display 1920 * 720 HDMI USBC, Portable Touch Screen for Laptop Computer Windows Aida64 GPU CPU RAM Monitoring

10.3'' Small Touchscreen Monitor, 400cd/m², VESA, IPS Stretched Bar LCD Mini Travel Touch Display 1920 * 720 HDMI USBC, Portable Touch Screen for Laptop Computer Windows Aida64 GPU CPU RAM Monitoring

(17)
$99.00$79.20
FREE delivery Sun, Aug 10
Loading...
GMKtec AD-GP1 External GPU Docking Station, eGPU Enclosure with AMD Radeon 7600M XT GPU Graphics Card, HDMI2.1, DisplayPort2.0, Oculink, USB4, eGPU Dock for Mini PC Laptop Notebook Game Console

GMKtec AD-GP1 External GPU Docking Station, eGPU Enclosure with AMD Radeon 7600M XT GPU Graphics Card, HDMI2.1, DisplayPort2.0, Oculink, USB4, eGPU Dock for Mini PC Laptop Notebook Game Console

(27)
$659.99
PrimeFREE delivery Sat, Jun 14
Loading...
Corsair MP600 PRO LPX 4TB M.2 NVMe PCIe x4 Gen4 SSD - Optimized for PS5 (Up to 7,100MB/sec Sequential Read & 6,800MB/sec Sequential Write Speeds, High-Speed Interface, Compact Form Factor) Black

Corsair MP600 PRO LPX 4TB M.2 NVMe PCIe x4 Gen4 SSD - Optimized for PS5 (Up to 7,100MB/sec Sequential Read & 6,800MB/sec Sequential Write Speeds, High-Speed Interface, Compact Form Factor) Black

(8,521)
$449.99
FREE delivery Tue, Jun 17
Loading...
Neck Fan with Cooling Plate,Portable Neck Fan Rechargeable Battery Powered,Bladeless Personal Air Conditioner Wearable Neck Cooler Fans,USB Electric Quiet Necklace AC Fans for Travel Outdoor Black

Neck Fan with Cooling Plate,Portable Neck Fan Rechargeable Battery Powered,Bladeless Personal Air Conditioner Wearable Neck Cooler Fans,USB Electric Quiet Necklace AC Fans for Travel Outdoor Black

(35)
$39.99
600+ bought in past month
Loading...
Portable Air Conditioners, 5200mAh Rechargeable Mini Air Conditioner Fan, 3 Wind Speeds Quiet Personal Small Desk Air Cooler Fan for for Bedroom, Car, Home, Camping, Travel, and Office

Portable Air Conditioners, 5200mAh Rechargeable Mini Air Conditioner Fan, 3 Wind Speeds Quiet Personal Small Desk Air Cooler Fan for for Bedroom, Car, Home, Camping, Travel, and Office

(29)
$26.99
FREE delivery Mon, Jun 16 on your first order
Loading...
KEXIN 64GB USB Flash Drive 3 Pack - Swivel Thumb Drives with LED Indicator, High-Speed USB 2.0 (Pink/Yellow/Cyan) for Data Storage, Bulk Pen Drives Multi-Color Pack

KEXIN 64GB USB Flash Drive 3 Pack - Swivel Thumb Drives with LED Indicator, High-Speed USB 2.0 (Pink/Yellow/Cyan) for Data Storage, Bulk Pen Drives Multi-Color Pack

(1,558)
$17.75
Prime
100+ bought in past month
Loading...
Flash Drive 16GB 5 Pack USB 2.0 Swivel Thumb Drives Bulk Memory Stick Jump Drive Pendrive Zip Drive for Data Storage,Black/Blue/Orange/Green/Purple (16G, 5Pcs Mixed Color)

Flash Drive 16GB 5 Pack USB 2.0 Swivel Thumb Drives Bulk Memory Stick Jump Drive Pendrive Zip Drive for Data Storage,Black/Blue/Orange/Green/Purple (16G, 5Pcs Mixed Color)

(983)
$18.04
FREE delivery Sun, Aug 10 on your first order
100+ bought in past month
Loading...
5 Pack 32GB Flash Drive, USB 2.0 Swivel Thumb Drives Bulk Memory Stick Jump Drive Pen Drive Zip Drive for Data Storage,Black/Blue/Orange/Green/Purple (32G, 5Pcs Mixed Color)

5 Pack 32GB Flash Drive, USB 2.0 Swivel Thumb Drives Bulk Memory Stick Jump Drive Pen Drive Zip Drive for Data Storage,Black/Blue/Orange/Green/Purple (32G, 5Pcs Mixed Color)

(965)
$24.69
FREE delivery Sat, Jun 14 on your first order
Loading...
2 Pack 32GB USB Flash Drive, 2.0 USB Stick Thumb Drive,Waterproof Function with Lanyard USB Memory Stick for Data Storage and Backup, Black

2 Pack 32GB USB Flash Drive, 2.0 USB Stick Thumb Drive,Waterproof Function with Lanyard USB Memory Stick for Data Storage and Backup, Black

(332)
$15.19
FREE delivery Sat, Jun 14 on your first order