Attention in transformers, step-by-step | Deep Learning Chapter 6

3Blue1Brown April 7, 2024
Video Thumbnail
3Blue1Brown Logo

3Blue1Brown

@3blue1brown

About

My name is Grant Sanderson. Videos here cover a variety of topics in math, or adjacent fields like physics and CS, all with an emphasis on visualizing the core ideas. The goal is to use animation to help elucidate and motivate otherwise tricky topics, and for difficult problems to be made simple with changes in perspective. For more information, other projects, FAQs, and inquiries see the website: https://www.3blue1brown.com

Video Description

Demystifying attention, the key mechanism inside transformers and LLMs. Instead of sponsored ad reads, these lessons are funded directly by viewers: https://3b1b.co/support Special thanks to these supporters: https://www.3blue1brown.com/lessons/attention#thanks An equally valuable form of support is to simply share the videos. Demystifying self-attention, multiple heads, and cross-attention. Instead of sponsored ad reads, these lessons are funded directly by viewers: https://3b1b.co/support The first pass for the translated subtitles here is machine-generated and, therefore, notably imperfect. To contribute edits or fixes, visit https://www.criblate.com Звуковая дорожка на русском языке: Влад Бурмистров. ------------------ Here are a few other relevant resources Build a GPT from scratch, by Andrej Karpathy https://youtu.be/kCc8FmEb1nY If you want a conceptual understanding of language models from the ground up, @vcubingx just started a short series of videos on the topic: https://youtu.be/1il-s4mgNdI?si=XaVxj6bsdy3VkgEX If you're interested in the herculean task of interpreting what these large networks might actually be doing, the Transformer Circuits posts by Anthropic are great. In particular, it was only after reading one of these that I started thinking of the combination of the value and output matrices as being a combined low-rank map from the embedding space to itself, which, at least in my mind, made things much clearer than other sources. https://transformer-circuits.pub/2021/framework/index.html Site with exercises related to ML programming and GPTs https://www.gptandchill.ai/codingproblems History of language models by Brit Cruise,  @ArtOfTheProblem  https://youtu.be/OFS90-FX6pg An early paper on how directions in embedding spaces have meaning: https://arxiv.org/pdf/1301.3781.pdf ------------------ Timestamps: 0:00 - Recap on embeddings 1:39 - Motivating examples 4:29 - The attention pattern 11:08 - Masking 12:42 - Context size 13:10 - Values 15:44 - Counting parameters 18:21 - Cross-attention 19:19 - Multiple heads 22:16 - The output matrix 23:19 - Going deeper 24:54 - Ending ------------------ These animations are largely made using a custom Python library, manim. See the FAQ comments here: https://3b1b.co/faq#manim https://github.com/3b1b/manim https://github.com/ManimCommunity/manim/ All code for specific videos is visible here: https://github.com/3b1b/videos/ The music is by Vincent Rubinetti. https://www.vincentrubinetti.com https://vincerubinetti.bandcamp.com/album/the-music-of-3blue1brown https://open.spotify.com/album/1dVyjwS8FBqXhRunaG5W5u ------------------ 3blue1brown is a channel about animating math, in all senses of the word animate. If you're reading the bottom of a video description, I'm guessing you're more interested than the average viewer in lessons here. It would mean a lot to me if you chose to stay up to date on new ones, either by subscribing here on YouTube or otherwise following on whichever platform below you check most regularly. Mailing list: https://3blue1brown.substack.com Twitter: https://twitter.com/3blue1brown Instagram: https://www.instagram.com/3blue1brown Reddit: https://www.reddit.com/r/3blue1brown Facebook: https://www.facebook.com/3blue1brown Patreon: https://patreon.com/3blue1brown Website: https://www.3blue1brown.com

Essential Tools for Deep Learning

AI-recommended products based on this video

Loading...
AocBook 15.6'' FHD Laptop, Intel N95, Nvidia GTX 1060 4GB, 32GB DDR4 RAM, M.2 SSD, Sleek Notebook with Type-C, HDMI, RJ45 Ethernet, Backlit Keyboard, Fingerprint (32GB DDR4 | 1TB SSD)

AocBook 15.6'' FHD Laptop, Intel N95, Nvidia GTX 1060 4GB, 32GB DDR4 RAM, M.2 SSD, Sleek Notebook with Type-C, HDMI, RJ45 Ethernet, Backlit Keyboard, Fingerprint (32GB DDR4 | 1TB SSD)

(3)
$749.00
FREE delivery Fri, Jun 20
Loading...
Google Pixel Buds Pro 2 - Noise Canceling Earbuds - Up to 31 Hour Battery Life with Charging Case - Bluetooth Headphones - Compatible with Android - Hazel

Google Pixel Buds Pro 2 - Noise Canceling Earbuds - Up to 31 Hour Battery Life with Charging Case - Bluetooth Headphones - Compatible with Android - Hazel

(183)
199.99
PrimeFREE delivery Saturday, June 14
400+ bought in past month
Loading...
Deeyaple USB C to Aux, 4FT/1.2M, Type C to 3.5mm Audio Cable Headphone Jack Cable for Car Mobile Phone, iPhone 16 15, iPad Pro, Samsung Galaxy S24 S23 S2010, Google Pixel,Oneplus Grey (1)
Best Seller

Deeyaple USB C to Aux, 4FT/1.2M, Type C to 3.5mm Audio Cable Headphone Jack Cable for Car Mobile Phone, iPhone 16 15, iPad Pro, Samsung Galaxy S24 S23 S2010, Google Pixel,Oneplus Grey (1)

(78)
$6.99
FREE delivery Fri, Sep 5 on your first order
1K+ bought in past month
Loading...
Car Carplay Woven Cable for iPhone 16 15 3.3FT USB A to USB C 3.2 Gen 2 Carplay Adapter Wire for iPhone 16 15 Pro Max, iPad Pro/Air, Samsung Galaxy S25/S24/S23/S22/S21 Google Pixel, Car Charger Cable

Car Carplay Woven Cable for iPhone 16 15 3.3FT USB A to USB C 3.2 Gen 2 Carplay Adapter Wire for iPhone 16 15 Pro Max, iPad Pro/Air, Samsung Galaxy S25/S24/S23/S22/S21 Google Pixel, Car Charger Cable

(213)
$12.99
FREE delivery Sat, Jun 14 on your first order
100+ bought in past month
Loading...
Skytech Archangel Gaming PC Desktop – AMD Ryzen 5 3600 3.6 GHz, NVIDIA RTX 3060, 1TB NVME SSD, 16GB DDR4 RAM 3200, 600W Gold PSU, 11AC Wi-Fi, Windows 11 Home 64-bit

Skytech Archangel Gaming PC Desktop – AMD Ryzen 5 3600 3.6 GHz, NVIDIA RTX 3060, 1TB NVME SSD, 16GB DDR4 RAM 3200, 600W Gold PSU, 11AC Wi-Fi, Windows 11 Home 64-bit

(31)
$2,657.01$2,656.90
FREE delivery Feb 24 - Mar 3
Loading...
Skytech Blaze 3.0 Gaming PC Desktop – Intel Core i5 12400F 2.5 GHz, NVIDIA RTX 3060, 500GB NVME SSD, 16GB DDR4 RAM 3200, 600W Gold PSU, 11AC Wi-Fi, Windows 11 Home 64-bit

Skytech Blaze 3.0 Gaming PC Desktop – Intel Core i5 12400F 2.5 GHz, NVIDIA RTX 3060, 500GB NVME SSD, 16GB DDR4 RAM 3200, 600W Gold PSU, 11AC Wi-Fi, Windows 11 Home 64-bit

(786)
$1,414.11
FREE delivery Thu, Feb 19Only 4 left in stock.
Loading...
NVIDIA RTX A2000 - Graphics Card - RTX A2000-6 GB GDDR6 - PCIe 4.0 x16-4 x Mini DisplayPort

NVIDIA RTX A2000 - Graphics Card - RTX A2000-6 GB GDDR6 - PCIe 4.0 x16-4 x Mini DisplayPort

(17)
$1,224.08
FREE delivery Feb 20 - 26
Loading...
NVIDIA RTX A4500

NVIDIA RTX A4500

(15)
$1,939.67
$11.54 delivery Feb 25 - Mar 6
Loading...
10.1 Inch Touch Portable Monitor IPS Screen 1366x768P 60Hz 400 Brightness 99% sRGB HDMI USB-C Monitors Switch for Xbox PS3/4/5 Laptop Compatible with Raspberry Pi, Mini Touch Screen

10.1 Inch Touch Portable Monitor IPS Screen 1366x768P 60Hz 400 Brightness 99% sRGB HDMI USB-C Monitors Switch for Xbox PS3/4/5 Laptop Compatible with Raspberry Pi, Mini Touch Screen

(1)
$99.99
FREE delivery Mon, Aug 11
Loading...
DHT11 modules Digital Temperature and Humidity Temperature Sensor for arduino DIY KIT 5pcs

DHT11 modules Digital Temperature and Humidity Temperature Sensor for arduino DIY KIT 5pcs

(1)
$15.99
FREE delivery Sat, Jun 14 on your first order