Attention in transformers, step-by-step | Deep Learning Chapter 6
3Blue1Brown
@3blue1brownAbout
My name is Grant Sanderson. Videos here cover a variety of topics in math, or adjacent fields like physics and CS, all with an emphasis on visualizing the core ideas. The goal is to use animation to help elucidate and motivate otherwise tricky topics, and for difficult problems to be made simple with changes in perspective. For more information, other projects, FAQs, and inquiries see the website: https://www.3blue1brown.com
Latest Posts
Video Description
Demystifying attention, the key mechanism inside transformers and LLMs. Instead of sponsored ad reads, these lessons are funded directly by viewers: https://3b1b.co/support Special thanks to these supporters: https://www.3blue1brown.com/lessons/attention#thanks An equally valuable form of support is to simply share the videos. Demystifying self-attention, multiple heads, and cross-attention. Instead of sponsored ad reads, these lessons are funded directly by viewers: https://3b1b.co/support The first pass for the translated subtitles here is machine-generated and, therefore, notably imperfect. To contribute edits or fixes, visit https://www.criblate.com Звуковая дорожка на русском языке: Влад Бурмистров. ------------------ Here are a few other relevant resources Build a GPT from scratch, by Andrej Karpathy https://youtu.be/kCc8FmEb1nY If you want a conceptual understanding of language models from the ground up, @vcubingx just started a short series of videos on the topic: https://youtu.be/1il-s4mgNdI?si=XaVxj6bsdy3VkgEX If you're interested in the herculean task of interpreting what these large networks might actually be doing, the Transformer Circuits posts by Anthropic are great. In particular, it was only after reading one of these that I started thinking of the combination of the value and output matrices as being a combined low-rank map from the embedding space to itself, which, at least in my mind, made things much clearer than other sources. https://transformer-circuits.pub/2021/framework/index.html Site with exercises related to ML programming and GPTs https://www.gptandchill.ai/codingproblems History of language models by Brit Cruise, @ArtOfTheProblem https://youtu.be/OFS90-FX6pg An early paper on how directions in embedding spaces have meaning: https://arxiv.org/pdf/1301.3781.pdf ------------------ Timestamps: 0:00 - Recap on embeddings 1:39 - Motivating examples 4:29 - The attention pattern 11:08 - Masking 12:42 - Context size 13:10 - Values 15:44 - Counting parameters 18:21 - Cross-attention 19:19 - Multiple heads 22:16 - The output matrix 23:19 - Going deeper 24:54 - Ending ------------------ These animations are largely made using a custom Python library, manim. See the FAQ comments here: https://3b1b.co/faq#manim https://github.com/3b1b/manim https://github.com/ManimCommunity/manim/ All code for specific videos is visible here: https://github.com/3b1b/videos/ The music is by Vincent Rubinetti. https://www.vincentrubinetti.com https://vincerubinetti.bandcamp.com/album/the-music-of-3blue1brown https://open.spotify.com/album/1dVyjwS8FBqXhRunaG5W5u ------------------ 3blue1brown is a channel about animating math, in all senses of the word animate. If you're reading the bottom of a video description, I'm guessing you're more interested than the average viewer in lessons here. It would mean a lot to me if you chose to stay up to date on new ones, either by subscribing here on YouTube or otherwise following on whichever platform below you check most regularly. Mailing list: https://3blue1brown.substack.com Twitter: https://twitter.com/3blue1brown Instagram: https://www.instagram.com/3blue1brown Reddit: https://www.reddit.com/r/3blue1brown Facebook: https://www.facebook.com/3blue1brown Patreon: https://patreon.com/3blue1brown Website: https://www.3blue1brown.com
Essential Tools for Deep Learning
AI-recommended products based on this video

AocBook 15.6'' FHD Laptop, Intel N95, Nvidia GTX 1060 4GB, 32GB DDR4 RAM, M.2 SSD, Sleek Notebook with Type-C, HDMI, RJ45 Ethernet, Backlit Keyboard, Fingerprint (32GB DDR4 | 1TB SSD)

Google Pixel Buds Pro 2 - Noise Canceling Earbuds - Up to 31 Hour Battery Life with Charging Case - Bluetooth Headphones - Compatible with Android - Hazel

Deeyaple USB C to Aux, 4FT/1.2M, Type C to 3.5mm Audio Cable Headphone Jack Cable for Car Mobile Phone, iPhone 16 15, iPad Pro, Samsung Galaxy S24 S23 S2010, Google Pixel,Oneplus Grey (1)

Car Carplay Woven Cable for iPhone 16 15 3.3FT USB A to USB C 3.2 Gen 2 Carplay Adapter Wire for iPhone 16 15 Pro Max, iPad Pro/Air, Samsung Galaxy S25/S24/S23/S22/S21 Google Pixel, Car Charger Cable

Skytech Archangel Gaming PC Desktop – AMD Ryzen 5 3600 3.6 GHz, NVIDIA RTX 3060, 1TB NVME SSD, 16GB DDR4 RAM 3200, 600W Gold PSU, 11AC Wi-Fi, Windows 11 Home 64-bit

Skytech Blaze 3.0 Gaming PC Desktop – Intel Core i5 12400F 2.5 GHz, NVIDIA RTX 3060, 500GB NVME SSD, 16GB DDR4 RAM 3200, 600W Gold PSU, 11AC Wi-Fi, Windows 11 Home 64-bit

NVIDIA RTX A2000 - Graphics Card - RTX A2000-6 GB GDDR6 - PCIe 4.0 x16-4 x Mini DisplayPort

NVIDIA RTX A4500

10.1 Inch Touch Portable Monitor IPS Screen 1366x768P 60Hz 400 Brightness 99% sRGB HDMI USB-C Monitors Switch for Xbox PS3/4/5 Laptop Compatible with Raspberry Pi, Mini Touch Screen




















