But how do AI images and videos actually work? | Guest video by Welch Labs

3Blue1Brown July 25, 2025
Video Thumbnail
3Blue1Brown Logo

3Blue1Brown

@3blue1brown

About

My name is Grant Sanderson. Videos here cover a variety of topics in math, or adjacent fields like physics and CS, all with an emphasis on visualizing the core ideas. The goal is to use animation to help elucidate and motivate otherwise tricky topics, and for difficult problems to be made simple with changes in perspective. For more information, other projects, FAQs, and inquiries see the website: https://www.3blue1brown.com

Video Description

Diffusion models, CLIP, and the math of turning text into images Welch Labs Book: https://www.welchlabs.com/resources/imaginary-numbers-book Sections 0:00 - Intro 3:37 - CLIP 6:25 - Shared Embedding Space 8:16 - Diffusion Models & DDPM 11:44 - Learning Vector Fields 22:00 - DDIM 25:25 - Dall E 2 26:37 - Conditioning 30:02 - Guidance 33:39 - Negative Prompts 34:27 - Outro 35:32 - About guest videos Special Thanks to: Jonathan Ho - Jonathan is the Author of the DDPM paper and the Classifier Free Guidance Paper. https://arxiv.org/pdf/2006.11239 https://arxiv.org/pdf/2207.12598 Preetum Nakkiran - Preetum has an excellent introductory diffusion tutorial: https://arxiv.org/pdf/2406.08929 Chenyang Yuan - Many of the animations in this video were implemented using manim and Chenyang’s smalldiffusion library: https://github.com/yuanchenyang/smalldiffusion Cheyang also has a terrific tutorial and MIT course on diffusion models https://www.chenyang.co/diffusion.html https://www.practical-diffusion.org/ Other References All of Sander Dieleman’s diffusion blog posts are fantastic: https://sander.ai/ CLIP Paper: https://arxiv.org/pdf/2103.00020 DDIM Paper: https://arxiv.org/pdf/2010.02502 Score-Based Generative Modeling: https://arxiv.org/pdf/2011.13456 Wan2.1: https://github.com/Wan-Video/Wan2.1 Stable Diffusion: https://huggingface.co/stabilityai/stable-diffusion-2 Midjourney: https://www.midjourney.com/ Veo: https://deepmind.google/models/veo/ DallE 2 paper: https://cdn.openai.com/papers/dall-e-2.pdf Code for this video: https://github.com/stephencwelch/manim_videos/tree/master/_2025/sora Written by: Stephen Welch, with very helpful feedback from Grant Sanderson Produced by: Stephen Welch, Sam Baskin, and Pranav Gundu Technical Notes The noise videos in the opening have been passed through a VAE (actually, diffusion process happens in a compressed “latent” space), which acts very much like a video compressor - this is why the noise videos don’t look like pure salt and pepper. 6:15 CLIP: Although directly minimizing cosine similarity would push our vectors 180 degrees apart on a single batch, overall in practice, we need CLIP to maximize the uniformity of concepts over the hypersphere it's operating on. For this reason, we animated these vectors as orthogonal-ish. See: https://proceedings.mlr.press/v119/wang20k/wang20k.pdf Per Chenyang Yuan: at 10:15, the blurry image that results when removing random noise in DDPM is probably due to a mismatch in noise levels when calling the denoiser. When the denoiser is called on x_{t-1} during DDPM sampling, it is expected to have a certain noise level (let's call it sigma_{t-1}). If you generate x_{t-1} from x_t without adding noise, then the noise present in x_{t-1} is always smaller than sigma_{t-1}. This causes the denoiser to remove too much noise, thus pointing towards the mean of the dataset. The text conditioning input to stable diffusion is not the 512-dim text embedding vector, but the output of the layer before that, [with dimension 77x512](https://stackoverflow.com/a/79243065) For the vectors at 31:40 - Some implementations use f(x, t, cat) + alpha(f(x, t, cat) - f(x, t)), and some that do f(x, t) + alpha(f(x, t, cat) - f(x, t)), where an alpha value of 1 corresponds to no guidance. I chose the second format here to keep things simpler. At 30:30, the unconditional t=1 vector field looks a bit different from what it did at the 17:15 mark. This is the result of different models trained for different parts of the video, and likely a result of different random initializations. Premium Beat Music ID: EEDYZ3FP44YX8OWT

AI Learning Tools Now

AI-recommended products based on this video

Loading...
Seasonic Focus V4 GX-1000 (ATX3) - 1000W - 80+ Gold - ATX 3.0 & PCIe 5.1 Ready -Full-Modular -ATX Form Factor -Premium Japanese Capacitor -10 Year Warranty -Nvidia RTX 30/40 Super & AMD GPU Compatible

Seasonic Focus V4 GX-1000 (ATX3) - 1000W - 80+ Gold - ATX 3.0 & PCIe 5.1 Ready -Full-Modular -ATX Form Factor -Premium Japanese Capacitor -10 Year Warranty -Nvidia RTX 30/40 Super & AMD GPU Compatible

(22)
$423.35
FREE delivery Oct 8 - 10
Loading...
Environet Hydroponic Growing Kit, Self-Watering Mason Jar Herb Garden Starter Kit Indoor, Windowsill Herb Garden, Grow Your Own Herbs from Organic Seeds (Basil)

Environet Hydroponic Growing Kit, Self-Watering Mason Jar Herb Garden Starter Kit Indoor, Windowsill Herb Garden, Grow Your Own Herbs from Organic Seeds (Basil)

(786)
$69.15$33.51
FREE delivery Jun 13 - 17
Loading...
Herb Garden Planter Indoor Kit 21Pcs Kitchen Herb Garden Starter Kit Growing Kit Including Wooden Box Burlap Pots Soil Discs Gardening Tools Unique Easter Birthday Christmas Gift Ideas for Women Mom

Herb Garden Planter Indoor Kit 21Pcs Kitchen Herb Garden Starter Kit Growing Kit Including Wooden Box Burlap Pots Soil Discs Gardening Tools Unique Easter Birthday Christmas Gift Ideas for Women Mom

(0)
$30.99
FREE delivery Jun 19 - Jul 3
Loading...
Bonsai Starter Kit – 1x Bonsai Tree | Complete Indoor Starter Kit for Growing Plants with Bonsai Seeds, Tools & Planters – Gardening Gifts for Women & Men
Best Seller

Bonsai Starter Kit – 1x Bonsai Tree | Complete Indoor Starter Kit for Growing Plants with Bonsai Seeds, Tools & Planters – Gardening Gifts for Women & Men

(243)
$19.99
FREE delivery Sat, Jun 14 on your first order
300+ bought in past month
Loading...
Freenove Ultimate Starter Kit for BBC micro bit (V2 Included), 316-Page Detailed Tutorial, 225 Items, 44 Projects, Blocks and Python Code

Freenove Ultimate Starter Kit for BBC micro bit (V2 Included), 316-Page Detailed Tutorial, 225 Items, 44 Projects, Blocks and Python Code

(382)
$94.95
PrimeFREE delivery Sat, Jun 14
Loading...
AI Translation Earbuds Real Time with Audio and Video Calls, 164 Language/7 Translation Modes Translator Earbuds Bluetooth 5.4 with APP, 50H Translate Ear Buds Device for Business/Learning/Travel

AI Translation Earbuds Real Time with Audio and Video Calls, 164 Language/7 Translation Modes Translator Earbuds Bluetooth 5.4 with APP, 50H Translate Ear Buds Device for Business/Learning/Travel

(249)
$40.84
Prime
100+ bought in past month
Loading...
AI Translation Earbuds Real Time with Audio and Video Calls, 164 Language/7 Translation Modes Translator Earbuds Bluetooth 5.4 with APP, 50H Translate Ear Buds Device for Business/Learning/Travel
Best Seller

AI Translation Earbuds Real Time with Audio and Video Calls, 164 Language/7 Translation Modes Translator Earbuds Bluetooth 5.4 with APP, 50H Translate Ear Buds Device for Business/Learning/Travel

(222)
$40.84
FREE delivery Fri, Oct 24
900+ bought in past month
Loading...
Ulefone Armor 28 Ultra 5G AI Rugged Smartphone 32GB+1TB, MTK Dimensity 9300+ 12-Core, 6.67” AMOLED+1.04” Sub-Screen, 50MP Triple Camera+64MP Night Vision 10600mAh 120W IP68/IP69K, 8K Video, WiFi 7

Ulefone Armor 28 Ultra 5G AI Rugged Smartphone 32GB+1TB, MTK Dimensity 9300+ 12-Core, 6.67” AMOLED+1.04” Sub-Screen, 50MP Triple Camera+64MP Night Vision 10600mAh 120W IP68/IP69K, 8K Video, WiFi 7

(2)
$1,199.99
FREE delivery Wed, Jun 18
Loading...
Amazfit Bip 6 Smart Watch 46mm, 14 Day Battery, 1.97" AMOLED Display, GPS & Free Maps, AI, Bluetooth Call & Text, Health, Fitness & Sleep Tracker, 140+ Workout Modes, 5 ATM Water-Resistance, Black

Amazfit Bip 6 Smart Watch 46mm, 14 Day Battery, 1.97" AMOLED Display, GPS & Free Maps, AI, Bluetooth Call & Text, Health, Fitness & Sleep Tracker, 140+ Workout Modes, 5 ATM Water-Resistance, Black

(4,211)
$104.92$73.49
FREE delivery Mon, Jan 26
100+ bought in past month
Loading...
AI Translation Earbuds Real Time with Audio and Video Calls, 164 Language/7 Translation Modes Translator Earbuds Bluetooth 5.4 with APP, 50H Translate Ear Buds Device for Business/Learning/Travel

AI Translation Earbuds Real Time with Audio and Video Calls, 164 Language/7 Translation Modes Translator Earbuds Bluetooth 5.4 with APP, 50H Translate Ear Buds Device for Business/Learning/Travel

(249)
$40.84
Prime
100+ bought in past month
Loading...
AI Translation Earbuds Real Time with Audio and Video Calls, 164 Language/7 Translation Modes Translator Earbuds Bluetooth 5.4 with APP, 50H Translate Ear Buds Device for Business/Learning/Travel
Best Seller

AI Translation Earbuds Real Time with Audio and Video Calls, 164 Language/7 Translation Modes Translator Earbuds Bluetooth 5.4 with APP, 50H Translate Ear Buds Device for Business/Learning/Travel

(222)
$40.84
FREE delivery Fri, Oct 24
900+ bought in past month
Loading...
Ulefone Armor 28 Ultra 5G AI Rugged Smartphone 32GB+1TB, MTK Dimensity 9300+ 12-Core, 6.67” AMOLED+1.04” Sub-Screen, 50MP Triple Camera+64MP Night Vision 10600mAh 120W IP68/IP69K, 8K Video, WiFi 7

Ulefone Armor 28 Ultra 5G AI Rugged Smartphone 32GB+1TB, MTK Dimensity 9300+ 12-Core, 6.67” AMOLED+1.04” Sub-Screen, 50MP Triple Camera+64MP Night Vision 10600mAh 120W IP68/IP69K, 8K Video, WiFi 7

(2)
$1,199.99
FREE delivery Wed, Jun 18