Building makemore Part 2: MLP

Andrej Karpathy September 12, 2022
Video Thumbnail
Andrej Karpathy Logo

Andrej Karpathy

@andrejkarpathy

About

Video Description

We implement a multilayer perceptron (MLP) character-level language model. In this video we also introduce many basics of machine learning (e.g. model training, learning rate tuning, hyperparameters, evaluation, train/dev/test splits, under/overfitting, etc.). Links: - makemore on github: https://github.com/karpathy/makemore - jupyter notebook I built in this video: https://github.com/karpathy/nn-zero-to-hero/blob/master/lectures/makemore/makemore_part2_mlp.ipynb - collab notebook (new)!!!: https://colab.research.google.com/drive/1YIfmkftLrz6MPTOO9Vwqrop2Q5llHIGK?usp=sharing - Bengio et al. 2003 MLP language model paper (pdf): https://www.jmlr.org/papers/volume3/bengio03a/bengio03a.pdf - my website: https://karpathy.ai - my twitter: https://twitter.com/karpathy - (new) Neural Networks: Zero to Hero series Discord channel: https://discord.gg/3zy8kqD9Cp , for people who'd like to chat more and go beyond youtube comments Useful links: - PyTorch internals ref http://blog.ezyang.com/2019/05/pytorch-internals/ Exercises: - E01: Tune the hyperparameters of the training to beat my best validation loss of 2.2 - E02: I was not careful with the intialization of the network in this video. (1) What is the loss you'd get if the predicted probabilities at initialization were perfectly uniform? What loss do we achieve? (2) Can you tune the initialization to get a starting loss that is much more similar to (1)? - E03: Read the Bengio et al 2003 paper (link above), implement and try any idea from the paper. Did it work? Chapters: 00:00:00 intro 00:01:48 Bengio et al. 2003 (MLP language model) paper walkthrough 00:09:03 (re-)building our training dataset 00:12:19 implementing the embedding lookup table 00:18:35 implementing the hidden layer + internals of torch.Tensor: storage, views 00:29:15 implementing the output layer 00:29:53 implementing the negative log likelihood loss 00:32:17 summary of the full network 00:32:49 introducing F.cross_entropy and why 00:37:56 implementing the training loop, overfitting one batch 00:41:25 training on the full dataset, minibatches 00:45:40 finding a good initial learning rate 00:53:20 splitting up the dataset into train/val/test splits and why 01:00:49 experiment: larger hidden layer 01:05:27 visualizing the character embeddings 01:07:16 experiment: larger embedding size 01:11:46 summary of our final code, conclusion 01:13:24 sampling from the model 01:14:55 google collab (new!!) notebook advertisement

Boost Your MLP Skills Today

AI-recommended products based on this video

Loading...
Seasonic Focus V4 GX-1000 (ATX3) - 1000W - 80+ Gold - ATX 3.0 & PCIe 5.1 Ready -Full-Modular -ATX Form Factor -Premium Japanese Capacitor -10 Year Warranty -Nvidia RTX 30/40 Super & AMD GPU Compatible

Seasonic Focus V4 GX-1000 (ATX3) - 1000W - 80+ Gold - ATX 3.0 & PCIe 5.1 Ready -Full-Modular -ATX Form Factor -Premium Japanese Capacitor -10 Year Warranty -Nvidia RTX 30/40 Super & AMD GPU Compatible

(22)
$423.35
FREE delivery Oct 8 - 10
Loading...
(2 Pack) Outlet Wall Mount Holder for Google Home Mini and Google Nest Mini, A Space-Saving Accessories with Cord Management for Google Mini Smart Speaker, No Messy Wires or Screws

(2 Pack) Outlet Wall Mount Holder for Google Home Mini and Google Nest Mini, A Space-Saving Accessories with Cord Management for Google Mini Smart Speaker, No Messy Wires or Screws

(3,057)
$15.99
Prime
200+ bought in past month
Loading...
Graphics Card Fan 95MM PLD10010S12H DC12V RTX 3060 RTX 3060 Ti Eagle for RTX 3060 RTX 3060 Ti Eagle GPU Fan Computer Cooling Components(BA)

Graphics Card Fan 95MM PLD10010S12H DC12V RTX 3060 RTX 3060 Ti Eagle for RTX 3060 RTX 3060 Ti Eagle GPU Fan Computer Cooling Components(BA)

(0)
$61.64
Loading...
Laptop Parts 3Pcs/Set GA82S2H DIY GPU Fan Graphics Card Cooling for ZOTAC RTX 3060 12GD for GE PRO 2060 6G 3060Ti 8G GTX 1660 Super

Laptop Parts 3Pcs/Set GA82S2H DIY GPU Fan Graphics Card Cooling for ZOTAC RTX 3060 12GD for GE PRO 2060 6G 3060Ti 8G GTX 1660 Super

(0)
$56.61
FREE delivery Mar 19 - Apr 10
Loading...
Desktop Graphics Card, RTX2060 Super 8GB GDDR6 256bit, 1650MHz GPU 14000MHz Memory Clock, Dual Cooling Fan, for Gaming Video, DVI DisplayPort HD PCI Express 3.0

Desktop Graphics Card, RTX2060 Super 8GB GDDR6 256bit, 1650MHz GPU 14000MHz Memory Clock, Dual Cooling Fan, for Gaming Video, DVI DisplayPort HD PCI Express 3.0

(0)
$431.14
FREE delivery Feb 27 - Mar 11
Loading...
Cooling Fan 4PIN 85MM RTX 2060 2070 GPU for SOYO RTX2060 GTX1660 S Video Card Fans

Cooling Fan 4PIN 85MM RTX 2060 2070 GPU for SOYO RTX2060 GTX1660 S Video Card Fans

(0)
$159.83
FREE delivery Mar 19 - Apr 10
Loading...
Corsair MP600 PRO LPX 4TB M.2 NVMe PCIe x4 Gen4 SSD - Optimized for PS5 (Up to 7,100MB/sec Sequential Read & 6,800MB/sec Sequential Write Speeds, High-Speed Interface, Compact Form Factor) Black

Corsair MP600 PRO LPX 4TB M.2 NVMe PCIe x4 Gen4 SSD - Optimized for PS5 (Up to 7,100MB/sec Sequential Read & 6,800MB/sec Sequential Write Speeds, High-Speed Interface, Compact Form Factor) Black

(8,521)
$449.99
FREE delivery Tue, Jun 17
Loading...
KOORUI 24 Inch IPS Gaming Monitor, FHD 1920×1080p, 165Hz 200Hz Monitors, 1ms with Adaptive Sync, DCI-P3 90% Color Gamut, HDR400, 2 x HDMI 2.0, DisplayPort 1.4, Tilt Adjustable, G2411P

KOORUI 24 Inch IPS Gaming Monitor, FHD 1920×1080p, 165Hz 200Hz Monitors, 1ms with Adaptive Sync, DCI-P3 90% Color Gamut, HDR400, 2 x HDMI 2.0, DisplayPort 1.4, Tilt Adjustable, G2411P

(4,083)
$139.99$126.47
FREE delivery Tue, Aug 12
400+ bought in past month
Loading...
MSI NVIDIA GeForce RTX 3050 Ventus 2X XS 8G OC Graphics Card - 8 GB GDDR6, 1807 MHz, PCI Express Gen 4, 128 Bits, DP v 1.4a, DL DVI-D, HDMI 2.1 (Supports 4K at 120Hz)

MSI NVIDIA GeForce RTX 3050 Ventus 2X XS 8G OC Graphics Card - 8 GB GDDR6, 1807 MHz, PCI Express Gen 4, 128 Bits, DP v 1.4a, DL DVI-D, HDMI 2.1 (Supports 4K at 120Hz)

(760)
$1,020.45