BERT explained: Training, Inference, BERT vs GPT/LLamA, Fine tuning, [CLS] token
Umar Jamil
View ChannelAbout
I'm a Machine Learning Engineer from Milan, Italy, teaching complex deep learning and machine learning concepts to my cat, 奥利奥. 我也会中文.
Latest Posts
Video Description
Full explanation of the BERT model, including a comparison with other language models like LLaMA and GPT. I cover topics like: training, inference, fine tuning, Masked Language Models (MLM), Next Sentence Prediction (NSP), [CLS] token, sentence embedding, text classification, question answering, self-attention mechanism. Everything is visually explained step by step. I also review the background knowledge in order to understand BERT, by starting from an introduction to large language models (LLM) and the attention mechanism. Slides PDF: https://github.com/hkproj/bert-from-scratch BERT paper: https://arxiv.org/abs/1810.04805 Chapters 00:00 - Introduction 02:00 - Language Models 03:10 - Training (Language Models) 07:23 - Inference (Language Models) 09:15 - Transformer architecture (Encoder) 10:28 - Input Embeddings 14:17 - Positional Encoding 17:14 - Self-Attention and causal mask 29:14 - BERT (overview) 32:08 - BERT vs GPT/LLaMA 34:25 - Left context and right context 36:36 - BERT pre-training 37:05 - Masked Language Model 45:01 - [CLS] token 48:26 - BERT fine-tuning 49:00 - Text classification 50:50 - Question answering
NLP Powerhouses: Essential Tools
AI-recommended products based on this video

Seasonic Focus V4 GX-1000 (ATX3) - 1000W - 80+ Gold - ATX 3.0 & PCIe 5.1 Ready -Full-Modular -ATX Form Factor -Premium Japanese Capacitor -10 Year Warranty -Nvidia RTX 30/40 Super & AMD GPU Compatible



