Transformers explained | The architecture behind LLMs

AI Coffee Break with Letitia β€’ January 21, 2024
Video Thumbnail
AI Coffee Break with Letitia Logo

AI Coffee Break with Letitia

View Channel

About

Lighthearted bite-sized ML videos for your AI Coffee Break! πŸ“Ί Mostly videos about the latest technical advancements in AI, such as large language models (LLMs), text-to-image models and everything cool in natural language processing, computer vision, etc.! We try to post twice a month! 🀞 But you know, Letitia has a full-time job, and Ms. Coffee Bean tends to enjoy time off to go out and have fun. πŸ˜„ Disclaimer: Opinions expressed are solely my own and do not express the views or opinions of my employer. Impressum: https://aicoffeebreak.com/impressum.html

Video Description

All you need to know about the transformer architecture: How to structure the inputs, attention (Queries, Keys, Values), positional embeddings, residual connections. Bonus: an overview of the difference between Recurrent Neural Networks (RNNs) and transformers. 9:19 Order of multiplication should be the opposite: x1(vector) * Wq(matrix) = q1(vector). Otherwise we do not get the 1x3 dimensionality at the end. Sorry for messing up the animation! Check this out for a super cool transformer visualisation! πŸ‘ https://poloclub.github.io/transformer-explainer/ ➑️ AI Coffee Break Merch! πŸ›οΈ https://aicoffeebreak.creator-spring.com/ Outline: 00:00 Transformers explained 00:47 Text inputs 02:29 Image inputs 03:57 Next word prediction / Classification 06:08 The transformer layer: 1. MLP sublayer 06:47 2. Attention explained 07:57 Attention vs. self-attention 08:35 Queries, Keys, Values 09:19 Order of multiplication should be the opposite: x1(vector) * Wq(matrix) = q1(vector). 11:26 Multi-head attention 13:04 Attention scales quadratically 13:53 Positional embeddings 15:11 Residual connections and Normalization Layers 17:09 Masked Language Modelling 17:59 Difference to RNNs Thanks to our Patrons who support us in Tier 2, 3, 4: πŸ™ Dres. Trost GbR, Siltax, Vignesh Valliappan, @Mutual_Information , Kshitij Our old Transformer explained πŸ“Ί video: https://youtu.be/FWFA4DGuzSc πŸ“Ί Tokenization explained: https://youtu.be/D8j1c4NJRfo πŸ“Ί Word embeddings: https://youtu.be/YkK5IKgxp-c πŸ“½οΈ Replacing Self-Attention: https://www.youtube.com/playlist?list=PLpZBeKTZRGPM8PNRyv6fNMcAW3dMDq_A- πŸ“½οΈ Position embeddings: https://www.youtube.com/playlist?list=PLpZBeKTZRGPOQtbCIES_0hAvwukcs-y-x @SerranoAcademy Transformer series: https://www.youtube.com/watch?v=OxCpWwDCDFQ&list=PLs8w1Cdi-zva4fwKkl9EK13siFvL9Wewf πŸ“„ Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. "Attention is all you need." Advances in neural information processing systems 30 (2017). β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€ πŸ”₯ Optionally, pay us a coffee to help with our Coffee Bean production! β˜• Patreon: https://www.patreon.com/AICoffeeBreak Ko-fi: https://ko-fi.com/aicoffeebreak β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€ πŸ”— Links: AICoffeeBreakQuiz: https://www.youtube.com/c/AICoffeeBreak/community Twitter: https://twitter.com/AICoffeeBreak Reddit: https://www.reddit.com/r/AICoffeeBreak/ YouTube: https://www.youtube.com/AICoffeeBreak #AICoffeeBreak #MsCoffeeBean #MachineLearning #AI #research​ Music 🎡 : Sunset n Beachz - Ofshane Video editing: Nils Trost

You May Also Like