Retrieval Augmented Generation (RAG) Explained: Embedding, Sentence BERT, Vector Database (HNSW)

Umar Jamil • November 27, 2023

Umar Jamil

About

I'm a Machine Learning Engineer from Milan, Italy, teaching complex deep learning and machine learning concepts to my cat, 奥利奥. 我也会中文.

Latest Posts

Flash Attention derived and coded from first principles with Triton (Python)

Umar Jamil

How diffusion models work - explanation and code!

Umar Jamil

Quantization explained with PyTorch - Post-Training Quantization, Quantization-Aware Training

Umar Jamil

Video Description

Get your 5$ coupon for Gradient: https://gradient.1stcollab.com/umarjamilai In this video we explore the entire Retrieval Augmented Generation pipeline. I will start by reviewing language models, their training and inference, and then explore the main ingredient of a RAG pipeline: embedding vectors. We will see what are embedding vectors, how they are computed, and how we can compute embedding vectors for sentences. We will also explore what is a vector database, while also exploring the popular HNSW (Hierarchical Navigable Small Worlds) algorithm used by vector databases to find embedding vectors given a query. Download the PDF slides: https://github.com/hkproj/retrieval-augmented-generation-notes Sentence BERT paper: https://arxiv.org/pdf/1908.10084.pdf Chapters 00:00 - Introduction 02:22 - Language Models 04:33 - Fine-Tuning 06:04 - Prompt Engineering (Few-Shot) 07:24 - Prompt Engineering (QA) 10:15 - RAG pipeline (introduction) 13:38 - Embedding Vectors 19:41 - Sentence Embedding 23:17 - Sentence BERT 28:10 - RAG pipeline (review) 29:50 - RAG with Gradient 31:38 - Vector Database 33:11 - K-NN (Naive) 35:16 - Hierarchical Navigable Small Worlds (Introduction) 35:54 - Six Degrees of Separation 39:35 - Navigable Small Worlds 43:08 - Skip-List 45:23 - Hierarchical Navigable Small Worlds 47:27 - RAG pipeline (review) 48:22 - Closing

Retrieval Augmented Generation (RAG) Explained: Embedding, Sentence BERT, Vector Database (HNSW)

Umar Jamil

About

Latest Posts

Flash Attention derived and coded from first principles with Triton (Python)

How diffusion models work - explanation and code!

Quantization explained with PyTorch - Post-Training Quantization, Quantization-Aware Training

Video Description

You May Also Like

Boost Your RAG Setup Today

Space Exploration Shuttle Building Toys, 6-in-1 Rocketship STEM Toys for 6 8 9 10 11 12 Year Old, Aerospace Model with Rocket and Launcher Building Block Kit Best Gift for Boys

Seasonic Focus V4 GX-1000 (ATX3) - 1000W - 80+ Gold - ATX 3.0 &amp; PCIe 5.1 Ready -Full-Modular -ATX Form Factor -Premium Japanese Capacitor -10 Year Warranty -Nvidia RTX 30/40 Super &amp; AMD GPU Compatible

Turtle Beach Stealth Ultra High-Performance Wireless Gaming Controller Licensed for Xbox Series X|S, Xbox One, Windows PC &amp; Android – LED Dashboard, Charge Dock, RGB Lighting, 30-Hr Battery, Bluetooth

Loading...

Seasonic Focus V4 GX-1000 (ATX3) - 1000W - 80+ Gold - ATX 3.0 & PCIe 5.1 Ready -Full-Modular -ATX Form Factor -Premium Japanese Capacitor -10 Year Warranty -Nvidia RTX 30/40 Super & AMD GPU Compatible

Turtle Beach Stealth Ultra High-Performance Wireless Gaming Controller Licensed for Xbox Series X|S, Xbox One, Windows PC & Android – LED Dashboard, Charge Dock, RGB Lighting, 30-Hr Battery, Bluetooth