PC97.com - Plus+ Channels for Creators and Shoppers

Reinforcement Learning from Human Feedback explained with math derivations and the PyTorch code.

50k

PT1H26M21S

Mistral / Mixtral Explained: Sliding Window Attention, Sparse Mixture of Experts, Rolling Buffer

Umar JamilDec 27, 2023

35k

PT1H12M53S

Distributed Training with PyTorch: complete tutorial with cloud infrastructure and code

Umar JamilDec 19, 2023

27k

PT49M24S

Retrieval Augmented Generation (RAG) Explained: Embedding, Sentence BERT, Vector Database (HNSW)

Umar JamilNov 27, 2023

74k

PT54M52S

BERT explained: Training, Inference, BERT vs GPT/LLamA, Fine tuning, [CLS] token

Umar JamilOct 26, 2023

63k

PT1H10M55S

LLaMA explained: KV-Cache, Rotary Positional Embedding, RMS Norm, Grouped Query Attention, SwiGLU

Umar JamilAug 24, 2023

95k

PT26M55S

LoRA: Low-Rank Adaptation of Large Language Models - Explained visually + PyTorch code from scratch

Umar JamilJul 25, 2023

39k

PT27M12S

Variational Autoencoder - Model, ELBO, loss function and maths explained easily!

Umar JamilJun 7, 2023

51k

PT2H59M24S

Coding a Transformer from scratch on PyTorch, with full explanation, training and inference.

Umar JamilMay 25, 2023

285k

Featured Results

Titans: Learning to Memorize at Test Time

Direct Preference Optimization (DPO) explained: Bradley-Terry model, log probabilities, math

Reinforcement Learning from Human Feedback explained with math derivations and the PyTorch code.

Mistral / Mixtral Explained: Sliding Window Attention, Sparse Mixture of Experts, Rolling Buffer

Distributed Training with PyTorch: complete tutorial with cloud infrastructure and code

Retrieval Augmented Generation (RAG) Explained: Embedding, Sentence BERT, Vector Database (HNSW)

BERT explained: Training, Inference, BERT vs GPT/LLamA, Fine tuning, [CLS] token

LLaMA explained: KV-Cache, Rotary Positional Embedding, RMS Norm, Grouped Query Attention, SwiGLU

LoRA: Low-Rank Adaptation of Large Language Models - Explained visually + PyTorch code from scratch

Variational Autoencoder - Model, ELBO, loss function and maths explained easily!

Coding a Transformer from scratch on PyTorch, with full explanation, training and inference.

Featured Results

Titans: Learning to Memorize at Test Time

Direct Preference Optimization (DPO) explained: Bradley-Terry model, log probabilities, math

Reinforcement Learning from Human Feedback explained with math derivations and the PyTorch code.

Mistral / Mixtral Explained: Sliding Window Attention, Sparse Mixture of Experts, Rolling Buffer

Distributed Training with PyTorch: complete tutorial with cloud infrastructure and code

Retrieval Augmented Generation (RAG) Explained: Embedding, Sentence BERT, Vector Database (HNSW)

BERT explained: Training, Inference, BERT vs GPT/LLamA, Fine tuning, [CLS] token

LLaMA explained: KV-Cache, Rotary Positional Embedding, RMS Norm, Grouped Query Attention, SwiGLU

LoRA: Low-Rank Adaptation of Large Language Models - Explained visually + PyTorch code from scratch

Variational Autoencoder - Model, ELBO, loss function and maths explained easily!

Coding a Transformer from scratch on PyTorch, with full explanation, training and inference.

Loading...