Learning at test time in LLMs [Jonas Hübotter]
Machine Learning Street Talk
View ChannelAbout
MLST is the leading highly technical AI podcast. Subscribe now! Welcome! We bring you the latest in advanced AI research, from the best AI experts in the world. Our approach is unrivalled in terms of scope and rigour – we believe in diversity of ideas (which is to say, not just LLMs!) and we also cover other promising alternative paths to AGI, as well as CogSci, CompSci, Neuro, Mathematics, Philosophy of Mind and Language. Support us on Patreon for early access, exclusive content, private Discord, biweekly calls and much more! https://www.patreon.com/mlst Donate here: https://www.paypal.com/donate/?hosted_button_id=K2TYRVPBGXVNA Please email us to learn about sponsorship packages and deals. tim at mlst.ai (please put your budget in the subject line) Podcast booking agencies - *don't contact us* - we wouldn't even interview anyone who needed a booking agent. Media/influence agencies - *don't contact us* - we only work directly with brands/sponsors.
Latest Posts
Video Description
Jonas Hübotter from ETH presents SIFT (Select Informative data for Fine-Tuning), a breakthrough algorithm that dramatically improves language model performance through test-time adaptation. Using intelligent data selection, SIFT achieves state-of-the-art results with a 3.8B parameter model - 30x smaller than previous approaches. The system combines a parametric controller with non-parametric memory to optimize training example selection, showing impressive results across mathematics, coding, and legal domains. This novel approach points toward more efficient and adaptable AI systems that can continuously improve through interaction. This was the first physical meetup of Tufa AI Labs, are you an ML researcher interested in joining or presenting at one of these sessions? Please get in touch with Benjamin Crouzier [email protected] - https://tufalabs.ai/ SLIDES: https://www.dropbox.com/scl/fi/sys3iasc63lgj8lm5t0ld/JONAS_SLIDES.pdf?rlkey=ak6ir61a2pyhrfuwyvgrdvq66&st=9cloopv9&dl=0 SPONSOR MESSAGE: CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. https://centml.ai/pricing/ Jonas Hübotter Doctoral Researcher at ETH Zurich working on Active Fine-Tuning and Local Learning. https://jonhue.github.io/ Test-Time Training on Nearest Neighbors for Large Language Models https://arxiv.org/abs/2305.18466 (IMPORTANT BACKGROUND READING) TOC: 1. SIFT Algorithm Core Concepts [00:00:00] 1.1 Introduction to Test-Time Adaptation and SIFT Algorithm [00:02:45] 1.2 The Pile Benchmark and Parameter Efficiency [00:07:00] 1.3 Local Learning Models and Vapnik's Principle [00:12:33] 1.4 SIFT Performance and Domain-Specific Comparisons 2. Training and Data Selection Methods [00:22:50] 2.1 Data Selection and Error Measurement Methods [00:32:33] 2.2 Non-IID Training Experiments on MNIST 3. Scaling and Implementation and Audience QnA [00:35:50] 3.1 Scaling Experiments to Larger Datasets and Models [00:42:30] 3.2 Model Scaling and Performance Across Architectures [00:44:25] 3.3 Exploration-Exploitation Trade-offs in Fine-tuning [00:47:54] 3.4 Two-Stage Local Learning Architecture and SIFT Implementation SHOWNOTES (transcript, references, best quotes etc): https://www.dropbox.com/scl/fi/os3ny3sy446u07yldz0zg/JONAS_PRESENTS.pdf?rlkey=kmu2pxfx8xbmiy283diof0kj1&st=xi5ouc9t&dl=0 REFS: [0:00:25] Paper: 'Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs' introducing SIFT algorithm for optimizing LLM performance through test-time fine-tuning (Jonas Hübotter, Sascha Bongni, Ido Hakimi, Andreas Krause) https://arxiv.org/pdf/2410.08020 [0:02:45] The Pile: An 800GB Dataset of Diverse Text for Language Modeling - A comprehensive dataset comprising 22 diverse high-quality subsets for training large-scale language models (Leo Gao et al.) https://arxiv.org/abs/2101.00027 [0:03:20] Language Models are Unsupervised Multitask Learners (GPT-2) - https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf [0:11:05] Vladimir Vapnik's principle from Statistical Learning Theory: 'When solving a problem of interest, do not solve a more general problem as an intermediate step. Try to get the answer that you really need, but not a more general one.' https://www.amazon.com/Statistical-Learning-Information-Science-Statistics/dp/0387987800 [0:22:05] Paper discussed at ICML 'The Linear Representation Hypothesis and the Geometry of Large Language Models' by Kiho Park et al. https://arxiv.org/abs/2311.03658 [0:23:20] On choosing and bounding probability metrics - Paper discussing Total Variation (TV) distance and its applications in probability theory (ALISON L. GIBBS AND FRANCIS EDWARD SU) https://arxiv.org/pdf/math/0209021 [0:33:25] MNIST dataset - Standard database of handwritten digits containing 60,000 training images and 10,000 test images of size 28x28 pixels (Yann LeCun, Corinna Cortes) https://yann.lecun.com/exdb/mnist/ [0:35:50] CIFAR-100 dataset - A dataset of 32x32 color images in 100 classes, with 600 images per class (Alex Krizhevsky) https://www.cs.toronto.edu/~kriz/cifar.html [0:36:00] ImageNet - Large-scale hierarchical image database with over 14 million images organized according to the WordNet hierarchy (Jia Deng et al) https://ieeexplore.ieee.org/document/5206848 [0:42:55] Llama 2: Collection of foundation and fine-tuned chat models ranging from 7B to 70B parameters (Hugo Touvron et al.) https://arxiv.org/abs/2307.09288 [0:43:35] Scaling Instruction-Finetuned Language Models - Paper introducing Flan-T5, showing performance improvements through instruction finetuning (Hyung Won Chung et al.) https://arxiv.org/abs/2210.11416 [0:45:10] Active Few-Shot Fine-Tuning methodology paper discussing exploration-exploitation trade-offs in the context of fine-tuning neural networks. (Jonas Hübotter et al.) https://arxiv.org/abs/2402.15898
Rain-Resistant Firestarter Kit
AI-recommended products based on this video




