Why AI evals are the hottest new skill for product builders | Hamel Husain & Shreya Shankar
Lenny's Podcast
@lennyspodcastAbout
Interviews with world-class product leaders and growth experts to uncover concrete, actionable, and tactical advice to help you build, launch, and grow your own product.
Video Description
Hamel Husain and Shreya Shankar teach the world’s most popular course on AI evals and have trained over 2,000 PMs and engineers (including many teams at OpenAI and Anthropic). In this conversation, they demystify the process of developing effective evals, walk through real examples, and share practical techniques that’ll help you improve your AI product. *What you’ll learn:* 1. WTF evals are 2. Why they’ve become the most important new skill for AI product builders 3. A step-by-step walkthrough of how to create an effective eval 4. A deep dive into error analysis, open coding, and axial coding 5. Code-based evals vs. LLM-as-judge 6. The most common pitfalls and how to avoid them 7. Practical tips for implementing evals with minimal time investment (30 minutes per week after initial setup) 8. Insight into the debate between “vibes” and systematic evals *Brought to you by:* Fin—The #1 AI agent for customer service: https://fin.ai/lenny Dscout—The UX platform to capture insights at every stage: from ideation to production: https://www.dscout.com/ Mercury—The art of simplified finances: https://mercury.com/ *Transcript:* https://www.lennysnewsletter.com/p/why-ai-evals-are-the-hottest-new-skill *My biggest takeaways (for paid newsletter subscribers):* https://www.lennysnewsletter.com/i/173871171/my-biggest-takeaways-from-this-conversation *Where to find Shreya Shankar* • X: https://x.com/sh_reya • LinkedIn: https://www.linkedin.com/in/shrshnk/ • Website: https://www.sh-reya.com/ • Maven course: https://bit.ly/4myp27m *Where to find Hamel Husain* • X: https://x.com/HamelHusain • LinkedIn: https://www.linkedin.com/in/hamelhusain/ • Website: https://hamel.dev/ • Maven course: https://bit.ly/4myp27m *Where to find Lenny:* • Newsletter: https://www.lennysnewsletter.com • X: https://twitter.com/lennysan • LinkedIn: https://www.linkedin.com/in/lennyrachitsky/ *In this episode, we cover:* (00:00) Introduction to Hamel and Shreya (04:57) What are evals? (09:56) Demo: Examining real traces from a property management AI assistant (16:51) Writing notes on errors (23:54) Why LLMs can’t replace humans in the initial error analysis (25:16) The concept of a “benevolent dictator” in the eval process (28:07) Theoretical saturation: when to stop (31:39) Using axial codes to help categorize and synthesize error notes (44:39) The results (46:06) Building an LLM-as-judge to evaluate specific failure modes (48:31) The difference between code-based evals and LLM-as-judge (52:10) Example: LLM-as-judge (54:45) Testing your LLM judge against human judgment (01:00:51) Why evals are the new PRDs for AI products (01:05:09) How many evals you actually need (01:07:41) What comes after evals (01:09:57) The great evals debate (1:15:15) Why dogfooding isn’t enough for most AI products (01:18:23) OpenAI’s Statsig acquisition (1:23:02) The Claude Code controversy and the importance of context (01:24:13) Common misconceptions around evals (1:22:28) Tips and tricks for implementing evals effectively (1:30:37) The time investment (1:33:38) Overview of their comprehensive evals course (1:37:57) Lightning round and final thoughts *LLM Log Open Codes Analysis Prompt:* _Please analyze the following CSV file. There is a metadata field which has an nested field called z_note that contains open codes for analysis of LLM logs that we are conducting. Please extract all of the different open codes. From the _note field, propose 5-6 categories that we can create axial codes from._ *Referenced:* • Building eval systems that improve your AI product: https://www.lennysnewsletter.com/p/building-eval-systems-that-improve • Mercor: https://mercor.com/ • Brendan Foody on LinkedIn: https://www.linkedin.com/in/brendan-foody-2995ab10b • Nurture Boss: https://nurtureboss.io/ • Braintrust: https://www.braintrust.dev/ • Andrew Ng on X: https://x.com/andrewyng • Carrying Out Error Analysis: https://www.youtube.com/watch?v=JoAxZsdw_3w • Julius AI: https://julius.ai/ • Brendan Foody on X—“evals are the new PRDs”: https://x.com/BrendanFoody/status/1939764763485171948 ...References continued at: https://www.lennysnewsletter.com/p/why-ai-evals-are-the-hottest-new-skill *Recommended books:* • Pachinko: https://www.amazon.com/Pachinko-National-Book-Award-Finalist/dp/1455563935 • Apple in China: The Capture of the World’s Greatest Company: https://www.amazon.com/Apple-China-Capture-Greatest-Company/dp/1668053373/ • Machine Learning: https://www.amazon.com/Machine-Learning-Tom-M-Mitchell/dp/1259096955 • Artificial Intelligence: A Modern Approach: https://www.amazon.com/Artificial-Intelligence-Modern-Approach-Global/dp/1292401133/ _Production and marketing by https://penname.co/._ _For inquiries about sponsoring the podcast, email [email protected]._ Lenny may be an investor in the companies discussed.
AI Product Builder's Essentials
AI-recommended products based on this video

Skytech Archangel Gaming PC Desktop – AMD Ryzen 5 3600 3.6 GHz, NVIDIA RTX 3060, 1TB NVME SSD, 16GB DDR4 RAM 3200, 600W Gold PSU, 11AC Wi-Fi, Windows 11 Home 64-bit

Skytech Blaze 3.0 Gaming PC Desktop – Intel Core i5 12400F 2.5 GHz, NVIDIA RTX 3060, 500GB NVME SSD, 16GB DDR4 RAM 3200, 600W Gold PSU, 11AC Wi-Fi, Windows 11 Home 64-bit

MSI NVIDIA GeForce RTX 3050 Ventus 2X XS 8G OC Graphics Card - 8 GB GDDR6, 1807 MHz, PCI Express Gen 4, 128 Bits, DP v 1.4a, DL DVI-D, HDMI 2.1 (Supports 4K at 120Hz)

Asus Dual NVIDIA GeForce RTX 3050 6GB OC Edition Gaming Graphics Card - PCIe 4.0, 6GB GDDR6 Memory, HDMI 2.1, DisplayPort 1.4a, 2-Slot Design, Axial-tech Fan Design, 0dB Technology, Steel Bracket

ASUS TUF Gaming A15 Gaming Laptop, 15.6” 144Hz FHD Display, AMD Ryzen 5 7535HS Processor, GeForce RTX 2050, 8GB DDR5 RAM, 512GB PCIe SSD Gen 4, Wi-Fi 6, Windows 11, FA506NF-AS51-CA

ASUS TUF F16 16" WUXGA 165Hz Gaming Laptop, Intel i7-14650HX, NVIDIA GeForce RTX 5070 8GB, 16GB DDR5, 1TB SSD, Backlit Keyboard, Number Pad, IR Camera, Wi-Fi 6E, Win 11, Gray, 1TB Docking Station Set

Dell 24 Monitor - SE2425HM - 23.8-inch Full HD (1920x1080) 16:9 100Hz Display, IPS Panel, 16.70 Million Colors, Anti-Glare, 1 HDMI / 1 VGA Port, TÜV Rheinland 3-Star*, Comfortview Plus - Black

Dell S2425HS Monitor - 23.8 Inch, FHD (1920x1080) Display, 100Hz Refresh Rate 1500:1 Contrast Ratio, TÜV Rheinland Eye Comfort 4 Star, Integrated 2x5W Speaker, Height/Tilt/Swivel/Pivot - Ash White

USB C Docking Station Dual Monitor for Dell Hp,15-in-1 Laptop Docking Station 3 Monitors USB C Hub with Dual 4K HDMI,8K DP,Button,PD Charging,Ethernet,6 USB A&C,SD/TF, Audio USB-C Multiport Adapter

Dell 24 Monitor - P2425H EPEAT

Logitech M185 Wireless Mouse, 2.4GHz with USB Mini Receiver, 12-Month Battery Life, 1000 DPI Optical Tracking, Ambidextrous, Compatible with PC, Mac, Laptop - Black

Logitech G203 Wired Gaming Mouse, 8,000 DPI, Rainbow Optical Effect LIGHTSYNC RGB, 6 Programmable Buttons, On-Board Memory, Screen Mapping, PC/Mac Computer and Laptop Compatible - Black

Logitech G305 Lightspeed Wireless Gaming Mouse, Hero 12K Sensor, 12,000 DPI, Lightweight, 6 Programmable Buttons, 250h Battery Life, On-Board Memory, PC/Mac - Black

Logitech G502 Hero High Performance Wired Gaming Mouse, Hero 25K Sensor, 25,600 DPI, RGB, Adjustable Weights, 11 Programmable Buttons, On-Board Memory, PC/Mac, Black

CORSAIR iCUE Link XD5 RGB Elite LCD Pump-Reservoir Unit - D5 PWM Pump - 480x480 IPS LCD Screen - 22 Addressable RGB LEDs - 440ml Nylon Reservoir - White

CORSAIR iCUE Link XC7 RGB Elite CPU Water Block - Transparent Flow Chamber - 24 RGB LEDs - Fits Intel® LGA 1700, AMD® AM5 and Older - White

CORSAIR Hydro X Series iCUE Link XH405i Custom Cooling Kit – Hardline Water Cooling Loop – XC7 Elite CPU Water Block – XD5 Elite D5 Pump Res – XR5 360mm Radiator – 3X QX120 RGB Fans

Anker 332 USB-C Hub (5-in-1) with 4K HDMI Display, 5Gbps - and 2 5Gbps USB-A Data Ports and for MacBook Pro, MacBook Air, Dell XPS, Lenovo Thinkpad, HP Laptops and More

Anker Nano USB C Wall Charger,45W Fast Charging Smart Display Charger,with 180°Foldable Plug,Smart Recognition,Built-in Care Mode,for iPhone17/16/15 (Non-Battery,One USB-C Port,No Cable Included) ClimatePartner certified

Magnetic Nasal Strips Starter Kit: Comfortable Nasal Breathing Support for Sleep, Helps Reduce Snoring Noise, Includes 60 Tabs (30 Uses) with 4 Sizes

Environet Hydroponic Growing Kit, Self-Watering Mason Jar Herb Garden Starter Kit Indoor, Windowsill Herb Garden, Grow Your Own Herbs from Organic Seeds (Basil)

Herb Garden Planter Indoor Kit 21Pcs Kitchen Herb Garden Starter Kit Growing Kit Including Wooden Box Burlap Pots Soil Discs Gardening Tools Unique Easter Birthday Christmas Gift Ideas for Women Mom

Bonsai Starter Kit – 1x Bonsai Tree | Complete Indoor Starter Kit for Growing Plants with Bonsai Seeds, Tools & Planters – Gardening Gifts for Women & Men

Air Purifiers for Home Large Room up to 2200sq.ft with Washable Filters, EVALIT Air Purifiers for Bedroom with Fragrance, PM 2.5 Display Air Quality Sensor for Smoke Dust Odors, Ivory, 1Pack




















