Which LLM??? LLM Evaluation in Azure AI Foundry

Tech with Kirk September 10, 2025
Video Thumbnail
Tech with Kirk Logo

Tech with Kirk

@techwithkirk

About

Hi, I'm Kirk. I make hands-on content about AI systems, cloud-native tooling, and platform engineering. Subscribe if you're building smarter systems. If you find this content helpful, please consider buying me a coffee. Buy Me Coffee: https://buymeacoffee.com/techwithkirk

Video Description

Not all Large Language Models (LLMs) are created equal — so how do you know which one you can trust for your projects? In this video, I’ll walk you through how to evaluate LLMs using Azure AI Foundry. We’ll cover: Why evaluation matters (and what can go wrong if you skip it) What a golden dataset and grounded truths are Key evaluation metrics like semantic similarity, relevance, coherence, precision, recall, and F1 score How to use LLM-as-a-judge (yes, an AI judging another AI 🤯) Whether you’re building apps, chatbots, or AI pipelines, this will give you the tools to trust your model before deploying it. If this was helpful, don’t forget to like, subscribe, and hit the bell for more AI engineering content! #LLM #AI #Azure #MachineLearning #Evaluation 00:00 Evaluate Your LLM 00:19 Introduction 00:41 Why Evaluation Matters 01:03 LLM as Judge Concept 01:53 Evaluation Process 02:25 Evaluation Metrics 03:52 Practical Demo in Azure AI Foundry 09:45 Working with Datasets 11:23 Running Evaluations 19:43 Conclusion