Which LLM??? LLM Evaluation in Azure AI Foundry

Tech with Kirk • September 10, 2025

Tech with Kirk

@techwithkirk

About

Hi, I'm Kirk. I make hands-on content about AI systems, cloud-native tooling, and platform engineering. Subscribe if you're building smarter systems. If you find this content helpful, please consider buying me a coffee. Buy Me Coffee: https://buymeacoffee.com/techwithkirk

Latest Posts

PT4M

Build an MCP Server - Step by Step - in Microsoft Agent Framework

Tech with Kirk3 months ago

622

PT4M

Deploy Microsoft Foundry AI Apps in the Real World

Tech with Kirk3 months ago

714

PT4M

Run LLM Models Locally with Microsoft Foundry Local - A Full Tutorial

Tech with Kirk4 months ago

1259

PT4M

Agent Orchestration with Microsoft Agent Framework

Tech with Kirk4 months ago

753

Video Description

Not all Large Language Models (LLMs) are created equal — so how do you know which one you can trust for your projects? In this video, I’ll walk you through how to evaluate LLMs using Azure AI Foundry. We’ll cover: Why evaluation matters (and what can go wrong if you skip it) What a golden dataset and grounded truths are Key evaluation metrics like semantic similarity, relevance, coherence, precision, recall, and F1 score How to use LLM-as-a-judge (yes, an AI judging another AI 🤯) Whether you’re building apps, chatbots, or AI pipelines, this will give you the tools to trust your model before deploying it. If this was helpful, don’t forget to like, subscribe, and hit the bell for more AI engineering content! #LLM #AI #Azure #MachineLearning #Evaluation 00:00 Evaluate Your LLM 00:19 Introduction 00:41 Why Evaluation Matters 01:03 LLM as Judge Concept 01:53 Evaluation Process 02:25 Evaluation Metrics 03:52 Practical Demo in Azure AI Foundry 09:45 Working with Datasets 11:23 Running Evaluations 19:43 Conclusion