Gemini 2.5 Pro - It’s a Darn Smart Chatbot … (New Simple High Score)

AI Explained March 28, 2025
Video Thumbnail
AI Explained Logo

AI Explained

View Channel

About

Covering the biggest news of the century - the arrival of smarter-than-human AI. The author of Simple Bench, exposing the remaining human-LLM reasoning gap. Solo Developer of LM Council: lmcouncil.ai Join me at AI Insiders, with exclusive videos and a 1000+ network of AI enthusiasts and professionals: https://www.patreon.com/AIExplained

Video Description

Gemini gets a new record on Simple Bench, and several other benchmarks. I’ll go deep to explore its nuances, including how it deceptively reverse engineers answers, does better on certain coding benchmarks than others, may have a universal ‘conceptual language’ … https://weave-docs.wandb.ai/?utm_source=sponsorship&utm_medium=simple_bench&utm_campaign=ai_explained … and more. Plus practical tips, a note on security and Kling vs Veo 2 guest appearance. AI Insiders ($9!): https://www.patreon.com/AIExplained Chapters: 00:00 - Introduction 00:36 - Fiction Bench 02:41 - Practicality - YouTube urls + Security - cut-off date 03:42 - Coding 06:22 - WeirdML Bench 07:01 - Simple Bench Record High 11:23 - Reverse Engineering! 13:22 - Anthropic Paper 17:49 - 3 Caveats Gemini 2.5 Updated: https://deepmind.google/technologies/gemini/ Fiction Live Bench: https://fiction.live/stories/Fiction-liveBench-Feb-19-2025/oQdzQvKHw8JyXbN87 https://simple-bench.com/ WeirdML: https://htihle.github.io/weirdml.html https://x.com/htihle/status/1905014058228625542 Anthropic Thoughts: https://www.anthropic.com/research/tracing-thoughts-language-model https://transformer-circuits.pub/2025/attribution-graphs/biology.html#dives-cot https://aistudio.google.com/prompts/new_chat Search Study: https://www.cjr.org/tow_center/we-compared-eight-ai-search-engines-theyre-all-bad-at-citing-news.php Live bench: https://livebench.ai/#/ Paper: https://arxiv.org/pdf/2406.19314 LiveCode Bench: https://livecodebench.github.io/ SWE-Verified: https://arxiv.org/pdf/2310.06770 Non-hype Newsletter: https://signaltonoise.beehiiv.com/ Podcast: https://aiexplainedopodcast.buzzsprout.com/

You May Also Like