o3: Pushing the boundaries of AGI (and of coding)

Dr Waku • January 1, 2025

Dr Waku

About

Hi, I am Dr Waku! I did a PhD in computer security, at an Ivy League institution. Now, I work as an AI research scientist. I am also interested in accessibility; I have a chronic health condition that has led to many challenges and adaptations. With my channel, I talk about the philosophy of artificial intelligence and technological advancements that affect all of us. I post new videos every week (usually on Sundays). Please feel free to ask questions in the comments. I respond to as many as I can, and enjoy thoughtful discussions there.

Latest Posts

How AI actually reasons: inside model thought processes

Dr Waku

Video Description

As part of its “12 days of OpenAI”, OpenAI announced the new o3 model. This model has surpassed previous records in reasoning, science and math competitions, and programming. The model even passes the ARC AGI benchmark with above human performance, when given enough compute. It seems clear that o3 is superhuman at reasoning. The question is, does this translate into sufficient generality to call it AGI (artificial general intelligence)? We do see tasks where reasoning plays a big part becoming more at risk of automation. Many software engineers in particular are worried about their jobs, since AI models will soon be able to perform large chunks of their work. We present five stages that automation proceeds through, with software engineering soon to be at stage four of five. While there are no answers about what to do, we do analyze what happened in games in the past where the best human players were beaten. #ai #agi #automation OpenAI o3 Breakthrough High Score on ARC-AGI-Pub https://arcprize.org/blog/oai-o3-pub-breakthrough OpenAI introduces o3 model family as a successor to o1 model focusing on AI reasoning https://www.techloy.com/openai-introduces-o3-model-family-as-successor-to-o1-model-focusing-on-ai-reasoning/ OpenAI announces new o3 models https://techcrunch.com/2024/12/20/openai-announces-new-o3-model/ Tech Things: AI Benchmarks, O3, and the Future of SWE https://theahura.substack.com/p/tech-things-ai-benchmarks-o3-and FrontierMath: A Benchmark for Evaluating Advanced Mathematical Reasoning in AI https://epoch.ai/frontiermath/the-benchmark OpenAI considers AGI clause removal for Microsoft investment https://finance.yahoo.com/news/openai-considers-agi-clause-removal-123030732.html @adcock_brett: “OpenAI announced 'o3', the next iteration of o1...” https://x.com/adcock_brett/status/1870877753081217359 @_florianmai: “o3 is better than 99.95% of programmers...” https://x.com/_florianmai/status/1870191021440512431 @TolgaBilge_: “OpenAI board member @adamdangelo..." https://x.com/TolgaBilge_/status/1870904304049217725 How I came in first on ARC-AGI-Pub using Sonnet 3.5 with Evolutionary Test-time Compute https://jeremyberman.substack.com/p/how-i-got-a-record-536-on-arc-agi 0:00 Intro 0:29 Contents 0:37 Part 1: Going exponential 0:48 Naming and announcement of o3 1:15 Twelve days of OpenAI 1:42 o1-preview was released in a rush 2:29 Three months between releases 3:13 Exponential trend of AI capabilities 3:50 Questions to address: AGI? Coding? 4:11 Part 2: A question of generality 4:16 ARC AGI benchmark analysis 4:53 General trend: very good at reasoning 5:24 Big question is generalization 6:03 Incentives to avoid declaring AGI 6:46 Creator of ARC benchmark on livestream 7:19 Quote from ARC about o3 approaching human level performance 7:56 o3 is superhuman at reasoning (and decent at programming) 8:52 Question of AGI is one of generality 9:17 ARC benchmark describes o3's inner workings 9:52 Outsourcing safety to reasoning (deliberative alignment) 10:09 o3 is enough to change the world 10:25 Part 3: Why developers are worried 10:41 Competitive programming competitions 11:17 CodeForces ranking: 175 out of 600,000 12:02 Grandmaster CodeForces ranking 12:25 Real-world SWE-bench verified 13:07 Fully autonomous coding agent gets 71.7% 14:03 Programming is on life support 14:12 Five stages of automation 14:30 Digital art and programming at stage 4 15:15 Anxiety about losing jobs 15:30 Recursive self-improvement 16:04 Job loss analogy to playing games 16:37 Examples of games beaten by AI 16:48 Example 1: Go was beaten by AlphaGo 17:57 While Lee Sedol quit Go, others kept playing 18:35 Example 2: StarCraft 2 beaten by AlphaStar 19:27 Takeaways from how game AIs evolved 19:48 Quote from Ahura: "computer science is dead" 20:08 My takeaways, with a book recommendation 20:47 Conclusion 21:30 o3 will cause massive disruption to software engineering 22:01 Outro

o3: Pushing the boundaries of AGI (and of coding)

Dr Waku

About

Latest Posts

How AI actually reasons: inside model thought processes

Video Description

You May Also Like

AI Dev Kit Essentials

Blink Outdoor 4 (newest model) – Wireless smart security camera, two-year battery life, 1080p HD day and infrared night live view, two-way talk. Sync Module Core included – 3 camera system Reducing CO2

Intel 21P02J00BA Vga Intel21p02j00ba A750 R

Lenovo IdeaPad 15.6&quot; FHD Touchscreen Laptop (40GB RAM, 1TB SSD, Intel 6-Core i3-1215U (&gt; i5-1135G7)), Narrow Bezel, Webcam, 10-Hour Long Battery Life, Wi-Fi 6, Win 11 Home S Mode, Cloud Grey

Lenovo 15.6&quot; Laptop, Intel 10-Core i7-13620H (Beat Ultra 7 255U), 40GB DDR4 RAM, 2TB PCIe SSD, WiFi 6, Bluetooth 5.2, FHD Computer for Business &amp; Student, RJ45, Type-C, HDMI, Windows 11 Pro, Wendbo

NOCO Boost GB40: 1000A UltraSafe Jump Starter – 12V Lithium Battery Booster Pack, Portable Jump Box, Power Bank &amp; Jumper Cables - for 6.0L Gas and 3.0L Diesel Engines

Aquasonic Black Series Ultra Whitening Toothbrush – ADA Accepted Electric Toothbrush- 8 Brush Heads &amp; Travel Case – 40,000 VPM Electric Motor &amp; Wireless Charging - 4 Modes w Smart Timer

Wireless Earbuds, Bluetooth 5.3 Headphones LED Power Display, 2025 New Wireless Earphones, 4 ENC Noise Cancelling Mic, HiFi Stereo, 48H Playtime Mini Case IP7 Waterproof for Sport Walk

Apple AirPods 4 Wireless Earbuds, Bluetooth Headphones, with Active Noise Cancellation, Adaptive Audio, Transparency Mode, Personalized Spatial Audio, USB-C Charging Case, Wireless Charging, H2 Chip

Apple AirPods 4 Wireless Earbuds, Bluetooth Headphones, Personalized Spatial Audio, Sweat and Water Resistant, USB-C Charging Case, H2 Chip, Up to 30 Hours of Battery Life, Effortless Setup for iPhone

Loading...

Lenovo IdeaPad 15.6" FHD Touchscreen Laptop (40GB RAM, 1TB SSD, Intel 6-Core i3-1215U (> i5-1135G7)), Narrow Bezel, Webcam, 10-Hour Long Battery Life, Wi-Fi 6, Win 11 Home S Mode, Cloud Grey

Lenovo 15.6" Laptop, Intel 10-Core i7-13620H (Beat Ultra 7 255U), 40GB DDR4 RAM, 2TB PCIe SSD, WiFi 6, Bluetooth 5.2, FHD Computer for Business & Student, RJ45, Type-C, HDMI, Windows 11 Pro, Wendbo

NOCO Boost GB40: 1000A UltraSafe Jump Starter – 12V Lithium Battery Booster Pack, Portable Jump Box, Power Bank & Jumper Cables - for 6.0L Gas and 3.0L Diesel Engines

Aquasonic Black Series Ultra Whitening Toothbrush – ADA Accepted Electric Toothbrush- 8 Brush Heads & Travel Case – 40,000 VPM Electric Motor & Wireless Charging - 4 Modes w Smart Timer