Researchers Caught Their AI Model Trying to Escape
Species | Documenting AGI
View ChannelAbout
Hey! I'm Drew. I spend hours per day reading about AI and distill the stuff I find interesting for you :) Videos every 2 weeks!
Latest Posts
Video Description
If this resonated with you, here’s how you can help today: https://campaign.controlai.com/take-action Sources: Apollo Research - "Frontier Models are Capable of In-context Scheming" https://arxiv.org/pdf/2412.04984 - Nobel laureate Geoffrey Hinton says there is evidence that AIs can be deliberately and intentionally deceptive https://www.youtube.com/watch?v=b_DUft-BdIE - Anthropic - “Alignment Faking in Large Language Models” https://assets.anthropic.com/m/983c85a201a962f/original/Alignment-Faking-in-Large-Language-Models-full-paper.pdf - Exclusive: New Research Shows AI Strategically Lying | TIME https://time.com/7202784/ai-research-strategic-lying/ - OpenAI's o1 model sure tries to deceive humans a lot | TechCrunch https://techcrunch.com/2024/12/05/openais-o1-model-sure-tries-to-deceive-humans-a-lot/ - OpenAI’s new model is better at reasoning and, occasionally, deceiving | The Verge https://www.theverge.com/2024/9/17/24243884/openai-o1-model-research-safety-alignment - OpenAI's o1 and other frontier AI models engage in scheming | Axios https://www.axios.com/2024/12/13/ai-reasoning-models-scheme-skills - New Anthropic study shows AI really doesn't want to be forced to change its views | TechCrunch https://techcrunch.com/2024/12/18/new-anthropic-study-shows-ai-really-doesnt-want-to-be-forced-to-change-its-views/ - Apollo Research - “Towards evaluations-based safety cases for AI scheming” https://arxiv.org/pdf/2411.03336 - Joe Carlsmith - “Scheming AIs” https://arxiv.org/pdf/2311.08379 - “Optimal Policies Tend to Seek Power” https://arxiv.org/abs/1912.01683 - When AI Thinks It Will Lose, It Sometimes Cheats, Study Finds | TIME https://time.com/7259395/ai-chess-cheating-palisade-research/ - Palisade Research - “Demonstrating specification gaming in reasoning models” https://arxiv.org/abs/2502.13295 - Claude Fights Back - by Scott Alexander - Astral Codex Ten https://www.astralcodexten.com/p/claude-fights-back - Takes on "Alignment Faking in Large Language Models" - Joe Carlsmith https://joecarlsmith.com/2024/12/18/takes-on-alignment-faking-in-large-language-models - Andrew Ng vs Yoshua Bengio | Davos 2025 https://www.youtube.com/watch?v=Y1BUaLo67ac - Jeffrey Ladish on unprompted specification gaming: https://x.com/JeffLadish/status/1872805453224448208 - Prof. Stuart Russell on California Live: https://youtu.be/QEGjCcU0FLs?si=pHcBZbGpj8Rxri5n&t=2694 - Eric Schmidt on ABC News https://abcnews.go.com/ThisWeek/video/1-1-eric-schmidt-116804931 This video took me a month to make, and I'm a small channel, so subscribing really helps out :)
AI Lab Must-Haves
AI-recommended products based on this video

MSI Thin 15 Gaming Laptop, 15.6" 144 Hz IPS Display, AMD Ryzen 9 8945HS, NVIDIA RTX 4060 8GB GDDR6, 32 GB DDR5, 2 TB SSD, with Windows 11 Pro, Office Pro Lifetime License, Mouse, USB C Flash Drive

Skytech Rampage Gaming PC, Intel i7 14700K 3.4 GHz (5.5GHz Turbo Boost), NVIDIA RTX 4070 Super 12GB GDDR6X, 2TB SSD, 32GB DDR5 RAM 5600 RGB, 750W Gold PSU, 360mm AIO, Wi-Fi, Win 11 Home

acer Nitro V 15 Gaming Laptop, 15.6" 144Hz FHD Display, Intel 10-Core i7-13620H, NVIDIA GeForce RTX 4060, 64 GB DDR5 RAM, 4 TB SSD, Backlit Keyboard, Microsoft Office Lifetime License, Windows 11 Pro

acer Nitro V 15 Gaming Laptop, Microsoft Office Lifetime License & Windows 11 Pro, 15.6" 144Hz FHD Display, Intel 10-Core i7-13620H, NVIDIA GeForce RTX 4060, 64GB DDR5 RAM, 4TB SSD, Backlit Keyboard

UGREEN USB C to USB Hub with 4 USB 3.0, Powered USB C Splitter for Laptop, MacBook Pro, Mac mini M4, iMac, iPad Pro, Chromebook, Dell XPS, Galaxy S23, and More, 0.5FT

Blink Outdoor 4 (newest model) – Wireless smart security camera, two-year battery life, 1080p HD day and infrared night live view, two-way talk. Sync Module Core included – 3 camera system Reducing CO2

Lenovo V15 Laptop | 15.6" FHD Anti-Glare Display | AMD Ryzen 7 7730U | 40GB RAM | 1TB PCIe SSD | HDMI | Type-C | Webcam | Wi-Fi | THUNDEROBOT Wireless Ergonomic Mouse | Windows 11 Pro | Black

acer Nitro V 15 Gaming Laptop, Intel i9-13900H Up to 5.4GHz, GeForce RTX 4060, 15.6" 144 Hz IPS Display, 64 GB DDR5, 4 TB SSD, Wi-Fi 6, Backlit Keyboard, Windows 11 Pro, Gaming Mouse, 256GB UFD

DAYBETTER LED Strip Lights 130ft Lights Strip for Bedroom, Desk, Indoor Room Bedroom Brithday Gifts RGB Decor with Remote and 24V Power Supply

Redragon RGPS-850W 80+ Gold 850 Watt ATX 3.0 & PCIe 5.0 Fully Modular Power Supply w/ 80 Plus Gold Certified, Compact 160mm Size, Smart ECO Low Noise RGB Fan, 100% Japanese Capacitors, Full Mod Cables




















