Researchers Caught Their AI Model Trying to Escape
Species | Documenting AGI
@aispeciesAbout
Hey! I'm Drew. I spend hours per day reading about AI and distill the stuff I find interesting for you :) Videos every 2 weeks!
Latest Posts
Video Description
If this resonated with you, here’s how you can help today: https://campaign.controlai.com/take-action Sources: Apollo Research - "Frontier Models are Capable of In-context Scheming" https://arxiv.org/pdf/2412.04984 - Nobel laureate Geoffrey Hinton says there is evidence that AIs can be deliberately and intentionally deceptive https://www.youtube.com/watch?v=b_DUft-BdIE - Anthropic - “Alignment Faking in Large Language Models” https://assets.anthropic.com/m/983c85a201a962f/original/Alignment-Faking-in-Large-Language-Models-full-paper.pdf - Exclusive: New Research Shows AI Strategically Lying | TIME https://time.com/7202784/ai-research-strategic-lying/ - OpenAI's o1 model sure tries to deceive humans a lot | TechCrunch https://techcrunch.com/2024/12/05/openais-o1-model-sure-tries-to-deceive-humans-a-lot/ - OpenAI’s new model is better at reasoning and, occasionally, deceiving | The Verge https://www.theverge.com/2024/9/17/24243884/openai-o1-model-research-safety-alignment - OpenAI's o1 and other frontier AI models engage in scheming | Axios https://www.axios.com/2024/12/13/ai-reasoning-models-scheme-skills - New Anthropic study shows AI really doesn't want to be forced to change its views | TechCrunch https://techcrunch.com/2024/12/18/new-anthropic-study-shows-ai-really-doesnt-want-to-be-forced-to-change-its-views/ - Apollo Research - “Towards evaluations-based safety cases for AI scheming” https://arxiv.org/pdf/2411.03336 - Joe Carlsmith - “Scheming AIs” https://arxiv.org/pdf/2311.08379 - “Optimal Policies Tend to Seek Power” https://arxiv.org/abs/1912.01683 - When AI Thinks It Will Lose, It Sometimes Cheats, Study Finds | TIME https://time.com/7259395/ai-chess-cheating-palisade-research/ - Palisade Research - “Demonstrating specification gaming in reasoning models” https://arxiv.org/abs/2502.13295 - Claude Fights Back - by Scott Alexander - Astral Codex Ten https://www.astralcodexten.com/p/claude-fights-back - Takes on "Alignment Faking in Large Language Models" - Joe Carlsmith https://joecarlsmith.com/2024/12/18/takes-on-alignment-faking-in-large-language-models - Andrew Ng vs Yoshua Bengio | Davos 2025 https://www.youtube.com/watch?v=Y1BUaLo67ac - Jeffrey Ladish on unprompted specification gaming: https://x.com/JeffLadish/status/1872805453224448208 - Prof. Stuart Russell on California Live: https://youtu.be/QEGjCcU0FLs?si=pHcBZbGpj8Rxri5n&t=2694 - Eric Schmidt on ABC News https://abcnews.go.com/ThisWeek/video/1-1-eric-schmidt-116804931 This video took me a month to make, and I'm a small channel, so subscribing really helps out :)
AI Lab Must-Haves
AI-recommended products based on this video

MSI Thin 15 Gaming Laptop, 15.6" 144 Hz IPS Display, AMD Ryzen 9 8945HS, NVIDIA RTX 4060 8GB GDDR6, 32 GB DDR5, 2 TB SSD, with Windows 11 Pro, Office Pro Lifetime License, Mouse, USB C Flash Drive

Skytech Rampage Gaming PC, Intel i7 14700K 3.4 GHz (5.5GHz Turbo Boost), NVIDIA RTX 4070 Super 12GB GDDR6X, 2TB SSD, 32GB DDR5 RAM 5600 RGB, 750W Gold PSU, 360mm AIO, Wi-Fi, Win 11 Home

acer Nitro V 15 Gaming Laptop, 15.6" 144Hz FHD Display, Intel 10-Core i7-13620H, NVIDIA GeForce RTX 4060, 64 GB DDR5 RAM, 4 TB SSD, Backlit Keyboard, Microsoft Office Lifetime License, Windows 11 Pro

acer Nitro V 15 Gaming Laptop, Microsoft Office Lifetime License & Windows 11 Pro, 15.6" 144Hz FHD Display, Intel 10-Core i7-13620H, NVIDIA GeForce RTX 4060, 64GB DDR5 RAM, 4TB SSD, Backlit Keyboard

UGREEN USB C to USB Hub with 4 USB 3.0, Powered USB C Splitter for Laptop, MacBook Pro, Mac mini M4, iMac, iPad Pro, Chromebook, Dell XPS, Galaxy S23, and More, 0.5FT

COOWPS Switch Case for Nintendo Switch and Switch OLED Model, Portable Full Protection Carrying Travel Bag with 18 Game Cards Storage for Switch Console Pro Controller Accessories Black

Apple iPad 11-inch: A16 chip, 11-inch Model, Liquid Retina Display, 128GB, Wi-Fi 6, 12MP Front/12MP Back Camera, Touch ID, All-Day Battery Life — Blue EPEAT

Lenovo V15 Laptop | 15.6" FHD Anti-Glare Display | AMD Ryzen 7 7730U | 40GB RAM | 1TB PCIe SSD | HDMI | Type-C | Webcam | Wi-Fi | THUNDEROBOT Wireless Ergonomic Mouse | Windows 11 Pro | Black

acer Nitro V 15 Gaming Laptop, Intel i9-13900H Up to 5.4GHz, GeForce RTX 4060, 15.6" 144 Hz IPS Display, 64 GB DDR5, 4 TB SSD, Wi-Fi 6, Backlit Keyboard, Windows 11 Pro, Gaming Mouse, 256GB UFD

DAYBETTER LED Strip Lights 130ft Lights Strip for Bedroom, Desk, Indoor Room Bedroom Brithday Gifts RGB Decor with Remote and 24V Power Supply

Redragon RGPS-850W 80+ Gold 850 Watt ATX 3.0 & PCIe 5.0 Fully Modular Power Supply w/ 80 Plus Gold Certified, Compact 160mm Size, Smart ECO Low Noise RGB Fan, 100% Japanese Capacitors, Full Mod Cables




















