François Chollet on OpenAI o-models and ARC
About
No channel description available.
Video Description
François Chollet discusses the outcomes of the ARC-AGI (Abstraction and Reasoning Corpus) Prize competition in 2024, where accuracy rose from 33% to 55.5% on a private evaluation set. They explore two core solution paradigms—program synthesis (induction) and direct prediction ("transduction")—and how successful solutions combine both. Chollet emphasizes that human-like reasoning requires both fuzzy pattern matching (deep learning) and discrete, step-by-step symbolic processes. He also reveals his departure from Google to establish a new research lab focused on program synthesis, and provides insights into the next-generation ARC-2 benchmark. SPONSOR MESSAGES: *** CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. https://centml.ai/pricing/ Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. Are you interested in working on reasoning, or getting involved in their events? They are hosting an event in Zurich on January 9th with the ARChitects, join if you can. Goto https://tufalabs.ai/ *** Read about the recent result on o3 with ARC here (Chollet knew about it at the time of the interview but wasn't allowed to say): https://arcprize.org/blog/oai-o3-pub-breakthrough TOC: 1. Introduction [00:00:00] 1.1 Is o1 reasoning? 2. Interview Starts: ARC Competition 2024 Results and Evolution [00:02:30] 2.1 ARC Prize 2024: Reflecting on the Narrative Shift Toward System 2 [00:05:33] 2.2 Comparing Private Leaderboard vs. Public Leaderboard Solutions [00:08:21] 2.3 Two Winning Approaches: Deep Learning–Guided Program Synthesis and Test-Time Training 3. Transduction vs. Induction in ARC [00:11:08] 3.1 Test-Time Training, Overfitting Concerns, and Developer-Aware Generalization [00:14:39] 3.2 Gradient Descent Adaptation vs. Discrete Program Search 4. ARC-2 Development and Future Directions [00:18:55] 4.1 Ensemble Methods, Benchmark Flaws, and the Need for ARC-2 [00:20:39] 4.2 Human-Level Performance Metrics and Private Test Sets [00:24:48] 4.3 Task Diversity, Redundancy Issues, and Expanded Evaluation Methodology 5. Program Synthesis Approaches [00:25:22] 5.1 Induction vs. Transduction: Different Solutions for Different Task Types [00:27:15] 5.2 Challenges of Writing Algorithms for Perceptual vs. Algorithmic Tasks [00:29:27] 5.3 Combining Induction and Transduction (Kevin Ellis's Paper) [00:32:09] 5.4 Multi-View Insight and Overfitting Regulation 6. Latent Space and Graph-Based Synthesis [00:33:21] 6.1 Clément Bonnet's Latent Program Search Approach [00:35:14] 6.2 Decoding to Symbolic Form and Local Discrete Search [00:36:19] 6.3 Graph of Operators vs. Token-by-Token Code Generation [00:40:54] 6.4 Iterative Program Graph Modifications and Reusable Functions 7. Compute Efficiency and Lifelong Learning [00:43:09] 7.1 Symbolic Process for Architecture Generation [00:45:37] 7.2 Logarithmic Relationship of Compute and Accuracy [00:47:24] 7.3 Learning New Building Blocks for Future Tasks 8. AI Reasoning and Future Development [00:48:19] 8.1 Consciousness as a Self-Consistency Mechanism in Iterative Reasoning [00:51:34] 8.2 Reconciling Symbolic and Connectionist Views [00:55:17] 8.3 System 2 Reasoning Necessitates Awareness and Consistency [00:58:09] 8.4 Novel Problem Solving, Abstraction, and Reusability 9. Program Synthesis and Research Lab [01:00:57] 9.1 François Leaving Google to Focus on Program Synthesis [01:04:59] 9.2 Democratizing Programming and Natural Language Instruction 10. Frontier Models and O1 Architecture [01:09:42] 10.1 Search-Based Chain of Thought vs. Standard Forward Pass [01:11:59] 10.2 o1's Natural Language Program Generation and Test-Time Compute Scaling [01:14:39] 10.3 Logarithmic Gains with Deeper Search 11. ARC Evaluation and Human Intelligence [01:17:59] 11.1 LLMs as Guessing Machines and Agent Reliability Issues [01:20:06] 11.2 ARC-2 Human Testing and Correlation with g-Factor [01:21:20] 11.3 Closing Remarks and Future Directions SHOWNOTES PDF: https://www.dropbox.com/scl/fi/epf2pysdd9uxc77c9shqr/CHOLLETNEURIPS2.pdf?rlkey=9knnyuj9o28ke7qpezrtmspsd&dl=0
Upgrade Your ML Toolkit
AI-recommended products based on this video

Skytech Archangel Gaming PC Desktop – AMD Ryzen 5 3600 3.6 GHz, NVIDIA RTX 3060, 1TB NVME SSD, 16GB DDR4 RAM 3200, 600W Gold PSU, 11AC Wi-Fi, Windows 11 Home 64-bit

Skytech Blaze 3.0 Gaming PC Desktop – Intel Core i5 12400F 2.5 GHz, NVIDIA RTX 3060, 500GB NVME SSD, 16GB DDR4 RAM 3200, 600W Gold PSU, 11AC Wi-Fi, Windows 11 Home 64-bit

MSI NVIDIA GeForce RTX 3050 Ventus 2X XS 8G OC Graphics Card - 8 GB GDDR6, 1807 MHz, PCI Express Gen 4, 128 Bits, DP v 1.4a, DL DVI-D, HDMI 2.1 (Supports 4K at 120Hz)

Asus Dual NVIDIA GeForce RTX 3050 6GB OC Edition Gaming Graphics Card - PCIe 4.0, 6GB GDDR6 Memory, HDMI 2.1, DisplayPort 1.4a, 2-Slot Design, Axial-tech Fan Design, 0dB Technology, Steel Bracket

【DDR3 RAM Laptop Only】 GIGASTONE 16GB Kit (2x8GB) DDR3/DDR3L 1600MHz (1333MHz) PC3-12800 (PC3-10600) CL11 1.35V/1.5V 2Rx8 SODIMM 204 Pin Unbuffered Non ECC High Performance Notebook Memory Upgrade

TEAMGROUP Elite DDR4 16GB Kit (2 x 8GB) 3200MHz PC4-25600 CL22 Unbuffered Non-ECC 1.2V SODIMM 260-Pin Laptop Notebook PC Computer Memory Module Ram Upgrade - TED416G3200C22DC-S01

ASUS ROG Strix G16 Gaming Laptop, GeForce RTX 5070 Ti 12GB GDDR7, AMD Ryzen 9 8940HX, 64GB DDR5, 2TB SSD, Backlit Keyboard, Wi-Fi 6E, 16" WUXGA 165Hz Display, Win 11, Gray, 1TB Docking Station Set

Lenovo Legion Pro 5 16" WQXGA 165Hz Gaming Laptop, AMD Ryzen 9 9955HX, GeForce RTX 5070, 32GB DDR5, 3TB Storage (2TB SSD+1TB Docking Station Set), 24-Zone RGB Backlit Keyboard, WiFi 7, Win 11, Black

GEEKOM A7 Max Mini PC (Flagship Level Performance), AMD Ryzen 9 7940HS (Up to 5.2GHz),Mini Computer 16GB DDR5 RAM and 1TB PCIe 4.0 SSD, Windows 11 Pro, Quad 8K Display, USB4, Dual 2.5G LAN, Wi-Fi 6E

![Abstraction & Idealization: AI's Plato Problem [Mazviita Chirimuuta]](https://imgz.pc97.com/?width=500&fit=cover&image=https://i.ytimg.com/vi/yq318DIwPqw/hqdefault.jpg)
![Why Every Brain Metaphor in History Has Been Wrong [SPECIAL EDITION]](https://imgz.pc97.com/?width=500&fit=cover&image=https://i.ytimg.com/vi/pO0WZsN8Oiw/hqdefault.jpg)
![AutoGrad Changed Everything (Not Transformers) [Dr. Jeff Beck]](https://imgz.pc97.com/?width=500&fit=cover&image=https://i.ytimg.com/vi/9suqiofCiwM/hqdefault.jpg)
![Why Scientists Can't Rebuild a Polaroid Camera [César Hidalgo]](https://imgz.pc97.com/?width=500&fit=cover&image=https://i.ytimg.com/vi/vzpFOJRteeI/hqdefault.jpg)

![Why High Benchmark Scores Don’t Mean Better AI [SPONSORED]](https://imgz.pc97.com/?width=500&fit=cover&image=https://i.ytimg.com/vi/rqiC9a2z8Io/hqdefault.jpg)
![The Mathematical Foundations of Intelligence [Professor Yi Ma]](https://imgz.pc97.com/?width=500&fit=cover&image=https://i.ytimg.com/vi/QWidx8cYVRs/hqdefault.jpg)

![Tensor Logic "Unifies" AI Paradigms [Pedro Domingos]](https://imgz.pc97.com/?width=500&fit=cover&image=https://i.ytimg.com/vi/4APMGvicmxY/hqdefault.jpg)

![He Co-Invented the Transformer. Now: Continuous Thought Machines [Llion Jones / Luke Darlow]](https://imgz.pc97.com/?width=500&fit=cover&image=https://i.ytimg.com/vi/DtePicx_kFY/hqdefault.jpg)


![We Built Calculators Because We're STUPID! [Prof. David Krakauer]](https://imgz.pc97.com/?width=500&fit=cover&image=https://i.ytimg.com/vi/dY46YsGWMIc/hqdefault.jpg)
![Why Humans Are Still Powering AI [Sponsored] - Phelim Bradley](https://imgz.pc97.com/?width=500&fit=cover&image=https://i.ytimg.com/vi/R11ESdfVX64/hqdefault.jpg)
![The Universal Hierarchy of Life - Prof. Chris Kempes [SFI]](https://imgz.pc97.com/?width=500&fit=cover&image=https://i.ytimg.com/vi/iwClZ-7OweY/hqdefault.jpg)

![Google Researcher Shows Life "Emerges From Code" [Blaise Agüera y Arcas]](https://imgz.pc97.com/?width=500&fit=cover&image=https://i.ytimg.com/vi/rMSEqJ_4EBk/hqdefault.jpg)
![AI training data will never be fully synthetic [SPONSORED]](https://imgz.pc97.com/?width=500&fit=cover&image=https://i.ytimg.com/vi/cnxZZTl1tkk/hqdefault.jpg)
![AI Agents can write 10,000 lines of hacking code in seconds [Dr. Ilia Shumailov]](https://imgz.pc97.com/?width=500&fit=cover&image=https://i.ytimg.com/vi/aoX_pGQMbEM/hqdefault.jpg)