Sonnet 4.5 & the AI Plateau Myth — Sholto Douglas (Anthropic)

The MAD Podcast with Matt Turck The MAD Podcast with Matt Turck

41,791
3 tháng trước
Sholto Douglas, a key researcher at Anthropic, reveals the breakthroughs behind Claude Sonnet 4.5—the world's leading coding model—and why we might be just 2-3 years from AI matching human-level performance on most computer-facing tasks.

You'll discover why RL on language models suddenly started working in 2024, how agents maintain coherency across 30-hour coding sessions through self-correction and memory systems, and why the "bitter lesson" of scale keeps proving clever priors wrong.

Sholto shares his path from top-50 world fencer to Google's Gemini team to Anthropic, explaining why great blog posts sometimes matter more than PhDs in AI research. He discusses the culture at big AI labs and why Anthropic is laser-focused on coding (it's the fastest path to both economic impact and AI-assisted AI research). Sholto also discusses how the training pipeline is still "held together by duct tape" with massive room to improve, and why every benchmark created shows continuous rapid progress with no plateau in sight.

Bold predictions: individuals will soon manage teams of AI agents working 24/7, robotics is about to experience coding-level breakthroughs, and policymakers should urgently track AI progress on real economic tasks. A clear-eyed look at where AI stands today and where it's headed in the next few years.

Anthropic
Website - https://www.anthropic.com
Twitter - https://x.com/AnthropicAI

Sholto Douglas
LinkedIn - https://www.linkedin.com/in/sholto
Twitter - https://x.com/_sholtodouglas

FIRSTMARK
Website - https://firstmark.com
Twitter - https://twitter.com/FirstMarkCap

Matt Turck (Managing Director)
LinkedIn - https://www.linkedin.com/in/turck/
Twitter - https://twitter.com/mattturck


LISTEN ON:
Spotify - https://open.spotify.com/show/7yLATDSaFvgJG80ACcRJtq
Apple - https://podcasts.apple.com/us/podcast/the-mad-podcast-with-matt-turck/id1686238724


00:00 - Intro
01:09 - The Rapid Pace of AI Releases at Anthropic
02:49 - Understanding Opus, Sonnet, and Haiku Model Tiers
04:14 - Shelto's Journey: From Australian Fencer to AI Researcher
12:01 - The Growing Pool of AI Talent
16:16 - Breaking Into AI Research Without Traditional Credentials
18:29 - What "Taste" Means in AI Research
23:05 - Moving to Google and Building Gemini's Inference Stack
25:08 - How Anthropic Differs from Other AI Labs
31:46 - Why Anthropic Is Laser-Focused on Coding
36:40 - Inside a 30-Hour Autonomous Coding Session
38:41 - Examples of What AI Can Build in 30 Hours
43:13 - The Breakthroughs That Enabled 30-Hour Runs
46:28 - What's Actually Driving the Performance Gains
47:42 - Pre-Training vs Reinforcement Learning Explained
52:11 - Test-Time Compute and the New Scaling Paradigm
55:55 - Why RL on LLMs Finally Started Working
59:38 - Are We on Track to AGI?
1:02:05 - Why the "Plateau" Narrative Is Wrong
1:03:41 - Sonnet's Performance Across Economic Sectors
1:05:47 - Preparing for a World of 10-100x Individual Leverage