Try an interactive version of this dialog: Sign up at solve.it.com, click Upload, and pass this URL.

The Dangerous Illusion of AI Coding? - Jeremy Howard

Transcripts

# video_transcript = get_yt_transcript(video_url)
# video_transcript

Preread

3-4 paragraph pre-read.

In this conversation, Jeremy Howard — deep learning pioneer, co-founder of fast.ai, and creator of ULMFiT — sits down with Dr. Tim Scarfe at his home in Moreton Bay, Queensland. The discussion ranges from Jeremy's early work on transfer learning and fine-tuning language models, to sharp critiques of how AI coding tools are being used (and misused) in industry today. Jeremy is characteristically direct: he finds much of the current hype around AI-assisted software development not just overblown, but potentially harmful.

The central tension in the conversation is the distinction between coding and software engineering. Jeremy argues that LLMs are reasonably good at the former — translating a specification into syntax is essentially a pattern-matching problem — but are poor at the latter, which requires designing systems, choosing abstractions, and building up layers of understanding over time. He uses the metaphor of a slot machine to describe the AI coding experience: an illusion of control, occasional wins, and studies showing only a tiny uptick in what teams are actually shipping.

A deeper concern runs through the whole conversation: what happens to human skill and organisational knowledge when cognitive work is routinely delegated to AI? Drawing on cognitive science, spaced repetition research, and his own experience, Jeremy argues that learning requires friction — and that AI tools, by removing that friction, risk producing developers (and companies) that are quietly making themselves obsolete. He calls this "understanding debt."

The conversation ends on a more constructive note, with Jeremy making the case for interactive, exploratory programming environments — notebooks, REPLs, and tools like Solveit — as the right model for human-AI collaboration. Rather than handing a task off to an AI in a terminal, the goal should be a rich, stateful loop where both human and AI are inside the same live environment, building understanding together. This, he argues, is what leaves you energised rather than drained.

Central ideas list

Some central ideas:

1. Coding ≠ Software Engineering LLMs are decent at coding (a style-transfer problem — translating specs into syntax), but software engineering — designing systems, choosing abstractions, decomposing problems — is a fundamentally different skill they're poor at.

2. The "Slot Machine" Illusion of AI Coding AI coding tools give an illusion of control. You craft prompts, tweak MCPs, pull the lever — and get code nobody understands. Studies (including one Jeremy ran) show only a tiny uptick in actual shipping, despite the excitement.

3. Understanding Debt & the Atrophy of Human Skills When you delegate cognitive work to LLMs, you erode your own (and your organisation's) knowledge. Jeremy frames this as "understanding debt" — the Anthropic study he cites found most developers weren't learning, because there was no productive friction.

4. Interactive, Exploratory Environments Matter Jeremy argues that tools like notebooks (and Solveit!) — where humans can manipulate, observe, and interact with a system in real time — are how deep understanding is built. This is why he finds pure terminal-based AI coding tools philosophically troubling.

5. AI Creativity is Compositional, Not Generative LLMs are good at combining things within their training distribution, but can't genuinely extrapolate outside it. He calls this "compositional creativity" — impressive in scope, but not true novelty.

6. Power Centralisation is the Real Existential Risk His AI safety concern isn't rogue AI — it's that powerful AI technology will be monopolised by power-hungry actors if centralised.

Summarize the central ideas and concepts that Jeremy Howard talks about in this podcast. Ensure the list is comprehensive. Think through this. Adhere to the format I started for you and extend the list.

Here's the extended list:

1. Coding ≠ Software Engineering LLMs are decent at coding (a style-transfer problem — translating specs into syntax), but software engineering — designing systems, choosing abstractions, decomposing problems — is a fundamentally different skill they're poor at.

2. The "Slot Machine" Illusion of AI Coding AI coding tools give an illusion of control. You craft prompts, tweak MCPs, pull the lever — and get code nobody understands. Studies (including one Jeremy ran) show only a tiny uptick in actual shipping, despite the excitement.

3. Understanding Debt & the Atrophy of Human Skills When you delegate cognitive work to LLMs, you erode your own (and your organisation's) knowledge. Jeremy frames this as "understanding debt" — the Anthropic study he cites found most developers weren't learning, because there was no productive friction.

4. Interactive, Exploratory Environments Matter Jeremy argues that tools like notebooks (and Solveit!) — where humans can manipulate, observe, and interact with a system in real time — are how deep understanding is built. This is why he finds pure terminal-based AI coding tools philosophically troubling.

5. AI Creativity is Compositional, Not Generative LLMs are good at combining things within their training distribution, but can't genuinely extrapolate outside it. He calls this "compositional creativity" — impressive in scope, but not true novelty. The Claude-written C compiler is his key example: it quietly copied a quirk from LLVM that even Chris Lattner considers a mistake.

6. Power Centralisation is the Real Existential Risk His AI safety concern isn't rogue AI — it's that powerful AI technology will be monopolised by power-hungry actors if centralised. The "even if" argument: regardless of how powerful AI turns out to be, concentrating it in the hands of one company or government is dangerous.

7. LLMs "Cosplay" Understanding — and Both Sides Are Right The online debate ("LLMs understand nothing" vs "look what it just did!") is a false dichotomy. LLMs pretend to understand, and that pretense is fine while you're inside the training distribution — it's only when you fall outside it that you suddenly notice they're "worse than stupid." He invokes Searle's Chinese Room and Dennett.

8. LLMs Build Hierarchies of Abstraction Jeremy's original hypothesis for ULMFiT: to predict the next word well, a model must build a hierarchical structural understanding of the world — objects, people, hierarchies, institutions — because you can't compress text without that implicit knowledge.

9. Transfer Learning Was the Key Insight Nobody Took Seriously ULMFiT's core contribution was showing that pre-training on a general-purpose corpus (not task-specific data) and then fine-tuning works universally. The techniques — gradual unfreezing, discriminative learning rates, fine-tuning batch norms — were also novel and controversial at the time.

10. Slope vs. Intercept — Grow, Don't Just Produce Drawing on John Ousterhout's maxim "a little bit of slope makes up for a lot of intercept": the only thing Jeremy says he cares about in his staff is whether their personal capabilities are growing. Optimising purely for AI-assisted output today sacrifices the growth trajectory that makes the company viable long-term.

11. Desirable Difficulty and the Necessity of Friction Memories and skills aren't formed without hard work — referencing Ebbinghaus, spaced repetition, and Anki. AI coding removes friction almost entirely, which is precisely why (per the Anthropic study) people using it stop learning. No friction = no growth.

12. Knowledge is Embodied, Perspectival, and Protean Citing César Hidalgo and philosopher Mazviita Chirimuuta: knowledge isn't an abstract thing that can live on Wikipedia — it's alive, perspectival, and exists in people and organisations. Delegating cognition to LLMs therefore literally erodes organisational knowledge.

13. AI Benefits the Extremes, Not the Middle Very junior developers (who can now ship something) and very senior developers (who can offload typing and research) benefit most from AI coding. It's the intermediate developers — the majority — who are most at risk of stagnating or making themselves obsolete.

14. The Danger of Owning Code Nobody Understands His IPykernel experiment: AI successfully fixed a complex multi-threaded bug over two weeks — but left him with working code that no-one understands. Does it have memory leaks? Will it break? There's no engineering theory for what to do in this situation.

15. Small, Well-Understood Components Are the Path Forward Instagram and WhatsApp dominating their sectors with tiny teams is evidence that lean, well-engineered software beats bloated complexity. Large-company software quality (Windows, macOS) has visibly deteriorated. The answer is finding the right-sized pieces — which requires decades of engineering judgment AI doesn't have.

16. The "No Silver Bullet" Parallel Fred Brooks' classic essay predicted exactly the current hype — that a new tool would eliminate the need for programmers. He estimated a 30% improvement ceiling. The insight: the vast majority of software engineering isn't typing code.

17. Notebooks Are Not the Problem — The Environment Is the Solution Jeremy's rebuttal to Joel Grus: every criticism of notebooks can be solved without abandoning them (e.g. nbdev for CI, nb-merge-driver for git). Traditional software engineering, with its dead text files and no direct feedback loop, is the inferior paradigm — inspired by Brett Victor's philosophy of direct, visceral engagement with computation.

18. Solveit as a New Paradigm for Human-AI Collaboration By placing both human and AI inside a Python interpreter, you get a richer interaction: the human talks to the AI, the AI talks to the computer, they build tools for each other in real time. This is what makes it energising rather than draining — the opposite of the Claude Code marathon experience.

Exploration

When Jeremy talks about learning in humans, what is the recommended approach? What role does memorization via spaced repition play?

Jeremy draws on several interconnected ideas about how humans actually learn effectively:

The core principle: desirable difficulty Learning requires friction. Memories and skills aren't formed unless the work of forming them is genuinely hard. This is why AI tools that remove all friction are, in his view, pedagogically dangerous — if it's too easy, nothing sticks.

Spaced repetition as the gold standard for memorisation Jeremy spent 10 years studying Chinese specifically to understand learning, using Anki. The key insight from Ebbinghaus (19th century) and Piotr Wozniak (SuperMemo): the algorithm schedules each card just before the moment you'd forget it — which means reviews are always hard work. That difficulty is the point, not a bug. He says it was "absolutely exhausting" but worked so well that he still remembers Chinese 15+ years later without any further study.

But memorisation isn't the whole story Wozniak's deeper claim — which Jeremy endorses — is that creativity itself comes from having a rich store of remembered things and recombining them in interesting ways. So memorisation isn't rote drudgery; it's building the raw material for insight.

Building mental models through interaction He cites Feynman: great scientists build intuition by interacting with what they're studying — even if (like Feynman with quantum mechanics) they have to find physical analogies like spinning plates.

Would you like to dig into any of these threads further — or explore how he connects this back to software development and AI tools?

How do the modern ways work is being delegated to AI agents interfere with human learning?

Jeremy's central worry is what Tim Scarfe calls "understanding debt" — and it operates on a few levels:

Friction removal kills memory formation The Anthropic study found that most developers using AI coding tools simply weren't learning, because the process had no productive resistance. As Jeremy puts it (drawing on Ebbinghaus): memories don't form unless forming them is hard work.

The default attractor is autopilot Even if skilled practitioners can use AI tools in a reflective, learning-oriented way, the natural pull of the tools is toward delegation — just hand the task off and accept the output. Tim's analogy: like a self-driving car, once you stop engaging, competence quietly atrophies.

Organisational knowledge erodes too It's not just individuals. Citing César Hidalgo: knowledge is embodied in people and organisations, not in documents. When you outsource cognitive tasks, the organisation loses its ability to evolve and adapt — it cuts off the feedback loop through which edge cases and surprises filter upward and improve collective understanding.

The "slope vs. intercept" framing Jeremy's sharpest point: maximising output today (intercept) at the cost of personal growth (slope) is a path to obsolescence — for individuals and companies alike.

What angle interests you most — the individual learning side, or the organisational knowledge side?

Give me a workflow for learning a new skill based on Jeremy Howard's insights and insights from other human learning experts.

Here's a synthesised workflow, drawing on Jeremy alongside Ericsson (deliberate practice), Wozniak, and Feynman:


1. Immerse first, then focus Get a broad picture before drilling down — this gives context for details to "hang on." Jeremy's ULMFiT intuition: pre-train on something general before specialising.

2. Interact with the material directly Don't just read — poke at it. Run code, draw diagrams, build toy versions. Feynman studied spinning plates to understand quantum spin. The feedback loop is everything.

3. Work at the edge of your ability Tasks should be just hard enough to require real effort (Ericsson's deliberate practice). Too easy = no learning. Too hard = no progress. Find the productive friction zone.

4. Explain it simply The Feynman Technique: if you can't explain it to a novice, you don't understand it yet. Gaps in your explanation reveal gaps in your model.

5. Consolidate with spaced repetition Once you understand something, use Anki to retain it long-term — scheduled just before you'd forget it. This is the memorisation layer that fuels later creativity.

6. Build in small, verified steps Jeremy's coding philosophy applies to any skill: each step should work and be understood before moving on.


Which of these steps would you like to explore further, or apply to a specific skill you're learning?

What is the approach that works with learning human movements like those in jiu jitsu?

Motor learning has some important differences from cognitive learning — the body needs its own kind of "reps."

The core principle: drilling to automaticity Movements need to be practised slowly and correctly until they become unconscious. In BJJ this means isolated drilling before live rolling — you can't think your way through a triangle choke in real time.

The spacing effect still applies Short, frequent sessions beat long infrequent ones. Your nervous system consolidates motor patterns during rest and sleep, not during practice itself.

Live sparring = the friction layer This is where Jeremy's "reality pushes back" principle matters most. Drilling builds the pattern; a resisting partner stress-tests your actual understanding of it. Without that friction, you develop a false sense of competence.

Chunking Experts don't think in individual movements — they perceive whole sequences as single units. Learning BJJ well means building these chunks gradually, not trying to absorb whole systems at once.


There's a direct parallel here to Jeremy's workflow: drill small pieces (like building in small verified steps), then test against resistance (friction/feedback), then rest and consolidate.

Are you learning BJJ yourself, or is this more about understanding the principles for another movement skill?

What proportion of time should be spent in each area? What proportion of learning vs doing? And memorization vs learning new knowledge?

There's no single universal answer, but here's what the evidence suggests:

Learning vs. Doing Most experts land around 70–80% doing, 20–30% studying. Jeremy's philosophy strongly favours this — you build intuition through interaction, not passive consumption. The doing is the learning.

New knowledge vs. spaced repetition review Wozniak's research suggests something like 60–70% new material, 30–40% review — but this shifts over time. Early in a skill, new material dominates. Once you have a solid foundation, review becomes more important to consolidate it.

A rough weekly split might look like:

  • ~60% hands-on practice (building, experimenting)
  • ~20% structured study (courses, reading, worked examples)
  • ~20% review/consolidation (Anki, re-explaining to yourself)

The important caveat: Jeremy would push back on over-optimising these ratios. His real emphasis is on quality of engagement — are you in the friction zone? Are you interacting directly with the material? — rather than clock-watching.

What skill are you thinking about applying this to? That would help make the proportions more concrete.

What are the different types of creativity? How does compositional and generative creativity fit into this?

The richest framework here comes from philosopher Margaret Boden, who's directly cited in the podcast. She identifies three types:

  1. Combinatorial — making novel connections between existing ideas (e.g. metaphor, analogy)
  2. Exploratory — pushing to the edges of an existing conceptual space
  3. Transformative — actually restructuring the conceptual space itself (the rarest, most radical kind)

Jeremy's "compositional creativity" maps closely onto Boden's combinatorial type — recombining the entirety of human knowledge in interesting ways. He argues LLMs are surprisingly capable here, given the sheer size of their training corpus.

The contrast Jeremy draws is essentially combinatorial vs. transformative — LLMs can interpolate richly within their training distribution, but can't genuinely restructure or go outside it.

"Generative creativity" isn't a formal term from Boden — it tends to be used loosely to mean producing novel outputs. The key question Jeremy raises is: novel to whom, and how? Output that looks novel may still just be interpolation.

Where does this connect for you — are you thinking about this in the context of AI capabilities, or human creativity more broadly?

Do humans extrapolate in ways that AI doesn't?

Yes — and in several important ways:

Causal reasoning from sparse data Humans can observe a single striking example and update a whole world-model. We build causal models ("why did that happen?"), not just statistical ones. LLMs see correlations in text; humans build theories about mechanisms.

Reasoning from first principles When humans hit the edge of what they know, they can reason outward — combining logic, physical intuition, and analogy to reach genuinely new ground. Jeremy hits this every day in his R&D work, and it's precisely where he says LLMs suddenly "go dumb."

Embodied grounding Human abstraction is anchored in sensorimotor experience — we understand "push" partly because we've pushed things. This grounds our extrapolation in reality in a way text-trained models lack.

Transformative restructuring Boden's highest tier — humans occasionally reorganise their entire conceptual framework (think Copernicus, Einstein). This isn't interpolation; it's throwing away the old map.

The honest caveat though: we don't fully understand how humans do this — and it's rarer than we like to think. Most human "creativity" is also combinatorial.

Which of these angles interests you most?

Give me concrete examples of each type of extrapolation?

Causal reasoning from sparse data A doctor sees one unusual patient reaction to a drug and hypothesises a new mechanism — before any statistical evidence exists. One data point, whole new theory.

Reasoning from first principles Elon Musk famously asked: instead of buying rocket parts at market price, what would it cost to buy the raw materials and build from scratch? No precedent — just decomposing the problem to its foundations.

Embodied grounding A surgeon learning robotic surgery can transfer their feel for tissue resistance to interpret haptic feedback, even in a novel tool. The abstraction is grounded in years of physical experience.

Transformative restructuring Einstein didn't just extend Newtonian mechanics — he discarded the assumption that time was absolute. That's not extrapolation from existing data; it's replacing the conceptual framework entirely.


Notice these get progressively rarer and harder as you go down the list. Most human creativity sits in the top two. The bottom one happens perhaps a handful of times per century.

Which of these resonates most with what you were originally thinking about?

Does Jeremy talk about grounding?

Not explicitly — he doesn't use the word "grounding." But the concept is lurking in several places:

  • His point that LLMs "cosplay understanding" implies they lack real-world grounding — their knowledge is purely statistical/textual
  • Tim raises the embodied knowledge angle (citing Chirimuuta), arguing knowledge is alive and perspectival, not abstract
  • Jeremy's Feynman example — studying spinning plates to grasp quantum spin — is essentially an argument for embodied grounding in human learning

So grounding comes up as a gap in LLMs rather than something Jeremy theorises about directly.

Is this in the context of AI limitations, or human learning — or both?

Vocabulary List

Vocabulary list based on concepts Jeremy mentions like desirable difficulty, for example. Try to be comprehensive. I know that Jermey talks about a lot of important concepts.

Here's a vocabulary list drawn from the concepts Jeremy (and Tim) discuss:


Desirable Difficulty — The educational principle that learning requires effortful struggle to stick. Too-easy revision doesn't form lasting memories.

Spaced Repetition — Scheduling review of material just before you'd forget it, maximising retention efficiency (Ebbinghaus, Anki, SuperMemo).

Transfer Learning — Using a model pre-trained on one task/corpus as a starting point for a new task, rather than training from scratch.

Discriminative Learning Rates — Assigning different learning rates to different layers of a network during fine-tuning, since earlier layers need less updating.

Gradual Unfreezing — Fine-tuning neural network layers one at a time from the top, progressively involving deeper layers.

Regularization — Techniques that constrain a model's flexibility to prevent overfitting (e.g. dropout, weight decay).

Compositional Creativity — Creativity that recombines existing knowledge in novel ways, staying within the training distribution. Jeremy argues LLMs are limited to this.

Training Distribution — The space of data a model was trained on; LLMs degrade sharply outside it.

Style Transfer — Reframing one thing in the form/syntax of another — Jeremy argues coding is essentially this.

Understanding Debt — Accumulating code or systems that work but that no-one understands, analogous to technical debt.

Slope vs. Intercept — John Ousterhout's idea: a high growth rate (slope) beats a high starting point (intercept) over time.

Hierarchy of Abstractions — Jeremy's model of how LLMs must implicitly build layered world-knowledge (objects → people → institutions…) to predict text well.

Desirable Friction — Related to desirable difficulty: the productive resistance that forces you to build genuine understanding.

Knowledge Erosion — The organisational loss of embedded, embodied knowledge when cognitive tasks are delegated to AI.

Exploratory Programming — A development style where you interactively probe and build in small increments, observing results in real time (notebooks, REPLs).

Dead Code / Dead Files — Jeremy's term for static text-file-based software artefacts that offer no live feedback or interaction.

Cosplaying Understanding — Jeremy's vivid phrase for how LLMs convincingly simulate comprehension without genuine understanding.

The Chinese Room — Searle's thought experiment: a system can produce correct outputs without any internal understanding of meaning.

Functionalism — The philosophical view that what matters is what a system does, not what it's made of — Tim raises this as the bull case for AI-generated code.

Ontological Irreducibility — Searle's claim that understanding has a phenomenal component that can't be reduced to mere function.

Goodhart's Law — When a measure becomes a target, it ceases to be a good measure; optimising for a proxy metric can miss the real goal.

No Silver Bullet — Fred Brooks' essay arguing there's no single tool that will dramatically transform software productivity, because typing is not the bottleneck.

Vibe Coding — Coding by feel with AI, without deeply understanding what the code does — Jeremy and Rachel are sceptical of its productivity claims.

Specification Acquisition — Tim's framing: conversing with an LLM is fundamentally a process of progressively nailing down what you actually want.

Pretext Task — A self-supervised learning task (e.g. predict masked words/pixels) used to pre-train a model on unlabelled data.

Continual Learning — Training a model on new tasks while retaining previously learned knowledge, without catastrophic forgetting.


Want me to add definitions for any others, or organise these into themed groups?

Content references

yt_info([video_url], 'title', 'description')

Nicely summarize the references from the video description. I've shared it above. Include the links.

Here are the references from the video, organized by type:


πŸ“„ Blog Posts


πŸ“š Books


πŸ”¬ Research Papers & Studies


πŸŽ₯ Videos


πŸ“ Other

People References

List some of the people that Jeremy Howard references in this podcast episode. Link to their Wikipedia pages if they exist.

Here are the key people Jeremy Howard (and Tim Scarfe) reference in the episode:

ML / AI researchers:

Philosophy / Cognitive Science:

Computing pioneers:

Other:

Include the people that might not have a Wiki page.

Here are the additional people mentioned who likely don't have Wikipedia pages:

fast.ai / directly connected:

  • Rachel Thomas — Jeremy's wife and fast.ai co-founder; her METR study and article on AI coding addiction are referenced
  • Bryan McCann — cited alongside Andrew Dai for early pre-training work
  • Andrew M. Dai — Google researcher, early transfer learning work

Tech:

  • Joel Grus — the "I Don't Like Notebooks" talk (works at Allen Institute for AI)
  • Arvind Narayanan — the "snake oil" AI critic Jeremy co-wrote the AI risk rebuttal with (does have a page)

Philosophy / Neuroscience:

  • Mazviita Chirimuuta — philosopher of neuroscience, "knowledge is protean"

AI Safety:

  • Dan Hendrycks — "offense-defense asymmetry" (actually does have a page)

With pages I missed first time: