The Clean Code Question

You've probably seen code like this by now.

A function that calls an API method that doesn't exist, written with complete confidence. Variable names that are technically descriptive (userDataFromAPIResponseAfterValidation) but somehow make the code harder to understand, not easier. The same check written three different ways in the same file because the AI forgot what it did 50 lines ago.

The code works. The tests pass. It ships.

And at some point you've probably asked a question that would have been heresy five years ago: Does it matter?

The Numbers Are Brutal

Let's get the obvious stuff out of the way. By traditional metrics, AI-generated code is measurably worse than human-written code.

1.7x

more issues in AI PRs

code duplication increase

more performance issues

67%

devs debug AI code longer

CodeRabbit's 2025 analysis of 470 real-world pull requests found AI-generated PRs have 1.75x more logic errors, 1.64x more maintainability problems, and 1.57x more security vulnerabilities.

GitClear's research across 211 million lines of code found the percentage of code that gets refactored has dropped from 24% to under 10%. We're not improving code anymore. We're just adding more of it.

By every metric we've developed to measure code quality, AI is making things worse.

Case closed, right?

The Question Nobody's Asking

All of those metrics (readability, maintainability, cyclomatic complexity, code duplication) were developed by humans, for humans. They all assume the same thing: that humans will need to read this code later.

Clean code principles emerged from decades of painful experience. We learned that clever one-liners become debugging nightmares. That functions should do one thing because human working memory can only hold so much. That meaningful variable names matter because x tells you nothing six months from now. That DRY (Don't Repeat Yourself) exists because humans are bad at keeping multiple copies in sync.

But what if humans aren't the ones reading the code anymore?

How AI Actually Sees Code

LLMs don't read code the way we do.

How Humans Read Code

Build a mental model
Trace execution paths
Limited working memory
Need organized structure
Get tired, lose focus

How LLMs Process Code

Tokenize into fragments
Pattern-match statistically
Full context window at once
No need for compression
No fatigue (but hallucinate)

Research shows that LLM "confusion" correlates with human confusion, but that doesn't mean they need the same solutions we do. The cognitive constraints that drove clean code principles don't apply the same way.

When we write userDataFromAPIResponseAfterValidation, we're trying to compress information into a variable name because our eyes will only glance at it for a moment. But an LLM processes the entire context window at once. It doesn't need the compression.

The Disposable Code Thesis

Here's an idea that's been floating around: maybe we should stop maintaining AI-generated code entirely.

The traditional economics of software said that code is expensive to write and cheap to run, so you protect your investment by maintaining it carefully. But AI inverts this equation.

One developer reported investing years in building systems where complete applications could be regenerated from specifications. The result? A 20,000-line system rebuilt in three days. Not refactored. Rebuilt from scratch.

"The code is disposable. The specifications are permanent."

If you can regenerate a system faster than you can understand it, why bother understanding it? If a "refactoring" means describing what you want and generating fresh code, who cares about the intermediate representation?

In a disposable world you don't refactor, you regenerate. If an app built with AI assistance breaks or needs a new feature, you don't dive into the spaghetti code. You just ask the model to build it again, but better.

The Context Window Wildcard

Here's where it gets uncertain: context windows are expanding fast.

In 2022, GPT-3 had a 4K token context window. Maybe a few hundred lines of code. Today, Claude can handle 200K tokens. Google's Gemini offers 2 million. Some models claim 100 million tokens, roughly 10 million lines of code, or an entire large codebase in a single context.

If an AI can "see" your entire repository at once, does modular organization matter the same way? We broke code into modules partly because humans needed to hold one piece in their head at a time. An AI with a 100-million-token context window has no such limitation.

But before you assume this solves everything, there's a catch. Chroma's research on "context rot" found that models don't use their context uniformly. Their performance grows increasingly unreliable as input length grows. A model claiming 200K tokens typically becomes unreliable around 130K, with sudden performance drops rather than gradual degradation.

So yes, context windows are exploding. But whether models can actually use that context effectively is another question.

The Case for Clean Code

Okay, I've made the contrarian case. Now let me argue against myself.

Clean code helps AI too. Research on variable naming found that descriptive names achieve 0.874 semantic similarity with AI models, while obfuscated names score only 0.802. Clean code isn't just for humans. It gives AI better signal to work with.

AI is trained on code. Every sloppy pattern AI generates came from code it was trained on. Bad code creates bad AI creates worse code. It's a feedback loop. If we abandon clean code principles, we poison the training well for the next generation of models.

Someone's still debugging. That 67% of developers spending more time debugging AI code? They're humans. For the foreseeable future, humans remain in the loop, especially when things break. And things always break.

Safety-critical systems exist. Healthcare, finance, aviation, defense. These domains require human audits, regulatory compliance, and the ability to explain why code does what it does. "The AI wrote it and we can't quite explain it" doesn't fly when someone's life depends on the software.

The disposable code thesis works until it doesn't. You can regenerate your codebase from specifications, right up until you can't. Until the specification misses an edge case that emerged over years of production use. Until the regenerated version subtly behaves differently in ways you don't notice until customer data is corrupted. As Honeycomb noted: "Disposable code is here to stay, but durable code is what runs the world."

So What's the Answer?

I think we're asking the wrong question.

The question isn't "does clean code still matter?" The question is "what kind of code are we writing?"

Disposable Code

Internal scripts
One-off data migrations
Prototypes and MVPs
Might get thrown away in 3 months
Let AI generate it, ship it, move on

Infrastructure Code

Core business logic
Payment processing
Data pipelines everything depends on
Will outlive you and the AI model
Quality isn't optional

The problem is that AI makes it easy to treat all code the same way. The same workflow that generates a quick script can generate a core service. The friction that used to slow us down and force us to think is gone.

What Actually Changes

So what do we do with this?

Invest in specifications, not just code. If code is becoming more disposable, specifications become the durable artifact. What does this system actually need to do? What are the constraints? What edge cases matter? The companies that get this right will be able to regenerate their systems cleanly. The ones that don't will have AI-generated code they can't regenerate or understand.

Develop judgment about contexts. Not all code needs the same treatment. Get comfortable asking: Is this throwaway or infrastructure? Prototype or production? The answer determines how much you need to care about what the AI produces.

Learn to read AI code, not just write prompts. The people who thrive won't be prompt engineers who can't code. They'll be engineers who can read AI output critically, catching the subtle wrongness, the almost-right patterns that will cause problems later. This requires understanding clean code principles even if you're not writing clean code yourself.

Accept uncertainty. We don't know how context windows will scale. We don't know if AI will get better at maintaining coherent architecture. We don't know if the "disposable code" thesis will prove out or collapse under its own weight.

Where I Land

Clean code principles aren't arbitrary. They came from decades of learning what makes software maintainable by humans with limited memory and attention.

But they are human principles. Designed for human cognition. And we're in an era where the relationship between humans and code is changing.

Maybe AI will get good enough that code becomes truly disposable, regenerated as easily as we save versions of a document. Maybe context windows will expand until architecture doesn't matter because the AI can see everything at once. Maybe we're solving short-term problems that won't exist in two years.

Or maybe we're building a generation of systems that no one understands, creating technical debt that will take decades to unwind, training the next generation of developers to accept code they can't read.

I don't know which it is. I'm not sure anyone does.

But I know the question is worth asking: Clean code was written for humans. What happens when humans aren't the ones reading it anymore?

✨

The future probably isn't "clean code is dead" or "clean code forever." It's learning to tell the difference between disposable scripts and durable infrastructure, and treating each appropriately. That judgment is the new skill.

Sources