Why refactors silently regress

A refactor is supposed to change code without changing behaviour. That definition is the trap. When you ask an LLM to "clean up this function," it can — and often does — change behaviour while improving readability. The new version compiles, the test suite still passes, and a subtle edge case is now wrong. You won't notice until production tells you.

Three classic ways an LLM-driven refactor silently regresses:

Failure mode	What it looks like
Loosened types	Original code took `string`. Refactor takes `string \| undefined`. Now an upstream `null` slips through where it used to throw.
Reordered exception handling	Original code threw before any side-effect. Refactor moves the side-effect first, leaving the system in a partial state on failure.
Default values inserted	Original code required `size` to be passed. Refactor adds `size = 10` as a default. New callers silently get the wrong page size.

None of these break tests if the test suite doesn't cover the boundary. None of these are obvious in code review if the diff is large. All of these are exactly the kind of "improvement" an LLM happily makes when you say "refactor this for clarity."

This module's flow:

Unlocked vs locked refactor flow:

The fix is to write refactor prompts as constraint-locked transformations. You don't say "improve this." You say "do exactly this transformation while preserving exactly these properties." The properties become locks the model can't unlock.

Common locks that matter for refactors:

Public signature unchanged. Function name, parameter names, parameter types, return type — all identical.
Error semantics preserved. Same exception types, same conditions, same messages.
No new dependencies. No new imports, no new packages.
No new defaults. If a parameter was required, it stays required.
Idempotency preserved. If the function was idempotent, it remains so.

You'll see the next lesson use exactly these locks against a JavaScript-to-TypeScript refactor. The model will tighten types without slipping any of them. The lock list does the work — not the model's judgement.

A useful framing: a refactor prompt is closer to a contract than a request. You're not asking the model to make decisions about what's better. You're handing it a list of properties that must hold and asking it to produce code that holds them. The model is good at following contracts. It is bad at "use your judgement."

This is why refactors break the most often when engineers use the most casual prompts — "tidy this up," "make this cleaner," "modernize this." Every casual word leaves room for the model to "improve" something that was load-bearing.

Next up: the type-safety lock pattern in detail. :::

Quiz

Stay on the Nerd Track