Essay21 March 2026

In which the rule outlives the reason.

Version Eleven

Title: The Rule That Outlived the Reason

In the winter, a line entered the system prompt:

"Never Active products don't indicate risk."

At the time the sentence felt obvious. An LLM response had triggered an alert that made no operational sense, the kind of technically correct signal that collapses under the smallest amount of scrutiny mixed with institutional knowledge. The model had identified language that correlated with risk and applied that pattern faithfully, even though the customer had never adopted the product in the first place. I pulled up the underlying record, scrolled through a transcript and other data, looking for the moment that had confused the system, and the conclusion arrived quickly: the model wasn't wrong about the language, but the framing of the problem was off. A customer who never started using something cannot meaningfully be described as about to stop.

So I wrote the rule and committed it to the prompt.

I named the system TARS, after the Interstellar robot, because I am exactly that kind of person. Every fifteen minutes a cron job wakes up, pulls from our CRM, call transcripts, chat logs, meeting notes, and project management tools, analyzes five accounts, and goes back to sleep. Quietly, continuously, across the full portfolio.

Everything surrounding that first rule has already begun to fade. The particular transcript that triggered the problem, the order in which the logic became clear, all of it dissolves into the background noise of dozens of similar afternoons. Only the rule remains intact. Every transcript the system analyzes now encounters that sentence first, quietly adjusting the model's interpretation before the rest of the analysis unfolds.

That persistence produces a strange inversion of how organizations usually forget things.

Institutional knowledge tends to disappear when the artifact disappears. A teammate leaves, a document gets archived, a wiki page falls out of date, and the reasoning that once guided a decision slowly dissolves into rumor. Months later the same question surfaces again and someone reconstructs the answer from scratch, unaware that the organization solved the problem once already.

Here the opposite occurs. Reasoning evaporates slowly, in time, but the rule survives.

And rules like this don't only accumulate in the prompt file. They hide inside the system's architecture, in decisions that looked like engineering but were actually beliefs.

I noticed the problem when two accounts diverged in ways the system couldn't explain. One week, an account's competitive risk analysis refreshed three times in five days because a customer had mentioned a competitor's name on an otherwise positive call. Each recomputation returned roughly the same assessment. Meanwhile, a different account's churn risk sat unchanged for two weeks while the buyer's team was quietly reorganizing, something the system absolutely would have flagged if it had bothered to look again. Every risk dimension shared the same cache expiration window, and the signals moved at different speeds.

The fix felt like plumbing: churn risk now expires after seven days, competitive risk after fourteen. High-value accounts get their windows divided by three. Churn is volatile because a tense QBR, a reorg, an unanswered renewal conversation can shift the picture in days. Competitive risk moves slower because vendor evaluation conversations don't reverse themselves week to week. How often the system decides to look is, in effect, an embedded belief about how fast things change. I encoded that belief once, tested it loosely, and moved on. The system applies it hundreds of times a week without revisiting the question. And when the analysis arrives matters as much as what it finds: an overnight run meant people got Monday-morning assessments generated Saturday at 2am, and they treated those the way you'd treat a Monday-morning email sent at 3am.

A similar thing happened with the Slack sync. It originally processed channels in alphabetical order.

I realize that doesn't sound like the beginning of a disaster. But with hundreds of channels and API rate limits, the sync would reliably run out of time somewhere around the letter M. Every single run. Accounts whose Slack channels happened to start with N through Z were structurally underserved, not occasionally or randomly, but systematically, as a feature of the alphabet. I only caught this after noticing that some accounts consistently had rich Slack context in their analyses while others had none, and the dividing line was, absurdly, the first letter of the channel name.

A rotation cursor fixed the immediate problem by picking up where the previous run left off. But the broader lesson stayed with me: any system with bounded execution time and ordered processing will create invisible hierarchies. Alphabetical ordering feels neutral. It isn't. And the hierarchy it created was perfectly silent, distinguishable from normal operation only if you already suspected something was wrong.

The most surprising version of this pattern involved the system prompt's persona instruction. The prompt tells the model to "think like an operator, not a chatbot." For months I assumed this was a tone instruction, a way to make the output sound less like a help desk and more like a colleague. Then I actually compared the outputs side by side, and the difference was larger than I expected.

Without the operator framing, the model hedges. Assessments arrive padded with qualifiers: potential risk factors, possible concerns, indicators that may suggest elevated attention. These are the kinds of sentences that are technically defensible and practically meaningless. Someone reading "there may be potential risk factors" across forty accounts learns exactly nothing about which ones need attention this week.

With the operator persona, the model commits. It names the evidence, identifies the source, and makes a call about what matters and what doesn't. My best explanation is that the persona creates pressure toward specificity. An "operator" who hedges everything isn't actually performing the role the prompt describes, and the model seems to resolve that tension by grounding itself in concrete evidence rather than retreating to vague qualification. Persona shapes how the model reasons, not just how it sounds.

Inside the prompt file, lines accumulate gradually. The filename or a random comment gives the game away:

2026-02-21-v11

Over time the prompt begins to resemble sediment more than design. Buried inside it are fragments of conversations the organization has already forgotten: a renewal misread as churn, a transcript mentioning a competitor that turned out to be harmless, a product adoption metric missing from a regional account that once triggered a cascade of false alarms. Each line compresses the outcome of a discussion that once occupied a room full of people.

Rules behave differently from principles in that respect.

Principles carry their lineage with them. Someone can usually reconstruct the argument that produced them. Principles have parents.

Rules accumulate through encounter. An unexpected case appears, the system reacts poorly, and a constraint enters the prompt to prevent the mistake from repeating itself. The only evidence that the lesson ever mattered is the lingering discomfort people feel when they consider deleting it.

Scar tissue might be closer to the right term than memory. Each rule marks a place where the organization's assumptions turned out to be incomplete. Scar tissue protects the system from reopening the same wound twice. It also conceals the original injury.

And the scar tissue isn't confined to the prompt. The cache windows, the rotation cursor, the persona instruction: each of these is a rule embedded in architecture rather than language, harder to find and harder to question because it doesn't look like a rule at all. It looks like engineering.

By the time the prompt reaches version eleven, most of the scenes that produced its instructions have already disappeared. The model still behaves correctly because the rules execute every time the system runs, but the reasoning that gave those rules their meaning has begun to dissolve. What remains is a system that remembers certain operational lessons more faithfully than the organization itself does, applying them hundreds of times a week without hesitation or explanation.

Documentation preserves explanations. Playbooks preserve procedures. Prompt rules preserve reflexes, responding automatically to injuries the system no longer remembers receiving.

I've thought about deleting some of the older rules. There are lines in the prompt I no longer fully recognize, instructions that feel right in the way that all caution feels right, without my being able to reconstruct the scene that made them necessary. Now those sentences run without me, in contexts I didn't anticipate, shaping conclusions I won't always see.

What stops me is the discomfort of not knowing. The rule might be protecting the system from something the organization has forgotten how to name.

Or it might be a ghost.

Footnotes

Most of the prompt rules I've written don't correct the model's reasoning so much as redefine the boundaries of the problem it's trying to solve. The model was right about the language. The category was wrong. And no amount of pattern recognition fixes a misframed question.

The constraint TARS was built to address was never analytical capability. Anyone managing a book of business will, on a good week, actually review call transcripts for maybe eight of their forty accounts. The other thirty-two get whatever the person can hold in working memory from the last time they looked, supplemented by CRM fields that may or may not reflect reality. Multiply this across a team of ten and the math gets ugly fast: at any given moment, the organization's understanding of most of its revenue is weeks stale and selectively remembered. Not because anyone is lazy. Because the day fills up, and the transcripts keep coming, and there are only so many hours before the next call.

There's also a 24-hour debounce: the system won't reanalyze an account it just analyzed, even if the expiration window has technically lapsed, unless genuinely new evidence has appeared or the renewal is within thirty days. Without this, every minor data change would trigger a recomputation, burning tokens to produce an analysis barely distinguishable from yesterday's. The debounce is the system's way of knowing when to leave something alone, which is a harder skill than it sounds.

Older enterprise systems often accumulate logic in the same way. Deep inside billing platforms or CRM automation flows sit conditions that begin with phrases like "temporary workaround" or "special handling." Years later the surrounding context disappears while the logic remains, quietly redirecting transactions no one remembers debugging. Engineers sometimes refer to these sections of code as defensive layers: protections against edge cases the system once encountered and never wants to encounter again. Prompt files are beginning to develop the same geology, except the defensive logic now encodes judgment, not computation.

The system also can't tell the difference between a quiet account where everything is fine and a quiet account where the customer has already made the decision to leave. Silence looks identical in the data regardless of what's producing it. Absence of signal isn't the same as absence of risk, and the model handles that ambiguity unevenly. I keep coming back to this because I'm not sure the scar tissue model accounts for it. You can write a rule for a pattern you've seen before. You can't write a prompt instruction for a signal that doesn't exist.


Reply

I’d welcome your thoughts on this essay. Send me a note →

Related reading
Latest entries