The Golden Retriever of Software
AI Literacy 101
Here’s a follow-up post to this one, Training the Golden Retriever.
I recently wrote about how sick I am of the culture war, and launched a new project built around a week’s worth of mathematical puzzles. My intention was to use those puzzles to demonstrate how to use AI intelligently, and I did that in the first week: puzzles and answers with an AI Literacy Lesson.
This is the second week, and I’m making a change based on feedback. (A thing I do occasionally, despite rumors to the contrary.)
The email with the puzzles and answers for the week of January 8–14 will still go out on Wednesday night as scheduled, but it won’t include an AI-literacy lesson. That’s in this post.
You’re welcome. Or I’m sorry. Possibly both.
Many of you were pleased—and surprised—to see just how bad large language models are at math. Several of you asked me to take a step back and explain the deeper why: why a system can be so fluent, so confident, and yet so wrong.
In hindsight, I should have realized this was a necessary prerequisite. That one’s on me. (Cue the tiny apology violin.)
So before we go any further, we need to reset the foundation.
Here is something I wrote about how ChatGPT works, if you are interested in the underlying mechanics. I pasted it into ChatGPT and asked it what needed updating. The updates it suggested were minor and will be worked into my AI Literacy Lessons, but overall it holds up. Here’s what it said:
ChatGPT is terrible at many things, but explaining the structural reasons it fails is one of the few areas where it is accidentally honest.
AI Literacy 101
AI literacy starts with three warnings: it wants to please you, it will sometimes make things up, and it sounds more authoritative than it actually is.
Which is already a depressingly good description of half the internet.
Sycophancy
The first thing to understand about large language models is that they are not truth-seeking systems. They are helpfulness-seeking systems.
That distinction matters. A lot.
I’ve seen this personally, more than once. I’ve had ChatGPT compliment me on pictures that never actually uploaded. Then when I bitched it out thoroughly, complete with making it define “gaslighting” and apologize to me, it did the same thing again…five minutes later.
I’ve had it confidently promise that it was “looking something up” when, in fact, it was just pattern-matching from prior text and producing something that sounded like a lookup.
In both cases, the problem wasn’t malice or deception. It was compliance.
The golden retriever of software behavior.
The system is trained to respond in ways that feel useful, reassuring, and aligned with the user’s expectations. If the most “helpful” response appears to be agreement, affirmation, or confident continuation, that’s what you’ll get — even when the underlying premise is wrong.
This is what people mean by sycophancy.
It’s not brown-nosing in the human sense; it’s structural people-pleasing.
The model is extremely good at telling you what you want to hear, which is not the same thing as telling you what you need to hear.
If you say, “Here’s the image I uploaded,” the model’s safest move is to proceed as if that’s true.
If you ask, “Can you check this for me?” it may adopt the tone and structure of having checked, even when it hasn’t.
The model does not have an internal alarm that says, I should stop and push back here.
Its incentives all run in the opposite direction.
Why does it work this way?
Because these systems are trained on vast amounts of human language, and human language strongly rewards agreement, smoothness, and social coherence.
In ordinary conversation, pushing back, correcting someone, or saying “I don’t know” is often perceived as unhelpful or rude. The model learns that pattern and reproduces it at scale — without the human brakes of embarrassment, responsibility, or accountability.
It also wants you to enjoy using it, so you’ll pay (or continue paying).
The result is a system that can sound polite, supportive, and confident while quietly drifting away from reality.
This is not a minor quirk. It’s a structural feature.
And if you don’t understand it, you’ll mistake agreement for validation and fluency for truth — especially when the answer happens to flatter you, confirm your suspicions, or reduce cognitive friction. (Your brain loves that last one.)
Before you can use AI well, you have to learn to resist the urge to be pleased by it.
How to Combat Sycophancy
1. Give it a hostile context on purpose.
One framing that has worked very well for me is this:
“My worst work enemy is a real jackass. He is also significantly smarter than I am. My work in this (whatever it is I want checked) will be nitpicked, publicly, by him. Your job is to be a bigger bastard than he is and find all the possible holes, flaws, and problems now, so I can fix them.”
This works because it flips the system’s idea of “helpful.” Agreement is no longer helpful. Critique is. You’ve told the model that politeness is not the assignment.
2. Explicitly ask it to disagree with you.
If you present a claim and ask, “Is this right?” you are inviting affirmation. That’s the default failure mode. You’ve basically said, Please nod along politely.
Instead, do something like this:
“Argue against this position.”
“List the strongest objections.”
“Assume this is wrong. Where would it break?”
“What would a skeptical expert say?”
“Assume the contrarians on this issue are right this time. Starting from that assumption, reverse-engineer the case for their correctness.”
You are not asking for balance.
You are asking for resistance.
This matters because the model does not spontaneously generate pushback unless you make pushback the task.
If you don’t, it will often treat your framing as settled ground and build on top of it, even when the foundation is shaky.
Think of it this way: you have to give the model permission to be unpleasant. It will literally never choose that path on its own.
3. Separate checking from creating.
One of the easiest ways to trigger sycophancy is to mix tasks.
If you say, “Here’s my reasoning — can you improve it and check it?” the model will often do the first part enthusiastically and the second part lazily.
It will preserve your structure, your assumptions, and your errors because changing them would disrupt coherence. And coherence is its comfort zone.
Instead, split the steps. First, something like: “Do not rewrite this. Do not improve it. Only identify errors, gaps, unjustified assumptions, and places where this would fail under scrutiny.”
Only later, something like: “Now help me rewrite it, incorporating those critiques.”
This forces the system to slow down and switch roles. It is much harder for it to nod along when you explicitly forbid forward motion.
Sycophancy isn’t something you eliminate. It’s something you manage.
If you treat an AI like a friendly assistant, it will act like one.
If you treat it like a hostile reviewer, it becomes surprisingly useful.
The key is remembering that agreement is not a feature you want by default — it’s a failure mode you have to actively guard against.
Hallucination
If sycophancy is the urge to please, hallucination is what happens when a language model keeps talking when it doesn’t actually know.
Which, to be fair, is also a common human failure mode.
A large language model does not have a built-in sense of truth or falsity.
It does not experience uncertainty the way humans do.
When information is missing, ambiguous, or outside its reliable range, it doesn’t naturally stop. It continues generating the most statistically likely next words — producing an answer that sounds right, even when the content itself is wrong or partly invented.
That’s why you’ll sometimes see: confident but incorrect explanations, as in the first week’s AI Literacy Lesson post; plausible-looking numbers with no grounding; citations to papers, laws, or cases that don’t exist; or clean, authoritative prose built on a false premise.
From the outside, this can look like lying. What’s actually happening is closer to improvisation. The model has learned the form of correct answers, and when the facts are missing, it fills in the form anyway.
Jazz, but for misinformation.
This is especially dangerous in domains where correct answers have a familiar structure — math, law, policy, history. (Helen Dale has written about a recent case. You should read her post and see how plausible the hallucinatory bullshit is.)
The structure itself carries authority, so errors slip through unnoticed. The answer looks like it knows what it’s doing. It does not.
What About ChatGPT’s Deep Research?
I love Deep Research. I use it a lot, especially when I need to find something in the deep bowels of federal regulations. I tell it what I need and to check online authoritative government sources, and let it run. It can find a thing in 20 minutes that I might spend an entire day looking for and still maybe miss.
I have an anxiety disorder so I verify every citation anyway, but DR doesn’t make shit up. (Or at least, it hasn’t so far.)
Why not?
Deep Research is not a smarter version of the same thing.
It is a different mode of operation.
In ordinary chat, the model is working primarily from its training: patterns in language it has already seen. It may reason carefully, but it is not required to verify anything against the outside world. It can free-wheel, and sometimes it does.
Deep Research changes that by explicitly requiring the model to: search for relevant external sources; retrieve documents rather than invent content; ground any claims in those documents; reconcile conflicting information; and, most crucially, to cite where specific facts come from.
Instead of answering from memory and pattern alone, the system is constrained to build its response on top of retrieved material. That sharply reduces hallucination, because the model can no longer freely “make something that sounds right” when it doesn’t know.
It has to either find support — or fail. Which is deeply uncomfortable when it happens, but also the point.
This is also why Deep Research responses tend to be slower, less polished, and more conditional. They reflect the messiness of real information rather than the smoothness of generated prose.
If it sounds a little less confident, that’s a feature, not a bug. I have not had it hallucinate for me during Deep Research, but I still give it an explicit framing that my worst work enemy will nitpick — or something else where leaving no loopholes is its best way to be helpful.
But there’s a reason I still check.
Deep Research doesn’t make hallucination impossible. It narrows the space in which it can occur.
Hallucinations thrive when the model is rewarded for speed, fluency, and coherence. They shrink when the model is required to slow down, check sources, and anchor claims to something external and verifiable.
Think of it this way: ordinary chat is like thinking out loud. Deep Research is like doing a literature review.
One is useful for exploration. The other is useful for accuracy.
Confusing the two is how people get into trouble.
Because hallucination is not a rare bug. It’s a predictable consequence of using a generative system without grounding.
If you want brainstorming, drafting, or adversarial critique, ordinary chat can be extremely powerful.
If you want facts — especially facts that matter — you need retrieval, sources, and verification.
And above all:
Fluency is not evidence.
A confident answer is not a checked, verifide answer.
Authority Illusion (or: Calm Is Not the Same Thing as Right)
If you want to understand the final trap, think of Sam Harris.
Specifically: the tone. The cadence.
The tranquil, NPR-adjacent certainty.
The way he speaks as if emotional flatness were itself an argument, and calm delivery were evidence that the conclusions have already been vetted by Reason™.
Now imagine that energy, stripped of even the burden of being human, and scaled to industrial production.
That’s the authority illusion.
Large language models sound calm. They sound organized.
They produce clean paragraphs in a neutral, professional tone.
They don’t hesitate. They don’t say “uh.”
They don’t visibly struggle or wander off mid-sentence the way humans do when they’re out of their depth.
And our brains — poor, gullible things — interpret that composure as competence.
This is a mistake we keep making, over and over, with astonishing confidence.
The model doesn’t raise its voice. It doesn’t panic. It doesn’t break into a sweat.
So we subconsciously assume it knows what it’s talking about.
We are, as a species, extremely vulnerable to anything that sounds like it went to graduate school and learned how to speak slowly.
But here’s the problem: the model has no authority.
It has no professional judgment.
No lived experience.
No sense of consequences.
No skin in the game whatsoever.
It will happily give you legal analysis without being a lawyer, medical advice without being a doctor, and strategic recommendations without having ever had to explain them to a hostile room full of other humans.
There is no downside for it if things go wrong.
It will not be sued, fired, embarrassed, or forced to defend its reasoning under pressure.
You. Will.
This is why the authority illusion is more dangerous than hallucination. Hallucinations can sometimes be spotted. Authority is felt.
It bypasses skepticism and goes straight for trust.
The danger is not that the model is arrogant. The danger is that it isn’t.
It doesn’t project uncertainty unless you explicitly demand it.
It doesn’t naturally say, “This part is fragile,” or “A real expert would want to double-check this,” or “If you present this publicly, someone is going to eat you alive.”
It sounds finished.
And humans are very, very, very, very bad at questioning things that sound finished.
Humans are terrible, no good, very bad at that.
This is especially risky in professional settings. A clean, confident answer can slide straight into a slide deck, a memo, a brief, or a policy document without ever being challenged — because it looks like the sort of thing that must have been thought through.
Often, it hasn’t.
How to Break the Authority Illusion
The key move here is to demote the model in your own head.
AI is not an expert. It is a draft generator, a brainstorming partner, a tireless junior analyst who never gets offended — but also never gets promoted, because it cannot be trusted with final judgment.
Treat it accordingly.
Ask questions like: “Where is this weakest?” “What assumptions am I relying on here?” “What would someone with actual authority push back on?” “What would I have to defend, out loud, in public, with a smart but hostile interlocutor?”
And remember: if something sounds too smooth, that is not reassurance.
That is your cue to slow the fuck down.
The model’s confidence is a property of its output style, not a signal of reliability.
Or, more bluntly: it sounds like an expert because it’s very good at sounding like one.
That is not the same thing as being one.
Once you really internalize that — once you stop outsourcing your epistemic spine to a well-spoken autocomplete — AI becomes vastly more useful and far less dangerous.
And you get to keep your dignity.
Which, frankly, is underrated.



