
Most of us want our deeply held beliefs confirmed. Why?
Because we believe those beliefs reflect the truth. Confirmation feels good. It’s soothing. Comforting. It reassures us that our view of the world is not only correct but virtuous.
That desire, while entirely human and entirely understandable, is also deeply dangerous. It makes us vulnerable to one of the most common and corrosive cognitive traps: confirmation bias.
And confirmation bias is probably the best way to step on a rake.
Consider the “very fine people” controversy surrounding President Trump’s 2017 remarks about Charlottesville. During the 2024 campaign, at least half a dozen readers emailed me that infamous clip. They were trying to warn me that I was, even if reluctantly and only in a lesser-of-two-evils kind of way, supporting a racist presidential candidate.
I sent the same reply to each of them, telling them that the clip they sent me was clipped out of context and that in the very next breath, Trump says “I’m not talking about the Neo-Nazis and the white supremacists, because they should be condemned totally.”
And I added — in complete candor and good faith — that I was fully aware of Trump’s characterological weaknesses. He’s a hedonist, a womanizer, and it wouldn’t shock me if he were a racist too. Did they have any actual evidence of racism to send me?
None of them ever did.
Their belief that Trump is racist had been emotionally confirmed by a misleading clip. And that was enough. They didn’t even try to find real evidence, even when I told them I was open to persuasion.
That’s confirmation bias in action.
And confirmation bias gets even more seductive when it wears a lab coat.
When it comes in the form of a “study” that seems to back up what you already believe.
But do you actually know how to read a study? Could you tell the difference between a study that proves something and one that just... implies it?
If your answer is “no,” then you’re at serious risk of clinging to bullshit — of comforting yourself with falsehoods — the same way the Wokes do when they pass around that misleading Trump clip.
And when you try to argue with someone who does know how to evaluate a study, you’re going to look just as gullible. Just as sloppy. Just as desperate to be reassured.
The rest of this post attempts to teach you how to read a study, evaluate its methods, and decide how much weight its conclusions really deserve.
The Study That Went Substack-Viral
An article about a study titled “They Don’t Read Very Well,” recently went Substack-viral. It was written by a kind person; please don’t pile on.
It was conducted by two current university professors and one “semi-retired” former professor. It was funded by a university. By the standards of most of the center-right and right folks on Substack, whose work usually starts from the premise that universities are Communist kindergartens incapable of producing anything more creative than a protest song and more valuable than a box of hair, it should have been dismissed out of hand.
But because the study appeared to show something damning about universities — that English majors are, in fact, terrible readers — all that usual skepticism vanished.
Most of these people would look out the window before believing a college professor who said the sky was blue. But they pounced on this study like it had descended from Sinai.
I did examine the study — closely. And it has many flaws.
Before I go further, let me clarify: a flawed study can still be valuable. You just have to weigh it properly.
A replicated study with a large, representative sample, rigorous design, double-blind controls, and statistically significant results should be weighed far more heavily than a non-replicated study with a small, narrow sample and weaker methodology.
The former can prove something.
The latter can, at best, suggest something.
And often, researchers don’t have the resources to do the gold-standard kind of study — which is fine. You can still learn something from lower-powered or exploratory research.
I repeat: a study that can only suggest is not worthless.
But you have to know the difference. And you have to be willing to say the difference out loud.
That’s one of the clearest signs of an honest researcher: they’ll tell you how limited their own study is. They’ll highlight its flaws. They’ll explain how seriously (or not) to take the results.
And if they’re doing it right, they’ll err slightly on the side of humility. When you finish reading their methods and limitations section, you should find yourself thinking: “Honestly, they’re being a little too hard on themselves.”
But to know whether a study deserves that kind of trust, you need to understand what you’re looking at.
So let’s look at this one.
Facts About Their Sample
They give us information about their sample in percentage terms, which is standard but can be inadvertently misleading. I’ve done the math to get actual numbers for our examination, and am rounding up or down to the nearest person. For example, their study reports that 19% of participants were sophomores. With 85 participants total, that works out to 16.15, so we’ll call that 16.
Here’s what their sample looked like:
Majors
35 English Education majors
50 English majors with an emphasis like literature or creative writing.
Class Standing:
30 seniors
29 juniors
16 sophomores
3 freshmen
7 unknown
Sex:
57 female
28 male
ACT Scores:
The national ACT reading score average was 21.4 for the incoming freshmen in 2015, the year of the study. (Out of a possible 36.)
The average ACT reading score for incoming freshmen from both universities was 22.4.
But here’s the thing: the authors don’t provide the ACT scores of the actual participants. They also don’t tell us the average score for English majors specifically, at either university.
So what’s the point of giving that 22.4 figure?
It seems designed to suggest that these students were “better-than-average” readers. But that’s a misleading impression — whether it was intentional or not. That 22.4 figure represents all incoming freshmen at those schools, not the 85 English majors who participated in the study, which had only 3 freshmen anyway. We have no idea how strong (or weak) the ACT scores of the actual test group were.
Other details:
Their sample was described as “almost all” white and “almost all” graduates of Kansas public high schools.
58 students were chosen from one Kansas regional university (KRU1) and 27 from another (KRU2), neither of which is named.
At KRU1, students were “volunteers” from seven English classes. At KRU2, the researchers “set up outside the English Department and asked individual students to participate.”
Evaluation of This Sample
Frankly, this is a joke. Let’s start with the recruitment methods.
They don’t define what they mean by “volunteers.” They give no details about the classes beyond “English.” Were these general ed classes? Advanced seminars? Classes for future teachers? We have no idea.
And that’s a problem — because how those classes were chosen matters. A lot.
Did professors offer extra credit for participating? That would bias the sample toward lower-performing students who need a grade boost, and away from high-performing students who don’t.
Were volunteers recruited from early morning classes — more likely to have serious, non-party-animal students? Evening classes — more likely to have mature adults going back to school while working? Lecture halls, or seminar rooms? Was the class instructor present when recruitment happened? Every one of those factors could tilt the participant pool — and they tell us nothing.
At KRU2, the approach was even worse: research assistants stood outside the department and asked individual students to participate.
Let’s be honest: if you send undergrad research assistants to wander around campus asking people to participate, who are they most likely to approach? Someone friendly-looking. Approachable. Probably female. Probably someone who makes eye contact.
Am I saying that cute girls are worse readers? No.
But let’s be real: people who are not conventionally attractive, who are socially awkward, eccentric, or simply shy often take refuge in books — and this recruitment method makes it less likely they were sampled. It skews toward extroversion, availability, and physical approachability.
Self-selection also introduces serious bias. Some diagnostic questions to make that clear:
Are bookworms more often introverts or extroverts?
Who’s more likely to raise their hand when a professor asks for volunteers — the quiet A-student or the guy who talks over everyone in class?
Who tends to have better focus: students who juggle a job and school, or those who don’t?
Who has time to do extra, unpaid work for a stranger with a clipboard?
And here’s the key one:
Would the most serious English majors — the ones who actually do the reading, carefully and completely — be more or less likely to have time to volunteer?
If you want to believe the results of the study and you think I’m being too rigid on this, here’s a thought experiment. Suppose the students had been recruited the same two ways — self-selected from classrooms, or approached on foot — but from the College Republicans club instead of English classes.
Would you trust the results of that study as a window into the average young conservative’s reading ability? Would you conclude that there is a terrible illiteracy crisis among young conservatives?
Or would you suddenly be very concerned about the sampling bias?
Would you demand more detail about the participant pool? Would you object if the study’s conclusions were generalized across all conservative students?
Be honest.
Now, here's a subtle but important wrinkle:
The authors were studying reading skill. And yet 49 of the 85 participants — 58% — “often described their reading process as skimming and/or relying on SparkNotes.”
That’s a striking number. It absolutely is worth further study that many English majors apparently don’t read in full. That’s a real and serious problem — one worthy of investigation on its own terms.
But here’s the kicker: they wanted to test reading skill, and they made no effort to screen for whether the students they sampled actually read books.
That’s like designing a study to test sprinting speed and then recruiting people who don’t own running shoes.
They tested the reading skill of students who don’t actually read. That’s not a control group — that’s just poor design. At minimum, the study should have either screened participants for reading habits or controlled for that variable in analysis. It did neither.
Let’s move now to how they actually tested “reading skill” — and what that phrase meant in practice.
Before I Critique Their Methods
Before I evaluate their methods, I want to acknowledge something up front: designing education studies is not easy. It’s not trivial. It’s not plug-and-play.
There are a lot of reasons for this, and they vary at every level. With young children, for example, parental involvement, home environment, trauma, nutrition, sleep — all of it matters, and all of it is well beyond what most teachers can compensate for. As kids grow, some of those external forces fade, but others intensify: peer influence, mental health, the shift from extrinsic to intrinsic motivation. It’s messy. It's human. It doesn't fit neatly into spreadsheets.
But the single biggest challenge — the one that cuts across every age and level — is this: individual teachers create the classroom experience, and individual teachers vary wildly.
Forgive me if this sounds arrogant, but I have enough experience to say this confidently: I’m an excellent teacher. And I think there are two ways great teaching happens.
I represent the first: a lucky overlap of two traits.
First, I love mathematics. That’s the passion half.
Second, I had to fight to become highly skilled at mathematics. I wasn’t born a math prodigy. I worked. I failed. I climbed. Which means I can spot where a student is stuck — often with just a few questions. That’s theory-of-mind.
Passion and the ability to mentally model where a student is combine to make for powerful teaching.
My friend
is even better. He’s taught me many things, none of which he loves as much as I love math. But he’s that second kind of excellent teacher: the natural. Some people are born with perfect pitch. Some with unerring hand-eye coordination. Josh was born to teach.Here’s why that matters: if you wanted to prove some harebrained educational scheme was brilliant, it wouldn’t be hard. You’d just pay me and Josh — or people like us — to implement it during the semester you gathered your data.
Good teachers can make almost anything look good.
Bad teachers can make even great curricula fail.
But even that isn’t the whole story — because the students’ motivation still matters, at least as much as the quality of teaching, and sometimes more.
You can put a brilliant teacher in front of a disengaged, indifferent student, and they’ll only get so far. A highly motivated student, on the other hand, can learn in almost any environment — they’ll scrape together knowledge from handouts, YouTube, PDFs, Discord, wherever they can.
All of this complicates the work of trying to measure whether students are actually learning — and what they’re capable of. Especially when you’re talking about something like reading skill, which depends not just on ability but on habit, on discipline, on whether they’ve trained their brains to focus and absorb.
And that brings us back to this study.
Critique of Their Methods
Most of the participants were graduates of Kansas public high schools — a system that likely offered highly variable preparation, navigated with equally variable levels of effort.
That context matters.
If you want to measure college students’ reading skills, you first have to grapple with the fact that many of them may not actually read. You have to know what you’re measuring, and what you’re not, and design your methods accordingly.
This study didn’t do those things.
To their credit, they did try to supplement with an external reading test.
And here’s where things get even more tangled: the students took the Degrees of Reading Power test — a standardized reading assessment designed to measure 10th-grade level literacy. According to the authors, 64% scored in the 90–100 range, which is objectively solid. Another 17% scored between 80–89%. That means more than 80% of these students did quite well on a conventional, national literacy test.
But the authors then tell us that 59% of the students they labeled as “problematic readers” scored above 90% on that same test. So what does that tell us? One of two things — and maybe both: either the DRP test doesn’t measure the kind of reading skill they care about, or their own test is picking up something else entirely — stress response, lack of motivation, unfamiliarity with the format, performance anxiety, whatever.
You can’t have it both ways. If these students can crush a nationally normed 10th-grade literacy test but still fail your custom task, then the takeaway shouldn’t be they can’t read. The takeaway should be: you’re measuring something different — and you need to be honest about what that is.
The authors used a think-aloud protocol: each student was asked to read the first seven paragraphs of Bleak House out loud — sentence by sentence — and then paraphrase each sentence in plain English. This happened in a one-on-one, recorded session with a facilitator who wasn’t allowed to offer help but was instructed to prompt for interpretation every few sentences. Students were explicitly told they could use dictionaries, phones, or outside websites as needed. They were also told they didn’t have to finish the entire passage.
Afterward, the researchers transcribed and coded the recordings using 18 codes for narration issues (like skipping or mispronouncing words) and 62 codes for comprehension (like misunderstanding a metaphor or failing to identify a setting). Based on this analysis, they divided students into three categories:
Proficient (4 students, 5% of participants): could interpret most of the literal prose, use vocabulary correctly, recognize figurative language, and demonstrate recursive reading strategies — meaning they circled back and corrected themselves when confused.
Competent (32 students, 38% of participants): could grasp some vocabulary and figures of speech, but often resorted to vague generalizations or guessed incorrectly without checking or revising.
Problematic (49 students, 58% of participants): struggled with literal comprehension, misread figurative language, skipped difficult sentences, and often summarized entire paragraphs with comments like “there’s just fog everywhere.”
The distinction wasn’t just about accuracy — it was about reading behavior. Proficient students stayed engaged with the text, asked questions, and used resources effectively. Problematic readers simplified, guessed, or gave up — and then, bafflingly, claimed they’d have no trouble reading the entire novel.
There are several serious problems with the method used in this study, even if you take their goals at face value. First, not everyone is comfortable reading aloud, especially in a high-pressure one-on-one setting with a stranger holding a clipboard. Reading aloud is a performance skill, not just a literacy skill. It layers on anxiety, self-consciousness, and the distraction of hearing your own voice. Any or all of these factors can impair comprehension. If a student stumbles over words or loses their place while reading out loud, that doesn’t necessarily mean they don’t understand the text. It might just mean they’re nervous.
Second, the demand to interpret sentence-by-sentence in real time is not how anyone reads naturally. Competent readers often get the gist of a paragraph first, then circle back to untangle the details — especially with dense, figurative prose like Dickens. Forcing students to pause after each sentence and explain it immediately isn’t testing their reading skill; it’s testing their ability to verbalize an interpretation on the spot, without context or momentum. That’s not the same thing. Add to that the artificiality of a one-off test with no stakes, limited engagement, and no follow-up — and what you’re really measuring isn’t reading. It’s compliance and performance under pressure.
Their Actual Findings
The authors don’t include a limitations section. That’s a huge problem.
It suggests they believe this study is far more solid and broadly applicable than it actually is. When evaluating a study, this omission shouldn’t just raise a red flag.
It should make you wonder if there’s a red flag factory operating behind the scenes.
Given the many weaknesses in this study, I’m not going to comment on the results themselves, because I don’t think they mean very much.
But I will comment on what I think the actual value of this study is.
To me, the most revealing part of this study isn’t the reading scores. It’s the wild overconfidence. Fifty-eight percent of the participants were labeled “problematic readers” — meaning they struggled with literal comprehension, skipped unknown words, butchered metaphors, and gave up mid-sentence — yet every single one of them still said they could read Bleak House on their own.
They couldn’t.
But they thought they could. And that gap between competence and confidence is worth paying attention to.
It’s also worth noting that students were explicitly allowed to use phones, Google, and outside websites during the test. Even with unlimited access to the full informational power of the internet, most of them still couldn’t demonstrate basic reading proficiency.
That’s not a knock on their intelligence; it’s a knock on the myth that the internet enables self-teaching, and that digital natives are unusually skilled at this.
I’ve taught myself a ton of math, coding, and technical skills online, so I know that I am biased. I know I’m risking war in the Substack Notes trenches by saying this out loud. But I think this speaks to something deeper about the nature of STEM vs other disciplines.
In math, you get the right answer or you don’t. In code, it compiles or it doesn’t. There's no rhetorical fluff to hide behind. No vibes. Just output. Yes, LLMs can write boilerplate code and help you debug or clean up syntax. But they don’t write good code. They don’t design clever structures. They’re surprisingly bad at logic. Maybe that’ll change — maybe not. Either way, it’ll still take skilled, insightful humans to use them well, which means humans who can code well themselves.
That clarity — that feedback loop — makes the internet an incredible tool for self-teaching STEM. But in literature and the humanities? It seems like the same digital natives who’ve been online since birth haven’t absorbed the most basic habits of digital resourcefulness.
They don’t even seem curious. They’re not trained to double-check, cross-reference, or dig.
They skim. They SparkNotes. They shrug and move on.
So yes, the study is methodologically weak. Yes, the sample is flawed. But there is value here: just not the value the authors think they found.
What we’re really looking at is a one-time snapshot (which is the important caveat to remember) of typical middle-American college kids: overconfident, under-skilled, and astonishingly lazy about using the tools literally at their fingertips.
If you want to sound the alarm, don’t make it about reading since these kids don’t read anyway, and the study neither screened nor controlled for that.
Make it about a generation that doesn’t read. And doesn’t seem to know how to look things up, either.
How To Evaluate A Study
So how do you actually evaluate a study? I hope what I just wrote serves as a decent model, but here’s a checklist of sorts. A few core principles — none of which require a PhD, just some basic skepticism and a willingness to slow down.
1. Look at the sampling methods, size, and demographics.
Small or narrow samples aren’t automatically a problem — as long as the authors acknowledge the limitations and don’t pretend their results apply to the entire population. In this case, the biggest issue isn’t that they only had 85 students. It’s that they tried to draw sweeping conclusions about English majors based on 85 self-selected students from two regional Kansas universities — most of whom don’t actually read.
Now imagine the same study done with 2,000 students across a range of schools — state schools, private liberal arts colleges, an Ivy. Even with some methodological flaws, the sheer size and diversity of the sample would make the findings more stable, more generalizable, and less likely to be distorted by quirks of who showed up. That’s the kind of thing you want to keep in mind: the smaller and more homogenous the group, the more cautious you should be in accepting the results at face value.
2. Ask yourself a question that cuts against your bias.
For this study, I posed the College Republicans question: would people on the right trust the results if the same self-selection and walk-up-to-individuals-and-ask methods were used to sample from a College Republican club? Would you suddenly care a lot more about who volunteered, who got approached, and how the sample might be skewed?
Be honest about it. If the answer is “I’d throw it out entirely,” then you should weigh this study lightly too. If the answer is “I’d still consider it, but cautiously,” great — that’s the level you should use here as well. This is the fastest way to check your own blind spots.
3. Think about whether the methods make sense — especially if it’s something you know.
In this case, if you’ve read Dickens, or tried to teach reading (even to just your own kids), or even just been a student in a literature class, you know how variable that experience can be. So ask: does it make sense to use a think-aloud method, with no screening for actual reading habits, and still try to measure reading skill? Is reading Dickens aloud for 20 minutes in front of a stranger really a valid way to measure a student’s ability to comprehend long-form literary prose?
Even if your answer is “maybe,” the question itself will slow you down and help you engage more critically.
4. If it’s a topic you don’t understand well, look for signs of good experimental design.
Was the study single-blind or double-blind? That’s your cue for quality control.
Single-blind means the participants didn’t know what the researchers were looking for, so they couldn’t consciously or unconsciously skew their answers.
Double-blind means the researchers themselves didn’t know which group the participants were in when measuring outcomes — which helps avoid unintentional bias in interpretation.
This study was neither. The facilitators knew what was being tested. The students knew they were being recorded. There was no real attempt to control for bias on either side.
5. Always check for a limitations section — and if it’s not there, be suspicious.
This is a big one. No study is perfect. Every study has flaws. So when researchers are upfront about what those flaws are — sample size, generalizability, measurement quirks — it’s a sign they’re taking the work seriously. It means they’re not trying to sell you something.
When that section is missing, like it is here? That’s not just a red flag — that’s a bad smell.
6. Look at what the study claims, not just what it finds.
Sometimes the methods are fine and the findings are real — but the claims go way beyond the evidence. Watch for slippery generalizations or sweeping cultural takes based on narrow data.
In this study, the authors found that some students at two regional universities had trouble reading Bleak House. That’s it. But they leap from that to a broader narrative about diploma inflation, national literacy collapse, and long-term career failure. That’s not analysis. That’s marketing.
You can trust a finding without trusting the framing. Learn to separate them.
7. Ask: what isn’t being measured?
Every study makes trade-offs. What’s being left out?
Here, they measured students reading out loud — not silent comprehension, not annotation, not essay writing, not long-term retention. They didn’t track whether students got better over time. They didn’t test for whether students even read books regularly. So while it’s tempting to say this proves “English majors can’t read,” what it actually proves is that this group struggled with a very specific task, under very specific conditions, at a single moment in time.
A study can’t tell you what it didn’t measure. Always ask what’s missing.
Conclusion: A Little Knowledge, a Lot of Rakes
I’m happy to go deeper on this study — or others like it — in the future. Feel free to comment or email if you have one in mind.
But let me be blunt: analyzing a study properly takes real time, real effort, and a non-zero amount of sanity preservation. So if I do go further, it'll probably be behind the paywall.
To recap: this particular study does not prove that English majors can’t read. It shows that a small, self-selected, and poorly controlled group of English majors — many of whom don’t actually read — struggled to read a dense, symbolic passage from Bleak House even when they were allowed to use Google.
That’s not a literacy crisis. That’s a motivation and methodology crisis.
Now, to be as unambiguous as possible: I’m not saying we don’t have a literacy problem in this country. I suspect we probably do. And I’m perfectly willing to be persuaded of that — by real data. Just like I was open to being persuaded that Trump is racist.
But this? This is no more proof of mass illiteracy among English majors than the “very fine people” clip is proof that Trump is a white supremacist. It’s emotionally satisfying, not intellectually rigorous.
The deeper value of this study lies in what it unintentionally reveals — about the yawning gap between how competent students think they are and how competent they actually are. And about how badly the internet is failing as a self-teaching tool in disciplines where there’s no binary feedback loop. These students had every resource at their fingertips and still floundered.
But the real point of this essay isn’t even about them.
It’s about you and me.
It’s about how easy it is to fall for something that confirms what you already believe. It feels good. It’s emotionally satisfying. It saves time and mental energy. And it’s also how you end up spreading garbage.
Stepping on a rake you didn’t even see coming.
This is human. It’s not a left-wing problem or a right-wing problem. It’s a people problem. But it’s one we can get better at recognizing — and resisting — if we know what to look for.
If you found this post useful, you might also like my How to Not Suck at Math series. It’s exactly what it sounds like: I start at the very beginning — counting, place value, number sense — and work my way up. We’re currently working through basic algebra, explained in plain English, without any of the jargon or shame that ruins math for so many people.
Thanks for reading — and for wanting to get better at thinking clearly.
This post referred to my series, “How to Not Suck At Math.” The posts in that series are listed here. The first five are available to everyone; parts six and onward are behind the paywall. If you cannot afford a paid subscription, email hollymathnerd at gmail dot com and I will give you a year’s access for free.
Now I'm really curious about the difference between mathematical competence among mathematics major and among mathematics education majors.
I have my suspicions … but I have no data.
Thank you Holly, that was a very clear and useful article. The following strikes me as a very important addtional point:
“They don’t even seem curious. They’re not trained to double-check, cross-reference, or dig.
They skim. They SparkNotes. They shrug and move on.”
And this might be the key educational issues right there, for the people in the study anyway:
1. Their intellectual curiosity was either never developed or was suppressed.
2. They were never trained to employ these techniques, which I think are crucial to a thinking being, particularly an adult.
3. They are allowed to get by using measures which will not serve them well later on.
What could be the key factor linking all of these issues? I posit bad or lazy teachers played a big role. Perhaps the authors of this study simply cannot or do not want to see that.