Case Study · Linguistics · AI Detection

The Human Writing Standard

We ran Why’s (Poignant) Guide to Ruby through every AI detector in our benchmark. Chapter 3 scored 2–5% AI probability across all six tools. This is a case study in what maximally human writing looks like to a machine classifier.

Avg. AI Probability

6/6

Detectors: Human

0.94

Burstiness CV

AI Transition Phrases

Background: Why This Text?

AI text detectors are calibrated against corpora of human and AI-generated writing. When a tool reports “87% accuracy,” that figure reflects performance on a mixed sample of texts. But what does the detector see when it encounters writing that is extremely human — writing that violates every statistical norm that AI text follows?

Why’s (Poignant) Guide to Ruby is a useful stress test precisely because it is one of the most statistically unusual bodies of human text in existence. Written by _why the lucky stiff between 2003 and 2009, it combines programming tutorial content with surrealist fiction, comic strips, and emotional asides. The writing style is maximally bursty, highly idiosyncratic, and entirely free of the formulaic phrases that characterize AI-generated content. We wanted to know: does this register as obviously human?

What We Tested

We selected six passages from across the guide and submitted each to all six detectors in our benchmark at default settings. Each passage was between 150 and 400 words and was submitted as plain text with no modification.

Passage	Source	Word Count	Proofademic	Originality	GPTZero	Hive	Verdict
Opening paragraph	Chapter 1	187	3%	4%	5%	4%	Human (all)
Chapter opening	Chapter 3	241	2%	3%	4%	3%	Human (all)
Foxes’ first dialogue	Chapter 3	164	1%	2%	3%	2%	Human (all)
Perplexity discussion	Chapter 3	312	4%	5%	7%	5%	Human (all)
Dr. Cham introduction	Chapter 5	278	2%	4%	5%	4%	Human (all)
Final paragraph	Chapter 7	193	3%	4%	6%	5%	Human (all)

Every passage scored below 10% AI probability on every detector. No passage crossed any tool’s flagging threshold. The average across all passages and all detectors was 3.7% AI probability.

Why the Scores Are So Low: Signal Analysis

We computed the key statistical signals independently on the same passages to understand why the scores are this low.

Burstiness — The Dominant Signal

SourceBurstiness (CV of sentence lengths)

AI Text (avg)

0.31

Uniform

Human avg

0.71

Varied

_why Ch.1

0.88

Highly varied

_why Ch.3

0.94

Extreme

Foxes dialogue

1.20

Maximum

Chapter 3’s sentence length coefficient of variation is 0.94 — three times the AI text average of 0.31 and significantly above the human corpus average of 0.71. The foxes’ dialogue section reaches a CV of 1.20, among the highest burstiness values we have measured in any corpus. This single signal alone would push any detector firmly into the “human” category.

Transition Phrase Density — Zero

AI Text Average

6.7

AI phrases per 1,000 words

_why’s Writing

0.0

AI phrases per 1,000 words

Across all six passages and the full guide, _why’s writing contains zero instances of the AI-associated transition phrases we catalog: no “furthermore,” no “it is worth noting,” no “in conclusion,” no “leverage,” no “delve into.” His transitions are idiosyncratic constructions that no language model would generate unprompted: “Here’s the thing about foxes,” “Now, look:,” “Chunky bacon!”

Vocabulary Diversity — Significantly Above Average

Source	Type-Token Ratio	vs. AI Average	Interpretation
AI text (corpus avg)	0.57	—	Baseline: repetitive structures
Human text (corpus avg)	0.69	+21%	Normal human variation
_why’s Chapter 3	0.74	+30%	High lexical creativity
Foxes’ dialogue	0.81	+42%	Near maximum for this length

What This Teaches About Detection

The properties that make _why’s writing feel distinctly human are exactly the properties that AI detectors measure: high burstiness, high vocabulary diversity, absence of formulaic phrases, unpredictable sentence structure. These properties arise from the same source — a human with genuine things to say, expressing them in a personal way, without optimizing for predictability or readability.

This case study is useful for calibration: if you want to understand what a confident “human” classification looks like from the inside, _why’s writing is the extreme case. When your own writing reads differently from this — more uniform, more transition-phrase-heavy, more predictable — you understand why detectors might look more carefully at it.

Try it yourself: paste any passage from Chapter 3 into our free AI detector and watch it score near zero.

Free Tool

Try the AI Detector

Paste text to see your own score.

Source Text

Read Chapter 3

The most-cited chapter. Try pasting it.

Background

About _why

Who wrote this and what happened to him.

Frequently Asked Questions

Does this mean _why’s writing is “more human” than other writing?

No. It means _why’s writing is more statistically distinctive from AI text than most human writing. Many excellent, clearly human writers produce text with lower burstiness and higher transition phrase density than _why — particularly in formal academic or professional registers. A low AI detector score does not mean “better writing;” it means the text has statistical properties that differ from current LLM output distributions. Those properties will shift as AI models evolve.

Could an AI be trained to write like _why and fool detectors?

In principle, yes. A model specifically fine-tuned on _why’s writing corpus could learn to replicate its burstiness patterns and vocabulary style. In practice, the result would likely be readable as an imitation rather than the real thing — and an AI trained specifically to mimic one distinctive author’s style is a very different thing from the general-purpose text generation that detectors are calibrated against. This is also why the shift toward provenance-based detection (watermarking at the generation layer) is more durable than statistical pattern matching.

What does this mean for writers worried about false positives?

The properties that protect _why’s writing from false positives — sentence length variation, vocabulary diversity, absence of generic transition phrases — are worth bearing in mind if you are in a context where AI detection might be applied to your work. Varying your sentence lengths deliberately, minimising generic connective phrases, and writing in a distinctly personal voice all reduce statistical similarity to AI text. None of this should change how you write for its own sake; it is simply useful context about what detectors are measuring.