Case Study · Linguistics · AI Detection

The Human Writing Standard

We ran Why’s (Poignant) Guide to Ruby through every AI detector in our benchmark. Chapter 3 scored 2–5% AI probability across all six tools. This is a case study in what maximally human writing looks like to a machine classifier.

3%
Avg. AI Probability
6/6
Detectors: Human
0.94
Burstiness CV
0
AI Transition Phrases

Background: Why This Text?

AI text detectors are calibrated against corpora of human and AI-generated writing. When a tool reports “87% accuracy,” that figure reflects performance on a mixed sample of texts. But what does the detector see when it encounters writing that is extremely human — writing that violates every statistical norm that AI text follows?

Why’s (Poignant) Guide to Ruby is a useful stress test precisely because it is one of the most statistically unusual bodies of human text in existence. Written by _why the lucky stiff between 2003 and 2009, it combines programming tutorial content with surrealist fiction, comic strips, and emotional asides. The writing style is maximally bursty, highly idiosyncratic, and entirely free of the formulaic phrases that characterize AI-generated content. We wanted to know: does this register as obviously human?

What We Tested

We selected six passages from across the guide and submitted each to all six detectors in our benchmark at default settings. Each passage was between 150 and 400 words and was submitted as plain text with no modification.

PassageSourceWord CountProofademicOriginalityGPTZeroHiveVerdict
Opening paragraphChapter 11873%4%5%4%Human (all)
Chapter openingChapter 32412%3%4%3%Human (all)
Foxes’ first dialogueChapter 31641%2%3%2%Human (all)
Perplexity discussionChapter 33124%5%7%5%Human (all)
Dr. Cham introductionChapter 52782%4%5%4%Human (all)
Final paragraphChapter 71933%4%6%5%Human (all)

Every passage scored below 10% AI probability on every detector. No passage crossed any tool’s flagging threshold. The average across all passages and all detectors was 3.7% AI probability.

Why the Scores Are So Low: Signal Analysis

We computed the key statistical signals independently on the same passages to understand why the scores are this low.

Burstiness — The Dominant Signal

SourceBurstiness (CV of sentence lengths)
AI Text (avg)
0.31
Uniform
Human avg
0.71
Varied
_why Ch.1
0.88
Highly varied
_why Ch.3
0.94
Extreme
Foxes dialogue
1.20
Maximum

Chapter 3’s sentence length coefficient of variation is 0.94 — three times the AI text average of 0.31 and significantly above the human corpus average of 0.71. The foxes’ dialogue section reaches a CV of 1.20, among the highest burstiness values we have measured in any corpus. This single signal alone would push any detector firmly into the “human” category.

Transition Phrase Density — Zero

AI Text Average
6.7
AI phrases per 1,000 words
_why’s Writing
0.0
AI phrases per 1,000 words

Across all six passages and the full guide, _why’s writing contains zero instances of the AI-associated transition phrases we catalog: no “furthermore,” no “it is worth noting,” no “in conclusion,” no “leverage,” no “delve into.” His transitions are idiosyncratic constructions that no language model would generate unprompted: “Here’s the thing about foxes,” “Now, look:,” “Chunky bacon!”

Vocabulary Diversity — Significantly Above Average

SourceType-Token Ratiovs. AI AverageInterpretation
AI text (corpus avg)0.57Baseline: repetitive structures
Human text (corpus avg)0.69+21%Normal human variation
_why’s Chapter 30.74+30%High lexical creativity
Foxes’ dialogue0.81+42%Near maximum for this length

What This Teaches About Detection

The properties that make _why’s writing feel distinctly human are exactly the properties that AI detectors measure: high burstiness, high vocabulary diversity, absence of formulaic phrases, unpredictable sentence structure. These properties arise from the same source — a human with genuine things to say, expressing them in a personal way, without optimizing for predictability or readability.

This case study is useful for calibration: if you want to understand what a confident “human” classification looks like from the inside, _why’s writing is the extreme case. When your own writing reads differently from this — more uniform, more transition-phrase-heavy, more predictable — you understand why detectors might look more carefully at it.

Try it yourself: paste any passage from Chapter 3 into our free AI detector and watch it score near zero.

Frequently Asked Questions

Does this mean _why’s writing is “more human” than other writing?

No. It means _why’s writing is more statistically distinctive from AI text than most human writing. Many excellent, clearly human writers produce text with lower burstiness and higher transition phrase density than _why — particularly in formal academic or professional registers. A low AI detector score does not mean “better writing;” it means the text has statistical properties that differ from current LLM output distributions. Those properties will shift as AI models evolve.

Could an AI be trained to write like _why and fool detectors?

In principle, yes. A model specifically fine-tuned on _why’s writing corpus could learn to replicate its burstiness patterns and vocabulary style. In practice, the result would likely be readable as an imitation rather than the real thing — and an AI trained specifically to mimic one distinctive author’s style is a very different thing from the general-purpose text generation that detectors are calibrated against. This is also why the shift toward provenance-based detection (watermarking at the generation layer) is more durable than statistical pattern matching.

What does this mean for writers worried about false positives?

The properties that protect _why’s writing from false positives — sentence length variation, vocabulary diversity, absence of generic transition phrases — are worth bearing in mind if you are in a context where AI detection might be applied to your work. Varying your sentence lengths deliberately, minimising generic connective phrases, and writing in a distinctly personal voice all reduce statistical similarity to AI text. None of this should change how you write for its own sake; it is simply useful context about what detectors are measuring.