textmetrics/readability

Canonical English-language readability scores.

All six scores in this module — Flesch Reading Ease, Flesch–Kincaid Grade Level, Gunning Fog, SMOG, Automated Readability Index, and Coleman–Liau Index — are computed from the count primitives exposed by textmetrics/count. The functions are pure, deterministic, and O(n) in input length.

Returned scores are Float values; callers should round or quantise to fit their reporting needs. Empty or extremely small inputs return ReadabilityError instead of non-finite numbers.

The syllable counter consumed by these formulas is an English-tuned heuristic (see count.syllables_in_word). Non-English text will produce scores that match textstat’s fallback behaviour but should not be interpreted as meaningful grade-level estimates.

Reference scores produced by these implementations agree with Python textstat (the de-facto reference) to within roughly ±2 on the Reading Ease 0–100 scale and ±1 on grade-level scales, over a corpus of fixtures from the Wikipedia readability articles.

Types

Errors returned when a readability score cannot be computed because the input does not meet the minimum-size precondition of the underlying formula.

pub type ReadabilityError {
  TooFewWords(at_least: Int, got: Int)
  TooFewSentences(at_least: Int, got: Int)
}

Constructors

  • TooFewWords(at_least: Int, got: Int)

    The input had fewer than at_least words. got is the actual count.

  • TooFewSentences(at_least: Int, got: Int)

    The input had fewer than at_least sentences. got is the actual count.

Values

pub fn automated_readability_index(
  text: String,
) -> Result(Float, ReadabilityError)

Automated Readability Index (ARI), Smith & Senter (1967).

4.71 × (characters/words) + 0.5 × (words/sentences) − 21.43

The characters count is letters + digits + accented graphemes (i.e. count.characters) and excludes whitespace and punctuation. ARI is the only formula in this module that treats digits as score-bearing characters; texts containing large numeric runs will score correspondingly higher. The result is clamped to [0.0, 18.0]; use automated_readability_index_unbounded for the raw value.

pub fn automated_readability_index_unbounded(
  text: String,
) -> Result(Float, ReadabilityError)

Automated Readability Index without the [0.0, 18.0] clamp.

pub fn coleman_liau_index(
  text: String,
) -> Result(Float, ReadabilityError)

Coleman–Liau Index, Coleman & Liau (1975).

0.0588 × L − 0.296 × S − 15.8

where

  • L = average number of letters per 100 words (characters / words × 100)
  • S = average number of sentences per 100 words (sentences / words × 100)

The output approximates the US grade level expected to read the text comfortably. Like ARI, this formula uses the count.characters definition (letters

  • digits), so digit-heavy text scores slightly higher than its pure-prose equivalent.

The result is clamped to [0.0, 18.0]; use coleman_liau_index_unbounded for the raw value.

pub fn coleman_liau_index_unbounded(
  text: String,
) -> Result(Float, ReadabilityError)

Coleman–Liau Index without the [0.0, 18.0] clamp.

pub fn flesch_kincaid_grade(
  text: String,
) -> Result(Float, ReadabilityError)

Flesch–Kincaid Grade Level.

0.39 × (words/sentences) + 11.8 × (syllables/words) − 15.59

The output approximates the US school grade required to comprehend the text. The result is clamped to [0.0, 18.0] (US K–12 plus graduate range) so synthetic inputs cannot produce -2.88 or 49+. Use flesch_kincaid_grade_unbounded for the raw value.

pub fn flesch_kincaid_grade_unbounded(
  text: String,
) -> Result(Float, ReadabilityError)

Flesch–Kincaid Grade Level without the [0.0, 18.0] clamp.

pub fn flesch_reading_ease(
  text: String,
) -> Result(Float, ReadabilityError)

Flesch Reading Ease, original Flesch (1948) formula.

206.835 − 1.015 × (words/sentences) − 84.6 × (syllables/words)

Higher is easier. The classic interpretation bands:

  • 90–100 — 5th grade reader
  • 80–90 — 6th grade
  • 70–80 — 7th grade
  • 60–70 — 8th–9th grade (“plain English”)
  • 50–60 — 10th–12th grade
  • 30–50 — college
  • 0–30 — college graduate

The result is clamped to [0.0, 100.0] to match the standard reporting convention used by Wikipedia, Microsoft Word, Python textstat’s default, and most readability UIs. Use flesch_reading_ease_unbounded when you need the raw formula output (which can exceed 100 for unusually short or syllable-poor text, and drop below 0 for unusually dense academic prose).

Returns TooFewWords for input with no words, and TooFewSentences for input with no sentence-shaped content.

pub fn flesch_reading_ease_unbounded(
  text: String,
) -> Result(Float, ReadabilityError)

Flesch Reading Ease without the standard [0.0, 100.0] clamp. Returns the raw 206.835 − 1.015 × (words/sentences) − 84.6 × (syllables/words) value, which can exceed 100 for unusually short text and drop below 0 for unusually dense prose.

pub fn gunning_fog(
  text: String,
) -> Result(Float, ReadabilityError)

Gunning Fog Index — Robert Gunning (1952).

0.4 × ((words/sentences) + 100 × (polysyllables/words))

A polysyllable here is a word with three or more syllables. The original Gunning rules excluded proper nouns, hyphenated compounds, and inflected forms (-es / -ed / -ing) — this implementation follows Python textstat and does not apply those exclusions, so scores match textstat rather than the strict 1952 paper. Callers needing the strict variant can subtract their own exclusion count from count.polysyllables before applying the formula directly.

Output approximates the years of formal education required to understand the text on first reading. The result is clamped to [0.0, 18.0]; use gunning_fog_unbounded for the raw value.

pub fn gunning_fog_unbounded(
  text: String,
) -> Result(Float, ReadabilityError)

Gunning Fog Index without the [0.0, 18.0] clamp.

pub fn smog(text: String) -> Result(Float, ReadabilityError)

Simple Measure of Gobbledygook (SMOG), McLaughlin (1969).

1.043 × sqrt(polysyllables × (30/sentences)) + 3.1291

SMOG is statistically reliable only for texts of 30 sentences or more — McLaughlin’s regression was calibrated on samples of that size, and applying the formula to shorter passages compounds estimation error. This implementation therefore returns TooFewSentences when the input has fewer than 30 sentences.

pub fn smog_g(text: String) -> Result(Float, ReadabilityError)

SMOG-G — the same formula as SMOG, applied to texts shorter than 30 sentences via the same 30 / sentences scaling already used inside SMOG. Issue #23: real-world snippets (a Wikipedia paragraph, a press release, a tweet, an email) almost never have 30 sentences, so the strict SMOG gate rules them all out. SMOG-G drops the gate and returns the extrapolated grade for any non-empty input with at least one sentence.

Use smog when you have 30+ sentences and need the statistically calibrated form; use smog_g for everything else. The two agree to within ~1 grade for 30+ sentences.

Search Document