textmetrics/readability

Canonical English-language readability scores.

All six scores in this module — Flesch Reading Ease, Flesch–Kincaid Grade Level, Gunning Fog, SMOG, Automated Readability Index, and Coleman–Liau Index — are computed from the count primitives exposed by textmetrics/count. The functions are pure, deterministic, and O(n) in input length.

Returned scores are Float values; callers should round or quantise to fit their reporting needs. Empty or extremely small inputs return ReadabilityError instead of non-finite numbers.

The syllable counter consumed by these formulas is an English-tuned heuristic (see count.syllables_in_word). Non-English text will produce scores that match textstat’s fallback behaviour but should not be interpreted as meaningful grade-level estimates.

Reference scores produced by these implementations agree with Python textstat (the de-facto reference) to within roughly ±2 on the Reading Ease 0–100 scale and ±1 on grade-level scales, over a corpus of fixtures from the Wikipedia readability articles.

Types

Errors returned when a readability score cannot be computed because the input does not meet the minimum-size precondition of the underlying formula.

pub type ReadabilityError {
  TooFewWords(at_least: Int, got: Int)
  TooFewSentences(at_least: Int, got: Int)
}

Constructors

  • TooFewWords(at_least: Int, got: Int)

    The input had fewer than at_least words. got is the actual count.

  • TooFewSentences(at_least: Int, got: Int)

    The input had fewer than at_least sentences. got is the actual count.

Values

pub fn automated_readability_index(
  text: String,
) -> Result(Float, ReadabilityError)

Automated Readability Index (ARI), Smith & Senter (1967).

4.71 × (characters/words) + 0.5 × (words/sentences) − 21.43

The characters count is letters + digits + accented graphemes (i.e. count.characters) and excludes whitespace and punctuation. ARI is the only formula in this module that treats digits as score-bearing characters; texts containing large numeric runs will score correspondingly higher.

pub fn coleman_liau_index(
  text: String,
) -> Result(Float, ReadabilityError)

Coleman–Liau Index, Coleman & Liau (1975).

0.0588 × L − 0.296 × S − 15.8

where

  • L = average number of letters per 100 words (characters / words × 100)
  • S = average number of sentences per 100 words (sentences / words × 100)

The output approximates the US grade level expected to read the text comfortably. Like ARI, this formula uses the count.characters definition (letters

  • digits), so digit-heavy text scores slightly higher than its pure-prose equivalent.
pub fn flesch_kincaid_grade(
  text: String,
) -> Result(Float, ReadabilityError)

Flesch–Kincaid Grade Level.

0.39 × (words/sentences) + 11.8 × (syllables/words) − 15.59

The output approximates the US school grade required to comprehend the text. Negative scores are returned as-is and signal text below 1st-grade complexity.

pub fn flesch_reading_ease(
  text: String,
) -> Result(Float, ReadabilityError)

Flesch Reading Ease, original Flesch (1948) formula.

206.835 − 1.015 × (words/sentences) − 84.6 × (syllables/words)

Higher is easier. The classic interpretation bands:

  • 90–100 — 5th grade reader
  • 80–90 — 6th grade
  • 70–80 — 7th grade
  • 60–70 — 8th–9th grade (“plain English”)
  • 50–60 — 10th–12th grade
  • 30–50 — college
  • 0–30 — college graduate

The raw formula is not clamped, so unusually short or syllable-poor text can produce scores above 100 (and unusually dense academic prose can produce scores below 0).

Returns TooFewWords for input with no words, and TooFewSentences for input with no sentence-shaped content.

pub fn gunning_fog(
  text: String,
) -> Result(Float, ReadabilityError)

Gunning Fog Index — Robert Gunning (1952).

0.4 × ((words/sentences) + 100 × (polysyllables/words))

A polysyllable here is a word with three or more syllables. The original Gunning rules excluded proper nouns, hyphenated compounds, and inflected forms (-es / -ed / -ing) — this implementation follows Python textstat and does not apply those exclusions, so scores match textstat rather than the strict 1952 paper. Callers needing the strict variant can subtract their own exclusion count from count.polysyllables before applying the formula directly.

Output approximates the years of formal education required to understand the text on first reading.

pub fn smog(text: String) -> Result(Float, ReadabilityError)

Simple Measure of Gobbledygook (SMOG), McLaughlin (1969).

1.043 × sqrt(polysyllables × (30/sentences)) + 3.1291

SMOG is statistically reliable only for texts of 30 sentences or more — McLaughlin’s regression was calibrated on samples of that size, and applying the formula to shorter passages compounds estimation error. This implementation therefore returns TooFewSentences when the input has fewer than 30 sentences.

Search Document