textmetrics/readability
Canonical English-language readability scores.
All six scores in this module — Flesch Reading Ease, Flesch–Kincaid
Grade Level, Gunning Fog, SMOG, Automated Readability Index, and
Coleman–Liau Index — are computed from the count primitives
exposed by textmetrics/count. The functions are
pure, deterministic, and O(n) in input length.
Returned scores are Float values; callers should round or
quantise to fit their reporting needs. Empty or extremely small
inputs return ReadabilityError instead of
non-finite numbers.
The syllable counter consumed by these formulas is an
English-tuned heuristic (see
count.syllables_in_word).
Non-English text will produce scores that match textstat’s
fallback behaviour but should not be interpreted as meaningful
grade-level estimates.
Reference scores produced by these implementations agree with
Python textstat (the de-facto reference) to within roughly
±2 on the Reading Ease 0–100 scale and ±1 on grade-level scales,
over a corpus of fixtures from the Wikipedia readability articles.
Types
Errors returned when a readability score cannot be computed because the input does not meet the minimum-size precondition of the underlying formula.
pub type ReadabilityError {
TooFewWords(at_least: Int, got: Int)
TooFewSentences(at_least: Int, got: Int)
}
Constructors
-
TooFewWords(at_least: Int, got: Int)The input had fewer than
at_leastwords.gotis the actual count. -
TooFewSentences(at_least: Int, got: Int)The input had fewer than
at_leastsentences.gotis the actual count.
Values
pub fn automated_readability_index(
text: String,
) -> Result(Float, ReadabilityError)
Automated Readability Index (ARI), Smith & Senter (1967).
4.71 × (characters/words) + 0.5 × (words/sentences) − 21.43
The characters count is letters + digits + accented graphemes
(i.e. count.characters) and excludes
whitespace and punctuation. ARI is the only formula in this
module that treats digits as score-bearing characters; texts
containing large numeric runs will score correspondingly higher.
pub fn coleman_liau_index(
text: String,
) -> Result(Float, ReadabilityError)
Coleman–Liau Index, Coleman & Liau (1975).
0.0588 × L − 0.296 × S − 15.8
where
L= average number of letters per 100 words (characters / words × 100)S= average number of sentences per 100 words (sentences / words × 100)
The output approximates the US grade level expected to read the
text comfortably. Like ARI, this formula uses the
count.characters definition (letters
- digits), so digit-heavy text scores slightly higher than its pure-prose equivalent.
pub fn flesch_kincaid_grade(
text: String,
) -> Result(Float, ReadabilityError)
Flesch–Kincaid Grade Level.
0.39 × (words/sentences) + 11.8 × (syllables/words) − 15.59
The output approximates the US school grade required to comprehend the text. Negative scores are returned as-is and signal text below 1st-grade complexity.
pub fn flesch_reading_ease(
text: String,
) -> Result(Float, ReadabilityError)
Flesch Reading Ease, original Flesch (1948) formula.
206.835 − 1.015 × (words/sentences) − 84.6 × (syllables/words)
Higher is easier. The classic interpretation bands:
90–100— 5th grade reader80–90— 6th grade70–80— 7th grade60–70— 8th–9th grade (“plain English”)50–60— 10th–12th grade30–50— college0–30— college graduate
The raw formula is not clamped, so unusually short or syllable-poor
text can produce scores above 100 (and unusually dense academic
prose can produce scores below 0).
Returns TooFewWords for input with no words,
and TooFewSentences for input with no
sentence-shaped content.
pub fn gunning_fog(
text: String,
) -> Result(Float, ReadabilityError)
Gunning Fog Index — Robert Gunning (1952).
0.4 × ((words/sentences) + 100 × (polysyllables/words))
A polysyllable here is a word with three or more syllables. The
original Gunning rules excluded proper nouns, hyphenated compounds,
and inflected forms (-es / -ed / -ing) — this implementation
follows Python textstat and does not apply those
exclusions, so scores match textstat rather than the strict
1952 paper. Callers needing the strict variant can subtract their
own exclusion count from
count.polysyllables before applying
the formula directly.
Output approximates the years of formal education required to understand the text on first reading.
pub fn smog(text: String) -> Result(Float, ReadabilityError)
Simple Measure of Gobbledygook (SMOG), McLaughlin (1969).
1.043 × sqrt(polysyllables × (30/sentences)) + 3.1291
SMOG is statistically reliable only for texts of 30 sentences or
more — McLaughlin’s regression was calibrated on samples of that
size, and applying the formula to shorter passages compounds
estimation error. This implementation therefore returns
TooFewSentences when the input has fewer
than 30 sentences.