By Robert Delwood
Senior Programmer Writer

Microsoft Office 2007 introduced a new type of spelling error:  the contextual misspelling. Noted with blue squiggly underlines, these are words that, although are technically spelled correctly, in their context seem suspicious. The Internet’s now famous poem “Ode To My Spell Checker,” has the lines “Eye halve a spell checkerIt came with my pea sea,” making fun of this weakness detecting contextual errors and taking the issue to an amusing extreme. Amusing perhaps, but a serious problem. It’s been estimated that these account for about a quarter to a third of spelling issues. In the work place, these errors range from annoying and embarrassing (such as “Swine flu as a pubic health threat.”) to just plain wrong (“We’re not ready,” for “We’re now ready”).

Finding misspellings is relatively easy. Conventional spell checkers use a word/non-word test. That is, if the word does not appear in a predefined list, then it’s an assumed misspelling. The dictionaries are thorough, too. The Office American English dictionary runs up to 250,00 words and by adding specialized dictionaries such as for legal or medical terms, can reach 300,000. Contextual checking is more difficult. For this, it introduces a newer approach, based on patterns within the language. As an example, for the phrase “the sky is …,” falling or blue are the most commonly associated words. This is not because of any linguistic rule but simply that those patterns occur most often. To continue, if it’s not one of those two words, then it may be flagged, a sort of grammatical reverse Family Feud. Like the dictionaries, the pattern base is thorough. Microsoft analyzed billions of sentences looking for these patterns.

What is the contextual spell checker looking for? In short, any set of unlikely combinations, from variations of closed case words (too for to), or wrongly split (through out for throughout), to malapropisms, homonyms, or eggcorns. Mercifully, as a collection they are called real word errors although malapropism is commonly used, even among linguistic elites. These further divide into fair and unfair malapropisms. The difference is that an unfair one is in a context that automation couldn’t reasonably be expected to identify such as employee for employer, and not for now as pointed out earlier. This also includes unusually obscure occurrences, such as tunning, (the act of pouring wine into a cask or tun) for running, or anything Yogi Berra might say. Unfair malapropisms are excluded from this discussion.

So how accurate is Word’s contextual detection? In general, Word’s ability to locate contextual errors is low and can miss up to 70% of the actual occurrences. However, of the terms it does find, it’s almost always correct. On the other hand, it has about a 70% accuracy rate when suggesting corrections. When one is offered, it’s usually, but not always, correct. In the remainder of cases, if offers no suggestions.

Even with these limitations, it still has value. Microsoft will likely improve this record with each release. The contextual error system is closely related to their speech recognition enterprise, which is active and expanding. Nevertheless, understand how the checker works and always be conscientious of this during your own reviews. For example, it’s possible to introduce real word errors into documents by carelessly accepting suggestions. Human editors aren’t obsolete yet.

Words That Only a Logophile Could Love

The following is a brief list of related terms. Not all of them affect contextual spellings.

  • A malapropism uses an incorrect word for a word with a similar sound, and the resulting phrase makes no sense. “Alcohol lets down your prohibitions.” (“inhibitions”).
  • An eggcorn is like a malapropism except the resulting phrase has meaning that is different from the original, but could still have meaning. “Old-timers’ disease” for “Alzheimer’s disease.”
  • A euphemism uses a less intense word in place of a stronger or offensive word. “Pre-owned” for “used”. In the late 20th century euphemisms were used extensively as a form of doublespeak, often to intentionally mislead, or even outright lie.
  • Dysphemism (also malphemism, cacophemism) is the opposite of a euphemism in that a strong word is used in place of a weaker one. “Egghead” for “smart person.” Taken further, the dysphemism treadmill introduces harsh or shocking words as existing words lose their impact. A cacophemism implies an intentional offensive use of the word. A Cacography is deliberate comic misspelling, usually for a verbal caricature.
  • Spoonerism is the switching of sounds among words. “Let me sew you to your sheet” for “Let me show you to your seat.”
  • A portmanteau word is one formed from two existing words. “Brunch” is combined from “breakfast” and “lunch.” “Seinfeld” even asked “How come there’s no ‘lupper’ or ‘linner’?
  • Catachresis is a misused word especially in a mixed metaphor. Alternately, it’s using an existing word to denote something that has no name in the current language.
  • A figure of speech (or locution) is a word or phrase with a meaning not based on the literal meaning of the words. These include metaphors, similes, or personifications.
  • A solecism is a grammatical error or a sentence turned into an absurdity. “I could care less” to mean “I couldn’t care less”.

###

Leave a Reply

Your email address will not be published. Required fields are marked *