Secular soothsayers in recent years have deployed a variety of well-known tools–not just opinion polling but also various economic auguries and social indicators–to divine the public’s mood and perhaps even to illuminate what lies a few steps ahead. Often, these tools are useful. But what if we could monitor, on a day-to-day basis, the shifting prominence of specific words in the collective American consciousness? Wouldn’t such a Semantic Observation System (as we might call it) yield important clues to the public mind? It turns out that we have such clues at our fingertips, thanks to the electronic version of Merriam-Webster’s Collegiate Dictionary. The notion that words offer a key to social circumstances is a hoary one. Scholars have spent a great deal of time trying to infer, on the basis of vocabulary alone, the cultural and environmental milieu of the first “proto-Indo-European” people–that is, the people who spoke proto-Indo-European, the reconstructed language that is presumed to be ancestral to most of the languages spoken today in a vast swath from Europe to India. To give one example: The Indo-Europeans had no generalized word for sea, prompting scholars to speculate that they were originally a landlocked people.
Linguistic work usually requires years of painstaking effort. What the online Merriam-Webster’s Collegiate Dictionary makes possible is a form of instantaneous lexicographie vérité. Individuals consult the online dictionary to look up more than 6 million words a month. (Merriam-Webster Online is probably the most widely used English language online dictionary in the world, and the service is available at no charge.) Merriam-Webster designed the dictionary with a kind of see-through-mirror function: Observers at the company can “watch” which words are being looked up in real time; they do not, of course, know who is looking those words up. Merriam-Webster also keeps cumulative track of the incoming traffic, word by word, and company executives can review a Top 100 list at any moment. The number of daily hits required for a word to reach the No. 1 spot varies according to day of week and time of year; it has been known to exceed 1,000.
Frequently, words are flung onto that list by current events. Paparazzi, princess, and cortege briefly surged as high as No. 17, No. 36, and No. 60, respectively, in September 1997, in the aftermath of the death of Princess Diana. Aupair reached No. 10 during the trial in the fall of 1997 of Louise Woodward, the young British nanny who was accused of shaking to death a baby left in her care. The recent 60 Minutes broadcast, which showed video footage of Dr. Jack Kevorkian putting a patient to death, prompted a short-lived upward blip of euthanasia to No. 102.
N ot surprisingly, during the first nine months of 1998, words associated with President Bill Clinton’s travails regularly made their way onto the Top 100 list. Online-dictionary usage first registered the Clinton scandals in mid-January, when people began looking up the words suborn, perjury, and proffer in unusually large numbers. As winter gave way to spring, interest in these words was replaced by interest in impeach, impeachment, and character. By August, when Clinton submitted to a four hour deposition with the independent counsel and then made his semiconfessional speech to the nation, the ascendant words included contrite, contrition, and mea culpa. In early September, on the weekend when Kenneth Starr made public the full text of his report, the No. 1 word on the Merriam-Webster list was salacious, with sordid at one point cresting to the No. 34 mark. Lexicographically, salacious represents the Clinton scandal’s point of furthest semantic advance. Impeachment attempted a comeback, reaching its all-time high (No. 3) Oct. 8 but, for weeks thereafter, no scandal-related terminology showed any sign of significant strength or staying power on the Merriam-Webster site–an obvious bellwether of the public’s sudden loss of interest. (At one point in November, integrity registered an anemic No. 98.) John M. Morse, president of Merriam-Webster and a professional lexicographer who has studied the diurnal fluctuations closely, observes, “From approximately early October onward, Monica-related words virtually ceased to appear on any day’s Top 100 list. What was evident from our site was that the public had tired of the whole issue.”
Words such as salacious and paparazzi, thrust briefly into prominence, must be seen against a background of what constitutes ordinary dictionary traffic, which for the most part is inertial and humdrum–the dictionary equivalent of junk DNA. Leaving out sexually oriented and offensive words (which people seem as prone to look up in online dictionaries as they do in print ones; fuck is by far the most-looked-up word in the online Merriam-Webster’s Collegiate Dictionary on any given day), what follows is the cumulative Top 40 list for the first six months of this year:
1. paradigm; 2. love; 3. thesaurus; 4. ubiquitous; 5. HTML; 6. effect; 7. gry; 8. affect; 9. home; 10. dog; 11. help; 12. time; 13. oxymoron; 14. Internet; 15. computer; 16. serendipity; 17. web; 18. etymology; 19. mail; 20. esoteric; 21. home page; 22. genealogy; 23. pedantic; 24. ensure; 25. synergy; 26. hubris; 27. hello; 28. marijuana; 29. metaphor; 30. resume; 31. dictionary; 32. heuristic; 33. caveat; 34. eclectic; 35. information; 36. epiphany; 37. weight; 38. acquiesce; 39. pragmatic; 40. egregious.
Ipicked through this list one morning with Morse’s help. He explained that Nos. 1, 4, 13, 16, 20, 23, 25, 26, 29, 32, 33, 34, 36, 38, 39, and 40 were most characteristic of total traffic–examples of frequently encountered but imperfectly understood words. No. 2 is probably a euphemistic search for sexual content. Nos. 3, 18, and 31 are lexicography-related words: “It’s a mystery why they are looked up so often.” Nos. 5, 9, 14, 15, 17, and 21 reflect an interest in computer terms, “either out of genuine curiosity or to test the currency of the information on the site.” Nos. 6, 8, and 24–“these all derive from classic usage problems.” No. 7–is “an age-old riddle.” (The premise of the riddle is that there are “three common words” that end in -gry, hungry and angry being two of them. What is the third? In truth, there is no common third word. However, for the record there is an uncommon word gry, which is an obsolete unit of measure equal to one-eighth of a line, a line in turn being equal to one-fortieth of an inch.) Nos. 10 and 27–“probably test words selected to see how the site works.” No. 12–“there is a myth that this word is impossible to define.” (Wasn’t it Augustine who said that he knew perfectly well what time was until he had to say what it was?) No. 19–“may be in search of free e-mail.” No. 22–a popular activity on the Web. No. 28–“a little thrill-seeking?”
The Semantic Operating System remains in a relatively primitive state of development, but clearly the future holds more, not less, of this sort of thing. The statistical analysis of language was, after all, a primal urge even before the days when electronic tools made it easy, and some of the most compelling statistical data emerged from the pre-microchip Age of the Linguist-Drudge. (Examples would include the fact that 12 specific syllables account for a quarter of all the English speech you hear, and the fact that the 50 most frequently used words in English account for 45 percent of the total volume of all words used.) So we can expect refinements and advances. Meanwhile, Merriam-Webster’s tabulation of online consultations will continue. Take note: In recent days, one word has been coming on very strong–censure, which, at this writing, holds the No. 5 position among the Merriam-Webster Top 100.
To my mind, the most revealing news offered by the Merriam-Webster data has nothing to do with individual words or current events. It is the fact that ordinary people are so eager to take the trouble to look things up. America cherishes a myth of itself as a land of practical folk who evince a zest for self-improvement–even as they pursue some sort of quest for definition. Some 6 million times a month, the myth turns out to be the truth.