Kernerman Dictionary News Number 11 July
The Coming Boom in English
‘The Coming Boom in English Lexicography: Some Thoughts about the World
Wide Web (Part One)’, was published in Kernerman Dictionary News, Number
9, July 2001 (http://kdictionaries.com/newsletter/kdn9-1.html).
In July 2001, I optimistically
crawled out on a limb in these pages talking about an imminent boom in
Then in the July 2002 issue, my good friend and colleague Joseph Esposito
countered that traditional, legacy dictionary publishers, like OUP and
Merriam-Webster, “will muddle along, with growth becoming harder to come
by except at the expense of their smaller and declining rivals; eventually
they will stop publishing for broad markets altogether and the remaining
activity will be to focus on the scraps Microsoft leaves on the floor.”
As a graduate in
the history and philosophy of science, I have always been amazed at the
uncanny ability of Esposito, a literature graduate, to anticipate important
trends in technology. Possibly I get too preoccupied chasing down such
details as how much, when? For example, how much will online dictionary
searches replace print look-ups in the next three, five, and ten years?
Twenty percent or 50 percent within ten years? But, as Esposito wrote (in a
more general point about the future of traditional dictionary publishing),
“Who knows?…In the absence of growth, the old [dictionary] business will
be strained for capital, which will beget smaller investments, which will in
turn hasten the decline.”
Recently in the States.
Certainly during the past two years, American lexicography has shown many of
the strains Esposito wrote about. One could say these wounds were
self-inflicted through corporate ownership dramas – but no matter, the
cause of lexicography in America was not advanced, and possibly permanently
For example, Random House Webster dismantled its lexicography staff, as its relatively new Bertelsmann-owned parent grew tired of dictionary making. Webster’s New World, sold to Hungry Minds (formerly, IDG Books), was off-loaded to John Wiley. American Heritage, as part of Houghton Mifflin, went to Vivendi, which then sold it to a private American-led investment group (headed by Bain Capital and Thomas H. Lee). The new Microsoft Encarta print dictionaries (created by Bloomsbury in the UK and distributed by St. Martin’s in the US), after creating a splash of publicity, retreated in sales at retail. And the recent US$55 New Oxford American Dictionary—by all appearances an excellent product—assumed a dignified, but quiet, presence on bookshelves.
Merriam-Webster seems to have gained market share in America at the expense
of its smaller competitors, who were hobbled by corporate problems.
Currently shipping the new eleventh edition of its flagship Collegiate
Dictionary (bundled for the first time with an electronic version and a free
introductory online subscription – all for US$25.95), M-W recently boasted
a 17 percent increase in dictionary sales, while their Website bustled with
more than 150,000 daily visits. The private investor (Jacqui Safra) who now
owns Merriam, however, in a possible sign of impatience, recently brought in
an outside CEO (Gordon Macomber) to find ways to grow the business more
quickly. We will have to stay tuned for what develops at America’s leading
In the aggregate,
though, the American print dictionary market seems to stay at about the same
size, year after year, even as online look-ups are apparently booming.
Is the resilience of print dictionaries in America a good sign? Is overall
dictionary use (counting both print and digital look-ups) increasing?
Possibly, yes to both questions. If I unscientifically use myself as an
example, I now routinely consult two online dictionaries in addition to
printed standbys – the OED online (accessible for free as a member of the
Quality Paperback Bookclub)
and the faithful friend I once published, the Random House Webster’s
Unabridged Dictionary on CD-ROM, fully installed on my Wintel machine. If a
word still perplexes me, I search Google
or visit well-developed reference sites like Webopedia
(for technical terms). Because of the availability of multiple sources in
both print and online, I am now much more dictionary literate – as are, I
extrapolate, other serious word users.
development in the States is the long-overdue progress toward creating the
first comprehensive American National Corpus (ANC) – the first 10 million
words of which are now being made available. My friend and colleague
Wendalyn Nichols says more about this elsewhere in this newsletter.
The Bigger Picture.
Looking beyond America’s shores and around the world, lexicography seems
very much alive and well, if not booming to my optimistic drumbeat. The
continued use and exploration of corpora and the vigorous linguistic
research into world Englishes are two important signs of continued vigor.
While at Random House Webster, I helped initiate—with the assistance of
Nichols and others—an all-too-brief foray into creating entirely
American-bred ESL/EFL dictionaries, partnering with publishers like FLTRP
(Foreign Language Teaching and Research Press) in Beijing, 
under the innovative leadership of Li Pengyi.
(Houghton Mifflin created the American Heritage English as a Second
Language Dictionary in 1997 and revised it in 2002, primarily I believe
to reach American schools and colleges – without strong marketing
internationally.) Except for these efforts, the major American dictionary
companies still appear blissfully lackadaisical about the potential of the
global ELT market – which is probably the single largest area for growth
in the English-dictionary business. 
Maybe it would be
more accurate to say that the global reach and penetration of
English—especially as reflected on the WWW—will keep linguists and
lexicographers busier for some time to come analyzing and mediating
exchanges in which English is the lingua franca, and helping build the next
iteration of the WWW, called the Semantic Web.
In the absence of
an American corpus, for several years I have relied on the Web as a
surrogate. For example, in a pop-reference book on Yiddish I co-created,
playfully entitled the Meshuggenary,
the Web was the best source of finding current uses in English of
Yiddish-origin words and phrases (see the appendix).
I am grateful to
Michael Rundell, editor-in-chief of the Macmillan English Dictionary for
for alerting me to the forthcoming special issue of Computational
edited by Adam Kilgarriff and Greg Grefenstette, which will be devoted to
the question of the Web as a corpus. Rundell notes that there are “many
computational linguists who are beginning to see the Web as the only corpus
worth looking at (well, maybe I exaggerate somewhat), and as the solution to
the long-running problem of ‘data-sparseness’.”
Of course, as
Kilgarriff and Grefenstette point out, search engines like Google’s still
have significant limitations as lexicographic tools—for example, in giving
too much weight to words in the titles and headings of Web pages, and in
missing the vast volume of material now hidden from the eyes of search
engines behind “fee walls,” in archives that charge to retrieve
documents—but nonetheless it is clear that the Web presents lexicographers
with a whole new set of opportunities to research current language use and
should be considered a valid linguistic corpus.
pointed out the heating up of discussions in the UK and Europe about English
as a lingua franca – a development that looms so large throughout the
world that it can still actually seem invisible to many native
(English-centric) speakers. I noted in my first installment that an
estimated 80 percent of Web pages are written in English (though that
percentage may actually decrease somewhat over time; for example, Internet
Explorer now supports the use of dozens of special scripts of the world’s
Somewhat hidden to
those outside of Europe is the growing role that English plays (and a
controversial one at that) in the affairs of the European Union. In 1970,
about 60 percent of all documents coming out of Brussels were written in
French, few if any in English. By 1997, English (45 percent) had surpassed
French (40 percent) as the most frequently used official language.
(Is it less CNN, MTV, and MacDonald’s that so haunts the French than ELF,
English as a Lingua Franca? As language buffs, we can be sympathetic about
the potential decline of any vibrant language.) The Semantic Web should
further accelerate the use, promulgation, and importance of ELF. And by
providing even richer data for research, the Semantic Web should also
accelerate the business of global linguistics and lexicography.
The key step in
building the Semantic Web will be the addition of metadata URIs—Universal
Resource Identifiers—that “define or specify an entity, not necessarily
by naming its location on the Web.”
Put simplistically, the Semantic Web will establish protocols to identify
types of content on each Web page – in ways to make the content elements
computer readable and useable. As Tim Berners-Lee, who laid the groundwork
for the WWW, writes:
Semantic Web, in naming every concept simply by a URI, lets anyone express
new concepts that they invent with minimal effort. Its unifying logical
language will enable these concepts to be progressively linked into a
universal Web. This structure will open up the knowledge and workings of
humankind to meaningful analysis by software agents, providing a new class
of tools by which we can live, work, and learn together.
I am obviously
skipping lightly over a number of important new works-in-progress that will
profoundly affect linguistic and lexicographic research in the coming years.
(Searching on Google for the phrase “Semantic Web” plus “lexicographer
or lexicography,” restricted to English pages cached in the past year,
yielded about 850 results, with a number of fascinating leads to recent
papers and conferences. Adding “linguist or linguistics” to the search
item increases the results to 4000.)
Work is just gearing up, and while I bemoan not having more hard data and
numbers, my instincts tell me that lexicography and linguistics are on the
verge of a revolution as a result – though, sadly for me, much of this new
linguistic and lexicographic innovation may take place outside of America,
even ironically as American-English is the driving force behind the
increasing global use of English.
There’s definitely a big growth in corpus development worldwide
(especially but by no means exclusively for use in dictionary making) –
sometimes, it seems, almost anywhere but the US. The big Japanese publishers
like Kenkyusha and Shogakukan are all partners in the ANC consortium, but
also busy with corpus development of their own. There is, for example, a
100-million-word Corpus of Professional English under development in Japan.
All this corpus work is immensely exciting, and it is going to be
interesting to see how it will influence the look and feel of future
native-speaker dictionaries of English as well.
The implications may be more far-reaching and actually concern other
languages too, not just English. For example, expect a dramatically growing
demand for bilingual dictionaries of “unorthodox language-pairs”,
including so-called non-major languages.
One of the key developments that I hope to see is the sharp increase in the
quality of these dictionaries covering odd bilingual couples.
I see a rhetorical error in the paper – rhetorical, not substantive. You
are using my paper as a pushing-off point, which is fine: I have been a
straw man before. But the contrast is imprecise:
you are referring to lexicography, I to dictionary-makers.
Lexicography is bound to grow. The current crop of dictionary companies
can't grow. Apples and oranges.
There actually may be two lacunae in what I write: (1) The boom may be more
in computational linguistics than in lexicography as such, especially when
it comes to the Semantic Web. Separately, I learned, for example, that many
students who major in linguistics go on to careers in software. (2) I am
implying if not stating that because of ELF, you could grow the
dictionary-making business; but I fudge about addressing the key question
– whether you could make it a “growth” business. It’s somewhat like
talking about the U.S. economy. If it grows at only 1-2% a year, it would
still be considered an investment crisis. You are saying you doubt
dictionary making as a business can grow at all and will decline.
I think you can grow any business that adds value beyond the default value
of a bundled Microsoft dictionary, and learners’ dictionaries add value.
Levine: But, I do not fully address the key point (because it is so hard to pin down), which is whether you could grow a dictionary business enough to attract serious investment money. The sad truth of the matter is that it may be difficult that one could. A little growth is not enough, and there hangs a tale (as you would say) – a tale of most of the corporate problems we have recently seen in American dictionary companies, including Merriam-Webster. Even if you could grow a dictionary business, under ideal circumstances, the growth may not be interesting or attractive enough to investors, American-style ones at least.
Other publishing entrepreneurs with a more worldly view, might be willing to accept a little growth. What puzzled me most, for example, about the dismantling of Random House Webster was that the dictionaries could be of immense benefit to the global branding of the Random House name, which fits in with the Bertelsmann global strategy, although not necessarily with the one emanating from the Broadway headquarters. For example, I am told that the Random House name is well known in both Japan and China, largely because of the local translations of the Random House dictionaries. But the latter benefits were apparently not strong enough to overcome the issue of growth for Random House/Bertelsmann.
I do lament this development, not only out of self-interest, but selfless interest when it comes to many good American lexicographers whose careers are being turned upside down by the stranglehold that dummies seem to have on dictionary publishing in the States today. But, all this is a natural process, in which the mighty fall by the wayside—out of hubris, complacency, or too much past success—and room is created for new scrappy small guys who are willing to take risks and innovate.
Links, notes, references
 I roughly estimate that the combined size of the American monolingual dictionary business – measured in total annual sales dollars – and including dictionaries sold directly to schools – as well as electronic versions – falls somewhere in the range of US$100 million annually. This number does not appear to have grown much during the past decade. One could point out that sales of electronic dictionaries, like the Microsoft Encarta World English Dictionary, do not seem to have greatly added to the total sales dollars, because of bundling with other products and/or the purchase of an electronic product in substitution for an equivalent one in print. This is all, of course, educated guessing.
 The Random House Webster’s Easy English Dictionary, New York: 2001, paperback US$12.95, was certainly the first entirely new American English dictionary written specifically for middle school learners, a rapidly growing and underserved market around the world. In China, for example, soon every student, starting in the earliest grades, must master three core subjects: Mandarin, mathematics, and English.
 One should also mention the Newbury House Dictionary of American English, edited by Philip M. Rideout, and published by Heinle in a revised edition in 1999. There is also a basic edition. Although Heinle is not a major dictionary publisher, these two editions are, according to Nichols, giving Longman a run for its money in the intermediate ESL market. McGraw-Hill, like Houghton Mifflin, has a small ESL group as well that tries to reach the domestic U.S. market, with course materials.
 Meshuggenary: Celebrating the World of Yiddish, by Payson R. Stevens, Charles M. Levine, and Sol Steinmetz. Simon & Schuster, New York: 2002.
 Macmillan, London: 2002.
 Tim Berners-Lee, James Hendler and Ora Lssila, “The Semantic Web: A new form of Web content that is meaningful to computers will unleash a revolution of possibilities,” Scientific American, April 2002.
 Op. cit.
 Also see the recent paper delivered by David Jost of Houghton-Mifflin and Win Carus of the Dictaphone Corporation, at the DSNA (Dictionary Society of North America) conference held this past May 2003, entitled “Is the Semantic Web Possible?” (www.duke.edu/web/linguistics/dsnaabstracts.htm - jost).
– Google’s Top Dozen
brave new worldictionaries
10 Nahum Street, Tel Aviv 63503 Israel
tel: 972-3-5468102 fax: 972-3-5468103