Kernerman Dictionary News • Number 11 • July 2003

The American National Corpus

The ANC Consortium members include publishers, software companies, and academic members. Consortium members have exclusive access throughout the development period and for five years after the full corpus becomes available. Access to the corpus for development of commercial products (dictionaries and other reference publications, language-aware software, etc.) is restricted to members until the year 2007. The ANC is freely available for the purposes of academic research and education.

Acquired data includes, so far, about 2 million words of spoken data (the LDC Switchboard corpus and a portion of the CallHome corpus); 1.5 million words of previously un-released newspaper data from the New York Times; a few hundred thousand words of “ephemera” (pamphlets, newletters, etc.); several novels published by Oxford University Press USA; Berlitz Travel Guides from Langensheidt; Verbatim magazine; government documents drawn from the web; about 5 million words from Slate magazine (Microsoft); and about 900,000 words of research papers from the Association for Computational Linguistics.

Commercial Members
Pearson Education
Langenscheidt Publishing Group
HarperCollins Publishers
Cambridge University Press
Microsoft Corporation
Shogakukan Inc.
ACL Press Inc.
Taishukan Publishing Company
Oxford University Press
Kenkyusha Ltd.
IBM Corporation
Obunsha Publishing Co. Ltd.
Bloomsbury Publishing Plc
Benesse Corporation
Sanseido Co. Ltd.
Sony Electronics Inc.
Macmillan Publishers

Academic Members
Vassar College
Northern Arizona University
New York University
Linguistic Data Consortium, University of Pennsylvenia
International Computer Science Institute, University of California, Berkeley
University of Colorado at Boulder


K Dictionaries Ltd
10 Nahum Street, Tel Aviv 63503 Israel
tel: 972-3-5468102 • fax: 972-3-5468103