Kernerman Dictionary News Number 11 July
The American National Corpus
The ANC Consortium members
include publishers, software companies, and academic members. Consortium
members have exclusive access throughout the development period and for five
years after the full corpus becomes available. Access to the corpus for
development of commercial products (dictionaries and other reference
publications, language-aware software, etc.) is restricted to members until
the year 2007. The ANC is freely available for the purposes of academic
research and education.
data includes, so far, about 2 million words of spoken data (the LDC
Switchboard corpus and a portion of the CallHome corpus); 1.5 million words
of previously un-released newspaper data from the New York Times; a
few hundred thousand words of “ephemera” (pamphlets, newletters, etc.);
several novels published by Oxford University Press USA; Berlitz Travel
Guides from Langensheidt; Verbatim magazine; government documents drawn from
the web; about 5 million words from Slate magazine (Microsoft); and about
900,000 words of research papers from the Association for Computational
Langenscheidt Publishing Group
Cambridge University Press
ACL Press Inc.
Taishukan Publishing Company
Oxford University Press
Obunsha Publishing Co. Ltd.
Bloomsbury Publishing Plc
Sanseido Co. Ltd.
Sony Electronics Inc.
Northern Arizona University
New York University
Linguistic Data Consortium, University of Pennsylvenia
International Computer Science Institute, University of California, Berkeley
University of Colorado at Boulder
K Dictionaries Ltd
10 Nahum Street, Tel Aviv 63503 Israel
tel: 972-3-5468102 fax: 972-3-5468103