Kernerman Dictionary News • Number 12 • July 2004
Kernerman Dictionary Research Grants
Kiswahili is an agglutinating language, meaning that morphemes are juxtaposed to form linguistic words. In all current dictionaries, ‘orthographic words’ are decomposed into their formatives, with only the latter being lemmatised. As a result, not all native speakers of Kiswahili can look up ‘words’ in their own language – as this implies being able to cut off prefixes and suffixes – and even trained scholars often need more than one look-up round before they hit on what they are looking for (since sound changes between formatives are not always predictable).
This research project attempts to deal with all these problems simultaneously. The aim is to create the first corpus-based Kiswahili dictionary that is also intuitive in nature, and to research the feasibility of this approach in real time. Instead of lemmatising stems as in traditional dictionaries, the idea is to lemmatise full orthographic words (in addition to stems), and to provide full translations for these strings. In order to sensibly limit the number of items one can physically treat, the items will be selected from a frequency list derived from a large corpus. Concordance lines will be called up for each frequent orthographic word, and the various translations will be recorded in order of frequency. A user will thus be able to look up words directly, as they are spoken or written, and the translations will be arranged from most likely to least likely. An English search index will additionally enable searches in the reverse direction. Since, obviously, such an approach will require much more ‘space’ than in a traditional stem-based dictionary, the dictionary will be developed and made available in an electronic environment right from the start, primarily on the Internet, where it is also possible to keep a log of all searches. Analyzing these log files will enable further research on whether or not this hybrid approach is feasible and to amend the approach if need be.
the intuitive lemmatisation approach, native speakers and learners at the
elementary and intermediate levels will for the first time be able to
effectively look up words, and find meanings of ‘real’ words,
which should help to develop a dictionary culture. Furthermore, the log files
will be utilised to full potential by tracking each individual dictionary-use
behaviour, including vocabulary retention. For the first time, truly
unobtrusive data will be collected and true look-up behaviour in an
electronic environment will be recorded. Finally, this project will also
ensure that Kiswahili, an increasingly popular language on the Internet, is
also kept alive in a modern online reference work based on sound
K Dictionaries Ltd