Kernerman Dictionary News • Number 12 • July 2004
|
Kernerman Dictionary Research Grants
Kiswahili is an agglutinating language, meaning
that morphemes are juxtaposed to form linguistic words. In all current
dictionaries, ‘orthographic words’ are decomposed into their formatives, with
only the latter being lemmatised. As a result, not all native speakers of
Kiswahili can look up ‘words’ in their own language – as this implies being
able to cut off prefixes and suffixes – and even trained scholars often need
more than one look-up round before they hit on what they are looking for
(since sound changes between formatives are not always predictable). This research project
attempts to deal with all these problems simultaneously. The aim is to create
the first corpus-based Kiswahili dictionary that is also intuitive in nature,
and to research the feasibility of this approach in real time. Instead of
lemmatising stems as in traditional dictionaries, the idea is to lemmatise
full orthographic words (in addition to stems), and to provide full
translations for these strings. In order to sensibly limit the number of
items one can physically treat, the items will be selected from a frequency
list derived from a large corpus. Concordance lines will be called up for each
frequent orthographic word, and the various translations will be recorded in
order of frequency. A user will thus be able to look up words directly, as
they are spoken or written, and the translations will be arranged from most
likely to least likely. An English search index will additionally enable
searches in the reverse direction. Since, obviously, such an approach will
require much more ‘space’ than in a traditional stem-based dictionary, the
dictionary will be developed and made available in an electronic environment
right from the start, primarily on the Internet, where it is also possible to
keep a log of all searches. Analyzing these log files will enable further
research on whether or not this hybrid approach is feasible and to amend the
approach if need be. Given
the intuitive lemmatisation approach, native speakers and learners at the
elementary and intermediate levels will for the first time be able to
effectively look up words, and find meanings of ‘real’ words,
which should help to develop a dictionary culture. Furthermore, the log files
will be utilised to full potential by tracking each individual dictionary-use
behaviour, including vocabulary retention. For the first time, truly
unobtrusive data will be collected and true look-up behaviour in an
electronic environment will be recorded. Finally, this project will also
ensure that Kiswahili, an increasingly popular language on the Internet, is
also kept alive in a modern online reference work based on sound
lexicographical principles.
K Dictionaries Ltd |