ZulMorph is a finite state morphological analyser for Zulu, developed using the Xerox finite state tools lexc
and xfst
. It also compiles with Foma.
Zulu words in their surface form are analysed to their base form. Any meaningful word can be input, and the output will be a complete morphological analysis of that word.
Words marked with a “+?” could not be analysed by the analyser for various reasons - in most of the cases it is because the stem/root of the word is not included in the embedded lexicon of the analyser yet.
Most words have multiple analyses, and the selection of the correct analysis would be context dependent. Such disambiguation forms a next processing step.
Stems marked with[Hlon]
indicate stems belonging to ‘isiHlonipho’, a language of respect, which is a variety of Zulu used by married women to show respect towards their male in-laws and chiefs by avoiding their names and (parts of) words related to, or even just phonetically resembling, these names.
A demo of ZulMorph is available.
Zulu belongs to the Bantu language family, a ‘family’ of more than 400 languages spoken in Africa, from the Cape in the south to just north of the Equator. The Zulu language is a member of the Nguni group of languages and is spoken in South Africa in the province of KwaZulu-Natal, as well as in the northern Free State, south-eastern part of Mpumalanga and in Gauteng. Zulu is a widely spoken language in South Africa, with approximately 11,5 million first language speakers, i.e. 22.7% of the population http://mobi.statssa.gov.za/census2011/First Language.html. The ISO 639-3 code for Zulu is [zul]
(http://www.sil.org/iso639-3).
Zulu is characterised by a rich agglutinating morphological structure, which is based on two principles, namely the nominal classification system, and the concordial agreement system. According to the nominal classification system, nouns are categorised by prefixal morphemes, which for analysis purposes have been sorted into classes and given numbers by scholars who have worked within the field of the Bantu language family. Table 1 shows examples of Meinhof’s (1932:48) numbering system of some of the noun class prefixes.
Table 1: Meinhof’s (1932:48) numbering system of noun class prefixes:
Noun Class Prefix | Class | Word form | English |
---|---|---|---|
u+mu- |
1 | umuntu | “person” |
a+ba- | 2 | abantu | “persons” |
u- | 1a | unozinti | “goalkeeper” |
o- | 2a | onozinti | “goalkeepers” |
u+mu- | 3 | umuzi | “homestead” |
i+mi- | 4 | imizi | “homesteads” |
i+(li)- | 5 | idolo | “knee” |
a+ma- | 6 | amadolo | “knees” |
i+si- | 7 | isinkwa | “bread” |
i+zi- | 8 | izinkwa | “breads” |
i+n- | 9 | indlovu | “elephant” |
i+zin- | 10 | izindlovu | “elephants” |
u+(lu)- | 11 | ukhezo | “wooden spoon” |
u+bu- | 14 | ubusuku | “night” |
u+ku- | 15 | ukudla | “food” |
These noun class prefixes lead to concordial agreement that links the noun to other words in the sentence such as verbs, adjectives, pronouns and so forth. For example:
In the above example, arc 1 shows the agreement between the subject of the sentence and the verb, arc 2 shows the agreement between the noun and its modifier, and arc 3 shows the agreement between the verb and the direct object of the sentence.
Noun prefixes usually indicate number - the uneven class numbers designate singular and the corresponding even class numbers designate plural. However, this is not always the case, since some nouns in so-called plural classes do not have a singular form; plurals of class 11 nouns are found in class 10, while a class such as 14 is usually not associated with number at all.
The noun prefix typically constitutes two parts, namely a preprefix (the initial vowel) and a basic prefix, but in some classes such as class 1a and its plural class 2a a basic prefix does not feature. In other instances such as classes 11 and 14 the basic prefixes are often discarded, with the result that only the preprefix appears in the surface form.
Meinhof, C. 1932. Introduction to the phonology of the Bantu languages. Berlin: Dietrich Reimer/Ernst Vohsen.
Tag | Description | Example | Analysis |
---|---|---|---|
Tags dependent on class, person and/or number | |||
1ps | first person singular | ngikusho | ngi[SC][1ps]ku[OC][15]sh[VRoot]o[VT] |
1pp | first person plural | sikhula | si[SC][1pp]khul[VRoot]a[VT] |
2ps | second person singular | ungena | u[SC][2ps]ngen[VRoot]a[VT] |
2pp | second person plural | ningashada | ni[SC][2pp]nga[Pot]shad[VRoot]a[VT] |
AC | Adjective concord | obuningi | obu[AC][14]ningi[AdjStem] |
AdjPre | Adjective prefix | sincane | si[AdjPre][7]ncane[AdjStem] |
BPre | Basic prefix | abantu | a[NPrePre][2]ba[BPre][2]ntu[NStem][1-2] |
Dem | Demonstrative pronoun | lena | le[Dem][4][Pos1] |
DemCop | Demonstrative copulative | naso | naso[DemCop][7][Pos2] |
EC | Enumerative concord | munye | mu[EC][1]nye[EnumStem] |
SCNeg | Negative subject concord | akakwazi | a[NegPre]ka[SCNeg][1a]ku[OC][15]az[VRoot]i[VT] |
NPrePre | Noun preprefix | abantu | a[NPrePre][2]ba[BPre][2]ntu[NStem][1-2] |
OC | Object concord | basifuna | ba[SC][2]si[OC][1pp]fun[VRoot]a[VT] |
PC | Possessive concord | kwakho | kwa[PC][15]kho[PronStem][2ps] |
PronStem | Pronoun stem | bethu | ba[PC][2]ithu[PronStem][1pp] |
SCPT | Past tense subject concord | sanikwa | sa[SCPT][1pp]nik[VRoot]w[PassExt]a[VT] |
QC | Quantitative concord | zonke | zo[QC][10]nke[QuantStem] |
RC | Relative concord | okuthi | oku[RC][15]th[VRoot]i[VT] |
RCPT | Relative concord past tense | ezabonakala | eza[RCPT][8]bon[VRoot]akal[NeutExt]a[VT] |
SC | Subject concord | ngikusho | ngi[SC][1ps]ku[OC][15]sh[VRoot]o[VT] |
SCHort | Hortative subject concord | masihambe | ma[HortPre]si[SCHort][1pp]hamb[VRoot]e[VTSubj] |
SCSit | Situative subject concord | bebuka | be[SCSit][2]buk[VRoot]a[VT] |
SCSubj | Subjunctive subject concord | abuke | a[SCSubj][1]buk[VRoot]e[VTSubj] |
Tags independent of class, person and/or number | |||
AdjStem | Adjective stem | obuningi | obu[AC][14]ningi[AdjStem] |
Adv | Adverb | kukhona | ku[SC][15]khona[Adv] |
AdvPre | Adverb prefix | ngemibuzo | nga[AdvPre]i[NPrePre][4]mi[BPre][4]buzo[NStem][3-4] |
ApplExt | Applied extension | ukubhekela | u[NPrePre][15]ku[BPre][15]bhek[VRoot]el[ApplExt]a[VT] |
AugSuf | Augmentative suffix | amakhosikazi | a[NPrePre][6]ma[BPre][6]khosi[NStem][9-6]kazi[AugSuf] |
AuxVStem | Auxiliary verb stem | babeshadile | ba[SC][2]be[AuxVStem]be[SCSit][2]shad[VRoot]ile[VTPerf] |
CausExt | Causative extension | ukubhalisa | u[NPrePre][15]ku[BPre][15]bhal[VRoot]is[CausExt]a[VT] |
Conj | Conjunction | kepha | kepha[Conj] |
CopPre | Copulative prefix | yibandla | yi[CopPre]i[NPrePre][5]li[BPre][5]bandla[NStem][5-6] |
DimSuf | Diminutive suffix | indodana | i[NPrePre][9]n[BPre][9]doda[NStem][9-6]ana[DimSuf] |
EnumStem | Enumerative stem | munye | mu[EC][1]nye[EnumStem] |
ExclNeg | Exclusive negative | alikaqedi | a[NegPre]li[SC][5]ka[ExclNeg]qed[VRoot]i[VTNeg] |
Fut | Future tense | izokhipha | i[SC][9]zo[Fut]khiph[VRoot]a[VT] |
FutNeg | Future tense negative | akazubuya | a[NegPre]ka[SCNeg][1]zu[FutNeg]buy[VRoot]a[VT] |
HortPre | Hortative prefix | masihambe | ma[HortPre]si[SCHort][1pp]hamb[VRoot]e[VTSubj] |
Ideoph | Ideophone | ngqo | ngqo[Ideoph] |
ImpPre | Imperative prefix | yenza | yi[ImpPre]enz[VRoot]a[VT] |
ImpSuf | Imperative suffix | zamanani | zam[VRoot]an[RecipExt]a[VT]ni[ImpSuf] |
IntensExt | Intensive extension | bahambisisa | ba[SC][2]hamb[VRoot]isis[IntensExt]a[VT] |
ComplExt | Completive extension | bajikelela | ba[SC][2]jik[VRoot]elel[ComplExt]a[VT] |
Interrog | Interrogtative | nini | nini[Interrog] |
InterrogSuf | Interrogative suffix | kwafikwaphi | kwa[SCPT][15]fik[VRoot]w[PassExt]a[VT]phi[InterrogSuf] |
LocPre | Locative prefix | kubo | ku[LocPre]bo[PronStem][2] |
LocSuf | Locative suffix | esifundeni | e[LocPre]i[NPrePre][7]si[BPre][7]funda[NStem][7-8]ini[LocSuf] |
LongPres | Long present tense | iyakhela | i[SC][4]ya[LongPres]khel[VRoot]a[VT] |
NegPre | Negative prefix | alikaqedi | a[NegPre]li[SC][5]ka[ExclNeg]qed[VRoot]i[VTNeg] |
NeutExt | Neuter extension | ukubukeka | u[NPrePre][15]ku[BPre][15]buk[VRoot]ek[NeutExt]a[VT] |
NStem | Noun stem | umuntu | u[NPrePre][1]mu[BPre][1]ntu[NStem][1-2] |
PassExt | Passive extension | ukubhalwa | u[NPrePre][15]ku[BPre][15]bhal[VRoot]w[PassExt]a[VT] |
PossKA | Possessive prefix “ka” | kadokotela | ka[PossKA]u[NPrePre][1a]dokotela[NStem][1a-2a] |
Pot | Potential | lingacela | li[SC][5]nga[Pot]cel[VRoot]a[VT] |
PotNeg | Potential negative | bangebuke | ba[SC][2]nge[PotNeg]buk[VRoot]e[VT] |
PreLoc-s | Pre-locative “s” | wasekhaya | wa[PC][1]s[PreLoc-s]e[LocPre]i[NPrePre][5]li[BPre][5]khaya[NStem][5-6] |
ProgPre | Progressive prefix | lisalandela | li[SC][5]sa[ProgPre]landel[VRoot]a[VT] |
PronSuf | Pronoun suffix | sona | so[PronStem][7]na[PronSuf] |
ProperName | ProperName | ujabulani | u[NPrePre][1a]Jabulani[NStem][1a-2a] |
QuantStem | Quantitative stem | zonke | zo[QC][10]nke[QuantStem] |
RecipExt | Reciprocal extension | ukubhekana | u[NPrePre][15]ku[BPre][15]bhek[VRoot]an[RecipExt]a[VT] |
ReflPre | Reflexive prefix | sizibuze | si[SC][7]zi[ReflPre]buz[VRoot]e[VTPerf] |
RelStem | Relative stem | ezimanzi | ezi[RC][10]manzi[RelStem] |
RelSuf | Relative suffix | abadingayo | aba[RC][2]ding[VRoot]a[VT]yo[RelSuf] |
VT | Verb terminative | niqasha | ni[SC][2pp]qash[VRoot]a[VT] |
VTNeg | Verb terminative negative | alikaqedi | a[NegPre]li[SC][5]ka[ExclNeg]qed[VRoot]i[VTNeg] |
VTPerf | Verb terminative perfect | sizibuze | si[SC][7]zi[ReflPre]buz[VRoot]e[VTPerf] |
VTSubj | Verb terminative subjunctive | abuke | a[SCSubj][1]buk[VRoot]e[VTSubj] |
VRoot | Verb root | niqasha | ni[SC][2pp]qash[VRoot]a[VT] |
Hlon | Hlonipha | uyacafuna | u[SC][1]ya[LongPres]cafun[Hlon][VRoot]a[VT] |
Bosch, Sonja E & Laurette Pretorius. 2017. A computational approach to Zulu verb morphology within the context of lexical semantics. Lexikos 27:152-182. http://dx.doi.org/10.5788/27-1-1398
Pretorius, Laurette & Sonja Bosch. 2012. Automated extraction of morphological grammars for Nguni with special reference to Southern Ndebele. Proceedings of AFLaT and SALTMIL Workshop, LREC 2012, Istanbul. http://aflat.org/files/pretorius%20&%20bosch.pdf
Bosch, Sonja E & Laurette Pretorius. 2011. Towards Zulu corpus clean-up, lexicon development and corpus annotation by means of computational morphological analysis. South African Journal of African Languages 31(1):138-158. http://uir.unisa.ac.za/bitstream/handle/10500/5539/bosch_sajal_v31_n1_a11.pdf?sequence=1&isAllowed=y
Pretorius L and SE Bosch. 2010. Finite State Morphology of the Nguni Language Cluster: Modelling and Implementation Issues. Lecture Notes in Computer Science Volume 6062/2010, p. 123-130. Berlin, Heidelberg: Springer, ISSN 0302-9743 (Print) 1611-3349 (Online). http://link.springer.com/chapter/10.1007/978-3-642-14684-8_13
Bosch, Sonja, Pretorius, Laurette, Fleisch, Axel. 2008. Experimental Bootstrapping of Morphological Analysers for Nguni Languages. Nordic Journal of African Studies 17(2):66-88. http://www.njas.helsinki.fi/
Pretorius, Laurette & Sonja Bosch. 2008. Containing overgeneration in Zulu computational morphology. Southern African Linguistics and Applied Language Studies 26(2): 209–216.
Bosch, Sonja E, Laurette Pretorius & Jackie Jones. 2007. Towards machine-readable lexicons for South African Bantu languages. Nordic Journal of African Studies 16(2):131-145. http://www.njas.helsinki.fi/
Bosch S, Jones J, Pretorius L & Anderson W. 2007. Computational Morphological Analysers and Machine-Readable Lexicons for South African Bantu Languages. Localisation Focus - The International Journal of Localisation 6(1): 22-28. ISSN 1649-2358. http://www.localisation.ie/oldwebsite/resources/lfresearch/Vol6_1BoschJonesPretoriusAnderson.pdf
Bosch, Sonja E & Laurette Pretorius. 2006. A finite-state approach to linguistic constraints in Zulu morphological analysis. Studia Orientalia 103:205-227. http://ojs.tsv.fi/index.php/StOrE/article/download/52604/16369
Pretorius, Laurette and Sonja Bosch 2003. Towards technologically enabling the indigenous languages of South Africa: the central role of computational morphology. Interactions of the Association for Computing Machinery 10 (2) (Special Issue: HCI in the developing world): pp.56-63. http://dl.acm.org/citation.cfm?doid=637848.637863
Pretorius, Laurette and Sonja Bosch 2003. Computational aids for Zulu natural language processing. Southern African Linguistics and Applied Language Studies 21(4) pp. 265-282. https://www.tandfonline.com/doi/abs/10.2989/16073610309486348
Pretorius, Laurette and Sonja Bosch 2003. Finite-State Computational Morphology: An Analyzer Prototype for Zulu. Machine Translation 18: 195-216. https://link.springer.com/article/10.1007/s10590-004-2477-4#page-1
Bosch, Sonja E and Laurette Pretorius. 2002. The significance of computational morphological analysis for Zulu lexicography, in South African Journal of African Languages, 2002, 22.1:11-20. http://uir.unisa.ac.za/handle/10500/5646
Pretorius, Laurette and Sonja Bosch 2002. Finite-State Computational Morphology - Treatment of the Zulu Noun. South African Computer Journal, 2002, 28:30-38. http://osprey.unisa.ac.za/TechnicalReports/UNISA-TR-2001-18.pdf
Pretorius, L. and Bosch, S. (2018). ZulMorph: Finite state morphological analyser for Zulu (Version 20190103) [Software]. Web demo at https://portal.sadilar.org/FiniteState/demo/zulmorph/
Please send your feedback to Laurette Pretorius or Sonja Bosch.