About ZulMorph

ZulMorph is a finite state morphological analyser for Zulu, developed using the Xerox finite state tools lexc and xfst. It also compiles with Foma.

Zulu words in their surface form are analysed to their base form. Any meaningful word can be input, and the output will be a complete morphological analysis of that word.

Words marked with a “+?” could not be analysed by the analyser for various reasons - in most of the cases it is because the stem/root of the word is not included in the embedded lexicon of the analyser yet.

Most words have multiple analyses, and the selection of the correct analysis would be context dependent. Such disambiguation forms a next processing step.

Stems marked with[Hlon] indicate stems belonging to ‘isiHlonipho’, a language of respect, which is a variety of Zulu used by married women to show respect towards their male in-laws and chiefs by avoiding their names and (parts of) words related to, or even just phonetically resembling, these names.

ZulMorph demo

A demo of ZulMorph is available.

The Zulu language

Zulu belongs to the Bantu language family, a ‘family’ of more than 400 languages spoken in Africa, from the Cape in the south to just north of the Equator. The Zulu language is a member of the Nguni group of languages and is spoken in South Africa in the province of KwaZulu-Natal, as well as in the northern Free State, south-eastern part of Mpumalanga and in Gauteng. Zulu is a widely spoken language in South Africa, with approximately 11,5 million first language speakers, i.e. 22.7% of the population http://mobi.statssa.gov.za/census2011/First Language.html. The ISO 639-3 code for Zulu is [zul] (http://www.sil.org/iso639-3).

Zulu morphology

Zulu is characterised by a rich agglutinating morphological structure, which is based on two principles, namely the nominal classification system, and the concordial agreement system. According to the nominal classification system, nouns are categorised by prefixal morphemes, which for analysis purposes have been sorted into classes and given numbers by scholars who have worked within the field of the Bantu language family. Table 1 shows examples of Meinhof’s (1932:48) numbering system of some of the noun class prefixes.

Table 1: Meinhof’s (1932:48) numbering system of noun class prefixes:

Noun Class Prefix Class Word form English

u+mu-

1 umuntu “person”
a+ba- 2 abantu “persons”
u- 1a unozinti “goalkeeper”
o- 2a onozinti “goalkeepers”
u+mu- 3 umuzi “homestead”
i+mi- 4 imizi “homesteads”
i+(li)- 5 idolo “knee”
a+ma- 6 amadolo “knees”
i+si- 7 isinkwa “bread”
i+zi- 8 izinkwa “breads”
i+n- 9 indlovu “elephant”
i+zin- 10 izindlovu “elephants”
u+(lu)- 11 ukhezo “wooden spoon”
u+bu- 14 ubusuku “night”
u+ku- 15 ukudla “food”

These noun class prefixes lead to concordial agreement that links the noun to other words in the sentence such as verbs, adjectives, pronouns and so forth. For example:

The concordial agreement links the nouns to other words in the sentence

In the above example, arc 1 shows the agreement between the subject of the sentence and the verb, arc 2 shows the agreement between the noun and its modifier, and arc 3 shows the agreement between the verb and the direct object of the sentence.

Noun prefixes usually indicate number - the uneven class numbers designate singular and the corresponding even class numbers designate plural. However, this is not always the case, since some nouns in so-called plural classes do not have a singular form; plurals of class 11 nouns are found in class 10, while a class such as 14 is usually not associated with number at all.

The noun prefix typically constitutes two parts, namely a preprefix (the initial vowel) and a basic prefix, but in some classes such as class 1a and its plural class 2a a basic prefix does not feature. In other instances such as classes 11 and 14 the basic prefixes are often discarded, with the result that only the preprefix appears in the surface form.

Reference

Meinhof, C. 1932. Introduction to the phonology of the Bantu languages. Berlin: Dietrich Reimer/Ernst Vohsen.

ZulMorph Tagset

Tag Description Example Analysis
Tags dependent on class, person and/or number
1ps first person singular ngikusho ngi[SC][1ps]ku[OC][15]sh[VRoot]o[VT]
1pp first person plural sikhula si[SC][1pp]khul[VRoot]a[VT]
2ps second person singular ungena u[SC][2ps]ngen[VRoot]a[VT]
2pp second person plural ningashada ni[SC][2pp]nga[Pot]shad[VRoot]a[VT]
AC Adjective concord obuningi obu[AC][14]ningi[AdjStem]
AdjPre Adjective prefix sincane si[AdjPre][7]ncane[AdjStem]
BPre Basic prefix abantu a[NPrePre][2]ba[BPre][2]ntu[NStem][1-2]
Dem Demonstrative pronoun lena le[Dem][4][Pos1]
DemCop Demonstrative copulative naso naso[DemCop][7][Pos2]
EC Enumerative concord munye mu[EC][1]nye[EnumStem]
SCNeg Negative subject concord akakwazi a[NegPre]ka[SCNeg][1a]ku[OC][15]az[VRoot]i[VT]
NPrePre Noun preprefix abantu a[NPrePre][2]ba[BPre][2]ntu[NStem][1-2]
OC Object concord basifuna ba[SC][2]si[OC][1pp]fun[VRoot]a[VT]
PC Possessive concord kwakho kwa[PC][15]kho[PronStem][2ps]
PronStem Pronoun stem bethu ba[PC][2]ithu[PronStem][1pp]
SCPT Past tense subject concord sanikwa sa[SCPT][1pp]nik[VRoot]w[PassExt]a[VT]
QC Quantitative concord zonke zo[QC][10]nke[QuantStem]
RC Relative concord okuthi oku[RC][15]th[VRoot]i[VT]
RCPT Relative concord past tense ezabonakala eza[RCPT][8]bon[VRoot]akal[NeutExt]a[VT]
SC Subject concord ngikusho ngi[SC][1ps]ku[OC][15]sh[VRoot]o[VT]
SCHort Hortative subject concord masihambe ma[HortPre]si[SCHort][1pp]hamb[VRoot]e[VTSubj]
SCSit Situative subject concord bebuka be[SCSit][2]buk[VRoot]a[VT]
SCSubj Subjunctive subject concord abuke a[SCSubj][1]buk[VRoot]e[VTSubj]
Tags independent of class, person and/or number
AdjStem Adjective stem obuningi obu[AC][14]ningi[AdjStem]
Adv Adverb kukhona ku[SC][15]khona[Adv]
AdvPre Adverb prefix ngemibuzo nga[AdvPre]i[NPrePre][4]mi[BPre][4]buzo[NStem][3-4]
ApplExt Applied extension ukubhekela u[NPrePre][15]ku[BPre][15]bhek[VRoot]el[ApplExt]a[VT]
AugSuf Augmentative suffix amakhosikazi a[NPrePre][6]ma[BPre][6]khosi[NStem][9-6]kazi[AugSuf]
AuxVStem Auxiliary verb stem babeshadile ba[SC][2]be[AuxVStem]be[SCSit][2]shad[VRoot]ile[VTPerf]
CausExt Causative extension ukubhalisa u[NPrePre][15]ku[BPre][15]bhal[VRoot]is[CausExt]a[VT]
Conj Conjunction kepha kepha[Conj]
CopPre Copulative prefix yibandla yi[CopPre]i[NPrePre][5]li[BPre][5]bandla[NStem][5-6]
DimSuf Diminutive suffix indodana i[NPrePre][9]n[BPre][9]doda[NStem][9-6]ana[DimSuf]
EnumStem Enumerative stem munye mu[EC][1]nye[EnumStem]
ExclNeg Exclusive negative alikaqedi a[NegPre]li[SC][5]ka[ExclNeg]qed[VRoot]i[VTNeg]
Fut Future tense izokhipha i[SC][9]zo[Fut]khiph[VRoot]a[VT]
FutNeg Future tense negative akazubuya a[NegPre]ka[SCNeg][1]zu[FutNeg]buy[VRoot]a[VT]
HortPre Hortative prefix masihambe ma[HortPre]si[SCHort][1pp]hamb[VRoot]e[VTSubj]
Ideoph Ideophone ngqo ngqo[Ideoph]
ImpPre Imperative prefix yenza yi[ImpPre]enz[VRoot]a[VT]
ImpSuf Imperative suffix zamanani zam[VRoot]an[RecipExt]a[VT]ni[ImpSuf]
IntensExt Intensive extension bahambisisa ba[SC][2]hamb[VRoot]isis[IntensExt]a[VT]
ComplExt Completive extension bajikelela ba[SC][2]jik[VRoot]elel[ComplExt]a[VT]
Interrog Interrogtative nini nini[Interrog]
InterrogSuf Interrogative suffix kwafikwaphi kwa[SCPT][15]fik[VRoot]w[PassExt]a[VT]phi[InterrogSuf]
LocPre Locative prefix kubo ku[LocPre]bo[PronStem][2]
LocSuf Locative suffix esifundeni e[LocPre]i[NPrePre][7]si[BPre][7]funda[NStem][7-8]ini[LocSuf]
LongPres Long present tense iyakhela i[SC][4]ya[LongPres]khel[VRoot]a[VT]
NegPre Negative prefix alikaqedi a[NegPre]li[SC][5]ka[ExclNeg]qed[VRoot]i[VTNeg]
NeutExt Neuter extension ukubukeka u[NPrePre][15]ku[BPre][15]buk[VRoot]ek[NeutExt]a[VT]
NStem Noun stem umuntu u[NPrePre][1]mu[BPre][1]ntu[NStem][1-2]
PassExt Passive extension ukubhalwa u[NPrePre][15]ku[BPre][15]bhal[VRoot]w[PassExt]a[VT]
PossKA Possessive prefix “ka” kadokotela ka[PossKA]u[NPrePre][1a]dokotela[NStem][1a-2a]
Pot Potential lingacela li[SC][5]nga[Pot]cel[VRoot]a[VT]
PotNeg Potential negative bangebuke ba[SC][2]nge[PotNeg]buk[VRoot]e[VT]
PreLoc-s Pre-locative “s” wasekhaya wa[PC][1]s[PreLoc-s]e[LocPre]i[NPrePre][5]li[BPre][5]khaya[NStem][5-6]
ProgPre Progressive prefix lisalandela li[SC][5]sa[ProgPre]landel[VRoot]a[VT]
PronSuf Pronoun suffix sona so[PronStem][7]na[PronSuf]
ProperName ProperName ujabulani u[NPrePre][1a]Jabulani[NStem][1a-2a]
QuantStem Quantitative stem zonke zo[QC][10]nke[QuantStem]
RecipExt Reciprocal extension ukubhekana u[NPrePre][15]ku[BPre][15]bhek[VRoot]an[RecipExt]a[VT]
ReflPre Reflexive prefix sizibuze si[SC][7]zi[ReflPre]buz[VRoot]e[VTPerf]
RelStem Relative stem ezimanzi ezi[RC][10]manzi[RelStem]
RelSuf Relative suffix abadingayo aba[RC][2]ding[VRoot]a[VT]yo[RelSuf]
VT Verb terminative niqasha ni[SC][2pp]qash[VRoot]a[VT]
VTNeg Verb terminative negative alikaqedi a[NegPre]li[SC][5]ka[ExclNeg]qed[VRoot]i[VTNeg]
VTPerf Verb terminative perfect sizibuze si[SC][7]zi[ReflPre]buz[VRoot]e[VTPerf]
VTSubj Verb terminative subjunctive abuke a[SCSubj][1]buk[VRoot]e[VTSubj]
VRoot Verb root niqasha ni[SC][2pp]qash[VRoot]a[VT]
Hlon Hlonipha uyacafuna u[SC][1]ya[LongPres]cafun[Hlon][VRoot]a[VT]

References

Bosch, Sonja E & Laurette Pretorius. 2017. A computational approach to Zulu verb morphology within the context of lexical semantics. Lexikos 27:152-182. http://dx.doi.org/10.5788/27-1-1398

Pretorius, Laurette & Sonja Bosch. 2012. Automated extraction of morphological grammars for Nguni with special reference to Southern Ndebele. Proceedings of AFLaT and SALTMIL Workshop, LREC 2012, Istanbul. http://aflat.org/files/pretorius%20&%20bosch.pdf

Bosch, Sonja E & Laurette Pretorius. 2011. Towards Zulu corpus clean-up, lexicon development and corpus annotation by means of computational morphological analysis. South African Journal of African Languages 31(1):138-158. http://uir.unisa.ac.za/bitstream/handle/10500/5539/bosch_sajal_v31_n1_a11.pdf?sequence=1&isAllowed=y

Pretorius L and SE Bosch. 2010. Finite State Morphology of the Nguni Language Cluster: Modelling and Implementation Issues. Lecture Notes in Computer Science Volume 6062/2010, p. 123-130. Berlin, Heidelberg: Springer, ISSN 0302-9743 (Print) 1611-3349 (Online). http://link.springer.com/chapter/10.1007/978-3-642-14684-8_13

Bosch, Sonja, Pretorius, Laurette, Fleisch, Axel. 2008. Experimental Bootstrapping of Morphological Analysers for Nguni Languages. Nordic Journal of African Studies 17(2):66-88. http://www.njas.helsinki.fi/

Pretorius, Laurette & Sonja Bosch. 2008. Containing overgeneration in Zulu computational morphology. Southern African Linguistics and Applied Language Studies 26(2): 209–216.

Bosch, Sonja E, Laurette Pretorius & Jackie Jones. 2007. Towards machine-readable lexicons for South African Bantu languages. Nordic Journal of African Studies 16(2):131-145. http://www.njas.helsinki.fi/

Bosch S, Jones J, Pretorius L & Anderson W. 2007. Computational Morphological Analysers and Machine-Readable Lexicons for South African Bantu Languages. Localisation Focus - The International Journal of Localisation 6(1): 22-28. ISSN 1649-2358. http://www.localisation.ie/oldwebsite/resources/lfresearch/Vol6_1BoschJonesPretoriusAnderson.pdf

Bosch, Sonja E & Laurette Pretorius. 2006. A finite-state approach to linguistic constraints in Zulu morphological analysis. Studia Orientalia 103:205-227. http://ojs.tsv.fi/index.php/StOrE/article/download/52604/16369

Pretorius, Laurette and Sonja Bosch 2003. Towards technologically enabling the indigenous languages of South Africa: the central role of computational morphology. Interactions of the Association for Computing Machinery 10 (2) (Special Issue: HCI in the developing world): pp.56-63. http://dl.acm.org/citation.cfm?doid=637848.637863

Pretorius, Laurette and Sonja Bosch 2003. Computational aids for Zulu natural language processing. Southern African Linguistics and Applied Language Studies 21(4) pp. 265-282. https://www.tandfonline.com/doi/abs/10.2989/16073610309486348

Pretorius, Laurette and Sonja Bosch 2003. Finite-State Computational Morphology: An Analyzer Prototype for Zulu. Machine Translation 18: 195-216. https://link.springer.com/article/10.1007/s10590-004-2477-4#page-1

Bosch, Sonja E and Laurette Pretorius. 2002. The significance of computational morphological analysis for Zulu lexicography, in South African Journal of African Languages, 2002, 22.1:11-20. http://uir.unisa.ac.za/handle/10500/5646

Pretorius, Laurette and Sonja Bosch 2002. Finite-State Computational Morphology - Treatment of the Zulu Noun. South African Computer Journal, 2002, 28:30-38. http://osprey.unisa.ac.za/TechnicalReports/UNISA-TR-2001-18.pdf

How to cite the output of ZulMorph

Pretorius, L. and Bosch, S. (2018). ZulMorph: Finite state morphological analyser for Zulu (Version 20190103) [Software]. Web demo at https://portal.sadilar.org/FiniteState/demo/zulmorph/

Contact us

Please send your feedback to Laurette Pretorius or Sonja Bosch.