ADJ

adj

adjectives

Those words that can occur in attributive position and are not verbal nouns we define as adjectives; they are almost always multisyllablic. The paragon of adjectives are those that in -po, -mo, -pa, -ma, -can, -ldan, and -med. Etymologically -can, -ldan 'with', and -med 'without' are verbs, but when they occur in forms that are functioning nominally with the regards to syntax, they cannot be analyzed as verbs. So, we treat them together with the preceding syllable as an adjective. The formation of adjectives is a complex; which we intend to write about. In the mean time a search for words with this tag in the corpus will reveal some patterns. (Although words such as nag 'black', gsar 'new', and che 'big' are frequently treated like adjectives for pedagogical purposes, a single syllable in predicate position before verbal suffixes is a verb. These words may appear to occur attributively, but we see this as the formation of compounds. The compound blon-chen 'prime minister' contrasts beautifully with noun and adjective pair blon-po chen-po 'great minister', etc.) 

ADV
AdvType=Dir

adv.dir

directional adverbs

We use this tag for adverbs that end in -cad, i.e. phyin-cad 'after', son-cad 'before', man-cad 'below', yan-cad 'above', slan-cad 'after'. We also include phan-tshun 'mutually' in this category, for lack of a better place to put it.  Words tagged with this tag include: ཡན་ཆད་, མན་ཆད་, ཕན་ཚུན་, ཕྱིན་ཅད་, ཚུན་, ཚུན་ཅད་, སྔོན་ཅད་, སྔོན་ཆད་, སླན་ཅད་, ཕན་ཆད་, མན་ཅད་, ཡན་, ཡན་ཅད་  

ADV
AdvType=Intens

adv.intense

intensive adverbs

This tag was created for the adverbs rab (tu) 'very' and śin (tu) 'very', because as uninflected stems they do not occur independently. Subsequently the tag has also been applied to ཅུང་ཟད་, ཡེ་, ཧ་ཅང་, ཆེས་, ནན་ཏ, and ཅུང་ཙམ་.  

ADV
AdvType=Mim

adv.mim

mimetic adverb

This tag is used for onomatopoeia and phonophors. Examples include ཁོར་ཁོར་ཡུག་, ནར་ནར་, ཙབ་ཙོབ་, རྩོག་རྩོག་, ཁྲེ་ལོག་ལོག་, གཅེར་རེ་, ཅེ་རེ་, ཏབ་ཏབ་པོ, ཏབ་ཏོབ་, ཏུར་ཏུར་པོ་, རིག་རིག་, ལིང་, and ཧྲིག་ཧྲིག་.  

ADV
PronType=Dem

adv.proclausal

proclausal adverbs

A small number of words occur clause initially and refer to the content of the previous clause. Such adverbs often begin with a demonstrative stem. These words are classed as 'proclausal adverbs'. In the following list they are presented together with their affixal suffixes, but in our tagging we divide off these suffixes in order to be consistent with their treatment elsewhere: de (nas) 'then', de (ste) 'thereafter', gal (te) 'if', ḥo (na) 'in that case', ḥon (te) 'nevertheless', yaṅ (na) 'alternatively'. Words tagged with this tag include: དེ་, ཡང་, གལ་, ཁྱད་པ, འོ་, འོན་, དེ་བ, ལ, འང་, and སྒོས་. 

ADV
AdvType=Tim

adv.temp

temporal adverbs

Temporal adverbs are those that occur in syntactic positions or have morphological structure that suggests they are nouns, that refer to time, and that are not followed by case markers. In our corpus so far the temporal adverbs are son 'previously', da 'now', deṅ 'these days', mdaṅ 'yesterday', gdod 'at first', da-ruṅ 'still', phyi-ñin 'the next day', phyi-dro 'in the afternoon', and saṅ 'the next day'. There are also nouns that refer to time such as źag 'day' and gdugs 'noon', but these behave syntactically as nouns, for example by being suffixed with case markers. In the phrase sa gi gdugs la, where sa also appears to function as a noun, we have hesitatingly decided that for the time being it is best to tag saṅ as a temporal adverb. Words tagged with this tag include: ད་, སྔོན་, སྔ་, ད་དུང་, ནང་པ, ཕྱི་ཉིན་, ཁ་སང་, དོ་ནུབ་, གདོད་, ཉིན་, དེ་རིང་, དེང་, ནུབ་, མཚན་མོ་, ཅིག་ཅར་, གཟོད་, ཐོག་མ་, དབྱར་, དིང་སང་, མདང་, མཚན་, སང་, དགུན་, དེང་སང་, ནངས་པ, ཕྱི་, ཡུན་རིང་, སྲོད་, གདུགས་ཚོད་, གནངས་, གུང་, གློ་བུ, ཉིན་མཚན་, ད་ལོ, ནང་, ནངས་, ཕྱི་དྲོ་, མཚན་ཡལ་, འདང་, སླན་ཆད་, ཁར་རྩང་, ཁྲུགས་, གནའ་, གཟོད་མ་, ཉིན་ཞག་ཕྲུགས་, ད་ནང་, ད་རུང་, དོ་ནུབ, ནང་བ, ནངས་བ, ནམ་བཞིག་, ནུབ་མོ, ཕྱི་ཕྱི, མཚན་ཐོག་ཐག་, འཆར་ཁ་, འདང་གསུམ་, འཕྲོ་, རྒྱ་, སང་ཉིན་, སང་དགོང་, སང་ནང་པ, སྐྱ་རེངས་, སྔ་དྲོ་, སྔ་སོ, སྔ་སོ་, སྔོན, སྔོན་མ་. 

ADP
Case=Abl

case.abl

ablative case

The ablative case is the morpheme las when it follows a substantive.

ADP
Case=Agn

case.agn

agentive case

The agentive case is marked by any of the allomorphs -kyis, -gis, -gyis, -s, and -yis when suffixed to a substantive.

ADP
Case=All

case.all

allative case

The allative case is the clitic morpheme la when it occurs suffixed to substantives.

ADP
Case=Com

case.ass

associative case

The associative case is the clitic morpheme da when suffixed to a substantive.

ADP
Case=Cmp

case.comp

comparative case

The comparative case is either of the allomophs -bas or -pas suffixed to an substantive.

ADP
Case=Ela

case.ela

elative case

The elative case is the clitic morpheme -nas when suffixed to substantives.

ADP
Case=Gen

case.gen

genitive case

The genitive case is any of the allomorphs -kyi, -gyi, -gi, -yi, or -ḥi, when suffixed to a substantive.

ADP
Case=Loc

case.loc

locative case

The locative case is the clitic morpheme -na when suffixed to a substantive. Note however that when -na appears after a verbal noun ending in -pa or -ba in the meaning 'if/when', we tag this -na as 'cv.loc'.

ADP
Case=Qot

case.nare

the case -na-re

The case suffix -na-re which introduces direct speach.

ADP
Case=Ter

case.term

terminative case

The terminative case is any of the allomorphs -tu, -du, -ru, -su, -r when suffixed to substantives.

PART

cl.focus

Focus clitics

This tag was invented for the focus clitics ni and kyaṅ ~ yaṅ. The words currently tagged with this tag include: ནི་, ཡང་, ཀྱང་, འང་, ཅང་, and ཕྱིར་ཡང་. It is probably not correct to tag ཕྱིར་ཡང་ with this tag, but we must consider the matter further. The word lta when it is not tagged as a verb, or a relator noun, is also tagged as a focus clitic. 

VERB
Mood=Qot

cl.quot

The quotative clitic

We invented this tag for the 'quotative clitic' with the allomorphs ces, źes, and śes. However, the forms ce-na, ces-pa, etc. reveal that more properly this should be called a 'quotative verb' and tagged with a tag that begins with 'v'. Nonetheless, it is now inconvenient to change our approach. The words tagged with this tag are: ཞེས་, ཅེས་, ཞེས་པ་, ཞེ་, ཅེས་པ་, ཤེ་, ཅེ་, ཅེས་བ་, ཞིས་ (a misspelling of ཞེས་), and ཞེས་བ. 

SCONJ
Case=Abl

cv.abl

ablative converb

The ablative converb is the morpheme las when it follows a verb.

SCONJ
Case=Agn

cv.agn

agentive case

The agentive coverb is marked by any of the allomorphs -kyis, -gis, -gyis, -s, and -yis when suffixed to a verb.

SCONJ
Case=All

cv.all

allative converb

The allative converb is the clitic morpheme la when it occurs suffixed to verbs.

SCONJ
Case=Are

cv.are

The converb -a-re

This tag is used to tag the converb -ta-re and its allomorphs (cf. Walter Wimon, 1967, 'The Tibetan Particle "re",' Bulletin of the School of Oriental and African Studies, 30.1, pp. 117-126). 

SCONJ
Case=Com

cv.ass

associative converb

The associative converb is the clitic morpheme da when suffixed to a verbs. Note that this tag is particularly used with the polite imperative meaning of da. One also sees dain the meaning 'and' directly suffixed to morphological verb stems. In such cases we see daas evidence that the preceding form is funtioning nominally and should be tagged [n.count], barring reasons to do otherwise.

SCONJ
Case=Cont

cv.cont

continuing converb

The continuing converb, unknown in the early literature, is formally similar to the genitive followed by -n and functionally is similar to the imperfective converb in earlier strata of literature. The words tagged so far with this tag are: གིན་, ཀྱིན་, གྱིན་.
 

SCONJ
Case=Ela

cv.ela

elative converb

The elative converb is the clitic morpheme -nas when suffixed to verbs.

PART
Mood=Ind

cv.fin

final converb

The final converb is any allomorph of the marker -ḥo, which is an explicit marker of a finitie verb in the indicative mood. Words tagged with this tag include: སོ, འོ, དོ, ངོ, ནོ, ཏོ, རོ, ལོ, གོ, བོ, མོ, and པོ. 

SCONJ
Case=Gen

cv.gen

genitive converb

The genitive converb is any of the allomorphs -kyi, -gyi, -gi, -yi, or -ḥi, when suffixed to a verb.

PART
Mood=Imp

cv.imp

imperative converb

The imperative converb is the morpheme that explicitly marks the imperative or prohibitive mood, which has one of the three allomorphs -śig, -cig, and -źig.  Our corpus yields the unexpected śig-pa, which we have also tagged with this tag.  

SCONJ
Case=Impf

cv.impf

imperfective converb

The semi-final converb is the morpheme that has the three allomorphs ཞིང་, ཅིང་, and ཤིང་.  

SCONJ
Case=Loc

cv.loc

locative case

The locative converb is the clitic morpheme -na when suffixed to a verbs, or when suffixed to a verbal noun ending in -pa or -ba in the meaning 'if/when'.

SCONJ
Case=Odd

cv.odd

odd converbs

Odd converbs are any that we discovered after fixing the tagset. So far we the following words with this tag: ལས་ཆེ, ཨང་, ཚད་. 

PART
Polarity=Xor

cv.ques

question converb

The question converb is any allomorph of the marker -ḥam, which is an explicit marker of a polar question. Words tagged with this tag include: འམ, སམ, ནམ, དམ, གམ, ངམ, ཏམ, རམ, ལམ, བམ, མམ, ངེ་, and མཾ་   

SCONJ
Case=Rung

cv.rung

The converb rung

When the morpheme ru appears after a verb meaning 'however' it is tagged with this tag.

SCONJ
Case=Sem

cv.sem

semi-final converb

The semi-final converb is the morpheme that has the three allomorphs ཏེ, སྟེ, and དེ.  

SCONJ
Case=Ter

cv.term

terminative converb

The terminative converb is any of the allomorphs -tu, -du, -ru, -su, -r when suffixed to a verb.

DET
PronType=Dem

d.dem

demonstratives

This tag is used for the demonstratives ḥdi 'this' and de 'that'. These two words are tagged as demonstratives also when used as determiners (i.e. we do not distinguish rgyal-po de 'that king' from de 'that one, him'). Words tagged with this tag include: དེ་, འདི་, ཡ, ཚུ, མ, ཕ, ཕྱི་, ཕ་གི་, མ་ཀི, མ་གི, མ་གི་, མ་ཀི་, འདི་པ, ཡ་གི་.  

DET

d.det

determiner

The most frequent determiner is gźan 'other'. In addition, we identify ya-re 'each one (of two)' as a determiner on the basis of the following sentence: Brgya-byin daṅ Tshaṅs-paḥi rgyal-pos lag-pa ya-re nas zin te 'The kings Indra and Brahma each took him by one of his hands'. We reckon ḥbaḥ 'sole' as a determiner on the basis of sentences such as rus-pa daṅ khrag ḥbaḥ źig gis sa rtsog-rtsog ltar ḥdug-pa mthoṅ 'They saw the ground besmirched with only bone and blood'. Words tagged with this tag include: གཞན་, སོགས་, རང་, ཀ་, འབའ་, རེ་, རེ་རེ་, ག་, ཤ་སྟག་, ཁ་, གཞིན་, ཡ་རེ་, ཤས་དག་, སྣ་རེ་. We must check gzhin

DET
PronType=Emp

d.emph

emphasis

We initially invented this category for ñid in phrases such as rgyal-poñid 'that very king' or lusñid 'this body'. This syntactic use of ñid must be distinguished form its use in Buddhist terminology -ñid inside of words, e.g. sto-pa-ñid 'emptiness'.Apart from ñid, we have categorized kho-na 'the very, same' and re-re 'each' as emphatics. This use of kho-na should not be confused with its function as a third person pronoun in Old Tibetan. In one case kho appears not as a personal pronoun but as what seems to be a variant of kho-na; this kho we also classify as an emphatic, viz. smras-paḥi tshig ḥdi bden na bden-paḥi tshig bden-paḥi tshig smras-pas / bdag gi lus ḥdi sṅa-ma kho bźin du rma med-par gyur cig 'If these words that I have said are true, then because of saying true words, let this my body be without wounds like before' (vol 74, page 137b). Words tagged with this tag include: ཉིད་, རབ་, ཁོ་ན་, and ཁོ་. 

DET
PronType=Ind

d.indef

indefinite

This category is used for the allomorphs of the indefinite marker cig, źig, and śig as in pho-ña cig 'a messenger'. The indefinite marker, which occurs inside of noun phrases, must be distinguished from the identically looking imperative converb (see below), which occurs suffixed to the imperative stems of verbs. 

DET
Number=Plur

d.plural

plurals

The plural markers rnams, dag, kun, thams-cad, ḥo-cog (and its variants) and tsho are tagged as their own category 'plural'. However, plural pronouns (bdag-cag, khyed-cag, ḥu-bu-cag) are treated as one word. The plural marker -cag is not removed because to do so would result in pronominal stems which are not mutually comparable (viz. bdag is a singular pronoun, khyed a plural pronoun, and ḥu-bu has no independent life outside of u-bu-cag). We also tag ḥgaḥ 'some' as a plural, although in the abstract one would perhaps prefer to call it a 'quantifier'.The three verbs (la) sogs-pa 'etc', (daṅ) ldan-pa 'having', (daṅ) bcas-pa 'together with' could be seen as similar to quantifiers or otherwise to be treated as parts of the noun phrase, however, we have chosen to treat them etymologically as verbs. Words tagged with this tag include: རྣམས་, དག་, ཐམས་ཅད་, ཀུན་, ཡོངས་, སྣ་ཚོགས་, འགའ་, སྤྱི་, ཅུང་ཟད་, ཚོ་, དུ་མ, ཤས་, མཐའ་དག་, ཆུང་ཟད་, དགུ་ཅོག་, འགའ་རྫི་ར་, སོ་ཅོག་, སྣ་དགུ་. The inclusion of འགའ་རྫི་ར་ is probably a mistake.  

DET
PronType=Tsam

d.tsam

tsam and words with similar distribution

In an earlier version of the tag set we had tagged tsam as [cl.tsam], but then it became clear that it only ever followed nominals, so it clearly deserves the [d.xxx] category. Words tagged with this tag include ཙམ་, སྙེད་, ཙམ་པ་, རྙེད་ (a misspelling for སྙེད་), ཙམ་པ, སྙད་, and སྙེད་པ.  

INTJ

interj

interjections

This tag is used for interjections. Words tagged with this tag include: ཀྱེ་མ་, ཀྱེ་, ཨ་ལ་ལ་, ཀྱི་ཧུད་, ཁོ་དེ, མ་ལ་, འོ་དོད་, ཧེ་, ཨེ་མ་, and ཨེ་ཨེ་.  

NOUN
Number=Sing

n.count

count nouns

To identify lexical nouns we rely on the syntactic ability of the word in question to head a noun phrase, the dictionary meaning, and (when possible) the presence of nominal suffixes such as -mo, -po and -bu. Because we treat grammatical affixes as separate words, a single word normally does not not include grammatical affixes such as case markers and converbs. Nonetheless, there are well-motivated exceptions to this policy like gaṅ-na-ba 'whereabouts' and bdag-gi-ba 'that which is mine'. (We treat a -pa or -ba as part of the preceding word, regardless of the part-of-speech of the preceding word.) Another frequent category of exceptions is calques of Sanskrit terms, e.g. kun-tu-rgyu 'parivrājaka', which must be treated as a noun together because it looks like a verb phrase, but functions syntactically as a noun. In general, when two nouns occur in succession they are understood as a compound; the dandva compound pha-ma 'parents' is treated as one noun rather than two (pha 'father' and ma 'mother') and the tatpurua compound khyim-bdag 'householder' is likewise treated as one noun rather than two (khyim 'home' and bdag 'lord'). When an adjective precedes its head this is also treated as a compound. Thus, because dug btsan-po would be the expected order for 'mighty poison', we treat btsan-dug 'mighty poison' (vol. 74, page 147a) as a single word. Apposition is the one category of exceptions when the concatenation of two nouns is not treated as a compound. An example of this type is the two words bu khyeḥu in the following sentence: deḥi tshe yul de na khyim-bdag cig la bu khyeḥu źig btsas na / 'At that time, in that land, when a child, a son, was born to a householder'. Rather than understanding khyeḥu as an adjective modifying bu, or taking bu-khyeḥu as one word, the second word simply sits after the first one to add greater specificity. The reason to not treat apposition as compounding is because apposition occurs with proper nouns. For example, in the sentence deḥi tshe na rgyal-po Gsal-rgyal gyi btsun-mo chen-po Ḥbar-li źes bya-ba la bu-mo źig btsas-nas 'At that time a daughter was born to Ḥbar-li, the main queen of king Prasenajit', rgyal-po 'king' and Gsal-rgyal 'Prasenajit' are in apposition; to unite the two as a compound would lead to unintuitive, unwieldy, and therefore unacceptable consequences. 

NOUN
Number=Coll

n.mass

mass nouns

We divide out mass nouns from normal lexical nouns on the basis of two instances in our corpus where otherwise two nouns not in apposition would follow each other: nor-bu sbar gaṅ 'a handful of jewels' and chu sñim-pa gaṅ 'a handful of water'. Knowing that there exists this syntactic difference between normal lexical nouns and mass nouns, we tag all plausible mass nouns on the basis of their meaning (e.g. zaṅs 'copper'). A final list of mass nouns can only be securely put forward after the syntactic behavior of these words is better investigated. Words tagged with this tag include:  གསེར་, ཆུ་, རིན་པོ་ཆེ་, ནོར་, ཤ་, ཆུ, ས་, ཁྲག་, ཆང་, སྨན་, སེར་བ་, ནོར་བུ་, མར་, ངུར་སྨྲིག་, དངུལ་, ཡི་གེ་, ནས་, ལྕགས་, སྤུ་, གངས་, ཙན་དན་, གསེར་གཡུ་, གོས་, དར་, ཤེལ་, གསེར་དངུལ་, པདྨ་རཱ་ག་, མར་ཁུ, ཟངས་, ཡིག་ (I would think that this should only occur in compounds), གོས་དར་, ཆུ་སྡོར་, ཐེར་ཕྲུག་, བཟའ་, བསུ་ཆང་, བུམ་ཆུ, བཻདཱུརྱ, མཐོ་ཁྱད་, རིན་ཆེན་, སྣམ་བུ་, ཁ་བ་, གསེར་གྱི་མེ་ཏོག་ (a mistake), ཆུ་མེ་ཤིང་, ཉ་ཕྱིས་, ཐལ་བ་, ཐུད་, དབུས་ཕྲུག་, བུར་, མཐོན་མཐིང་, མར་གད་, མར་སར་, ཚོས་, ཟ་མ་བཅུད་, འཇིམ་པ་, འདག་, རིན་ཆེན་གསེར་, རྒྱགས་ཆུ་, རྡུལ་ཚོན་, རྣག་ཁྲག་, ལ་ཅ, ལ་ཆ, ལེ་བརྒན་, ཤ་ཁྲག་, སེང་ལྟེང་, སེང་ལྡེང་, སེར་, སྤེན་, and སྤེན་བད་.  

PROPN

n.prop

proper nouns

This tag is used for personal and place names.    

NOUN
NounType=Rel

n.rel

relator nouns

A relator noun is a noun, normally one syllable, which has a genitive before it and a spatial case (allative, locative, terminative) after it, e.g. deḥi naṅ na 'inside of that', deḥi druṅ du 'before him', deḥi ḥog tu 'under that', deḥi tshe na 'at that time'; relator nouns are not quantified and are not suffixed with adjectives, determiners or demonstratives. After identifying the class of relator nouns, these words are tagged as relator nouns even in syntactic contexts missing the genitive to left or the spatial case to the right. For example, in the sentence deḥi tshe blon-po źig phyi-rol nas naṅ du ḥoṅs-pa las/ mi btson du bzuṅ-ba mthoṅ-ba daṅ / 'Then the minister went inside from outside and saw the man who had been taken to prison', the relator noun tshe is not followed by a spatial case and the relator noun naṅ is not preceded by a genitive.  In the phrase bar ḥgaḥ 'sometimes' we do not consider bar a relator noun because it undergoes quantification, e.g. … bar ḥgaḥ ni gti-mug gi phyir lus btaṅ yaṅ chos kyi phyir bsod-nams kyi źiṅ daṅ lan ḥgaḥ yaṅ ma phrad-paḥi lus ḥdi ci ruṅ ? 'What is the use of this body which … sometimes has been used because of ignorance, but has not yet met an occasion (to serve) as a field of merit'. Words tagged with this tag include: ལྟ, ཕྱི, བཞིན་, ཚེ་, སྐད་, བར་, དུས་, ནང་, རྗེས་, དྲུང་, ལྟ་བུ, འདྲ་, འོག་, སྟེང་, ལྟ་བུ་, སླད་, ངང་, ཕྱོགས་, མཐའ་, སྒོ་, བཞིན, ས, སླ, མདུན, དབུས་, ཐོག་, ཁོང་, ཕྱི་བཞིན་, ཁ, ཆེད་, ཐོག་མ, སྐབས་, གོང་, རྟིང་, ཐད་, གན་, མཇུག་, འཕྲལ་, ལོགས་, ཁྲོད་, རྟེན་, སྨད་, ཁོངས་, གསེབ་, ཐ་མ, དུས, ཕུ་, དྲུད་, བསེབ་, མགོ, ཙ་, རིང་, རྩ, ལྟག་, སྐོར་, དཀྱིལ་, དབུང་, བྱང་, མཐའ་ལོགས་, ཚུན་, ཞོར་, འཁྲིས་, འོག་བ, རྒྱབ་, སྐེ་, སྐེད་, སྙིང་ཁ, སྟེངས་, སྟོད་. Several questions of delimiting relator nouns require further consideration, in particular, the approach to lta versus lta-bu.  

VERB
VerbForm=Vnoun|VerbType=Aux

n.v.aux

Nominalization of auxilliary verbs

This tag is used for the nominalized version of verbs that would be tagged [v.aux]. Forms currently tagged with this tag include དགོས་པ་, ནུས་པ་, ཤེས་པ་, དཀའ་བ་, ཕོད་པ་, ཐག་པ་, and རན་པ་.  

AUX
VerbForm=Vnoun

n.v.cop

Nominalization of copulas

This tag is used for nominalized versions of verbs that would be tagged [v.cop], q.v.

VERB
Tense=Fut|VerbForm=Vnoun

n.v.fut

Nominalization of a future verb stem

This tag is used for the nominalized versions of future verb stems, i.e. a future verb stem followed by pa, ba, tshul, rgyu, mkhan, mi, or bya.

VERB
Tense=Fut,Past|VerbForm=Vnoun

n.v.fut.n.v.past

Nominalization of a future or past stem of a verb

This tag is used for nominalized forms of verb stems that would be tagged [v.fut.v.past], q.v.

VERB
Tense=Fut,Pres|VerbForm=Vnoun

n.v.fut.n.v.pres

Nominalization of a future or present stem of a verb

This tag is used for the nominalized version of a verb stem that would be tagged [v.fut.v.pres], q.v.

VERB
Mood=Imp|VerbForm=Vnoun

n.v.imp

Nominalization of an imperative stem of a verb

The imperative stem of a verb is supposed not to be nominalized. However, in rare cases there are nominalized forms which appear to come from imperative stems, and we mark them accordingly. The form ཐོངས་པ་ is the only case that has come up so far.

VERB
VerbForm=Vnoun

n.v.invar

Nominalization of [v.invar]

This tag is used for nominalized forms of verb stems that would be tagged [v.invar] q.v.

VERB
Polarity=Neg|VerbForm=Vnoun

n.v.neg

Nominalized form of the negative verb med

This tag is used for the nominalized forms of the inherently negated verb med, such as med-pa and med-sa.

VERB
Tense=Past|VerbForm=Vnoun

n.v.past

Nominalization of a past verb stem

This tag is used for the nominalized versions of past verb stems, i.e. a past verb stem followed by pa, ba, tshul, rgyu, mkhan, or mi.

VERB
Tense=Past,Pres|VerbForm=Vnoun

n.v.past.n.v.pres

Nominalization of a past or present stem of a verb

This tag is used for the nominalized version of a verb stem that would be tagged [v.past.v.pres], q.v. 

VERB
Tense=Pres|VerbForm=Vnoun

n.v.pres

Nominalization of a present verb stem

This tag is used for the nominalized versions of present verb stems, i.e. a present verb stem followed by pa, ba, tshul, rgyu, mkhan, or mi.

PART
Polarity=Neg

neg

negation

The two negation prefixes ma and mi are classified together in their own category. The polar question prefix e is also categorized under this tag because it occupies the same syntactic position as ma and mi. For the two verbs min and med negation is inherent to their meaning, consequently 'neg' is also added to their POS-tags (i.e. min|v.cop.neg and med|v.neg see above).  

NUM
NumType=Card

num.card

cardinal numbers

In numbers we distinguish cardinals (gcig, gñis, gsum, etc.) and ordinals (daṅ-po, gñis-pa, gsum-pa, etc.). Other derivatives of numerals are treated according to their respective syntax, thus gcig-pa 'sole' is an adjective, gñi-ga 'both' is an indefinite pronoun, etc. In higher numbers each component digit is tagged separately, to do otherwise would prevent the computer from learning pattens by virtue of having to independently learn each possible cardinal number of the infinite possibility. When a numeral follows a noun we regard the two as separate words. In addition to obvious cases like mi lṅa 'five men', we also treat dkon-mchog gsum 'triratna' as two words. While it is true that one will almost never encounter any other numeral after the word dkon-mchog this fact says as much about Buddhism as it does about syntax.  The treatment of phrag well exemplifies our pragmatic attitude toward part-of-speech tagging. Although not a cardinal number itself, this syllable occurs inside cardinal numbers, effectively marking a certain place with a zero, e.g. sto phrag drug cu '1060'. Because the internal structure of numerals is not of interest to our project and adding a new tag for phrag would add unnecessary complications to our tag-set, we treat phrag itself as a cardinal number. There are occasions when the morphology of a word suggests that it might contain a numeral (e.g. mon-sum 'real', phun-sum-tshogs 'marvelous'), but there is no reason to see such cases as synchronically analyzable. Numbers that end with -po are also tagged [num.card].  

ADJ
NumType=Ord

num.ord

ordinal numbers

In numbers we distinguish cardinals (gcig, gñis, gsum, etc.) and ordinals (daṅ-po, gñis-pa, gsum-pa, etc.). Other derivatives of numerals are treated according to their respective syntax, thus gcig-pa 'sole' is an adjective, gñi-ga 'both' is an indefinite pronoun, etc. In higher numbers each component digit is tagged separately, to do otherwise would prevent the computer from learning pattens by virtue of having to independently learn each possible cardinal number of the infinite possibility. Some adjectives can be distinguished from ordinal numbers only in context. For example, in the phrase rdo-rje rtse-la-pa 'a five-pronged vajra' the noun phrase syntax and the overall meaning of the passage dictates that la-pa is part of the word rtse-la-pa 'five-pronged' rather than a word 'fifth'.   

PRON
PronType=Ind

p.indef

indefinite pronoun

The words la-la 'some', so-so 'each', and gñi-ga 'both' are used as indefinite pronouns in our pilot corpus. Words tagged with this tag include: ཁ་, སོ་སོ་, ལ་ལ་, རེ་རེ་, རེ་. (It looks like we are no longer putting gnyi-ga in here. I wonder why?) 

PRON
PronType=Int

p.interrog

interrogative pronouns

This is the tag used for interrogative pronouns such as su 'who', nam 'when', and gaṅ 'where'. Words tagged with this tag include: ཅི་, གང་, ཇི་, ཅི, སུ་, ནམ་, ཇི་སྲིད་, ག་, ག་རེ་, ཇི་ཙམ་, ཅི་སྲིད་, and ཙུག་. 

PRON
PronType=Prs

p.pers

personal pronouns

This tag is used for personal pronouns. Words tagged with this tag include: བདག་, ཁྱོད་, ང་, ཁྱེད་, བདག་ཅག་, ངེད་, ཁོ་, ཁོང་, རང་རེ་, ཁོ་བོ་, རང་, མོ་, ཁོ་མོ་, ཁྱེད་ཅག་, འོ་སྐོལ་, འུ་ཅག་, འོ་ཅག་. The word རང་ is only tagged as [p.pers] when it functions as a full independent pronoun in its own right; in ང་རང་ and ཁྱེད་རང་ the word རང་ is tagged [p.pers]. The word བདག་ is tagged as [p.pers] when it is a humble equivalent of ང་. When this word means 'lord' or ' the self' (philosophy) it is tagged [n.count] and when it means 'himself' it is tagged as [p.refl].
  

PRON
Reflex:Yes

p.refl

reflexive pronouns

This tag is used for the words རང་ and བདག་ when they are used as reflexive pronouns, i.e. in the meaning 'himself, herself'. The word རང་ is tagged as [p.refl] when it functions as a reflexive, in a sentence (invented) such as ཁོས་ དེ་རིང་ རང་ གི་ ཟན་ ཟོས་ སོ་ ' Today, he ate his own food' or in long form personal pronouns such as ང་རང་ and ཁྱེད་རང་. When རང་ is full independent pronoun meaning 'I'  or ' you' it is tagged [p.pers]. The word རང་ when it immediately appears after a personal pronoun is always tagged [p.refl]. The word བདག་ is tagged as [p.pers] when it is a humble equivalent of ང་; when it means 'lord' or ' the self' (philosophy) it is tagged [n.count] and when it means 'himself' it is tagged as [p.refl].
  

PUNCT

punc

punctuation

While a tsheg is considered part of the preceding syllable, all other punctuation marks are tagged as punctuation. Such marks include །, ༑, །།, ༄༅༅།, and །།།།. Punctuation marks are broken apart to aid tokenization, e.g. we tag ༄|punc ༅|punc ༅|punc །|punc. 

X

skt

Sanskrit or other metalingustic elements

This tag was invented for the odd Sanskrit word that occurs in Tibetan (e.g. in mantras) not as a syntactic word in its own right, but as something metalinguisitc at least vis à vis Tibetan. On analogy to this use of Sanskrit words in Tibetan we have extended the use of the tag to include all metalinguistic forms, e.g. words used not as the words they are but as the name of that word, or the name of a letter, words in Roman script, etc.

VERB
VerbType=Aux

v.aux

Auxiliary verbs

This tag is used for auxilliary verbs. By this we understand only those verbs that show no morphological stem variation and may occur directly after another verb stem. The words we tag with this tag are: ཐག་, དགོས་, ནུས་, མོད་, འདོད་, ཤེས་, སྲིད་, ཤེས་, རན་, གྲགས་, ཐང་, and ཕོད་. Note that this set is far fewer than the total number of verbs that might be considered auxiliary verb in classical or modern Tibetan. In fact, we now feel that it was a mistake the give these tags a separate encoding, instead they should have been tagged on the basis of their syntax (e.g. mi [neg] nus [v.pres.v.fut]) and the question of auxiliation and other syntactic uses of verb stems in Tibetan addresssed systematically in a future project. However, once we arrived at this understanding it was inconvenient to change our approach.

AUX

v.cop

Copula verbs

This tag is used for copula verbs, in particular yin and lags which do not distinguish various stems. In our current view, it would have been better not to have included this tag, but rather to have tagged these verbs according to their syntactic position, e.g. ma [neg] yin [v.past], yin [v.past.v.pres] nas [cv.ela], etc. However, when we made this determination it was already inconvenient to go back on the inclusion of the tag [v.cop].

AUX
Polarity=Neg

v.cop.neg

The negative copula verb

This tag is used for the inherently negated copula verb min.

VERB
Tense=Fut

v.fut

Future stem of a verb

This tag is used for future stem verbs, e.g. bya, gzu, sgrubs.

VERB
Tense=Fut,Past

v.fut.v.past

Future or past stem of a verb

This tag is used for verb stems that are both morphologically and syntactically ambiguous between the past and futur, e.g. bskol.

VERB
Tense=Fut,Pres

v.fut.v.pres

Future or present stem of a verb

This tag is used for verb stems that are both morphologically and syntactically ambiguous between the past and present, e.g. gśegs in the phrase mi gśegs.

VERB
Mood=Imp

v.imp

Imperative stem of a verb

This tag is used for imperative verb stems, e.g. byos, zu, sgrubs.

VERB

v.invar

Present, past, or future stem of a verb

This tag is used for verb stems that are morphologically and syntactically ambiguous among present, past, and future, e.g. gśegs in the phrase gśegs so.

VERB
Polarity=Neg

v.neg

The negative verb med

This tag is used for the inherently negated verb med.

VERB
Tense=Past

v.past

Past stem of a verb

This tag is used for past stem verbs, e.g. byas, bzu, bsgrubs.

VERB
Tense=Past,Pres

v.past.v.pres

Past or present stem of a verb

A verb stem that is both morphologically and syntactically ambiguous between the past and present, e.g. gśegs in the prhase gśegs nas.

VERB
Tense=Pres

v.pres

Present stem of a verb

This tag is used for present stem verbs, e.g. byed, ḥdzin, sgrub.