Background: We can be pretty sure that unknown words belong to open classes. We therefore assign "out of vocabulary" words the default tag [n.invar], so that they will be subject to subsequent rules which apply to nominals. 
Rule: Replace the "out of vocabulary" tag [undefined] with [n.invar].
Background: Our tag set is now, more or less fixed. it is easy for a human user to introduce a new tag by accident, such as [n.v.fut.v.pres] instead of [n.v.fut.n.v.pres] or [case.al] instead of [case.all]. In order to preclude this phenomenon, we throw an error when a tag is not in our approved list.
Rule: If a word has a hypothetical tag that is not in our list of allowed tags, then throw an error.
Background: We find in practice it is difficult for human annotators not to analyze some intransitive verbs, or final constitudents of compound as adjectives (e.g. maṅ 'be many', che 'be big', -chen 'big' [in compound], etc.). In order to prevent the continued reintroduction of such analyses into the lexicon, this rule deletes adjectival readings from most monosyllabic words as specified in the description of our tag set.
Rule: If a monosyllabic word other than gaṅ or mchog is associated with the tag [adj], then delete the tag [adj].
Background: The dictionaries sometimes imply that med-pa is an adjectival suffix (e.g. nor-med-pa 'impoverished'). Our project treats such cases as small imbeded clauses (nor [n.count] med-pa [n.v.neg]), but annotators may mistakenly follow the dictionaries, so it is helpful to add a rule that flags mistaken instances. 
Rule: If a word ending in -མེད་པ(་) is associated with the tag [adj], then delete the tag [adj].
Background: The tag [v.invar] is used for verb stems that cannot be disambiguated among future, past, and present; for example, in the phrase gśegs so the verb gśegs could be any tense (cf. present byed do, past byas so, and future byao). A rule replaces each [v.invar] with "[v.fut] ~ [v.past] ~ [v.pres]". An exactly parallel argument applies for [n.v.invar]. 
Rule: Replace [v.invar] and [n.v.invar] with "[v.fut] ~ [v.past] ~ [v.pres]" and "[n.v.fut] ~ [n.v.past] ~ [n.v.pres]" respectively.
Background: The tag [v.fut.v.past] is used for verb stems that cannot be disambiguated between future and past; for example at the end of a sentence (i.e. before a śad) the verb form bsgyur is either a future (cf. bya།) or a past (cf. byas།). A rule replaces each [v.fut.v.past] with "[v.fut] ~ [v.past]". An exactly parallel argument applies for [n.v.fut.v.n.past].
Rule: Replace [v.fut.v.past] and [n.v.fut.n.v.past] with "[v.fut] ~ [v.past]" and "[n.v.fut] ~ [n.v.past]" respectively.
Background: The tag [n.v.fut.n.v.pres] is used for verb stems that cannot be disambiguated between future and present; for example, in the phrase mi [neg] gśegs the verb gśegs could either present (cf. mi byed) or future (cf. mi bya). (Both ma gśegs and ma byas are unambiguous pasts.) A rule replaces each [v.fut.v.pres] with "[v.fut] ~ [v.pres]". An exactly parallel argument applies for [n.v.fut.n.v.pres].
Rule: Replace [v.fut.v.pres] and [n.v.fut.n.v.pres] with "[v.fut] ~ [v.pres]" and "[n.v.fut] ~ [n.v.pres]" respectively.
Background: The tag [v.past.v.pres] is used for verb stems that cannot be disambiguated between past and present; for example, in the phrase gśegs nas [cv.ela], the verb gśegs is either a past (cf. byas nas) or a present (cf. byed nas). A rule replaces each [v.past.v.pres] with "[v.past] ~ [v.pres]". An exactly parallel argument applies for [n.v.past.n.v.pres].
Rule: Replace [v.past.v.pres] and [n.v.past.n.v.pres] with "[v.past] ~ [v.pres] " and "[n.v.past] ~ [n.v.pres]" respectively.
Background: In our understanding of Tibetan morphosyntax all verb stems are monosyllabic. Thus, if the rule based tagger suggests tagging a two or more syllable word as a verb stem, this must have been introduced via a mistake in the training data.
Rule: If a word has more than one syllable then delete all [v.xxx] tags from it.
Background: If verb stems consist always of single syllable, then it follows automatically that verbal nouns must consist of disyllables, the first syllable of which is a verb stem, and the second syllable of which is the nominalization suffix that takes the forms -pa and -ba. Later documents such as the Mi la ras pai rnam thar have other verbal noun suffixes such as -mkhan, -sa, -tshul, and -bya.
Rule: If a word has more than two syllables remove the analysis [n.v.xxx].
Background: We use the tag [dunno] for words that we are not yet prepared to assign with a part-of-speech tag. For the rule-baed tagger to suggest [dunno] as an analysis would be equivalent to offering no analysis at all; the presence of [dunno] associated with some words would interfere with the correct performance of rules that make uses of unambiguous contexts. Consequently, we remove [dunno] wherever another analysis is available.
Rule: Remove [dunno] if there are other tags.
Background: The syllable gras can be both a noun [n.count] 'number' or an alternate present of the verb bgraṅ 'count'. The ambiguity continues with mi gras, which could either be 'a number (of) people' or 'not counting'. However, if gras is followed by med-pa then it forms a small clause meaning 'numberless' and mi gras med-pa means 'numberless people'. Thus, it is possible to write a rule that disambiguates gras in this context.
Rule: Assign gras the interpretation [n.count] when it occurs directly before med-pa.
Background: The syllable ṅa can be both a pronoun [p.pers] 'I', a noun [n.count] 'das Ich' (which appears to occur only in the phrase ṅa rgyal 'be proud') or a letter of the alphabet [skt]. If ṅa is not followed by rgyal-(ba), then the interpretation [n.count] may be removed.
Rule: If ṅa is not followed by rgyal-(ba) then delete the interpretation [n.count] from ṅa.
Background: The syllable da is normally the temporal adverb 'now', but can have rarer tags such as [skt] when it occurs in a Sanskrit word. However, when followed by lta (with or without a final tsheg) the syllable da can securely be identified as [adv.temp]. 
Rule: Assign da the interpretation [adv.temp] when it occurs directly before lta.
Background: The syllable ches is either tagged as a past tense verb (e.g. ches te) or as an intensive adverb (ches maṅ du). (These are perhaps likely to be the same thing etymologically). As the adverb must modify a verb (or perhaps an adjective) the interpretation [adv.intense] can be precluded before words that lack these interpretations.
Rule: Remove the interpretation [adv.intense] from the syllable ches if the word following it has no [v,xxx], [n.v.xxx], or [adj] interpretation.
Background: A specific rule is necessary to treat śin-tu. We treat śin-tu as an infinitive construction, although śin is not otherwise attested as a verb, which is why it is not tagged like one. In our system tu and du are to be tagged as converbs after śin.
Rule: If du or tu follows śin [adv.intense] then [case.term] can be deleted as an option.
Background: A specific rule is necessary to treat rab-tu. We treat rab-tu as an infinitive construction, although rabis not otherwise attested as a verb, which is why it is not tagged like one. In our system tu and du are to be tagged as converbs after rab.
Rule: If du or tu follows rab then rab may be tagged as [adv.intense] and [case.term] can be deleted as an option from du/tu.
Background: A later rule removes [v.past] as an interpretation of a verb that is functioning as the subordinate verb in the indirect infinitive construction; that rule uses an immediately following matrix verb as a contextual trigger. The intransitive verb rtag 'last a while, be permanent' often occurs separated from its matrix verb in this construction, probably because it is functioning adverbialy across the whole clause. Nonetheless, as we do analyze rtag tu as subordinate in the indirect infinitive, [v.past] can also be precluded from rtag tu when it is not immediatley followed by a matrix verb.
Rule: In the phrase rtag tu specify rtag [v.fut.v.pres] tu [cv.term].
Background: The syllable rgya has several possible tags. However, in the combination rgya che r 'extensively, such that the size is big', it should always be tagged [n.count].  
Rule: Assign rgya the interpretation [n.count] when it occurs directly before che r.
Background: The syllable rgya can be a verb 'spread' as well as a related noun 'size' among other things. When followed by che or che-ba, this syllable is certainly the noun 'size'. 
Rule: In the pattern rgya che(-ba) tag delete from rgya any [v.xxx] tags it may have and the tag [adv.temp].
Background: The syllable sems can be a verb 'think' as well as a related noun 'mind'. When followed by khrugsthis syllable is certainly the noun 'mind'. 
Rule: In the pattern sems ḥkhrugs delete from sems any [v.xxx] tags it may have.
Background: The syllable byin can be a verb 'give' as well as a related noun 'glory'. When followed by che-ba 'great' this syllable is certainly the noun 'glory'. 
Rule: In the pattern byin che-ba delete from byin any [v.xxx] tags it may have.
Background: In one passage in the Mdzaṅs blun འགུལ་ [v.invar] ཞིག་ [cv.impf] ལྡེག་ [v.pres] in which ཞིག་ is a mistake for ཞིང་. In order to preclude this mistake making ཞིག་ [cv.impf] possible in many inappropriate places, we limit the interpretation to between two unambiguous verbs.
Rule: If ཞིག་ is not preceded and followed by an unambiguous verbs stem, then delete from it the tag [cv.impf].
Background: In one passage in the Bu ston བསམ་ གྱིས་ མི་ ཁྱབ་ བོ is misspelled བསམ་ གྱི་ མི་ ཁྱབ་ བོ. This mistaken passage causes problems because མི་ appears to come after a genitive and thus a later rule specifies it as a noun. The simplest way to deal with this mistaken passage (which may occur elsewhere since the spelling mistake is an easy one to make) is to specify the tagging for each word in the phrase, i.e. བསམ་ [n.count] གྱི་ [case.agn] མི་ [neg] ཁྱབ་ [v.fut.v.pres] བོ [cv.fin]. Nonetheless, we specify ཁྱབ་ as [v.fut][v.pres] rather than [v.fut.v.pres], because there are not verb tags of ambiguous stem at this point in the work flow. 
Rule: In the phrase བསམ་ གྱི་ མི་ ཁྱབ་ བོ tag བསམ་ [n.count] གྱི་ [case.agn] མི་ [neg] ཁྱབ་ [v.fut][v.pres] བོ [cv.fin].
Background: The syllable gtan can be a verb 'tighten' (a door) as well as a related noun 'order, system'. In combintion with la and a form of the verb 'fall' (e.g. gtan la phap) a phrasal verb is formed meaning 'ascertain'. In this context gtan is always interpretable as a noun.
Rule: In the patterns gtan la dbab (pa), gtan la pheb (pa), gtan la phab (pa), and gtan la 'bebs (pa) delete from gtan any [v.xxx] tags it may have.
Background: The syllable re has many possible analyses. In the phrase re źig 'one time' (also spelled re cig and re śig) we tag it [n.count]. More rarely in this pattern it could also be an imperative or present used for the prohibitive, e.g. ṅa la mi ma re źig byas-pa s 'let me not be a man'.
Rule: In the patterns re źig, re cig, and re śig remove from re the tags [d.det], [p.indef], [num.card], [v.fut], [v.past].  (Alternatively, only allow [n.count], [v.imp] or [v.pres]).
Background: The syllable re has many possible analyses. In the phrase re źig 'one time' we tag it [n.count]; we do not tag it this way otherwise.
Rule: If re is not followed by źig, cig, or śig remove the tag [n.count]. 
Background: The syllable re has many possible analyses. In the phrase źo re źo do 'one or two ounces' we tag it [num.card]. We presume that this construction is generalizable to a pattern X re X do.
Rule: In the pattern X re X do, tag re as [num.card].
Background: The syllable re has many possible analyses. In the phrase źo re źo do 'one or two ounces' we tag it [num.card]. We presume that this construction is generalizabel to a pattern X re X do.
Rule: If re occurs otherwise than in the pattern X re X do, remove the tag [num.card] from re.
Background: The syllable da has several possible tags, including [case.ass], [cv.ass], and [v.invar] 'clear'. Thus, the sequence of syllables ma da could mean 'and mother' [n.count] [case.ass] or 'not clear' [neg] [v.past]. In practice a particular passage will not be ambiguious because of the surrounding context. In particular, certain verbs regularly take the associative case in their rection. The verbs include mjal 'meet', phrad 'meet', ldan 'have', ḥdra 'be similar', bsre-ba 'mix with', mtshuṅs 'be similar', bsdos, and bcas 'have'. Immediately before these verbs, and their nominalized equivalents, the interpretation of da as a verb can be precluded
Rule: When da occurs before phrad(-pa), mjal(-ba), ldan-(pa), ḥdra(ba), bsre(-ba), mtshuṅs(-pa), bsdos(-pa) or bcas(-pa) remove from it all [v.xxx] tags.
Background: The syllable da has several possible tags, including [case.ass], [cv.ass], and [v.invar] 'clear'. Thus, the sequence of syllables ma da could mean 'and mother' [n.count] [case.ass] or 'not clear' [neg] [v.past]. In practice a particular passage will not be ambiguious because of the surrounding context. In particular, between two unambiguous nouns the associative case is essentially guarenteed. (This rule will not specify associate case per se, because this is done by a more general rule later on).
Rule: When da occurs between two words that both have only nominal interpretations (i.e. any tags of the types n.xxx [including n.v.xxx], p.xxx, num.xxx, and d.xxx but no othe tags), then delete from da any [v.xxx] tags.
Background: The syllable phyin can either be the past tense verb of 'go', or can be a noun 'paramita', in which case it is normally followed by the number drug. In fact, we perhaps should have taken phyin-drug as a single word, but this would violate how we normally handle numbers. It is safe to assume that phyin is a noun if and only if it is followed by drug.
Rule: When phyin occurs before drug remove from it the tag [v.past].
Background: The syllable phyin can either be the past tense verb of 'go', or can be a noun 'paramita', in which case it is normally followed by the number drug. In fact, we perhaps should have taken phyin-drug as a single word, but this would violate how we normally handle numbers. It is safe to assume that phyin is a noun if and only if it is followed by drug.
Rule: If phyin occurs before something other than drug, remove the tag [n.count] from it.
Background: The syllable na either many things including the locative case marker, locative converb, and the verb 'be sick'. The locative converb will perforce appear after a verb, and often before a punctuation marker or a focus clitic. In the same context, the verb 'sick' would be unlikely. (It is hard to imagine how 'sick' could be preceded directly by a verb, if such an occasion does arise this rule can be modified). So, after a verb a the end of a clause, na can be precluded as a verb. 
Rule: If na appears after a verb [v.xxx] and before punctuation [punc] or a focus clitic [cl.focus], then delete any [v.xxx] tags from na.
Background: The syllable na has several possible tags, including [case.loc], [cv.loc], and [v.invar] 'sick'. After relator nouns na is very unlikely to mean 'sick'. 
Rule: When na occurs after ḥi tshe, ḥi dus, or ḥi naṅ remove from it all [v.xxx] tags.
Background: The syllable na has several possible tags, including [case.loc], [cv.loc], and [v.invar] 'sick'. Before the verb bźugs 'remain, stay', na is very unlikely to mean 'sick'. (This rule cannot be applied to ḥdug 'stay' because it also has an auxilliary function. It cannot be applied to gnas, because gnas has not yet been disambiguated among its verbal and nominal uses.).
Rule: When na occurs before bźugs remove from it all [v.xxx] tags.
Background: The syllable na has several possible tags, including [case.loc], [cv.loc], and [v.invar] 'sick'. In certain circumscribed situations that anticipate double case marking, the interpretation of na as a verb can be precluded. These include དེ་བ ས་ ན, གཅིག་ ཏུ་ ན, གཉིས་ སུ་ ན, and དཔེ ར་ ན 
Rule: In the combinations དེ་བ ས་ ན(་), གཅིག་ ཏུ་ ན(་), གཉིས་ སུ་ ན(་), and དཔེ ར་ ན(་) specify ན as [case.loc].
Background: By definition [n.rel] cannot be quantified or modified, consequently, if a word has both [n.rel] and [n.count], and is quantified or modified, then the reading [n.rel] can be removed.
Rule: If a word 'w' has both [n.count] and [n.rel] as tags and this word is followed by a word with an unambiguous analysis [adj] or [num.ord] then delete the analysis [n.rel] from 'w'.
Background: By definition a relator noun is either preceded directly by a noun or by a genitive case marker. Thus, if a word that is either a relator noun or a count noun is preceded by an associative case marker, then the wordd in question must be interpreted as a count noun and not as a relator noun.
Rule: If a word has both the tags [n.count] and [n.rel] and is preceded by daṅ, then delete the tag [n.rel] from the word in question.
Background: By definition a relator noun is either preceded directly by a noun or by a genitive case marker. Thus, if a word that is either a relator noun or a count noun is preceded by a semi-final converb, then the word in question must be interpreted as a count noun and not as a relator noun.
Rule: If a word that can be either [n.rel] or [n.count] is preceded by [cv.sem], then remove the tag [n.rel] from the word in question.
Background: The sequence skad has the possible tags [n.count] and [n.rel]. In the very frequent expression di skad ces, it should always be tagged as [n.rel].
Rule: In the phrase ḥdi skad ces tag skad as [n.rel].
Background: The sequence de has the possible tags [d.dem] and [cv.sem]. The sequence skad has the possible tags [n.count] and [n.rel]. In the very frequent expression de skad smras the sequence de is always [d.dem], the sequence skad is always [n.rel],and the sequence smras is always [v.past].
Rule: Specify that the sequence de skad smras is de [d.dem] skad [n.rel] smras [v.past].
Background: In general the syllable skad can be a noun 'word', a relator noun, or a verb 'say'. The sequence skad cig is thus in theory either an imperative 'speak!' or 'a word'. However, as a locution figée the pair skad cig means 'a moment'. Thus, in this context skad can be specified as a noun and cig as a indefinite marker.
Rule: Specify that the sequence skad cig is skad [n.count] cig [d.indef].
Background: In general the syllable skad can be a noun 'word, language', a relator noun, or a verb 'say'. The syntax of the sequence rgya-gar gyi skad du implies that skad is a relator noun, but in fact in this case it means 'language'. 
Rule: Specify that in the sequence rgya-gar gyi skad du the word skad is [n.count].
Background: The form lta can have several possible tags, including [n.rel] and [v.pres]. When lta appears in de lta r, ji lta r, or ḥdi lta r then it is unambiguously [n.rel]. In addition the <r(a)> ར་, which has the possible tags [n.count], [case.term], and [cv.term] can be specified as [case.term].
Rule: Assign lta the tag [n.rel] and assign <r(a)> ར་ the tag [case.term] in the contexts de lta r, ji lta r, and ḥdi lta r.
Background: The sequence skor has among its possible tags [n.rel] and [v.imp]. When skor follows a genitive, this word is an unambiguously [n.rel]. The rule is stated generally since other words might also be ambiguous between [n.rel] and [v.imp].
Rule: If a word with the part-of-speech tags [n.rel] and [v.imp] (such as skor), follows after a sequence of an unambiguous noun ([n.count], [n.mass], [n.prop], [adj]) and a form of the genitive (kyi, gi, gyi, ḥi), then remove the tag [v.imp] from the word in question.
Background: The sequence skor has among its possible tags [n.rel] and [v.imp]. The genitive converb does not come after the imperative verb stem, neither does the semifinal converb, or the final converb, consequently in the sequences skor kyi, skor te, and skor ro the syllable skor is an unambiguously [n.rel]. In the event that there are other ambiguous [n.rel]/[v.imp] words the rule is stated generally.
Rule: If a word has only the two tags [n.rel] and [v.imp] and this word is followed by kyi, or te, or any one of the syllables go, to, do, bo, mo, ro, lo, so when followed by a śad, then delete the tag [v.imp].
Background: The sequence chos has among its possible tags [n.count] and [v.imp]. When chos follows a genitive (e.g. sas-rgyas kyi chos), this word is an unambiguously [n.count]. The rule is stated generally since other words might also be ambiguous between [n.count] and [v.imp].
Rule: If a word with the part-of-speech tags [n.count] and [v.imp] (such as chos), follows after a sequence of an unambiguous noun ([n.count], [n.mass], [n.prop], [adj]) and a form of the genitive (kyi, gi, gyi, ḥi), then remove the tag [v.imp] from the word in question.
Background: The sequence chos has among its possible tags [n.count] and [v.imp]. The genitive converb does not come after the imperative verb stem, neither does the semifinal converb, or the final converb, consequently in the sequences chos kyi, chos te, and chos so the syllable chos is an unambiguously [n.count]. In the event that there are other ambiguous [n.count]/[v.imp] words the rule is stated generally.
Rule: If a word has only the two tags [n.count] and [v.imp] and this word is followed by kyi, or te, or any one of the syllables go, to, do, bo, mo, ro, lo, so when followed by a śad, then delete the tag [v.imp].
Background: The sequence chos has among its possible tags [n.count] and [v.imp]. In the phrase chos ñan-(pa) the word chos is certainly a noun. 
Rule: In the sequence ཆོས་ ཉན(་པ(་)) delete from ཆོས་ the tag [v.imp]. 
Background: The sequence phyag has among its possible tags [n.count] and [v.fut]. In the phrase chos ḥtshal-(ba) 'prostrate' the word phyag is certainly a noun. 
Rule: In the phrase ཕྱག་ འཚལ་(བ(་)) remove from ཕྱག་ the tag [v.fut]. 
Background: The sequence ḥdra has among its possible tags [n.rel] and [v.xxx]. The verb ḥdra takes in its rection the associative case markers daṅ and a relator noun comes after a noun, not after a case marker, hence in the sequence daṅ ḥdra the analysis of ḥdra as [n.rel] may be precluded.
Rule: In the sequence daṅ ḥdra remove from ḥdra the tag [n.rel].
Background: The word dpag can be a noun 'measure' or a future of the verb dpog 'to measure'. We take dpag in the phrase dpag tu med(-pa) as the future verb.
Rule: In the sequence dpag tu med(-pa) remove from dpag the tag [n.count] and from tu the tag [case.term].
Background: The syllable ḥphro has many possible tags, both nominal and verbal. In the pattern V ḥphro la it can be confidently assigned the tag [n.rel] and la can confidently be assigned the tag [case.all].
Rule: If a word with unambiguous [v.xxx] tags is followed by ḥphro la then tag ḥphro as [n.rel] and la as [case.all]. 
Background: The syllable ḥphro has many possible tags, both nominal and verbal. Outside of the pattern V ḥphro la it can be confidently be precluded from the tag [n.rel].
Rule: If ḥphro does not follow a word with a possible [v.xxx] tag then delete from ḥphro the tag [n.rel].
Background: The syllable bar has many possible tags, a relator noun, a noun, and a demonstrative (probably better analyzed as a spatial adverb, but oh well). When it appears before the verb bar / chod it is part of a phrasal verb, so can be analzed as a noun. 
Rule: In the patterns བར་ (མི་ /མ་) ཆད་(པ(་)) and བར་ (མི་ /མ་) ཆོད་(པ(་)) tag བར་ as [n.count].
Background: The syllable bar has many possible tags, a relator noun, a noun, and a demonstrative (probably better analyzed as a spatial adverb, but oh well). When it appears before ḥgaḥ it can be tagged as a noun (n.count).  
Rule: In the pattern བར་ འགའ་ tag བར་ as [n.count].
Background: The syllable bar has many possible tags, a relator noun, a noun, and a demonstrative (probably better analyzed as a spatial adverb, but oh well). When it appears at the beginning of a clause and is followed by certain case markers then it must mean 'the middle' with a spatial sense, which we tag as a demonstrative. 
Rule: In the pattern ། བར་ (དུ་/ ན་/ ནས་) tag བར་ as [d.dem].
Background: The word yun riṅ has the two possible tags [n.count] and [adv.temp]. The tag [n.count] is used when yun riṅ participates in the sentence as the head of a case marked noun phrase, usually functioning adverbially in the terminative case, i.e. yun-riṅ-du, yun-riṅ źig, or yun-riṅ dus. When yun-ri is in the absolutive case, but is not the argument of the verb, i.e. the absolutive used adverbially (as in yun-riṅ mi gnas so) it is tagged [adv.temp].
Rule: When yun-riṅ is followed by du, źig, or dus, remove from it the tag [adv.temp].
Background: The word yun riṅ has the two possible tags [n.count] and [adv.temp]. The tag [n.count] is used when yun riṅ particulates in the sentence as the head of a case marked noun phrase, usually functioning adverbially in the terminative case, i.e. yun-riṅ-du, yun-riṅ źig, or yun-riṅ dus. When yun-ri is in the absolutive case, but is not the argument of the verb, i.e. the absolutive used adverbially (as in yun-riṅ mi gnas so) it is tagged [adv.temp].
Rule: When yun-riṅ is followed by an unambiguous verb [v.xxx] or verbal noun [n.v.xxx], or one of these two proceeded by mi or ma (i.e. yun-riṅ gnas, yun-riṅ gnas-pa, yun-riṅ ma gnas, yun-riṅ mi gnas-pa, etc.) then remove from it the tag [n.count].
Background: The transitive verb ḥjug, bcug, gźug, chug is used in a causative construction. If ḥjug appears in the pattern [v.xxx] [cv.term] ḥjug, then it can be specified as present
Rule: In the pattern [v.xxx] [cv.term] ḥjug remove from ḥjug the tag [v.fut].
Background: The lexicographical tradition gives saṅs 'awaken, expel (of darkness)' as a very without stem variation. However, the Kanjur shows a clear opposition between the non-past ḥtshaṅ rgya 'become a Buddha' and past saṅs rgyas. The morphology of this pattern is already quite clear, but it is further confirmed with patterns of negation: ḥtshaṅ mi rgya occurs 77 times in 25 different; *ḥtshaṅ ma rgya is unattested; saṅs ma rgyas occurs 122 times in 42 texts; saṅs mi rgyas occurs twice in one text. Putting aside the issue of how these verbs are used outside of this meaning, it is clear that saṅs rgyas is composed of two past stems.
Rule: In the pattern sas [v.xxx] rgyas [v.xxx] (or rgyas-pa [n.v.xxx]), remove [v.xxx] and [n.v.xxx] tags other than [v.past] (or [n.v.past]), provided the past tags are still there.
Background: The lexicographical tradition gives saṅs 'awaken, expel (of darkness)' as a very without stem variation. However, the Kanjur shows a clear opposition between the non-past ḥtshaṅ rgya 'become a Buddha' and past saṅs rgyas. The morphology of this pattern is already quite clear, but it is further confirmed with patterns of negation: ḥtshaṅ mi rgya occurs 77 times in 25 different; *ḥtshaṅ ma rgya is unattested; saṅs ma rgyas occurs 122 times in 42 texts; saṅs mi rgyas occurs twice in one text. Putting aside the issue of how these verbs are used outside of this meaning, it is clear that ḥtshaṅ rgya is composed of two non-past stems.
Rule: In the pattern ḥtshaṅ [v.xxx] rgya [v.xxx] remove the tag [v.past] from rgya.
Background: The syllable za can either be the present of the verb 'eat' or the invariant verb 'itch'. After the nouns zan 'food', zas 'food', or śa 'meat', the syllable za can be specified as the verb 'eat'.
Rule: In the pattern zan, zas, or śa (mi) za(-ba) specify za as [v.pres].
Background: The syllable źu can either be the present or future of the verb 'request' or the past of a verb 'melt'. After the noun lan 'reply', chos 'dharma', etc. the syllable źu can be specified as the verb 'reply'.
Rule: If any of the words དབང་ཆོས་, མཐུ་ཆེན་, དབང་མོ་ཆེ་, གྲོང་འཇུག་, དངོས་སྣང་, ཞེས་ or ཅེས་ appear before źu(-ba) delete [(n.)v.past] from źu(-ba).
Background: The syllable źu can either be the present or future of the verb 'request' or the past of a verb 'melt'. One often requests that someone do something, and in Tibetan this can be expressed by e.g. ḥgro-bar źu 'ask to go'. Thus, after nominalized verbs in the terminative case (always -r after -pa/-ba) the syllable źu can be specified as the verb 'reply'.
Rule: In the pattern XXX|[n.v.xxx] ར་ ཞུ་(བ་) delete [(n.)v.past] from ཞུ་(བ་). (In which XXX ends with -པ or -བ without a final tsheg).
Background: According to the dictionaries sbyin is the present and future of a verb 'give' that has the past and imperative byin. However, in one place in our corpus the sequence ma sbyin occurs. Presumably this is simply a spelling mistake for ma byin. Tagging this sbyin as [v.past] leads this interpretation to proliferate in unwanted contexts. Consequently, this rule precludes the general interpretation of sbyin as [v.past].
Rule: If sbyin is not proceded by the word ma, then delete the tag [v.past] from sbyin.
Background: According to the dictionaries gzu is future of a verb 'take' that has the past bzuṅ. However, in one place in our corpus the sequence gzuṅ nas occurs. Presumably this is simply a spelling mistake for gzuṅ nas. Tagging this bzuas [v.past] leads this interpretation to proliferate in unwanted contexts. Consequently, this rule precludes the general interpretation of gzuas [v.past].
Rule: If gzuṅ is not followed by the word nas, then delete the tag [v.past] from gzuṅ.
Background: The word bcas can either be an invariant verb 'to have' or the past tense of a verb ḥchaḥ 'make'. In the phrase yi-dam bcas 'promise' it can be specified as past.
Rule: In the phrase yi-dam bcas specify bcas as [v.past].
Background: The syllable bco can be both a stem of the invariant verb 'make' and the cardinal number 'ten', which particularly occurs before lṅa 'five' and brgyad 'eight'. If this syllable occurs before lṅa 'five' or brgyad 'eight', it is very likely to also be a cardinal number.
Rule: If the syllable bco with the tag [num.card] and any [v.xxx] tag is directly followed by lṅa or brgyad 'eight' then delete from it any [v.xxx] tags.
Background: The syllable bcu can be both the future verb stem of the verb chu 'draw water' and the cardinal number 'ten'. If this syllable occurs before a cardinal number it is very likely to also be a cardinal number.
Rule: If a word has both the tags [num.card] and [v.fut] and is followed by an unambigous cardinal number then delete from it the tag [v.fut].
Background: Properly the syllable lṅa has only the interpretation [num.card] 'five', but frequently the temporal adverb sṅa 'early' is misspelled as lṅa, leading to [adv.temp] as a lexical entry for lṅa. If the syllable lṅa occurs before or after an unambiguous number (i.e. after [num.card] and before either [num.card] or [num.ord]) then the possibility that it is a misspelling of smay be precluded and the interpretation as [num.card] is secure.
Rule: If the syllable lṅa (with or without a following tsheg) occurs before or after an unambiguous number (i.e. after [num.card] and before either [num.card] or [num.ord]) then assign lṅa the tag [num.card]. 
Background: Some syllable occur both as nouns and in the formation of numerals (e.g. rtsa 'vein' and so 'tooth' versus sum-cu rtsa gsum 'thirty three' and sum-cu so la 'thirty five'). Between two numbers such syllables require the interpretation [num.card]; in this context other interpretations can be excluded.
Rule: If any word has two possible part-of-speech tags, one of which is [num.card], and this word occurs between two words with the part-of-speech tag [num.card], then assign this word the tag [num.card].
Background: Some syllable occur both as nouns and in the formation of numerals (e.g. rtsa 'vein' and so 'tooth' versus sum-cu rtsa gsum 'thirty three' and sum-cu so la 'thirty five'). After a cardinal number and before an ordinal number such syllables require the interpretation [num.card]; in this context other interpretations can be excluded.
Rule: If any word has two possible part-of-speech tags, one of which is [num.card], and this word occurs between a word with an unambiguous part-of-speech tag [num.card] and a word with an unambiguous part-of-speech tag [num.ord], then assign the word in question the tag [num.card].
Background: Some syllable occur both as nouns and in the formation of numerals (e.g. rtsa 'vein' and so 'tooth' versus sum-cu rtsa gsum 'thirty three' and sum-cu so la 'thirty five'). If one such morpheme is not followed by a word that has an analysis as [num.card] or [num.ord] than the word in question is much more likely to be an ordinary noun than to be syllable in the formation of a numeral.
Rule: If one of the syllables rtsa, so, ṅa, or don, having both [num.card] and [n.count] as possible tags, is followed by a word that lacks both [num.card] and [num.ord] as possible tags, then delete [num.card] from the word in question.
Background: Proclausal adverbs tend to occur at the beginning of clauses (i.e. after a śad). In addition, we are choosing to tag yaṅ as [adv.proclausal] in the combinations yaṅ yaṅ, yaṅ daṅ yaṅyaṅ la yaṅyaṅ nas yaṅ, and yaṅ na. In these contexts the tag [cl.focus] can be removed from yaṅ. In all other contexts, the interpretation of yaṅ as [adv.proclausal] is quite unlikely.
Rule: If yaṅ occurs in the combination yaṅ na, then remove the analysis [cl.focus].
Background: Proclausal adverbs tend to occur at the beginning of clauses (i.e. after a śad). In addition, we are choosing to tag yaṅ as [adv.proclausal] in the combinations yaṅ yaṅyaṅ daṅ yaṅyaṅ la yaṅyaṅ nas yaṅ, and yaṅ na. In these contexts the tag [cl.focus] can be removed from yaṅ. In all other contexts, the interpretation of yaṅ as [adv.proclausal] is quite unlikely. (The interpretation of yaṅ as a verb 'be light' is also unlikely in these contexts).
Rule: If yaṅ occurs in the combinations yaṅ yaṅ, yaṅ daṅ yaṅyaṅ la yaṅ or yaṅ nas yaṅ, then remove the analysis [cl.focus] and any [v.xxx] tags.
Background: When yaṅ occurs at the end of a clause it does not seem sensible to allow its analysis as a proclausal adverb.
Rule: If yaṅ still has both potential tags [cl.focus] and [adv.proclausal] and occurs in the sequence [v.xxx] (na) yaṅ །, then delete from yaṅ the tag [adv.proclausal]. (Note: v.xxx here means any word that has only tags that begin with v.).
Background: Proclausal adverbs tend to occur at the beginning of clauses (i.e. after a śad or a tsheg-less -g). In addition, we are choosing to tag yaṅ as [adv.proclausal] in the combinations yaṅ yaṅyaṅ daṅ yaṅyaṅ la yaṅyaṅ nas yaṅ, and yaṅ na. In these contexts the tag [cl.focus] can be removed from yaṅ. In all other contexts, the interpretation of yaṅ as [adv.proclausal] is quite unlikely.
Rule: If yaṅ occurs at the beginning of a clause, then remove the analysis [cl.focus].
Background: In śin tu yaṅ it is not conceivable that yaṅ refers back to a previous clause, and in this phrase yaṅ does not mean 'again'. Consequently, in this position yaṅ can be specified as [cl.focus]. 
Rule: If yaṅ occurs after śin tu, then remove the analysis [adv.proclausal].
Background: At the end of a clause or sentence it is not conceivable that yaṅ would be a proclausal adverbmeaning 'again'. Consequently, in this position yaṅ can be specified as [cl.focus]. 
Rule: If yaṅ occurs before śad, then remove the analysis [adv.proclausal].
Background: The focus clitic yaṅ/kyaṅ takes the form kyaṅ after words that end in -g, -d, -b, and -s. Consequently, in this position yaṅ can be specified as [adv.proclausal]. 
Rule: If yaṅ occurs after a word that ends in -g, -d, -b or -s then remove the analysis [cl.focus].
Background: The focus clitic yaṅ by definition must follow something. Either it follows a noun phrase, or it follows a finite verb. Because considerable effort has been made to locate yaṅ [adv.proclausal], it is safe to assume that after noun phrases and verbs remaining instances of yaṅ are focus clitics. 
Rule: If yaṅ occurs after [adj], [d.xxx], [n.xxx], [num.xxx], [p.xxx], [n.v.xxx] or [v.xxx], then delete from it the tag [adv.proclausal].
Background: Several focus clitics may follow the conditional converb, v.xxx na ni, v.xxx na yaṅ, v.xxx na go, in this position the reading of yaṅ as meaning 'again' (adv.proclausal) or 'light, i.e. not heavy' (v.xxx) can be precluded.
Rule: If ཡང(་) occurs in the pattern XXX[v.xxx] ན་ ཡང(་), then delete from it the tag [adv.proclausal] and any [v.xxx] tags.
Background: Because proclausal adverbs are normally found at the beginning of sentences, and sentences normally end with a śad (or a -g not followed by a tsheg) most proclausal adverbs will occur after a śad (or a -g not followed by a tsheg). In Classical Tibetan o na is essentially always a proclausal adverb [adv.proclausal]. Theoretically however, the syllable o could be a demonstrative pronoun [d.dem]. Nonetheless, after a śad the interpretation of o as a demonstrative will be exceedingly rare, because proclausal adverbs are usually found in sentence initial position (i.e. after śad). Consequently it is prudent to interpret all instances of o na which occur after ། to be proclausal adverbs.
Rule: After a śad or tsheg-less -g interpret the first syllable of o na as [adv.proclausal].
Background: In Classical Tibetan o na is essentially always a proclausal adverb [adv.proclausal]. Theoretically however, the syllable o could be a demonstrative pronoun [d.dem] or [cv.fin]. When not followed by na or lags the syllable cannot be the a proclausal adverb [adv.proclausal]. 
Rule: If the syllable o is not followed by na or lags then remove the tag [adv.proclausal] from o.
Background: The syllable gal should always be tagged as [adv.proclausal] when it occurs before te. Some readers might wonder whether gal te is not best treated as a single word. However, the te here is the usual [cv.sem], so it is best to treat gal as an independent word. (The other proclausal adverbs (e.g. o na or de nas) refer semantically to the preceding clause. In contrast gal te anticipates a following na [cv.loc]. This semantic difference does not however warrant a new part-of-speech tag. There are computational disadvantages to adding new part-of-speech tags, and there are no analytic advantages offered by part-of-speech categories with only one member, since the lexical content of the word itself serves as an adequate means to locate the word and study its behavior.)  
Rule: Tag gal te as gal [adv.proclausal] te [cv.sem].
Background: The syllable la has many interpretations: the allative case, the allative converb, the stem of the proclausal adverb lar 'moreover', and the noun mountain pass. At the beginning of a sentence (i.e. after a śad or -g without a tsheg) proclausal adverbs are frequent, and a noun 'mountain pass' is possible. However, since they have to follow something, case markers and converbs are precluded in this position.
Rule: If a word la appears after ། (or -g without a tsheg), then delete [case.all] and [cv.all] from this la.
Background: The syllable la has many interpretations: the allative case, the allative converb, the stem of the proclausal adverb lar 'moreover', and the noun mountain pass. Before śad the pro-clausal adverb' can be precluded.
Rule: If a word la appears before །, then delete [adv.proclausal] from this la.
Background: The syllable ḥon can be a proclausal adverb or a rare verb. At the beginning of a clause before kya the interpretation as a proclausal adverb is secure.  
Rule: If a word ḥon occurs after a śad and before kyaṅ, then remove any [v.xxx] tags from ḥon.
Background: The coincidence of correct sandhi phenomena and the end of a sentence essentially guarantees the successful identification of the final converb.
Rule: If Co (e.g. lo) is preceded by a word that ends with -C (e.g. -l) and occurs before a །, źes, sñam, zer, źu(s), or gsuṅ(s) then assign tag [cv.fin] to Co.
Background: The allomorph -go of the final converb is not used before a śad, but instead is used equivalently not followed by a tsheg. Consequently, this allomorph requires its own rule.  
Rule: If go is preceded by a word that ends with -g and is not followed by a tsheg then assign tag [cv.fin] to go.
Background: The allomorph -o of the final converb occurs after verbs that end in open syllables. Because previous rules rely on the reduplication found in all other allomorphs of this morpheme, they will not locate the allomorph -o. This allomorph requires its own rule. Because it is difficult to specify 'ends with a vowel' when treating unicode Tibetan, we assume that all occurrences of -o occur before a śad, źes, sñam, zer, źu(s), or gsuṅ(s).
Rule: If o occurs before a །, źes, sñam, zer, źu, źus, or gsuṅs then assign tag [cv.fin] to o.
Background: Candidates for analysis as final converbs that fail to occur in the correct sandhi context can be confidently precluded from this analysis.
Rule: If a word with the shape Co has more than one potential tag (e.g. lo), one of which is [cv.fin], remove [cv.fin] from this word if the preceding word does not end with -C (e.g. -l). This rule should not apply to ḥo
Background: The final converb never precedes a genitive (whether case marker or converb). If a word with a possible [cv.fin] analysis occurs before a genitive, the [cv.fin] interpretation can be excluded.
Rule: If a word has both [cv.fin] and [n.count] among its possible tags, and this word occurs before a word that has either [case.gen] or [cv.gen] or both as its possible tags, then remove [cv.fin] from the word in question. 
Background: The sequence lo has several interpretations including the final converb (after words that end in -l) and the noun 'year'. When lo appears after a numeral, it is guaranteed to be the noun 'year'.
Rule: If the word lo appears directly after a word that has only the one possible tag [numeral], then assign lo the tag [n.count].
Background: The same sandhi contexts that applied to the final converb also occur for the question converbs. Consequently, a very similar pair of rules can isolate both secure examples of the question converbs and secure examples of words that happen to coincide with the question converb (e.g. nam 'when').
Rule: If a word of the shape Cam is preceded by a word that ends with 'C' and occurs before a །, or źes, sñam, zer, źu(s), or gsuṅ(s) then assign tag [cv.ques] to the word Cam.
Background: The same sandhi contexts that applied to the final converb also occur for the question converbs. Consequently, a very similar pair of rules can isolate both secure examples of the question converbs and secure examples of words that happen to coincide with the question converb (e.g. nam 'when').
Rule: If a word with the shape Cam has more than one potential tag (e.g. nam), one of which is [cv.ques], remove [cv.ques] from this word if the preceding word does not end with -C (e.g. -n).
Background: The syllable de can be a demonstrative, a proclausal adverb, or a form of the semi-final converb. As a semifinal converb de is one of three phonologically determined allomorphs along with te and ste. The allomorph de of the semifinal converb occurs only after words that end with -d. Consequently, any instance of de that occurs in other sandhi contexts must be the demonstrative or the proclausal adverb and not the semifinal converb.
Rule: If de does not occur immediately after a word that ends in -d remove from it the interpretation [cv.sem].
Background: The previous rule (22) prohibited the interpretation of de as a semi-final converb in incorrect sandhi contexts, but it is difficult to find contexts in which to prohibit the interpretation of de as a demonstrative. Although the semi-final converb is frequent after verbs, any de after a verb might belong to the following clause as a demonstrative. However, if de stands immediately before a śad, then its interpretation as belonging to the following clause is unlikely. Consequently, a search for de after a verb stem and before śad, should yield the semi-final converb.
Rule: If a word with the hypothesized tags [d.dem] and [cv.sem] occurs after a word with an unambiguous verb tag [v.xxx], and before །, then delete the tag [d.dem] from this word.
Background: Background: The Tibetan syllable <r(a)> ར་ can be four things: the terminative case marker [case.term] after a noun phrase that ends in an open syllable (e.g. rgyal-po r 'to the king'), the terminative converb [cv.term] after a verb stem that ends in an open syllable (e.g. za r jug 'make someone eat'), the noun [n.count] 'goat', or the syllable ra in Sanskrit words [skt]. However, the word ra 'goat' will have a tsheg that precedes it, but a tsheg will not precede the terminative case marker or terminative converb. At the very beginning of a sentence the noun ra 'goat' will not have a tsheg preceding it, but instead will have a śad or a tsheg-less final ga preceding it. An additional stipulation must be included in this rule because in the combinations ga r 'to where' and dga r 'to be happy' the letter 'ra' occurs with a preceding tsheg-less ga, but is nonetheless not the noun 'goat'.
Rule: If <r(a)> ར་ which can still be [case.term] and [cv.term] is preceded by a shad or tsheg-less ག, then remove both interpretations from ར་, unless the preceding syllable is ག or དག.
Background: The Tibetan syllable <r(a)> ར་ can be four things: the terminative case marker [case.term] after a noun phrase that ends in an open syllable (e.g. rgyal-po r 'to the king'), the terminative converb [cv.term] after a verb stem that ends in an open syllable (e.g. za r jug 'make someone eat'), the noun [n.count] 'goat', or the syllable ra in Sanskrit words [skt]. However, the word ra 'goat' (and the syllable ra in Sanskrit words) will have a tsheg that precedes it, but a tsheg will not precede the terminative case marker or terminative converb. (This rule cannot be used after a word that ends in ga because ga r [case.term] is possible, but a sentence beginning with ra is also possible, which could occur, e.g. after ḥdug.)
Rule: If <r(a)> ར་ which can still be [case.term] or [cv.term] comes after a word that does not end with ་, then delete [n.xxx] from ར་, unless the preceding syllable is ག or དག.
Background: The Tibetan syllable <r(a)> ར་ can be four things: the terminative case marker [case.term] after a noun phrase that ends in an open syllable (e.g. rgyal-po r 'to the king'), the terminative converb [cv.term] after a verb stem that ends in an open syllable (e.g. za r jug 'make someone eat'), the noun [n.count] 'goat', or the syllable ra in Sanskrit words [skt]. However, the word ra 'goat' (and the syllable ra in Sanskrit words) will have a tsheg that precedes it, but a tsheg will not precede the terminative case marker or terminative converb. (This rule cannot be used after a word that ends in ga because ga r [case.term] is possible, but a sentence beginning with ra is also possible, which could occur, e.g. after ḥdug.)
Rule: If <r(a)> ར་ which can still be [case.term] or [cv.term] comes after a word that does not end with ་, then delete [skt] from ར་, unless the preceding syllable is ག or དག.
Background: The letter <s(a)> ས་ can be the noun sa 'earth', the relator noun 'place' (a-ma i sa r 'at mother's'), or the agentive case suffix -s after nouns that end with open syllables. In Tibetan orthography the case suffix ས་ is written together with the preceding syllable (e.g. rgyal-po s རྒྱལ་པོ ས་ 'king [case.agn] '). Consequently, sa ས་ 'earth' [n.noun] and 'place' [n.rel] can be differentiated from -s ས་ [case.agn] because sa ས་ 'earth' and 'place' [n.rel] are preceded by a word that ends in tsheg. At the very beginning of a sentence the noun sa will not have a tsheg preceding it, but instead will have a śad or a tsheg-less final ga preceding it. However, certain care must be taken because ḥgas 'by some', gñis ga s 'by two', and ma-skye-dga s 'by Ma-skye-dga'' are contexts in which the letter 'sa' can be the agentive case marker, eventhough it is preceded by a tsheg-less ga.
Rule: If <s(a)> ས་ (which is not preceeded by the syllables ག, འག, or དག) is preceded by a word that ends in ་, or by a sentence boundary (། or tsheg-less ག), then delete [case.agn] as a possible analysis.
Background: The letter <s(a)> ས་ can be the noun sa 'earth', the relator noun 'place' (a-ma i sa r 'at mother's'), or the agentive case suffix -s after nouns that end with open syllables. In Tibetan orthography the case suffix ས་ is written together with the preceding syllable (e.g. rgyal-po s རྒྱལ་པོ ས་ 'king [case.agn] '). Consequently, sa ས་ 'earth' [n.noun] and 'place' [n.rel] can be differentiated from -s ས་ [case.agn] because sa ས་ 'earth' and 'place' [n.rel], and the Sanskrit syllable sa are preceded by a word that ends in tsheg.  
Rule: If <s(a)> ས་ which can still be [case.agn] comes after a word that does not end with ་, then delete [n.xxx] from ས་.
Background: The letter <s(a)> ས་ can be the noun sa 'earth', the relator noun 'place' (a-ma i sa r 'at mother's'), or the agentive case suffix -s after nouns that end with open syllables. In Tibetan orthography the case suffix ས་ is written together with the preceding syllable (e.g. rgyal-po s རྒྱལ་པོ ས་ 'king [case.agn] '). Consequently, sa ས་ 'earth' [n.noun] and 'place' [n.rel] can be differentiated from -s ས་ [case.agn] because sa ས་ 'earth' and 'place' [n.rel], and the Sanskrit syllable sa are preceded by a word that ends in tsheg.  
Rule: If <s(a)> ས་ which can still be [case.agn] comes after a word that does not end with ་, then delete [skt] from ས་.
Background: The Bu ston chos 'byung has a passage དེ འི་ གདོན་ནད་ དང་ ལུས་ ཏེ ། in which the sequence lus te would in the absence of the larger context be interpreted as lus [v.invar], but in this longer context is clearly lus [n.count]. Similarly, in the passage སྒྲ་ དང་ དོན་ ཏོ ། the system thinks don is a verb because of the following final converb, but the associative case makes clear it is a noun. So, this rule must attempt to isolate lus [n.count] before the semi-final converb, before the subsequent rule interprets ambiguous syllables as verbs before the semi-final converb. 
Rule: If lus (or don) is preceded a sequence of an unambiguous noun [n.count] followed by daṅ, then remove any [v.xxx] tags form lus (or don).
Background: A syllable that has both nominal and verb tags is almost guaranteed to be a verb if it occurs before an unambiguous [cv.fin], [cv.sem], or [cv.impf]. The one time where interpreting such a word as a verb is probably not correct is if a genitive case marker immediately precedes it.
Rule: If a syllable other than lta has both nominal and verb tags, is not immediately preceded by a word with [case.gen] among its tags, and occurs before an unambiguous [cv.fin], [cv.sem], or [cv.impf], then remove any nominal tags from it.
Background: A syllable that has both nominal and verb tags is almost guaranteed to be a verb if it occurs before an [cv.impf]. The one time where interpreting such a word as a verb is probably not correct is if a genitive case marker immediately precedes it. Unfortunately, two of the syllables with the tag [cv.impf], namely źiṅ and śiṅ also have nominal reasdings (respectively 'field' and 'tree'). Nonetheless, the nominal readings of these syllables may be precluded before a śad. Thus, in this environment the verbal reading of the word preceding źiṅ and śiṅ can be specifie. (If the rule as formulated leads to false positives, it could be further specified with regard to the specific sandhi required by źiṅ and śiṅ).
Rule: If a word with both [n.count] and [v.xxx] tags, which is not preceded by a syllable with [case.gen] (or [cv.gen]) among its tags, occurs before either of the syllables śiṅ or źiṅ which in turn precede ། (or other relevant punctuation marks), then remove the tag [n.count] from the word in question.
Background: The intensive adverbs śin-tu and rab-tu make no sense preceding a noun, so a word that can be interpreted as both a noun and a verb should be tagged as a verb if it is preceeded by śin-tu or rab-tu. (This rule led to the intriguing false positive śin-tu sems khrugs 'very mentally agitated'. The explanation is that sems must be an incorporated noun. An earlier rule now specifies that in the phrase sems ḥkhrugs the noun sems is always a noun. A similar false positive is rgyal-po śn-tu byin che-ba r bźugs-pa r sprul to 'the king manifested such that his glory (byin) was very great. If I had written this I would have said byin śin-tu che-ba. We have added a rule to also specify this passage. Another option would have been to treat byin che-ba as an adjective, since śin-tu can occur before an adjective and the presence of śin-tu is itself an argument that byin-che-ba is an adjective.)
Rule: If a word with both [n.count] and [v.xxx] tags, which is preceded by śin tu or rab-tu, then remove the tag [n.count] from the word in question.
Background: The syllable gnas may either be the verb 'stay' or the noun 'place', when occuring after a terminative case marker [case.term] at the end of a clause, the syllable can be specified as a verb, since it is governing a noun and occurs in the syntactic position of a verb. (This rule leads to a false positive in the case of ཕྱི ར་ གནས་, so a preceding ཕྱི must be precluded.)
Rule: If the word gnas, not preceded by phyi, having both [n.count] and [v.xxx] as possible tags, is preceded by a word which can only be [case.term] or [cv.term] then delete from this word tag [n.count].
Background: The syllable ḥgyur may either be the verb 'become' or the noun 'translation', when occuring after a terminative case marker [case.term] at the end of a clause, the syllable can be specified as a verb, since it is governing a noun and occurs in the syntactic position of a verb. Negation and the focus clitic yaṅ must be permitted to intervene before the terminative suffix and ḥgyur.
Rule: If the word ḥgyur, having both [n.count] and [v.xxx] as possible tags, is preceded by a word which can only be [case.term] or [cv.term], possibly with an an intervening mi or yaṅ, then delete from this word tag [n.count].
Background: The syllable thag may either be an auxilliary verb that normally appeared negated (V ma thag) 'as soon as verb' or the noun 'rope', when occuring negated after a verb, the nominal interpretation may be precluded.
Rule: In the pattern XXX[v.xxx] ma thag delete the tag [n.count] from thag.
Background: In the sentence  འདི་ སྤྲོད་ ཅིག་ གསུང་ འདི་ གནང་བ་ ཡིན་ ཟེར ། a rule that located nouns at the head of noun phrases would mistaken pinpoint gsuṅ as a noun. In fact, because it comes after an imperative (i.e. at the end of a discourse) the word gsuṅ is unambiguously a verbum dicendi. 
Rule: If an unambigous verb stem (i.e. a word with only [v.xxx] reading), with an imperative stem [v.imp] as a possible analysis, is followed by cig, śig, or źig and this word in turn is followed by gsuṅ or zer (with potential tags [v.xxx] and [n.count]), then delete from gsuṅ the tag [n.count].
Background: In the sentence ད་ ཡུལ་མི་ རྣམས་ ཀྱིས་ ཁྱོད་ རང་ དང་ བུ་ ལག་མཐུ ས་ གསོད་པ ར་ འདུག་ གིས ། ནུས་པ་ རང་ ཅི་ ཡོད་ དམ་ མཐུ་ བྱུང་བ་ རང་ གིས་ ཆོག་ མོད་ ཟེར་ མང་པོ་ འབར་ དུ་ བྱུང་བ་ ལ ། a rule that located nouns at the head of noun phrases would mistaken pinpoint zer as a noun. In fact, because it comes after an auxiliary verb (i.e. at the end of a discourse) the word zer is unambiguously a verbum dicendi. (Because auxilliary verbs are only located much later, we write this rule specifically with reference to mod, subject to rivision based on future false positives.
Rule: If an unambigous verb stem (i.e. a word with only [v.xxx] reading) is followed by mod and this word in turn is followed by gsuṅ or zer (with potential tags [v.xxx] and [n.count]), then delete from gsuṅ/zer the tag [n.count].
Background: Some nouns happen to look like verbal forms. For example bza might be the future of za 'eat' or it might be a noun 'food'. The nominal reading is clear when the word heads a noun phrase, i.e. occurs before determiners, numbers, and adjectives (e.g. bza źim-po 'tasty food'). Care must however be taken with those words that can function pronominally, thus in  བུ་ ཆུབ་ བཙས་ ཐམས་ཅད་ ཀྱང་ མགོ་ དེ་ ཕྱོགས་ སུ་ སྟོན་ ནོ ། the word thag is an auxilliary verb and is not to be construed with thams-cad. Our approach has been to try to add new rules before this one that specify the necessary interpretations. (Another false positive is ཁོང་ ཚོ་ སྐྲག་ ནས་ ཁྱོད་ ཀྱིས་ ལན་ ཁྱོད་ ཀྱིས་ ལན་ ཟེར་ གཅིག་ ལ་ གཅིག་ གྱོད་ འདལ་ ཞིང་ ལོག་ སོང་ ཟེར ། in which zer is specified as [n.count] because it is followed by gcig, eventhough in this case gcig la gcig is pronominal and not attributive. For this reason, we must specify that the word in question is not followed by gcig la gcig.) (The phrase phar khrel tshur khrel gave another false positive because tshu is [d.dem]. For this reason, we must specify that the word in question is not flollowed by tshu ).
Rule: If a word that has both [n.xxx] and [v.xxx] tags (and is not followed by gcig la gcig or tshu) is followed by a word with [d.xxx], [num.xxx] or [adj] tags delete all of the [v.xxx] tags.
Background: Some nouns happen to look like verbal forms. For example bza might be the future of za 'eat' or it might be a noun 'food'. The nominal reading is clear when the word heads a noun phrase, i.e. occurs before determiners, numbers, and adjectives (e.g. bza źim-po 'tasty food'). Because the indefinite marker źig, cig, śig, is homophonous with the imperative converb, and only a later rule distinguishes them, the previous rule is unable to use the indefinite determiner in its search for noun phrases, i.e. gnas śig is still ambiguous between 'a place' or 'reside!'. Nonetheless, because only the imperative and the present (or past) negated with ma are permitted before the imperative converb, nouns that resemble future, or unnegated presents and past, can be isolated. In the first step we preclude the future before the indefinite determiner.
Rule: If a word that has both [n.xxx] and [v.fut] tags is followed by źig, cig or śig, then delete [v.fut].
Background: Some nouns happen to look like verbal forms. For example bza might be the future of za 'eat' or it might be a noun 'food'. The nominal reading is clear when the word heads a noun phrase, i.e. occurs before determiners, numbers, and adjectives (e.g. bza źim-po 'tasty food'). Because the indefinite marker źig, cig, śig, is homophonous with the imperative converb, and only a later rule distinguishes them, the previous rule is unable to use the indefinite determiner in its search for noun phrases, i.e. gnas śig is still ambiguous between 'a place' or 'reside!'. Nonetheless, because only the imperative and the present (or past) negated with ma are permitted before the imperative converb, nouns that resemble future, or unnegated presents and past, can be isolated. In the second step we preclude the unnegated past and present before the indefinite determiner.
Rule: If a word that has both [n.xxx] and [v.pres] (or [v.past]) as tags is followed by źig, cig or śig, and is not preceded by ma, then delete the [v.pres] (or [v.past]) tag.
Background: The genitive case connects two noun phrases, consequently, the constituent after a genitive is very likely to be a noun. For example, in the phrase dben-paḥi gnas śig 'a place which is isolated' the syllable gnas is a noun and not the verb 'reside'; similarly in the phrase bla-maḥi źal-nas 'from the lama's mouth', the syllable źal is the noun 'mouth' rather than a form of the verb 'to plaster'. A difficulty with using a preceding genitive case marker to specify that the following word is a noun, is that it is only later rules that distinguish the genitive case marker and genitive converb. However, this obstacle can be avoided, by specifying that the word preceding the genitive is an anambiguous nominal (like the verbal noun dben-pa 'isolated' and the noun bla-ma 'lama' are).
Rule: If a word has at least one hypothesized [v.xxx] tag and also has some [n.xxx] tags, and this word comes after a word that can be a genitive case marker (i.e. འི་, ཀྱི་, གི་, and གྱི་), which in turn follows a word with an unambiguous [adj], [n.xxx], [num.xxx], [d.xxx] or [p.xxx] tag, then delete any [v.xxx] tags off the word in question.
Background: Since daṅ as a case marker connects two nouns and as a converb it occurs at the end of a sentence, it is highly likely that something that can be a noun, when it follows daṅ is in fact a noun, even if this word has verbal tags. For example, in the phrase daṅ rtags the word rtags will be a noun. Because gsuṅ and zer are nouns as well as verba dicendi they may appear after a verb in the imperative followed by the associative converb (e.g. ltos daṅ zer, śog daṅ gsu); consequently this rule cannot be applied to these two words. The initial implementation of this rule lead to the following false positive in which chas was identified as the noun 'piece of clothing' rather than the verb 'go': དེ་ནས་རྒྱལ་བུ་དེ་བློན་པོའི་བུ་ཡ་མཚན་ཅན་ལྔ་བརྒྱ་ལ་སོགས་པ་དམག་ཉིས་འབུམ་དང་ཆས་ནས།འཐབ་མོ་བྱས་པའི་ཚེ།. Consequently, an exception for chas must also be written in. This rule also led to a false positive in which bsre 'to mix' was identified as bsre 'mixture' because the verb bsre takes the case marker daṅ in its rection: བར་དོའི་སེམས་དང་བསྲེ་དགོས་པས།།.
Rule: If a word, other than gsuṅ, zer, bsre, or chas with both [n.count] and [v.xxx] as interpretations appears immediately after daṅ, then delete from this word all [v.xxx] tags.
Background: Because gsuṅ and zer are nouns as well as verba dicendi they may appear after a verb in the imperative followed by the associative converb (e.g. ltos daṅ zer, śog daṅ gsu). In such a position the interpretation of gsuṅ and zer as nouns is not feasible, eventhough they are preceded by daṅ.
Rule: If gsuṅ or zer is preceded by the sequence of an unambiguous imperative verb [v.imp] and the word daṅ, then delete from gsuṅ or zer the tag [n.count].
Background: The syllable ñams can be either a noun 'experience' or a verb 'deteriorate'. In the combinations ñams su myoṅ 'experience' (lit. taste as experience) and ñams su len 'practice' (lit. take as experience) the syllable ñams can be confidently analysed as a noun.
Rule: If the syllable ñams has tags [n.count] and [v.xxx] and is followed by su myoṅ(-ba) or su len(-pa), then remove any [v.xxx] tags from ñams.
Background: The two nouns śiṅ 'wood' and źiṅ 'field' are ambiguous with the imperfective converb [cv.impf]. However, the genitive (whether case marker or homophonous converb) cannot follow an imperfective converb, so, if a word with only [case.gen] and [cv.gen] as its tags follows one of these words, they can be specified as nominal.
Rule: If the word śiṅ or źiṅ are followed by gi (or any word that has only [cv.gen] and [case.gen] as its tags) then delete from śiṅ/źiṅ the tag [cv.impf].
Background: The noun źiṅ 'field' is ambiguous with the imperfective converb [cv.impf]. A rule specifying that [cv.impf] is only possible after verbs lead to two false positives in which [cv.impf] is also possible after an adverb. The two false positives are ཕྱི ར་ ཞིང་ and ཕྱི་ཕྱི ར་ ཞིང་.
Rule: In the sequences ཕྱི ར་ ཞིང་ and ཕྱི་ཕྱི ར་ ཞིང་ remove the tag [n.count] from ཞིང་.
Background: The noun śiṅ 'field' and źiṅ 'field' is ambiguous with the imperfective converb [cv.impf]. The imperfective converb must follow a verb, so if the word preceding śiṅ 'field' or źiṅ has no verbal tags [v.xxx] then the tag [cv.impf] can be deleted form śiṅ / źiṅ
Rule: If śiṅ or źiṅ have the tags [cv.impf] and [n.count] and the word preceding śiṅ or źiṅ has no possible [v.xxx] tags, then delete [cv.impf] from śiṅ/źiṅ.
Background: The syllable thag can be both a noun and a verb. Before forms of the verb 'cut' it will be a noun, and together with the verb mean 'decide'. 
Rule: If thag occurs before chod, gcod, bcad, gcad, chod-pa, gcod-pa, bcad-pa or gcad-pa, then delete any [v.xxx] tags from thag. 
Background: The syllable phan be either a verb  'to benefit' or a noun 'benefit'. In combination with forms of the verb ḥdogs, btags, gdags, thogs we understand phan as the noun. To err on the side of caution the rule does not apply to the imperative of the verb thogs.
Rule: If phan occurs before ḥdogs, btags, gdags, or ḥdogs-pa, btags-pa, gdags-pa, then delete any [v.xxx] tags from phan.
Background: The syllable thugs can either be an honorific noun 'mind' or the much rarer imperative of the verb bdug 'fumigate'. The interpretation as the noun 'mind' is assured in certain common colloqutions. 
Rule: If the syllable ཐུགས་ occurs before any of the following:  གཏུམ་(པ),  ཁྲོས་(པ),  སྐྲག་(པ),  བརྩེ་(བ), or  དགྱེས་(པ), then remove from ཐུགས་ the tag [v.imp].
Background: The syllable bya can either be a noun 'bird' or the future stem of the verb to do (byed, byas, bya, byos). Like all verbs 'to do', this verb can be used as verbum dicendi, in which case it is likely to be preceded by a quotative clitic and followed by a semi-final converb. When these syntactic conditions are met, it is possible to preclude the interpretation [n.count].
Rule: If the syllable བྱ་ occurs in the sequence [cl.quot] བྱ་ སྟེ, then remove the tag [n.count] from བྱ་.
Background: The sequence theṅs, can either be a past tense verb form [v.past], or a noun 'time, Mal, fois'. The noun is often followed by an ordinal number (first time, second time, etc.), whereas the verb form is very unlikely to be followed by an ordinal number. 
Rule: If the word theṅs, having the tags [v.past] and [n.count] is followed by a word with an hypothetical tag [num.ord] then delete from theṅs the tag [v.past].
Background: The sequence theg-pa, can either be a verbal noun from the verb theg meaning 'life, bear' or 'go', or a noun 'vehicle'.  In the phrase theg-pa chen-po 'Mahāyāna' theg-pa can be confidently interpreted as a noun.
Rule: In the sequence theg-pa chen-po, remove from theg-pa any [n.v.xxx] tags.
Background: The word srid can be a verb 'be possible' or a noun 'the world, all that is possible'. In the phrase srid kyi rgyal-po 'king of the world', the word 'srid' is definitely a noun.
Rule: In the sequence སྲིད་ ཀྱི་ རྒྱལ་པོ(་) tag སྲིད་ as [n.count]. 
Background: Some forms, such as skad, can receive both relator noun [n.rel] (e.g. ḥdi skad ces) and verbal tags [v.invar] (e.g. skad do). Because the structure [case.gen] [n.rel] [case.xxx] is used to define relator nouns, the occurrence of a genitive to the left can be used to isolate secure relator nouns and deprecate verbal analyses.
Rule: If a word has [n.rel] and [v.xxx] as possible tags, and is preceded by something with the hypothesized tag [case.gen] then remove [v.xxx].
Background: The syllable འདྲ can either be a relator noun 'like' or a verb 'be similar to'. After a verbal noun and before a śad then the reading as a relator noun can be excluded.
Rule: If འདྲ occurs between a word with the tag [n.v.xxx] and ། then remove the analysis [n.rel].
Background: Some nouns, particularly chos 'dharma' happen to resemble imperative verbs. In this case chos 'prepare!' (pres. chos). After the genitive case the nominal reading is likely and the imperative reading probably impossible.  
Rule: If a word that follows [case.gen] has both the tags [n.count] and [v.imp] then the tag [v.imp] can be deleted.  
Background: Some nouns, particularly chos 'dharma' happen to resemble imperative verbs. In this case chos 'prepare!' (pres. chos). At the beginning of a sentence (i.e. after a śad or a tsheg-less -g) the nominal reading is likely and the imperative reading is quite unlikely. However, the corpus does yield the example phug cig ces 'he said, "pierce!"' at the beginning of a clause. Consequently, some precaution must be taken. In particular, the rule should not be applied if the ambiguous word in question is followed by źig, cig, or śig. However, to block the rule in general before źig, cig, or śig would increase the ambiguity unecessarily since these three suffixes can also be interpreted as the indefinite determiner. This increased ambiguity can be avoided by only blocking the rule when these three suffixes in turn occur before elements that make it clear they are the imperative conveb [cv.imp] rather than the indefinite determiner [d.indef]. A list of such elements has been generated by examining the occurence of [cv.imp] in the corpus and includes:  ཅེས་(པ་), གསུང་(བ་), ཟེར་(བ་), གསུངས་, དང་, བྱས་(པ་), and ཞུས་པ་.  
Rule: If a word that follows a śad or a tsheg-less -g has both the tags [n.count] and [v.imp] and is not itself followed by cig, śig, or źig, which are in turn followed by ཅེས་(པ་), གསུང་(བ་), ཟེར་(བ་), གསུངས་, དང་, བྱས་(པ་), or ཞུས་(པ་), then the tag [v.imp] can be deleted.  
Background: Some nouns, particularly chos 'dharma' happen to resemble imperative verbs. In this case chos 'prepare!' (pres. chos). The ambiguity continues when źig, cig, or śig follows, since these suffixes can either be as imperative converbs [cv.imp] or indefinite determiners [d.indef]. However, if the word following źig, cig, or śig is one that typically appears after imperatives (such as a verbum dicendi) then the ambiguous word in question can be specified as an imperative. A list of such elements has been generated by examining the occurence of [cv.imp] in the corpus and includes:  ཅེས་(པ་), གསུང་(བ་), ཟེར་(བ་), གསུངས་, བྱས་(པ་), and ཞུས་པ་. (The word དང་ does occur often after the imperative converbs but also occurs after the indefinte determiner, so we exclude it from this list.) 
Rule: If a word has both the tags [n.count] and [v.imp] and is followed by cig, śig, or źig, which are in turn followed by ཅེས་(པ་), གསུང་(བ་), ཟེར་(བ་), གསུངས་, བྱས་(པ་), or ཞུས་(པ་), then the tag [n.count] can be deleted.  
Background: The verbs yod and med anticipate an immediately preceeding noun in the absolutive case. Consequently, if a word that has both verbal and nominal interpretations occurs immediately before yod or med, then it can be interpreted as a noun with confidence.
Rule: If a word with both [n.count] and [v.xxx] as interpretations appears immediately before yod, med, or their nominalized equivalents yod-pa and med-pa, then delete from this word all [v.xxx] tags.
Background: The noun lam is the same in form as one allophone of the question converb. The genitive can follow a noun but cannot follow a question converb. Consequently, the presence of the genitive after lam can be used to preclude its interpretation as a question converb. The rule can be interpreted generally, since there may be other nouns that are homophonous with allophones of the question converb.
Rule: If a word with both [n.count] and [cv.ques] as interpretations appears immediately before gyi then delete from this word the tag [cv.ques].
Background: The syllable sogs 'etc.' derives from a verb 'to gather' in some cases this verbal original is still a possible interpretation (e.g. la sogs te), but when preceeded directly by nominal material it must be analyzed as a determiner of some kind and not as a verb. Before the semi final converb or imperfective converb the determiner reading can be precluded. (We do not apply this rule to śiṅ and źiṅ because it may incorrectly catch the nominal use of these words).
Rule: If the syllable sogs with possible [v.xxx] tags and [d.det] tag followed by te, ste, or ciṅ, then delete the tag [d.det].
Background: The syllable sogs 'etc.' derives from a verb 'to gather' in some cases this verbal original is still a possible interpretation (e.g. la sogs te), but when preceeded directly by nominal material it must be analyzed as a determiner of some kind and not as a verb. However, when followed by an imperfective converb (ciṅ, śiṅ, źiṅ) the verbal reading should take precedent. For śiṅ and źiṅ we further require the presence of following punctuation in order to minimize the possibility that these syllables are the nouns 'wood' and 'field' respectively.
Rule: If the syllable sogs (with [v.xxx] tags and the tag [d.det]) precedes any of the syllables śiṅ, źiṅ which in turn precedes །, then remove the tag [d.det] from sogs.
Background: The syllable sogs 'etc.' derives from a verb 'to gather' in some cases this verbal original is still a possible interpretation (e.g. la sogs te), but when preceeded directly by nominal material it must be analyzed as a determiner of some kind and not as a verb. However, when it follows a verb stem (yin sogs) the verbal reading should take precedent.
Rule: If the syllable sogs (with [v.xxx] tags and the tag [d.det]) is preceded by a word that has only [v.xxx] tags, then remove the tag [d.det] from sogs.
Background: The syllable sogs 'etc.' derives from a verb 'to gather' in some cases this verbal original is still a possible interpretation (e.g. la sogs te), but when preceeded directly by nominal material it must be analyzed as a determiner of some kind and not as a verb. 
Rule: If the syllable sogs with possible [v.xxx] tags and [d.det] tag is not preceded by la, then delete any [v.xxx] tags.
Background: Hill (forthcoming) describes the projects treatment of maṅ in detail. In brief, maṅ as an independent work is either a verb or a determiner, but the determiner is always followed by dag.
Rule: If the syllable maṅ with possible [v.xxx] tags and [d.det] tag is not followed by dag, then delete the tag [d.det].
Background: When the verb dga 'rejoice' appears before the final converb we segment it as dga ḥo. In unicode Tibetan it is not possible to distinguish this from dag ḥo. Consequently, the rule based tagger will associate [d.plural] with this syllable. However, since dag ḥo is very unlikely to occur in a text, we can preclude this analysis.
Rule: In the sequence དག འོ ། remove the tag [d.plural] from དག. 
Background: The word dga can be a verb 'rejoice' and a noun 'joy'. In the sequence dgaḥ mgu the interpretation of dgaḥ as a noun can be precluded. 
Rule: In the sequence dgaḥ mgu remove the tag [n.count] from dgaḥ.
Background: As a perhaps unnecessary component of our part-of-speech protocals, when a morphological verbal noun refers to something real in the world we tag it as a noun (mkhas-pa 'wise guy', ston-pa 'teacher'). This convention leads to rather a few ambigous situations for the rule based POS-tagger. If a verbal noun exercizes governance over constituents to its left, then it is a verbal noun and not a noun. It will only be possible to specify this rule according to the rection of individual verbs. At present the only instance that has come up in our corpus is ལམ་དུ་འགྲོ་བ་. If འགྲོ་བ་ is immediately preceeded by a terminative case marker or terminative converb, then this word must be understood as a verbal noun and not as the word 'creature'. Nonetheless, the example ཅི འི་ སླད་ དུ་ འགྲོ་བ་ མང་པོ་ ཡོངས་ སུ་ བཏང་ སྟེ ། shows that for the purpose of this rule a terminative that follows a [n.rel] should not count.
Rule: If the word འགྲོ་བ་, having both [n.count] and [n.v.xxx] as possible tags, is preceded by a word which can only be [case.term] or [cv.term], and this word itself is not preceded by a word with the tag [n.rel], then delete from this word tag [n.count].
Background: As a perhaps unnecessary component of our part-of-speech protocals, when a morphological verbal noun refers to something real in the world we tag it as a noun (mkhas-pa 'wise guy', ston-pa 'teacher'). This convention leads to rather a few ambigous situations for the rule based POS-tagger. Things in the world are much more likely to have definite reference, and thereby permit a plural marker, than abstract things. So, the presence of plural markers can be used to isolate nouns that resemble verbal nouns. (Because -kun has a pronominal reading 'all of them' it can occur clause initially, so this rule should not apply to kun.)
Rule: If a word has both [n.count] and [n.v.xxx] as possible tags and this word is followed by a word with an unambiguous [d.plural] analysis (other than the word -kun) or the adjective maṅ-po, then delete from the word in question the [n.v.xxx] tags.
Background: The syllable gtibs has two possible tags, either [n.count] or [v.past.v.pres] (at this phase of tagging [v.past] and [v.pres]). However, when followed by ḥog 'below', it is possible to preclude the verbal readings.
Rule: If the syllable gtibs is followed by ḥog, then delete any [v.xxx] tags from gtibs. 
Background: Normally the syllable ma in the interpretation [neg] must occur before a verb or verbal noun. An exception is the multi word phrase skad cig ma gcig 'one moment', in which ma should be interpreted as [neg]  although it is followed by the numeral gcig 'one'.
Rule: In the phrase skad cig ma gcig remove the interpretation [n.count] from ma.
Background: The prhase mi dgaḥ-ba can either mean, 'not happy' with mi interpreted as negation, or it could mean 'happy person' with mi interpreted as [n.count]. The existing rules do a satisfactory job of differentiating these two interpretations in many cases, but in some cases the distinction is difficult to make without an understanding of the meaning of the passage. In such cases the system is incorrectly specifying dgaḥ-ba as [v.invar] rather than [v.fut.v.pres], because it is not sure that mi should be interpreted as negation. This rule manually specifies mi as [neg] in certain combinations attested in the corpus, in order to trigger the tagging of the verbal noun as [v.fut.v.pres] rather than [v.invar].
Rule: In the phrases sems mi dgaḥ-ba and sñiṅ mi dgaḥ-ba remove from mi the tag [n.count].
Background: The prhase mi dgaḥ-ba can either mean, 'not happy' with mi interpreted as negation, or it could mean 'happy person' with mi interpreted as [n.count]. The existing rules do a satisfactory job of differentiating these two interpretations in many cases, but in some cases the distinction is difficult to make without an understanding of the meaning of the passage. In such cases the system is incorrectly specifying dgaḥ-ba as [v.invar] rather than [v.fut.v.pres], because it is not sure that mi should be interpreted as negation. This rule manually specifies mi as [neg] in certain combinations attested in the corpus, in order to trigger the tagging of the verbal noun as [v.fut.v.pres] rather than [v.invar].
Rule: In the phrase ḥgyod-ciṅ mi dgaḥ-ba remove from mi the tag [n.count].
Background: The prhase mi dgaḥ-ba can either mean, 'not happy' with mi interpreted as negation, or it could mean 'happy person' with mi interpreted as [n.count]. The existing rules do a satisfactory job of differentiating these two interpretations in many cases, but in some cases the distinction is difficult to make without an understanding of the meaning of the passage. In such cases the system is incorrectly specifying dgaḥ-ba as [v.invar] rather than [v.fut.v.pres], because it is not sure that mi should be interpreted as negation. This rule manually specifies mi as [neg] in certain combinations attested in the corpus, in order to trigger the tagging of the verbal noun as [v.fut.v.pres] rather than [v.invar].
Rule: In the phrase śin-tu mi dgaḥ-ba remove from mi the tag [n.count].
Background: When the syllables mi or ma occur without a verb or verbal noun to their right, they cannot be negation. Conversely, if mi or ma occur followed by the end of a noun phrase, then they must be nouns. In many cases the presence of mi or ma within a noun phrase is signaled by the POS category of the following word.
Rule: If mi / ma, ambiguous between [neg] and [n.count], is followed by an unambiguous [adj], [d.xxx], [n.count], [n.mass], [num.xxx], or [p.xxx], or by ambiguous źig, or i then remove the [neg] tag. (The caveat 'unambiguous' automatically excludes dag which can be both a verb and a plural suffix. The rule is written to specify [n.count] and [n.mass] only, because negation is perfectly permissible before [n.v.xxx].)
Background: A genitive connects two nouns. Consequently, mi preceded by the genitive must either be a noun, or the first word of a noun phrase. In the former case mi can be tagged as a noun even if it precedes a present or future verb stem (e.g. rmo-pa i mi ḥgroḥo 'an ignorant person goes'). In the latter case, mi might still be negation (e.g. bskal-pa gras med-pa i mi dge-ba ḥi las 'non virtuous deeds of countless eons'). It is important to isolate examples of the first type, because they would be otherwise be misanalysed as negation because of the following verb. In order to preclude the second type it suffices to specify that the word following mi is not a verbal noun. No rule yet attempts to distinguish the genitive case from the genitive converb. Thus, in order to preclude the the morpheme preceeding mi is the genitive converb, it is necessary to add the stipulation that the word two before mi is not a verb stem. The generalization that the genitive connects two nouns has one exception; the verb rigs 'to be proper' governs the genitive case. The syllable mi between a genitive and rigs is likely to be a negation marker (e.g. rab tu byu-ba i mi rigs 'it is not proper to take ordination'). Thus, the rule that uses a preceding genitive to locate instances of mi as a noun, but preclude that the following word is rigs.A parallel argument applies to ma.
Rule: If mi / ma could be [n.count], follows a probable genitive, does not precede rigs, and does not precede a [n.v.xxx], and the word before the probable genitive is not an unambiguous [v.xxx] tag, then mark mi / ma as a [n.count].
Background: In general if mi or ma precedes a verb, they are likely to be interpreted as negation. However, the negative verbal noun med-pa is already inherently negated, so if mi or ma precedes med-pa then they must be nouns rather than markers of negation. The corpus gives us a case of mi med-sa also, where mi can be specififed as a noun for a similar reason. Similarly, because yod-pa is negated as med-pa (rather than *ma yod-pa or *mi yod-pa) mi or ma before yod-pa must also be understood as nouns. A similar logic also applies to min; this verb is inherently negated so mi min-pa must mean 'isn't a person'.
Rule: If mi / ma precedes medmed-pa, med-sa, yodyod-pa, min, min-pa, or min-ba then assign the tag [n.count] to it.
Background: The associative case connects two nouns. Consequently, mi preceded by the associative must either be a noun, or the first word of a noun phrase. In the former case mi can be tagged as a noun even if it precedes a present or future verb stem (e.g. lha daṅ mi ḥgroḥo 'gods and people go'). In the latter case, mi might still be negation (e.g. dge ba daṅ mi dge baḥi las 'virtuous and non-virtuous deeds'). It is important to isolate examples of the first type, because they would be otherwise be misanalysed as negation because of the following verb. In order to preclude the second type it suffices to specify that the word following mi is not a verbal noun. An exception to the rule is made for the small number of verbs, such as ldan or mthun, which select for a noun phrase marked with associative case.
Rule: Tag mi or ma as a noun if it is conjoined by the associative case marker daṅ with an unambiguous noun (but not a verbal noun), unless mi or ma is followed by a verb (such as ldan, mthun, bstun, phrad, mjal, ḥdra, or bcas) which selects for a noun phrase marked with associative case.
Background: If mi is followed by either present or future verbs, then it is probably [neg]. We must ignore the verb ga since it can also be an interrogative pronoun (i.e. mi gaṅ can mean both 'not full' and 'which person'), the verb dag 'pure' because it can also be a plural marker, the verb sto 'be empty' can also be a number 'thousand', and the verb sum because it can also be a number 'three'. The verb sogs 'etc.' and maṅ 'be many' and the verbal noun bgres-pa 'old', ṅan-pa 'evil', phrad-tshad 'whichever one meets', rid-pa 'emaciated', ltos-pa 'hungry', dgaḥ-ba 'happy' and yod-pa 'exist', are also excluded; their semantics dictates that they are more likely to occur after the noun mi 'person' than they are to be negated. (In the case of yod, it would be negated as med and not as mi yod).
Rule: If mi which is ambiguous between [neg] and [n.count] is followed by a word (other than gaṅ, dag, re, stoṅ, sum, sogs, maṅ, lta, su, daṅ, dgu, bcu or rid-pa, bgres-pa, ṅan-pa, phrad-tshad, ltogs-pa, dgaḥ-ba, yod-pa) with the hypothesized tags [v.pres], [v.fut], [n.v.pres] or [n.v.fut] and could be [neg], then assign tag [neg] to the word mi.
Background: Although ma most characteristically negates the past, in the prohibitive construction it negates the present. This fact allows certain examples of ma to be securely analyzed as the negation prefix rather than the noun 'mother'.
Rule: If ma is followed by an unambiguous present verb stem, which in turn is followed by a possible imperative converb (i.e. cig, źig, śig), then assign [neg] to ma, and remove [d.indef] from cig, źig, śig.
Background: If ma is followed by past tense verbs or yin, then it is probably [neg]. The word 'mother' can occur in these positions, but its occurrence without any explicit nominal marking is rare. It must be kept in mind nonetheless that this rule will yield some false positives.  The first false positive we have run across is ma śi 'mother died', so we specifically exclude śi from this rule. A second false positive that the corpus gave was ma lta r 'like a mother', in which the lta was permitted as a verb and consequently the ma interpreted as negation.  So, we specifically exclude lta from this rule. The third false positive that the corpus gave was ma yaṅ 'even mother'. After the verb yaṅ 'be light, i.e. not heavy' entered the lexicon this incorrectly appeared to the computer to mean 'not light'. So, we specifically exclude yaṅ from this rule.  A fourth false positive that the corpus gave was ma bas 'more than mother'. After the verb bas 'be used up' entered the lexicon this was interpeted as 'not used up'. So, we specifically exclude bas from this rule.
Rule: If ma which is ambiguous between [neg] and [n.count] is followed by a word with the hypothesized tags [v.cop], [v.pres], [v.past], [n.v.cop], [n.v.pres], or [n.v.past], and this word is not śi, lta, yaṅ, or bas then assign the interpretation [neg] to the word ma. (An earlier version of this rule deleted the [n.count] interpretation, but when ma [skt] entered the lexicon this rule left the interpretation [skt] on ma, interfering with further rules.)
Background: In the sequence ma śi the syllable ma is ambiguous; the phrase either means 'didn't' die or 'mother died'. However, if this sequence means 'didn't die' then there is likely to be an immediately preceding noun phrase that specifies who it is who didn't die. On the other hand, if it means 'mother died' there will not be a preceding noun phrase. The presence of a preceding noun phrase can thus be used to isolate cases in which ma is securely analyzable as negation. 
Rule: If a noun phrase (i.e. a tag [adj], [d.xxx], [n.xxx], [num.xxx], or [p.xxx]), possibly followed by a focus clitic (i.e. [cl.focus]), precedes the sequence ma śi then remove the tag [n.count] from ma
Background: The negation markers ma and mi normally occur only before verbs. If case marking morphology occurs directly after ma and mi then it is possible to preclude the interpretation of these words as negation markers.
Rule: If ma or mi occur directly before a word that has [case.xxx] as one of its tags and has no [v.xxx] tags, then delete [neg] from ma or mi.
Background: There are syllables that are interpretable both as normal nouns and as morphological affixes (e.g. nas 'barely', źi 'field', las 'deed', śig 'louse' versus nas elative case marker and elative converb, źi imperfective converb, las ablative case marker and ablative converb, śig imperative converb and indefinite determiner). Because case markers and converbs must follow nouns and verbs respectively, at the left edge of noun phrases (i.e. after a śad, the genitive case or the associative case) only the noun interpretation is possible (e.g. ། nas dkar mo .. 'white barley', ། źi gi 'of the field', a-ma gaṅ dod-pa i las 'whatever deed mother wishes'). Two additional stipulations must be included in this rule because in the combinations like ga r 'to where' the letter 'ra' occurs with a preceding tsheg-less ga, but is nonetheless not the noun 'goat', and simiarly in a combination like gñi ga s 'by both' the letter 'sa' occurs preceding a tsheg-less ga, but it is nonetheless not 'earth'. 
Rule: If any word (other than r<a> or s<a>) has at least two possible part-of-speech tags, one of them [n.count] and one more that are either [cv.xxx] or [case.xxx], and this words appears directly after ། (or a tsheg-less -g), [case.gen] or [དང་], then remove any tags [cv.xxx] and [case.xxx] tags from this word. (The associative case must be searched for by its form དང་ rather than its POS, since དང་ also has the POS options [cv.ass] and [v.invar]).
Background: There are syllables that are interpretable both as normal nouns and as morphological affixes (e.g. nas 'barely', źi 'field', las 'deed', śig 'louse' versus nas elative case marker and elative converb, źi imperfective converb, las ablative case marker and ablative converb, śig imperative converb and indefinite determiner). Double case markers do occur in Tibetan, but the corpus suggests that after a genitive the interpretation of las as an ablative is very unlikely. Thus, it is possible to specify that after the gentitive las is the noun 'deed'.
Rule: When the word las follows any one of the words gi, gyi, kyi, or ḥi then tag las as [n.count].
Background: There are syllables that are interpretable both as normal nouns and as morphological affixes (e.g. nas 'barely', źi 'field', las 'deed', śig 'louse' versus nas elative case marker and elative converb, źi imperfective converb, las ablative case marker and ablative converb, śig imperative converb and indefinite determiner). In the relatively common phrase rgya-mtsho las, it is reasonable to assume that las is always the case marker and never the noun 'deed'.
Rule: In the sequence rgya-mtsho las remove from las the tag [n.count].
Background: There are syllables that are interpretable both as normal nouns and as morphological affixes (e.g. nas 'barely', źi 'field', las 'deed', śig 'louse' versus nas elative case marker and elative converb, źi imperfective converb, las ablative case marker and ablative converb, śig imperative converb and indefinite determiner). There is a reduplicated verb construction (e.g. soṅ soṅ-ba las, mchi mchi-ba las in which las [n.count] 'deed' can be precluded.
Rule: In the patterns [v.xxx] [n.v.xxx] las, in which the verb and verbal noun are the same (e.g. soṅ soṅ-ba las or mchi mchi-ba las) remove from las the tag [n.count].
Background: The syllable źiṅ is either a noun 'field' or a imperfective converb. After a verb stem and before a śad or another verb stem, the interpretation of źiṅ as a noun can be excluded.
Rule: If źiṅ occurs after [v.xxx] and before either ། or [v.xxx], then delete [n.count] from źiṅ
Background: The syllable śiṅ is either a noun 'wood' or a imperfective converb. After a verb stem that ends with -s and before a śad or another verb stem, the interpretation of śiṅ as a noun can be excluded.
Rule: If śiṅ occurs after a [v.xxx] that itself ends with -s, and śiṅ occurs before either ། or [v.xxx], then delete [n.count] from śiṅ
Background: In Middle Tibetan and Modern Tibetan it is possible to spell the continuative converb in a way that is identicle with the genitive. However, the continuative converb has a very specific syntactic context, coming between a verb stem and either a verb stem or verbal noun (śes kyi ḥdug, ḥgro gi yod-pa red, etc.). 
Rule: If kyi (gyi, or gi) fail to occur in the pattern [v.xxx] kyi [(n.)v.xxx] then delete from it the tag [cv.cont].
Background: The word yi be a noun 'mind', or, especialy in poetry, the equivlaent of ḥi which is the gentive case marker (after nouns) and gentive converb (after verbs). The grammatical suffix -yi can only occur in those sandhi contexts where ḥi occurs. These sandhi contexts are difficult to specify with rules, so reference is made to the corpus. In the absense of the appropriate sandhi context the tag [case.gen] can be stripped from yi.
Rule: If the syllable yi occurs after a syllable that is not one of those in the following list, then delete the tag [case.gen] from yi: ཀ་, ཀླུ་, ཁ་, ཁྱི་, དགྲ་, འགྲོ་, ང་, ངོ་, ལྔ་, ལྕེ་, ཆུ་, ཇ་, རྗེ་, དེ་, འདི་, མདོ་, པ་,པོ་, བ་, བུ་, བོ་, བློ་, མ་, མི་, མོ་,ཙོ་, རྩ་, རྩེ་, ཟ་, ལ་, ས་, ལྷ་, ལྷོ་, ངྷ་, མཐུ་, ཚེ་, ཕ་, བཞི་, བརྒྱ་. གཡུ་, མེ་, རྒྱ་, རྟ་, བཀའ་, ཆེ་, འབྲུ་, དབུ་, སྒྲ་, སྐུ་, ཞུ་, བྱ་. ཕྱི་, སྤྱི་, སྔོ་, བྲུ་, བཟོ་. རོ་, ཏ་, ཧ་, ཟྭ་, ཉ་, རི་, འཇའ་, སུ་, མདའ་, སྦྲ་, རྡོ་.   
Background: The word yi be a noun 'mind', or, especialy in poetry, the equivlaent of ḥi which is the gentive case marker (after nouns) and gentive converb (after verbs). The grammatical suffix -yi can only occur in those sandhi contexts where ḥi occurs. These sandhi contexts are difficult to specify with rules, so reference is made to the corpus. In the absense of the appropriate sandhi context the tags [cv.gen] can be stripped from yi.
Rule: If the syllable yi occurs after a syllable that is not one of those in the following list, then delete the tag [cv.gen] from yi:  
Background: The word ru be a noun 'horn', or, especialy in poetry, the equivlaent of -r which is the gentive case marker (after nouns) and gentive converb (after verbs). The grammatical suffix -ru can only occur in those sandhi contexts where -r occurs. These sandhi contexts are difficult to specify with rules, so reference is made to an external lexicon, which brings together all syllables in the corpus which occur immediatley before (i.e. without an intervening tsheg) ར|xxx.term, ས|xxx.agn, or འི|xxx.gen. In the absense of the appropriate sandhi context the tags [case.term] and [cv.term] can be stripped from ru. (Note that in principle ru should only occur in compounds, so there should be no need for this rule, but better to be safe than sorry).
Rule: If the syllable ru occurs after a syllable that is missing from the 'open syllable' external lexicon, then delete the tags [case.term] and [cv.term] from ru.
Background: The word nas can be a noun 'barley', or the elative case marker / converb. At the beginning of a clause de nas will useually be de [adv.proclausal] nas [case.ela], but is potentially other things (de nas 'from there', de nas 'he ... barley', but because 'barley' would head a noun phrase, and thus normally have trailing noun phrase stuff after it, if clause initial de nas is followed by a noun ([n.count] or [n.prop]), then the interpretation of nas as barley can be excluded. 
Rule: When de nas occurs after a śad (or tsheg-less -g) and before a word with an unambiguous [n.count] or [n.prop] tag, then delete from nas the tag [n.mass].
Background: The word nas can be a noun 'barley', or the elative case marker / converb. After a verb stem nas is very likely to be the elative converb. This liklihood goes up even more if nas is followed by a count noun [n.count], a proper noun [n.prop], or a personal pronoun [p.pers]. The reason for this is because nas 'barely' must head a noun phrase and is therefore likely to be followed by trailing noun phrase elements; in the absence of such elements nas is less likely to be a noun and consequently more likely to be the elative converb.
Rule: In the pattern [v.xxx] nas [n.count]/[n.prop]/[p.pers]/[adv.temp] remove [n.mass] from nas.
Background: The word nas can be a noun 'barley', or the elative case marker / converb. After a verb stem nas is very likely to be the elative converb. This liklihood goes up even more if nas is followed by yaṅ [adv.proclausal]. The reason for this is because yaṅ 'again' frequently begins a clause and is therefore likely to come immediatly after a clause initial noun.
Rule: In the pattern [v.xxx] nas yaṅ remove [n.mass] from nas.
Background: The word nas can be a noun 'barley', or the elative case marker / converb. After a relator noun such as sgo or źal-sṅa, in particular when the relator noun is preceeded by a possible gentivie, nas is certainly not the noun 'barely'.
Rule: If a word with the possible tag [case.gen] is followed by sgo or źal-sṅa, which in turn is followed by nas, then remove [n.mass] from nas.
Background: The syllable raṅ can have many meanings, one of them is a verb 'rejoice'. In this meaning raṅ will always follow the word yi 'mind' or yid 'mind'. 
Rule: If raṅ is not preceded by the syllables yi or yid, then delete from raṅ any [v.xxx] tags. 
Background: The syllable raṅ can have many meanings. In the long form pronouns  ṅa-raṅ, ṅed-raṅ, kho-raṅ, khoṅ-raṅ, khyod-raṅ and khyed-rait is always analzed as [p.refl].
Rule: If raṅ is preceded by the syllables ṅa, ṅed, kho, khoṅ, khyod or khyed, then assign raṅ the tag [p.refl].
Background: The syllable raṅ can have many meanings. One use is to mean 'quite, rather, even', which we tag as [d.det]. This function is difficult to isolate, but one enviornment that is clear is after an adjective and before źig. GIVE AN EXAMPLE
Rule: If raṅ is preceded by an unambiguous adjective and is followed by źig, then tag raṅ as [d.det].
Background: The syllable raṅ can have many meanings. One use is to mean 'quite, rather, even', which we tag as [d.det]. This function is difficult to isolate, but one enviornment that is clear is after an adjective (which is not the head of a noun phrase) and before mi ḥdug. GIVE AN EXAMPLE
Rule: In the pattern xxx [n.count] yyy [adj] raṅ mi ḥdug, tag raṅ as [d.det].
Background: The syllable de can either be a demonstrative pronoun (e.g. rgyal-po [n.count] de [d.dem]) or it can be a semi-final converb (e.g. gcod [v.pres] de [cv.sem]). Rule 22 has already located many unambiguous examples of de [d.dem] because they fail to meet the sandhi context that one expects for de [cv.sem]. The current goal is to isolate secure examples of de [d.dem], which happen to appear in the correct sandhi context to have avoided the action of rule 22. It would be tempting to suggest that de [cv.sem] only comes after verbs, but this is incorrect. Although it is most frequent after verbs, the semifinal converb can follow almost any constituent. What can be said is that most de [d.dem] cannot occur after a verb stem, that most instances of the syllable de at the end of a noun phrase will be [d.dem], and that, because the semi-final converb ends a clause, there is a tendency for it to appear before a śad. These tendencies can be combined to isolate to isolate very likely instances of de [d.dem], namely those cases of de that occur at the end of a noun phrase (and thus not after a verb stem) and which are not followed by śad. This rule as previously formulated gave rise to three false positives འདི་ མཆོག་ཉིད་ དེ་ དེ་ ཡི་ རྒྱུ་ ནི་ འདི་ ཐོས་ ཡིན །, བྱང་ཁོག་ གམ་ འདུ་བ་ དང་ སྟོད་ དེ་ མགོ་པོ་ དང་ མཚོན་ གྱིས་ རྨས་པ་ དང་ །, and རབ་ ཏུ་ བྱེད་པ་ སྡེ་ བརྒྱད་ དེ་ དག་ ལ་ འདོད་ དོ །, noting that two of these contain the sequence de de it merits excluding this context from the application of the rule. 
Rule: If de [d.dem]/[cv.sem] is preceded by a word with an unambiguous tag [adj], [d.xxx], [n.xxx], or [p.xxx], and is not followed by a śad or the word de then delete [cv.sem].
Background:
Rule: If དེ is followed by ས or ར which can be case, then make ས or ར a case and remove [cv.sem] from དེ.
Background: The following somewhat unanticipated sentence shows that in rare cases it is possible for a verb stem that ends with -d to be followed immediately by de [d.dem]: ལྷ་སྦྱིན་ གྱིས་ དེ འི་ ཚེ་ ན་ ཡང་ ང་ བསད་ དེ་ དག་ ལྟ ར་ གྱི་ བར་ དུ་ ང་ ལ་ སྡིག་པ འི་ སེམས་ ཀྱིས་ གསོད་པ ར་ སེམས་ སོ །. We know that this de is [d.dem] because it is followed by dag. Allowance should be made for such cases, before de is specified as [cv.sem] after verbs in the correct sandhi context.
Rule: If དེ་ is followed by དག་, then remove from དེ the tag [cv.sem].
Background:
Rule: If de [d.dem]/[cv.sem] is preceded by a word with an unambiguous tag [v.xxx] then delete [d.dem].
Background: In the Mi la ras pa rnam thar dag occurs as an alternative reading for lta in contexts where we tag lta as [cl.focus], for example སྐལ་ལྡན་གསུང་བ་ཡིན་ན་དག་ང་འཚེང་སྟེ།. Because of such cases dag [cl.focus] is added to the lexicon, and without a rule specifying otherwise, all cases of dag would have the interpretation [cl.focus] available to them in the pre-tagger. One context in which [cl.focus] is not imaginable as a tag for dag is immediately after demonstratives (such as de). Thus, it is possible for this rule to preclude [cl.focus] as an interpretation of dag after demonstratives.
Rule: If dag follows a word with an unambiguous interpretation as [d.dem], then delete from dag the interpretation [cl.focus]. (This rule must be run after rules that distinugish de [d.dem] from de [cv.sem]).
Background: In the Mi la ras pa rnam thar dag occurs as an alternative reading for lta in contexts where we tag lta as [cl.focus], for example སྐལ་ལྡན་གསུང་བ་ཡིན་ན་དག་ང་འཚེང་སྟེ།. Because of such cases dag [cl.focus] is added to the lexicon, and without a rule specifying otherwise, all cases of dag would have the interpretation [cl.focus] available to them in the pre-tagger. One context in which [cl.focus] is not imaginable as a tag for dag is immediately before -gis. Thus, it is possible for this rule to preclude [cl.focus] as an interpretation of dag before -gis.
Rule: If dag precedes -gis then delete from dag the interpretation [cl.focus].
Background: The syllable dag has a number of possible tags, including [d.plural] and [v.xxx] 'be pure'. In the corpus, dag as a verb 'be pure' occurs before the verbal suffix ciṅ. This is a position in which the verbal reading can be specified.
Rule: If dag precedes ciṅ remove from it the tag [d.plural]. 
Background: The syllable dag has a number of possible tags, including [d.plural] and [v.xxx] 'be pure'. In the phrases such as de dag thams-cad and ḥdi dag kun it is clear that dag is not a verb. 
Rule: If dag follows ḥdi or de and precedes kun or thams-cad (i.e. de dag kun, ḥdi dag kun, de dag thams-cad, and ḥdi dag thams-cad) then remove from it any [v.xxx] tags. 
Background: The syllable dag has a number of possible tags, including [d.plural] and [v.xxx] 'be pure'. In the phrases such as de dag gi dpe-cha and de dag daṅ de-bźin-gśegs-pa it is clear that dag is not a verb. 
Rule: If dag follows ḥdi or de and precedes a gi (or daṅ) which is itself before an unambiguous [n.count] (i.e. de dag gi XXX|[n.count], ḥdi dag gi XXX|[n.count], de dag daṅ XXX|[n.count], and ḥdi dag daṅ XXX|[n.count]) then remove from dag any [v.xxx] tags. 
Background: Some syllables are interpretable both as verb stems and as nouns. For example, the syllable sems can either be a the noun 'mind' or the present stem of the verb 'to think'. Because Tibetan has verb final syntax, it is very unlikely that the verb interpretation will occur in clause initial position. Consequently, clause initial sems can be specified as a noun. The rule applies generally to words that are ambiguous between verbal and nominal interpretaitons. However, because verbs are frequent (if not required) before [cv.impf], [cv.cont], [cv.sem], and other verbs the rule is not applied before syllables with these interpretations. Furthermore, a verbal reading is also likely if the preceding clause ends with the imperfective converb. (This rule must be run only after those rules that attempt to distinguish [cv.impf], [cv.sem]).(This rule should not apply to zer or dran, because as a verba dicendi it is likely to occur at the apparent beginning of a clause. The reading 'having said' is more likely than 'from the ray'.) (There are a number of false positives related to verse, e.g. སྤྱན་རས་གཟིགས་ཀྱིས་ལུང་།།བསྟན་འགྲོ་མགོན་ཀླུ་སྒྲུབ་ཐུགས་ཀྱི་སྲས།།ཤ་ཝ་རི་པའི་ཞབས་ལ་གཏུགས། in which bstan is a verb.)
Rule: If a word that has two possible tags [n.count] and [v.xxx] is preceded by śad (or tsheg-less ga), which in turn is not preceded by a word that is tagged [cv.impf], and the word in question is not zer or dran, and is not the word bstan preceded by ལུང་།།, and is not followed by a word that has any potential tags [cv.impf], [cv.cont], [cv.ela], [cv.sem], [cv.term], [v.xxx], [n.v.xxx], [neg], [cl.focus], or [adv.proclausal] then delete all [v.xxx] tags from the word in question.
Background: Some syllables are interpretable both as pronouns and as converbs. For example, the syllable su can either be a the pronoun 'who' or the case and converbial suffix. Because case and converbial suffixes must come after something, the case/converbial interpretation is impossible in initial position. Consequently, clause initial su can be specified as a pronoun. The rule applies generally to words that are ambiguous between converbs and pronoun interpretaitons.
Rule: If a word that has two possible tags [cv.xxx] and [p.xxx] is preceded by śad (or tsheg-less ga), then delete all [cv.xxx] tags from the word in question.
Background: The syllable cig and its sandhi alternates źig and śig can either be an indefinite determiner (e.g. lam cig 'a path') or it can be a converb that marks the imperative (e.g. khyed gñis kyis kho-bo sod cig 'you two kill me!'). The imperative converb can only come after an imperative verb stem or a negated present verb stem in its prohibitive use (e.g. grogs-po bdag ma gsod cig 'O friends, do not kill me!), so in these context the interpretation as an indefinite determiner can be excluded. Conversely, outside of these two contexts the interpretation as an imperative converb can be excluded.
Rule: If any word has the two possible part-of-speech tags [cv.imp] and [d.indef], then delete the tag [d.indef] if the preceding word only has the tag [v.imp], or the preceding two words are ma and a possible [v.pres].
Background:
Rule: If any word has the two possible part-of-speech tags [cv.imp] and [d.indef], and the preceding word has neither the tags [v.imp], [v.past] or [v.pres], then delete the tag [cv.imp] from the word in question.
Background: The syllable cig and its sandhi alternates źig and śig can either be an indefinite determiner (e.g. lam cig 'a path') or it can be a converb that marks the imperative (e.g. khyed gñis kyis kho-bo sod cig 'you two kill me!'). The imperative converb can only come after an imperative verb stem or a negated present (or rarely past) verb stem in its prohibitive use (e.g. grogs-po bdag ma gsod cig 'O friends, do not kill me!), so in these context the interpretation as an indefinite determiner can be excluded. Conversely, outside of these two contexts the interpretation as an imperative converb can be excluded. Sometimes the imperative stem is ambiguous with another verb stem. At the end of a sentence -cig before an verb that can be the imperative is definitely [cv.imp] and the verb is definately imperative [v.imp] (or [v.pres] and rarely [v.past] if preceeded by ma).
Rule: If a word has the two possible part-of-speech tags [cv.imp] and [d.indef], then delete the tag [d.indef] if the preceding has the tags [v.pres] or [v.past] among its tags, if the word before that word is ma, and if the following element is a quotative clitic [cl.quot] or a piece of punctuation, i.e. ma [v.pres]/[v.past] [cv.imp]/[d.indef] [cl.quot]/[punc].
Background: The syllable cig and its sandhi alternates źig and śig can either be an indefinite determiner (e.g. lam cig 'a path') or it can be a converb that marks the imperative (e.g. khyed gñis kyis kho-bo sod cig 'you two kill me!'). The imperative converb can only come after an imperative verb stem or a negated present verb stem in its prohibitive use (e.g. grogs-po bdag ma gsod cig 'O friends, do not kill me!), so in these context the interpretation as an indefinite determiner can be excluded. Conversely, outside of these two contexts the interpretation as an imperative converb can be excluded. Sometimes the imperative stem is ambiguous with another verb stem. At the end of a sentence -cig before an verb that can be the imperative is definitely [cv.imp] and the verb is definately imperative [v.imp] (or [v.pres] if preceeded by ma).
Rule: If a word has the two possible part-of-speech tags [cv.imp] and [d.indef], then delete the tag [d.indef] if the preceding has the tag [v.imp] among its tags, and if the following element is a quotative clitic [cl.quot] or punctuation marker.
Background: Certain verbs (ḥbebs, phap, bab, babs 'fall', thug 'be at the point of', brten 'rely on', phul, ḥbul, 'offer', ḥjug 'enter') typically requires la as part of their rection. This la will be interpreted as a case marker after nouns and a converb after verbs, but it will never be interpretable as the noun 'mountain pass' [n.count] or the metalinguistic mention of the letter la [skt].
Rule: If the syllable la precedes ḥbebs, phap, bab, babs, thug, brten, phul, ḥbul, or ḥjug or their nominalized equivalents (i.e. followed by pa, ba, tshul, rgyu, mkhan, mi, or bya) then remove from la the interpretations [n.count] and [skt].
Background: The syllable la has many interpretations: the allative case, the allative converb, the stem of the proclausal adverb lar 'moreover', and the noun mountain pass. At the end of a clause (i.e. after a verb or verbal noun but before a śad) the noun 'mountain pass' can be precluded.
Rule: If a word la appears after [v.xxx] or [n.v.xxx] and before །, then delete [n.count] from this la.
Background: The syllable nas has many interpretations: the ellative case, the ellative converb, and the mass noun 'barley'. At the end of a clause (i.e. after a verb but before a śad ) the noun 'barley' can be precluded.
Rule: If a word nas appears after [v.xxx] and before །, then delete [n.mass] from this nas.
Background: The syllable nas has many interpretations: the ellative case, the ellative converb, and the mass noun 'barley'. In the phrase yaṅ nas yaṅ we take it as [case.ela].
Rule: In the phrase ཡང་ ནས་ ཡང་ tag ནས་ as [case.ela].
Background: The orthographic form gyis can either be an allomorph of the agentive case (or agentive converb) or it can be the imperative of the verb 'to do' (bgyid). At least in classical Tibetan the agentive case (and converbial) marker only occurs after words that end in -n, -r, -l, or -m. The interpretation as [v.imp] will be particularly common after other imperative verbs. These two facts can be used in combination to disambiguate the part-of-speech category of the word that occurs immediately before gyis, and to preclude the analysis of gyis as [case.agn]/[cv.agn]. (This rule led to a false positive བསྟོད་པ་ བྱེད་པ་ དག་ གྱིས་ [v.imp] ཀྱང་ ། །. Although this is a spelling error, we should preclude it. The best way is probably to use the fact that ཀྱང་ does not follow the imperative.)
Rule: If the word before gyis has a possible interpretation as [v.imp] and does not end in -n, -r, -l, or -m, and the word after gyis is not dogs (since dogs 'fear' regularly takes gyis the agentive converb) or kyaṅ (because kyaṅ cannot follow the imperative), then remove from the word before gyis all [v.xxx] tags except for [v.imp], and delete off [cv.agn] and [case.agn] from gyis.
Background: The syllable gyis is both the imperative of bgyid and a [case.agn]/[cg.agn]. At the end of a sentence before the imperative converb śig the interpretation as an imperative verb is secure.
Rule: In the sequence gyis śig followed by zer, gsuṅ, gsuṅs, ces, or punctuation such as śad, assign [v.imp] to gyis.
Background: The syllable gyis is both the imperative of bgyid and a [case.agn]/[cg.agn]. It will be rather common for the imperative interpretation of gyis to occur before la [cv.all], because la coordinates imperatives. However, it might always be imaginable that gyis la is double case marking or the noun 'mountain pass' after the instrumental. So, we further specify that there is an imperative after the la.
Rule: In the sequence gyis la followed by a verb with an unambiguous [v.imp] tag, or an ambiguous verb with [v.imp] among its possible tags, if this verb is further followed by cig, źig or śig, then tag gyis as an imperative.
Background: The lexicon has gyi as an imperative verb, but this word is many many more times common as a genitive. In the correct sandhi context (after -n, -m, -r, and -l) and the correct syntactic context (between two noun phrases) the interpretation as a genitive is guaranteed.
Rule: If the syllable gyi occurs after a word with the tags [n.count], [n.mass] or [num.card], and that word ends with -n, -m, -r, or -l, and gyi occurs before a word with the tag [n.count] or [n.mass], then remove from gyi the tag [v.imp].
Background: The syllable su is ambiguous between the interrogative pronoun 'who' and the case/converbial morpheme that appears after words that end in -s. This sandhi allows those cases in which su is not preceded by a word that ends in -s to be specified as the interrogative pronoun.
Rule: If the syllable su does not follow a word that ends in -s, then delete from su the tags [cv.term] and [case.term].
Background: The syllable su is very common and has several interpretations. We are on the lookout for uambiguous contexts to disambiguate the various options. One context in which su can be securely analyzed as [cv.term] is after a verb stem that ends in -s and before the verb gsol, in the meaning 'request to verb'. 
Rule: If the syllable su follows a word that ends with -s, and which only has verb stem analyses [v.xxx], and the syllable following su is gsol, then tag su as [cv.term].
Background: Certain verbs take the terminative case (which appears as su after nouns that end in -s) as a normal part of their rection. These verbs include motion verbs (so, os, gśegs, phyin, byon, gro), 'arise as' (byuṅ, ḥbyu), and'change into' (gyur, gyur). If the syllable su occurs after a noun that ends in -s and before one of these verbs, then this su is certainly the case marker [case.term].
Rule: If su occurs after a word with an unambiguous interpretation [n.count], which ends with -s, and this su occurs before so, os, gśegs, phyin, byon, gro, byuṅ, ḥbyuṅ, gyur, orgyur (with the tag v.xxx) or the nominalized equivalent of any of these ending in -pa or -ba (with the tag n.v.xxx), then assign [case.term] to su.
Background: The syllable su is very common and has several interpretations. We are on the lookout for uambiguous contexts to disambiguate the various options. One context in which su can be securely analyzed as [p.interrog] is before źig. In this context it is also possible to specify źig as [d.indef].
Rule: If the syllable su with a possible tag [p.interrog] is followed by źig with a possible tag [d.indef], then select the tag [p.interrog] for su and [d.indef] for źig.
Background: In general the clitic tsam only appears after nominals. However, it appears that in some cases it may come after a verb stem and immediately before na 'if', which in this meaning we normally tag as [cv.loc], gźugs tsam na 'when sitting'. A following rule would specify na as a case marker after tsam because tsam has a tag [d.xxx]. We do not wish to bar this rule, as it would add ambiguity and often will be correct, so instead we specify that in the pattern [v.xxx] tsam na the word na must be tagged [cv.loc].
Rule: In the pattern [v.xxx] tsam na, when the word with verbal tags has no nominal tags, delete from na the tag [case.loc].
Background: When the element to the left of a syllable that can be either a case marker or converb is unambiguously part of a noun phrase, interpretation of the syllable as a converb can be excluded. This rule must be implemented in three stages. In the first stage, converbial interpretations are excluded after elements of noun phrases in general. However, because de [d.dem] and cig, źig, śig [d.indef] are not yet distinguished from the homophonous de [cv.sem] and cig, źig, śig [cv.imp], it is not possible to locate case markers after them using a search for the tags [d.dem] and [cv.sem]. Instead, a second stage of the rule takes aim at the phonological material of these morphemes, paying no attention to their interpretation. This strategy is safe, because combinations such as de la or cig gi are securely interpretable respectively as the demonstrative in the allative case and an indefinite marker in the genitive case. We must introduce one caveat in a passage like ད་ རེ་ ཅིག་ བུག་སྒོ་ ཤིག་ ལ་ ... གྱི་ གླེང་མོ་ བྱེད་པ་ ལ་ ཤོག་ ཅིག་ the śig is the imperative of the verb jig 'destroy' and la is the allative converb coordinating two imperatives. Because of such a case, this rule cannot apply to śig, instead it must be handled by a more specific rule.This rule also does not apply to verbal nouns, because we appear to allow the locative converb in the meaning 'when' after verbal nouns. 
Rule: If homophonous [case.xxx]/[cv.xxx] is preceded by a word with an ambiguous word de, cig, or źig, or an unambiguous tag [adj], [d.xxx], [n.count], [n.mass], [n.prop], [n.rel], [n.invar], [num.xxx], [adv.proclausal], or [p.xxx], then delete the [cv] tag.
Background: In a passage like ད་ རེ་ ཅིག་ བུག་སྒོ་ ཤིག་ ལ་ ... གྱི་ གླེང་མོ་ བྱེད་པ་ ལ་ ཤོག་ ཅིག་ the śig is the imperative of the verb jig 'destroy' and la is the allative converb coordinating two imperatives. Because of such a case, we must permit la to remain ambiguous after śig. The associative converb may also occur after imperatives. All other ambiguious case/coverbs can be specified as case markers after śig.
Rule: If homophonous [case.xxx]/[cv.xxx] -- other than la or da -- is preceded śig then delete the [cv] tag.
Background: When the element to the left of a syllable that can be either a case marker or converb is unambiguously part of a noun phrase, interpretation of the syllable as a converb can be excluded. This rule must be implemented in three stages. In the first stage, converbial interpretations are excluded after elements of noun phrases in general. However, because de [d.dem] and cig, źig, śig [d.indef] are not yet distinguished from the homophonous de [cv.sem] and cig, źig, śig [cv.imp], it is not possible to locate case markers after them using a search for the tags [d.dem] and [cv.sem]. Instead, a second stage of the rule takes aim at the phonological material of these morphemes, paying no attention to their interpretation. This strategy is safe, because combinations such as de la or cig gi are securely interpretable respectively as the demonstrative in the allative case and an indefinite marker in the genitive case.Because we permit [cv.loc] after verbal nouns the most general form of this rule must allowed converbs after verbal nouns. Consequently, a second rule narrows in specifically on verbal nouns followed by converbs other than [cv.loc]. 
Rule: If homophonous [case.xxx] /[cv.xxx], which is not na [case.loc]/[cv.loc] is preceded by a word with an unambiguous tag [n.v.xxx] then delete the [cv] tag.
Background: Some syllables are interpretable both as verb stems and as nouns. For example, the syllable sems can either be a the noun 'mind' or the present stem of the verb 'to think'. Because the gentive links two nouns, after an unambiguos gentiive sems can be specified as a noun. The rule applies generally to words that are ambiguous between verbal and nominal interpretaitons. However, because verbs are frequent (if not required) before [cv.impf], [cv.cont] and [cv.sem] the rule is not applied before syllables with these interpretations. Also, the verb rigs regularly takes the genitive, so the rule cannot be applied to rigs.
Rule: If a word (other than rigs) that has two possible tags [n.count] and [v.xxx] and is preceded by a word with an unambiguous [case.gen] tag, and the word in question is not followed by a word that has any potential tags [cv.impf], [cv.cont] or [cv.sem] then delete all [v.xxx] tags from the word in question.
Background: The preceding rule indentifies a few further nouns as unambiguous. These identifications permit a few further case/converb markers to be specified as case markers.
Rule: If homophonous [case.xxx]/[cv.xxx] is preceeded by [n.count], [n.mass], [n.rel], [num.xxx], or [p.xxx], then delete the [cv] tag.
Background: In our corpus there are cases when the genitive gi is incorrectly written as gis. Thus, the system is aware that [case.gen] is a possible tag for gis. Although it is good for the system to be aware of this possibility, there is a danger of over applying it. In particular, because the genitive connects two noun phrases, if gis is not followed directly by a noun phrase, then it cannot be a misspelling for the genitive.
Rule: If the syllable gis is not directly followed by a word with a possible [n.xxx] tag, then remove from gis the tag [case.gen].
Background: Now we turn from isolating secure instances of [case.xxx], to isolating secure instances of [cv.xxx]. After unambiguous verb stems, morphemes that are ambiguously case markers or converbs (other than the genitive) can be specified as converbs.
Rule: If a word with the hypothesized tags [case.xxx] ~ [cv.xxx] directly follows a word that is only tagged with [v.xxx] then the tag [case.xxx] can be removed, n.b. except that we do not automatically remove [case.gen], because it is permitted after verb stems.
Background: Our analysis only permits monosyllable verb stems, and only allowed converbs after verb stems (except the semifinal and final converbs, which can occur after anything). Consequently, the interpretation of a syllable as a converb can be precluded if it follows a word that is not a monosyllable. (A non-monosyllable could be defined as containing more than one tsheg). (This rule is useful in being able to apply to words of unknown POS category).(An exception must be written in for na, because we tag it as a converb after verbal nouns when it means 'if' of 'when' rather than 'in'.)
Rule: If a word with the hypothesized tags [case.xxx] ~ [cv.xxx] other than na directly follows a word that is more than one syllable long, remove the [cv.xxx] tags.
Background: Our analysis only permits monosyllable verb stems, and only allowed converbs after verb stems (except the semifinal and final converbs, which can occur after anything). Consequently, the interpretation of a syllable as a converb can be precluded if it follows a word that is not a monosyllable. (A non-monosyllable could be defined as containing more than one tsheg). 
Rule: If a word with the hypothesized tags [d.indef] ~ [cv.imp] follows a word that is more than one syllable long, remove the [cv.imp] tag.
Background: As a perhaps unnecessary component of our part-of-speech protocals, when a morphological verbal noun refers to something real in the world we tag it as a noun (mkhas-pa 'wise guy', ston-pa 'teacher'). This convention leads to rather a few ambigous situations for the rule based POS-tagger. If such a word is preceded by a terminitive converb, then it must be a verbal noun. (A previous rule handled the case of ḥgro-ba, but was not able to handle a case like rgya che r ḥgrel-ba, because it is only now that [case.term] and [cv.term] have been generally distinguished). 
Rule: If a word has both [n.count] and [n.v.xxx] as possible tags and this word follows a word with the unambiguous tag [case.term] then delete from this word the tag [n.count].
Background: Certain verbs take la [case.all]  as a normal part of their rection. These verbs include 'look at' (lta, bltas, blta, ltos), or gzigs 'see' and 'offer' (mchod), and 'request' (gsol), 'give' (sbyin, byin), 'benefit' (phan). If the syllable la occurs after a noun phrase and before one of these verbs, then this la is certainly not the noun 'mountain pass'.
Rule: If la follows [p.pers] [n.xxx] [d.xxx] or [num.xxx] and precedes one of the verb forms bltas, blta, ltos, gzigs, gsol, mchod, sbyin or byin, sñoms, ḥdud, ḥphur, ḥgro, byon, gnaṅ, reg, phan, or any of these verbs followed by the nominalizing -pa/-ba suffix, (we leave out lta since it has several other POS categories), possibly with an intervening mi [neg] or ma [neg], then delete the tag [n.count] from la.  
Background: In the preceding rules the unambiguous right edges of noun phrases and unambiguous verb stems to the left of the [case.xxx] /[cv.xxx] permitted disambiguation. An alternative approach is to look to the right of the [case.xxx] /[cv.xxx]. If to the right of an ambiguous [case.xxx] /[cv.xxx] is a verb which requires that particular case in its rection, then the sequence can be assigned the tag [case.xxx]. So far we have only one rule of this type. Etymologically the phrase la sogs-pa 'etc.' is a case marker followed by a verbal noun 'gathered at'. This analysis is clear in the Old Tibetan spelling la stsogs pa. In general our tact is to err in favor of etymologically faithful analyses, in the absence of compelling evidence to the contrary. Consequently, the la in the phrase la sogs-pa can be specified as a case marker.
Rule: If la is followed by sogs-pa then assign [case.all] to la (i.e. remove other possible tags, [cv.all] and [n.count]).
Background: Certain verbs take daṅ [case.ass] as a normal part of their rection. These verbs include 'equipped with' (bcas) and 'accord with' (mthun).  If the syllable daṅ occurs before one of these verbs, then this daṅ is certainly the case marker [case.ass]. Thus, for example, in the phrase ḥkhor daṅ bcas-pa 'together with his retinue', because of bcas-pa the daṅ may be specified as [case.ass] as opposed to [cv.ass].
Rule: If daṅ occurs before bcas [v.xxx], mthun [v.xxx], ldan [v.xxx], mtshuns, sbyar or their nominalized equivalents then delete [cv.ass] from daṅ.
Background: Certain verbs take daṅ [case.ass] as a normal part of their rection. These verbs include 'equipped with' (bcas) and 'accord with' (mthun).  If the syllable daṅ occurs before one of these verbs, then this daṅ is certainly the case marker [case.ass]. Thus, for example, in the phrase ḥkhor daṅ bcas-pa 'together with his retinue', because of bcas-pa the daṅ may be specified as [case.ass] as opposed to [cv.ass].
Rule: If daṅ occurs before bcas [v.xxx], mthun [v.xxx], ldan [v.xxx], mtshuns or their nominalized equivalents then delete [cv.ass] from daṅ.
Background: The previous verb identifies daṅ [case.ass] when it occurs as a normal part of the rection of certain verbs. If daṅ has been specified as [case.ass], then the word before daṅ must be some type of nominal. Consequently, if such a word has both verbal and nominal tags, the verbal tags can be excluded in this context. Thus, for example, in the phrase ḥkhor daṅ bcas-pa 'together with his retinue', the previous rule specified daṅ as [case.ass] because of bcas-pa, and this rule specifies ḥkhor specified as [n.count] rather than [v.xxx], because of daṅ [case.ass].
Rule: If a word has both [v.xxx] and [n.xxx] tags, and the word is followed by a daṅ that only has the interpretation [case.ass], then delete the [v.xxx] tags from this word.
Background: The syllable daṅ can be a both the associative case marker and the associative converb. Although the quotative converb [cl.quot] may be analyzed as a verb, because daṅ always means 'and' after the quotative converb it is best tagged as [case.associative] here. 
Rule: If daṅ occurs after a word that has only the tag [cl.quot] then tag daṅ as [case.ass].
Background: Especially in poetry Tibetan allows for the nominalization of verbs without a suffix. It would be imprudent for us to introduce a new tag for the verbs in such cases, so we leave them as finite verbs. However, such cases guggest the interpretation of daṅ as [cv.ass], but since the forms are nominalized daṅ should be tagged as [case.ass]. To allow [case.ass] after all finite verbs is too high a price to pay. Consequently, despite its unattractiveness as a solution we simply specify daṅ as [case.ass] in those particular passages that occur in our corpus. Perhaps in the future, when sufficient examples of this type are gathered, it will be possible to replace these passages with more abstract formulations.
Rule: In the pattern ང་ ཆེ་ དང་ ང་ མཁས་, tag དང་ as [case.ass].
Background: Especially in poetry Tibetan allows for the nominalization of verbs without a suffix. It would be imprudent for us to introduce a new tag for the verbs in such cases, so we leave them as finite verbs. However, such cases guggest the interpretation of daṅ as [cv.ass], but since the forms are nominalized daṅ should be tagged as [case.ass]. To allow [case.ass] after all finite verbs is too high a price to pay. Consequently, despite its unattractiveness as a solution we simply specify daṅ as [case.ass] in those particular passages that occur in our corpus. Perhaps in the future, when sufficient examples of this type are gathered, it will be possible to replace these passages with more abstract formulations.
Rule: In the pattern གཟུང་ དང་ འཛིན་པ, tag དང་ as [case.ass].
Background: Certain verbs take las [case.abl] as a normal part of their rection. These verbs include 'fall from' (babs), 'arise from' (byuṅ, ḥbyu), 'arise, stand' (laṅs) 'be free from' (thar), 'emit from' (ḥphros), and 'pass from' (das). If the syllable las occurs before one of these verbs, then this las is certainly the case marker [case.abl]. Thus, for example, in the phrase mya-ṅan las das-pa 'pass form sorrow', because of das-pa the las may be specified as [case.abl]
Rule: If las occurs after a word with an unambiguous interpretation [n.count] and before babs [v.xxx], byu[v.xxx], ḥbyu[v.xxx], laṅs [v.xxx], thar [v.xxx], ḥphros [v.xxx], das [v.xxx], or babs-pa [n.v.xxx], byuṅ-ba n.v.xxx], ḥbyuṅ-ba [n.v.xxx], laṅs-pa [n.v.xxx] thar-ba [n.v.xxx], ḥphros-pa [n.v.xxx], das-pa [n.v.xxx], then assign [case.abl] to las. (We say 'assign [case.abl]' rather than 'delete [cv.abl]' because the point is to preclude las [n.count], [v.imp], and [v.invar]).
Background: The syllable kha is ambiguous between a noun 'mouth' and a relator noun 'on the point of'. In general the pattern [case.gen] kha r should pick out the relator noun, but it is of course possible to say 'in the mouth of'. Nonetheless, the nouns that have physical mouths that are discussed in literature is a rather specified set, so, we write in exceptions based on the occurrence in our corpus.
Rule: In the combinations chu-srin gyi kha r and za ḥdod paḥi kha r remove the tag [n.rel] from kha.
Background: The syllable kha is ambiguous between a noun 'mouth' and a relator noun 'on the point of'. In general the pattern [case.gen] kha r should pick out the relator noun, but it is of course possible to say 'in the mouth of'. Nonetheless, the nouns that have physical mouths that are discussed in literature is a rather specified set, so, we write in exceptions based on the occurrence in our corpus.
Rule: In the combination kha s blaṅs-(pa) remove the tag [n.rel] from kha.
Background: The syllable kha is ambiguous between a noun 'mouth' and a relator noun 'on the point of'. In general the pattern [case.gen] kha la should pick out the relator noun, but it is of course possible to say 'in the mouth of'. Nonetheless, the nouns that have physical mouths that are discussed in literature is a rather specified set, so, we write in exceptions based on the occurrence in our corpus.
Rule: In the combination kha la ñan-(pa) remove the tag [n.rel] from kha.
Background: The syllable kha is ambiguous between a noun 'mouth' and a relator noun 'on the point of'. In general the pattern [case.gen] kha la should pick out the relator noun, but it is of course possible to say 'in the mouth of'. Nonetheless, the nouns that have physical mouths that are discussed in literature is a rather specified set, so, we write in exceptions based on the occurrence in our corpus.
Rule: In the combination kha la gtad-(pa) remove the tag [n.rel] from kha.
Background: The syllable kha can be the noun 'mouth', a relator noun, or an indefinite pronoun 'someone'. The indefinite pronoun will always occur before either cig or śas. At the beginning of the sentence as an agent kha cig is unambiguously 'someone'. 
Rule: In the sequence ། ཁ་ ཅིག་ གིས་ tag ཁ་ as [p.indef].
Background: The syllable kha can be the noun 'mouth', a relator noun, or an indefinite pronoun 'someone'. The indefinite pronoun will always occur before either cig or śas. Consequently, the interpretation of kha as an indefinite pronoun can be precluded in other contexts.
Rule: If the syllable kha is not followed by cig or śas, then remove from kha the tag [p.indef].
Background: The syllable kha can occur as an alternate spelling of ka the determiner seen in de ka 'that very', gsum ka 'threesome'. This interpretation of kha is not possible in clause initial position.
Rule: If the syllable kha appears after śad or a tsheg-less ga, then remove from kha the interpretation [d.det].
Background: The syllable kho occurs in our corpus in place of kho-na [d.emph]. This is strange usage, but it is attested, so we must live with it. It has the unfortunate consequence of adding [d.emph] as a possible interpretation to any kho, although almost always this syllable should be tagged [p.pers]. Because kho-na (as [d.emph] rather than the Old Tibetan third person pronoun) only occurs at the end of noun phrases, it cannot occur at the beginning of a clause. This context allows us to preclude [d.emph] as the interpretation of kho in clause initial position.
Rule: Immediately following a śad or a tsheg-less ga remove from kho the analysis [d.emph].
Background: The orthographic form tshe is either the noun 'life' or the relator noun 'when'. After the personal pronoun bdag in the genitive (bdag gi tshe) the word tshe is certainly the noun 'life'.
Rule: In the pattern བདག་ གི་ ཚེ་ assign the tag [n.count] to ཚེ་.
Background: The orthographic form tshe is either the noun 'life' or the relator noun 'when'. After a verbal noun in the genitive case tshe is certainly the relator noun.
Rule: In the pattern [n.v.xxx] [case.gen] ཚེ་ assign the tag [n.rel] to ཚེ་.
Background: The orthographic form tshe is either the noun 'life' or the relator noun 'when'. After a verb stem tshe is certainly the relator noun.
Rule: In the pattern [v.xxx]  ཚེ་ assign the tag [n.rel] to ཚེ་. (Note: a tsheg must precede the tshe to preclude the possibility that the verb and the tshe belong to different sentences).
Background: The orthographic form tshe is either the noun 'life' or the relator noun 'when'. Before demonstratives [d.dem] tshe is almost certainly the noun life. The example of this word leads to the speculation that other words that have both [n.rel] and [n.count] tags will be [n.count] before [d.dem]. There are however some necessary caveats. 1. In a phrase like de bźin ḥdi skad du, the pattern [n.rel] [d.dem] occurs, but this is only permissable because the [d.dem] is itself followed by a [n.rel].  2. Phrases such as de ḥi tshe de ḥi dus na 'at that very time' and de ḥi tshe de ḥi chung-ma 'at that time that girl' show that the rule should not be applied if tshe is preceded by de ḥi.
Rule: If a word 'w' has the two possible tags [n.count] and [n.rel] and this word is followed by [d.dem], then delete the tag [n.rel] from w, unless [d.dem] is followed by [n.rel] and unless 'w' is preceeded by de ḥi.
Background: The orthographic form tshe is either the noun 'life' or the relator noun 'when'. Before the verb ḥphos 'transfer' the word tshe is certainly the noun 'life' as part of the phrase tshe ḥphos 'die'. 
Rule: In the pattern ཚེ་ འཕོས་(པ(་)) assign the tag [n.count] to ཚེ་. 
Background: The orthographic form tshe is either the noun 'life' or the relator noun 'when'. In the phrase tshe daṅ ldan-pa 'venerable' the word tshe is unambiguously 'life'. 
Rule: In the pattern ཚེ་ དང་ ལྡན་པ(་) assign the tag [n.count] to ཚེ་.
Background: The orthographic form dus is either the noun 'time' or the relator noun 'when'. In the phrase dus gsum 'the three times' the word tshe is unambiguously 'life'. 
Rule: In the pattern དུས་ གསུམ་ assign the tag [n.count] to དུས་.
Background: Garrett et al. (forthcoming) define a relator noun as having “a genitive before it and a spatial case (allative, locative, terminative) after it” (CITATION). The tagger may consequently use the same syntactic frame to confidently isolate relator nouns. However, care must be taken in a few cases. In particular, naṅ can be 'home' [n.count]; in the Mi la ras paḥi rnam thar there occurs the phrase rgan-mo gcig gi naṅ du gnas 'I said in the house of one old woman'. If tagged as [n.rel], the passage would mean 'I said inside of one old woman'. To back off the rule from all cases of naṅ would mean that the rule loses much of its value. Instead, using a 'case law' approach, exceptions should be written into the rule until a coherent category of exceptions emerges.  Based on a false positive in the corpus (ཏམ་བུ་རི འི་ ཁོང་རྒྱུད་ ཀྱི་ སྐད་ ལ་ ) do not apply this rule to the word skad. (In fact, it seems likely that skad should not be tagged as a relator noun, but it is too late to second guess this idea for ḥdi skad ces).
Rule: If word has two possible tags [n.count] and [n.rel] and it occurs after a possible [case.gen] and before a [case.term], [case.loc], or [case.all], then delete the tag [n.count]. Based on a false positive in the corpus (རྒན་མོ་ གཅིག་ གི་ ནང་ དུ་ གནས་), do not apply this rule to the word ནང་ when it occurs after རྒན་མོ་ གཅིག་ གི་ (or རྒན་མོ འི་) or before དུ་ གནས་. Based on a false positive in the corpus (ཏམ་བུ་རི འི་ ཁོང་རྒྱུད་ ཀྱི་ སྐད་ ལ་ ) do not apply this rule to the word skad.
Background: The sequence yoṅs su is employed in neologisms to calque the Sanskrit verb prefix pari-. In these fixed locutions the interpretation can be specified as [d.plural].
Rule: When the syllable yoṅs occurs in the sequences ཡོངས་ སུ་ (མ་) དག་(པ་), ཡོངས་ སུ་ ཤེས་(པ་), ཡོངས་ སུ་ རྫོགས་(པ་), ཡོངས་ སུ་ བྱང་(བ་), ཡོངས་ སུ་ བསྡུས་(པ་), ཡོངས་ སུ་ གྲགས་(པ་), ཡོངས་ སུ་ ཚོལ་(བ་), ཡོངས་ སུ་ སྦྱོང་(བ་), ཡོངས་ སུ་ བརྗོད་(པ་), ཡོངས་ སུ་ འཛིན་(པ་), ཡོངས་ སུ་ བཟུང་, ཡོངས་ སུ་ གདུང་(བ་), ཡོངས་ སུ་ (མི་) ཉམས་(པ་), ཡོངས་ སུ་ (མི་) གཏོང་(བ་) tag it as [d.plural].  
Background: The sequence rjes su is employed in neologisms to calque the Sanskrit verb prefix anu-. Although in general it makes little sense for a relator noun to occur clause initially, since we regard the Sanskritic use of rjes su as [n.rel], in these fixed locutions the interpretation can be specified regardless of the placement of the sequence in a sentence.
Rule: When the syllable rjes occurs in the sequences rjes su yi raṅ (ba), rjes su yi raṅs, rjes su ḥdzin (pa), rje su mthun (pa), rjes su bstun (pa), rjes su chags (pa) or rjes su rtogs (pa) then tag rjes as [n.rel].
Background: The sequence khoṅ du is employed in the phrasal verb khoṅ du chud 'understand'. Although in general it makes little sense for a relator noun to occur clause initially, since we regard this use of khoṅ du as [n.rel], in this fixed locutions the interpretation can be specified regardless of the placement of the sequence in a sentence.
Rule: When the syllables khoṅ du occurs in the sequences khoṅ du chud (pa) then tag khoṅ as [n.rel].
Background: Relator nouns relate a constituent on the right to a constituent on the left. Consequently, if there is no constituent to the left of a word it is unlikely that this word is a relator noun.
Rule: If word has two possible tags [n.count] and [n.rel] and it occurs after a śad (or a -g not followed by a tsheg), then delete the tag [n.rel].
Background: The syllable ra is analyzable both as a reflexive pronoun (ed ra 'we ourselves', khyed ra 'you yourselves', a-ma na-re ra gi nor la 'Mother said “for one's own wealth...”') and as a determiner (źe-sdaṅ chen-po raṅ cig 'a very great antipathy'). After a personal pronoun the determiner use can be excluded.
Rule: If syllable ra occurs after a word with the tag [p.pers], then delete from ra the analysis [d.det].
Background: It is very common in Tibetan texts that the first time a protagonist is introduced by name, his name will appear before źes bya-ba; this fact allows words of unknown meaning to be interpreted as names in this context. (The tag [n.invar] will have been assigned to any unknown words as a default, so the rule applies to words tagged [n.invar].)
Rule: If a word with only the tag [n.invar] immediately precedes źes [cl.quot] bya-ba [n.v.fut] then assign this word the tag [n.prop].
Background: When a character is given a name in a Tibetan text this will normally be introduced with the pattern miṅ XXX [case.term] btags / 'give the name XXX'. In the honorific register mtshan will be used instead of miṅ in this pattern. (The tag [n.invar] will have been assigned to any unknown words as a default, so the rule applies to words tagged [n.invar].)
Rule: If a word with only the tag [n.invar] appears in the following pattern མིང་/ མཚན་ X [case.term] བཏགས ། then assign this word the tag [n.prop].
Background: An unknown word appearing before the phrase źes su grags 'famed as' is probably a name, or at least a nickname. (The tag [n.invar] will have been assigned to any unknown words as a default, so the rule applies to words tagged [n.invar].) 
Rule: If a word with only the tag [n.invar] appears before ཞེས་སུ་གྲགས་, then assign this word the tag [n.prop].
Background: In Canonical Tibetan a limited number of verbs occur as auxiliary verbs (nus 'be able', dgos 'need', śes 'know', ran 'be time for', srid 'be possible'). These auxiliaries come directly after the main verb of a clause, except for the possible interposition of a negation marker. This distribution allows these auxiliaries to be easily identified. It is important to isolate auxiliaries before running the tense disambiguation rules, because otherwise auxiliaries would have to be written in as exceptions to some of these rules.
Rule: If a word with the possible analysis [v.aux] or [n.v.aux] either (1) follows a word that only has verb stem analyses, i.e. [v.xxx], or (2) follows a sequence of such a word and a negation prefix, i.e. [v.xxx] [neg], then retain [v.aux] or [n.v.aux] as the only possible [v.xxx] or [n.v.xxx] analysis for the word.
Background: In Canonical Tibetan a limited number of verbs occur as auxiliary verbs (nus 'be able', dgos 'need', śes 'know', ran 'be time for', srid 'be possible'). These auxiliaries come directly after the main verb of a clause, except for the possible interposition of a negation marker. This distribution allows these auxiliaries to be easily identified. It is important to isolate auxiliaries before running the tense disambiguation rules, because otherwise auxiliaries would have to be written in as exceptions to some of these rules.
Rule: If a word with the possible analysis [v.aux] or [n.v.aux] follows a word that has no verb stem analyses [v.xxx] and no interpretation as negation [neg] or as a focus clitic [cl.focus], then delete [v.aux] or [n.v.aux] from the word in question.
Background: If an ambiguous imperative verb stem occurs before an ambiguous imperative converb (e.g. gśegs śig 'go!'), then the analysis as an imperative verb stem and an imperative converb is secure. Two possible exceptions occur. 1. The imperative converb follows a negated present in the prohibitive (e.g. ma gśegs śig). Consequently, the rule must stipulate that a ma does not precede the ambiguous verb stem. 2. If the imperative verb stem can also be a noun, since the imperative converb can also be an indefinite determiner the phrase is ambiguous (e.g. gnas śig 'stay!' or 'a place'). However, this exception need not be a cause for concern so long as the rule only removed hypothesized tenses other than the imperative, rather than stipulating interpretation as the imperative.
Rule: If a word with the hypothesized part-of-speech-tag [v.imp] is followed by cig, źig, or śig (and is not preceded by ma) then delete all other hypothesized [v.xxx] tags.
Background: The preceding rule used the imperative converb (źig, śig, cig) to isolate ambiguous verb forms as imperatives. Because, previous to the application of that rule, these verb stems were ambigous, no previous rule has specified that źig is an imperative converb. Consequently, this rule must specify that after an unambigous [v.imp] the syllable źig be specified as [cv.imp] and not [d.indef].
Rule: If a word with an unambiguous tag [v.imp] is followed by źig then delete the tag [d.indef] from źig.
Background: The imperative converb follows a negated present in the prohibitive (e.g. ma gśegs śig). Consequently, an ambiguous verb stem can be stipulated as present in this circumstance.  
Rule: If a word tagged with a hypothesized part-of-speech-tag [v.pres] is followed by cig, źig, or śig and is preceded by ma then delete all other hypothesized part-of-speech-tags. (Rule 39b has already stipulated that cig, źig, and śig are tagged as [cv.imp] in this context.)
Background: The imperative is generally not permitted before converbs, or other non-finite contexts (such as before kyaṅ). It is likely that further training data will prompt the inclusion of further contexts in which the imperative is impossible.
Rule: If a word has more than one [v.xxx] tag, including [v.imp], and the following word either has the form na, kyaṅ, ya, nas, kyi, or has any of the tags [cv.cont], [cv.ela], [cv.fin], [cv.impf], [cv.loc], [cv.ques], [cv.sem], or [cv.term], then remove the tag [v.imp] from the word in question. 
Background: The imperative is generally not permitted before converbs. Some tokens, such as zur and ris have readings both as [v.imp] and as [n.count]. The occurence of a word with [cv.term]/[case/term] as possible POS categories after these words precludes their interpretation as imperatives.
Rule: If a word has a tag [v.imp] among other possible tags and the following word either has the tag [case.term] or [cv.term] among its tags, then remove the tag [v.imp] from the word in question.
Background: The syllable źig has the three possible tags [v.past], [v.imp], and [d.indef]. The preceeding rule would normally have precluded the interpretation as an imperative before certain grammatical markers, e.g. in the series mi źig tu, however an earlier rule already specified that -tu in this context be interpreted as a case marker rather than a converbial marker, so the context specified for the rule preceeding this rule is not triggered. Consequently, the context that can be used to preclude the interpretation of źig as [v.imp] is [n.count] źig [case.xxx]. En passant, this context also precludes the interpretation of źig as [v.past].
Rule: In the pattern [n.count] źig [case.xxx], preclude from źig the analyses [v.imp] and [v.past].
Background: The imperative is a finite form, consequently the imperative (almost by definition) cannot occur before a modal auxiliary verb (nus, dgos etc.). Thus, ambiguous verb stems before auxiliary verbs can be specified as not imperative.
Rule: In the patterns [v1.xxx] (neg) [v2.xxx] (in which the third word is one of the word དགོས་, ནུས་, མོད་, དགོས, འདོད་, ཤེས་, སྲིད་, རན་, གྲགས་, ཐང་, ཕོད་, or དཀའ་) delete [v1.imp], if [v1.imp] is one of the [v1.xxx] tags.
Background: In our corpus unambiguous imperatives occur before certain verbs and verbal nouns (esp. those that are verba dicendi such as byed, byas, sgo-ba, zer-ba, etc.), but unambiguous imperatives do not occur between certain other verb forms, particularly byuṅ, soṅ, ḥdug, and gyur. It is frequently understood that a function of these words is to explicitly mark the tense of a preceeding invariant verb as past, so it is perhaps not surprising that they do not co-occur with unambiguous imperatives. The presence of these words permits a preceding verb to be specified as not imperative. (The false positive ད་ཁྱོད་རང་བྲག་དཀར་ཏ་སོར་སོང་ལ་ཨེ་འདུག་ལྟོས་ཤོགའདུག་ན་རང་རེ་གཉིས་ཀ་ཕྲད་དུ་འགྲོ་ཟེར་བས་བདེན་ནམ་སྙམ། shows that an exception must be made for verbs that end in a tsheg-lesss -g.)
Rule: If a word has multiple verb tags, including [v.imp], and ends with a tsheg, and this word is directly followed by byuṅsoṅ, ḥdug, and gyur or their nominalized equivalents, then delete from this word the tag [v.imp].
Background: The verb so is either a past tense [v.past] or an imperative [v.imp]. In Middle and Modern Tibetan the past tense soserves as an (evidential) auxilliary after past tense verbs. It is beyond the scope of our project to tag the auxilliary use of so (as distinct from its use as a main verb in the past). However, isolating the auxilliary function of so permits one to preclude its interpretation as an imperative.
Rule: If the word soṅ with tags [v.past] and [v.imp] directly follows a verb that has [v.past] among its possible tags, then delete from so the tag [v.imp].
Background: The relator nouns tshe and dus meaning 'when' may immediately follow a verb stem, but do not follow the imperative ('when do it!' is not intelligible). Thus, ambiguous verb stems before tshe [n.rel] and dus [n.rel] may be specified as not [v.imp].
Rule: In the pattern [v.xxx] (tshe [n.rel] / dus [n.rel]) delete [v.imp], if [v.imp] is one of the [v.xxx] tags.
Background: At the end of a sentence (i.e. before a śad) all four stems are possible; this is a context in which precluding the imperative is normally difficult. However, if the final verb stem has been specificially coordinated with a non-imperative, then the reading of the final stem as imperative is not possible. Consequently, the imperative can be precluded after the elative converb (-nas).
Rule: In the pattern [v1.xxx] nas [v2.xxx] [punc], if [v.imp] is one of the [v2.xxx] tags, delete it.
Background: At the end of a sentence (i.e. before a śad) all four stems are possible; this is a context in which precluding the imperative is normally difficult. However, if the final verb stem has been specificially coordinated with a non-imperative, then the reading of the final stem as imperative is not possible. Consequently, the imperative can be precluded after the semi-final coverb (-te, -de, -ste).
Rule: In the pattern [v1.xxx] [cv.sem] [v2.xxx] [punc], if [v.imp] is one of the [v2.xxx] tags, delete it.
Background: Word that do not distinguish the imperative morphologically often use śog to make the imperative clear. There are two ways to analyze this, one is that these words don't have an imperative, but since the dictionaries give the imperative of these verbs, we interpret such verbs in this context as exhibiting the imperative stem.
Rule: If a word with [v.xxx] tags including [v.imp] among them occurs immediately before the word śog, with the unambiguous tag [v.imp], then tag the word in question as [v.imp].
Background: The future tense verb stem does not occur before the elative converb -nas. Consequently, if an ambiguous verb stem occurs before the elative converb, the interpretation of the verb in question as a future can be precluded.  
Rule: If a word has more than one [v.xxx] tag, including [v.fut], and the following word is nas, remove the tag [v.fut] from the word in question. 
Background: As discussed in Garrett et al. (forthcoming c) matrix verbs appear never to subcategorize for the past stem in the indirect infinitive construction (e.g. ltar ḥgro 'go to see'). Consequently, the past stem can be precluded in this context. (Inevitably there are occasional exceptions such as yeṅs ma gtam du byas so / in the Mdzaṅs blun, because of cases of this type, it is necessary not to run the rule following ma.)
Rule: If a word has more than one [v.xxx] tag, including [v.past], the preceding word is not ma, and the following two words are (1) a word with a possible [cv.term] tag, and (2) a word with a possible [v.xxx] or [n.v.xxx] tag, remove the tag [v.past] from the word in question.
Background: Negation with ma occurs with the past, and with copulas and auxiliary verbs, which our system does not distinguish for tense. (It can also occur with the present in its prohibitive function, which was dealt with above in rule 54.) Therefore, where possible, it is safe to assume that only these stem forms, or their nominalized equivalents, can follow negation with ma.
Rule: If a word tagged with a hypothesized part-of-speech-tag [v.aux] ([n.v.aux]), [v.cop] ([n.v.cop]), or [v.past] ([n.v.past]) is preceded by ma [neg], then delete all other hypothesized tags.
Background: The prohbitive construction is normally formed by negating a present stem with ma. Negation of the past stem also occurs rarely in the prohibitive construction, but the future and imperative do not occur. Thus, although the dag yig gsar bsgrigs, generally a reliable dictionary, in a minority view point allows sdod as a future of 'sit', in the following context sdod must be analyzed as present: ང འི་ དབང་གྲལ་ དུ་ མ་ སྡོད་ གསུང་ 'He said, "Do not sit in my empowerment queue!". 
Rule: If a verb following ma [neg], does not have the hypothesized part-of-speech-tag [v.past], but does have the hypothesized tag [v.pres] and also [v.fut] or [v.imp], and this verb is followed by a śad, zer, gsung, gsungs, or [cl.quot], than delete [v.fut] or [v.imp] as appropriate from this word (i.e. leaving [v.pres] as the only [v.xxx] tag). 
Background: Negation with mi precludes the past, but it possible with both the present and future. (Because ma occurs before the present (rule 35) and yin (rule 39), it is not possible to stipulate that a verb stem following ma is necessarily a past.)
Rule: After mi [neg], keep only [v.aux], [v.fut], [v.pres], [n.v.aux], [n.v.fut], and [n.v.pres].
Background: A final da-drag is typical of past verbs with roots that end in -n, -r, -l (e.g. pres. sbyin, past byind 'give'); the da-drag can however also occur as the final of presents (e.g. pres. seld, past bsald 'cleanse, remove'). The presence of a da-drag has ramifications on the sandhi determined allomorphs of the following word in a number of cases. Specifically, after a da-drag one sees kya [cl.focus], ci[cv.impf], to [cv.fin], tu [cv.term], and tam [cv.ques] rather that other allomorphs of these morphemes, such as yaṅ, źiṅ, no etc., du, and nam etc. Not all past form that could have a da-drag do have a da-drag. For example, the verb 'give' (sbyin, byin, sbyin, byin) appears with the final converb as byin no and not byin to. Consequently, when there is no morphological ambiguity among present, past, and future, it would be inappropriate to insist on, or indeed expect, a da-drag. The temptation looms to only make use of da-drag information when a stem is ambiguous between past and future, but such a specification of the rule also has disadvantages. The rule of sections 10.1.3 and 10.1.4 will have already deleted many analyses from verbs that are in principle ambiguous between past and future. This action would delete the trigger for a rule that requires ambiguity between the past and future. The solution we have achieved is to insist only that a verb stem is somehow still ambiguous. It would be senseless to delete [v.past] from byin if it is the only remaining verbal analysis.
Rule: If a word has more than one [v.xxx] tag, including [v.fut], and the word ends in -l, -n, or -r and is followed by the word kya, ci, to, tu, or tam then delete [v.fut].
Background: After a da-drag the final converb takes the form to. Consequently, the forms of the final converb no, ro, and lo can be taken as evidence for the absence of a da-drag, which in turn provides evidence against the interpretation of the verb in question as a past.
Rule: If a word has more than one [v.xxx] tag, including [v.past], and the word ends in -l, -n, or -r and is followed by the word no, ro, or lo, then delete [v.past].
Background: After a da-drag the imperfective converb takes the form ciṅ. Consequently, the form źiṅ of the imperfective converb can be taken as evidence for the absence of a da-drag, which in turn provides evidence against the interpretation of the verb in question as a past.
Rule: If a word has more than one [v.xxx] tag, including [v.past], and the word ends in -l, -n, or -r and is followed by the word źi then delete [v.past].
Background: After a da-drag the question converb takes the form tam. Consequently, the forms of the question converb nam, ram, and lam can be taken as evidence for the absence of a da-drag, which in turn provides evidence against the interpretation of the verb in question as a past.
Rule: If a word has more than one [v.xxx] tag, including [v.past], and the word ends in -l, -n, or -r and is followed by lam [cv.ques] nam [cv.ques] or ram [cv.ques], then remove the tag [v.past].
Background: After a da-drag the terminative converb takes the form tu. Consequently, the form du of the terminative converb can be taken as evidence for the absence of a da-drag, which in turn provides evidence against the interpretation of the verb in question as a past.
Rule: If a word has more than one [v.xxx] tag, including [v.past], and this word ends in -l, -n, or -r and is followed by du then remove the tag [v.past].
Background: After a da-drag the focus clitic the form kya. Consequently, the form yaṅ of the focus clitic can be taken as evidence for the absence of a da-drag, which in turn provides evidence against the interpretation of the verb in question as a past.
Rule: If a word has more than one [v.xxx] tag, including [v.past], and this word ends in -l, -n, or -r and is followed by yaṅ then remove the tag [v.past].
Background: Inside of verbal nouns the da-drag can also be detected. The nominalization suffix takes the form -ba after -r, -n, or -l, but is -pa after the da-drag, i.e. implying that the verb stem is [n.v.past]. (Because -pa and -ba are similar looking and frequently confused, this rule may seem to risk introducing errors. However, we think it is best to disambiguate verb stems wherever it is possible to do so. Disambiguating these stems permits the behavior of -pa versus -ba to be more easily explored by future researchers; reason enough to add the rule.)
Rule: If a word has more than one [n.v.xxx] tag, including [n.v.fut], and the verb stem ends in -l, -n, or -r and is nominalized with -pa, then remove the tag [n.v.fut] from this word.  
Background: Inside of verbal nouns the da-drag can also be detected. The nominalization suffix takes the form -ba after -r, -n, or -l, but is -pa after the da-drag, i.e. implying that the verb stem is [n.v.past]. (Because -pa and -ba are similar looking and frequently confused, this rule may seem to risk introducing errors. However, we think it is best to disambiguate verb stems wherever it is possible to do so. Disambiguating these stems permits the behavior of -pa versus -ba to be more easily explored by future researchers; reason enough to add the rule.) 
Rule: If a word has more than one [n.v.xxx] tag, including [n.v.past], and the verb stem ends in -l, -n, or -r and is nominalized with -ba, then remove the tag [n.v.past] from this word.
Background: In many syntactic contexts the rule based tagger will be unable unambiguously specify the choice of verb stems. For example, gśegs so could be present, past, or future. Because such contexts are systematically ambiguous it would not be useful to present the human user with a choice between these three tags (i.e. [v.fut] ~ [v.fut] ~ [v.pres]). Consequently, we create a tag [v.invar] to explicitly mark such instances as undecidable. Identical consideration apply for the respective verbal nouns.
Rule: Replace [v.fut] ~ [v.past] ~ [v.pres] with [v.invar] and replace [n.v.fut] ~ [n.v.past] ~ [n.v.pres] with [n.v.invar].
Background: In many syntactic contexts the rule based tagger will be unable unambiguously specify the choice of verb stems. For example, bskon te could be future or past. Because such contexts are systematically ambiguous it would not be useful to present the human user with a choice between these three tags (i.e. [v.fut] ~ [v.past]). Consequently, we create a tag [v.fut.v.past] to explicitly mark such instances as undecidable. Identical consideration apply for the respective verbal nouns.
Rule: Replace [v.fut] ~ [v.past] with [v.fut.v.past] and replace [n.v.fut] ~ [n.v.past] with [n.v.fut.n.v.past].
Background: In many syntactic contexts the rule based tagger will be unable unambiguously specify the choice of verb stems. For example, mi gśegs could be future or present. Because such contexts are systematically ambiguous it would not be useful to present the human user with a choice between these three tags (i.e. [v.fut] ~ [v.pres]). Consequently, we create a tag [v.fut.v.pres] to explicitly mark such instances as undecidable. Identical consideration apply for the respective verbal nouns.  
Rule: Replace [v.fut] ~ [v.pres] with [v.fut.v.pres] and replace [n.v.fut] ~ [n.v.pres] with [n.v.fut.n.v.pres].
Background: In many syntactic contexts the rule based tagger will be unable unambiguously specify the choice of verb stems. For example, gśegs nas could be past or present. Because such contexts are systematically ambiguous it would not be useful to present the human user with a choice between these three tags (i.e. [v.past] ~ [v.pres]). Consequently, we create a tag [v.past.v.pres] to explicitly mark such instances as undecidable. Identical consideration apply for the respective verbal nouns.
Rule: Replace [v.past] ~ [v.pres] with [v.past.v.pres] and replace [n.v.past] ~ [n.v.pres] with [n.v.past.n.v.pres].
Background: The syllable źu is either the present/future of 'ask', or the past of 'to melt'.  
Rule: Replace źu [v.invar] with źu [v.fut.v.pres] ~ [v.past].
Background: The syllable bri is either the future/past of 'decrease', or the future of 'write'.  
Rule: Replace bri [v.invar] with bri [v.fut] ~ [v.fut.v.past].
Background: The syllable za is either the present of 'eat' or the invariant verb 'itch'. The syllable skya is either the present of 'carry' or the invariant verb 'be gray'. The syllable skyoṅ is either the present of 'guard' or the invariant verb 'go along a road'. 
Rule: Replace za [v.invar] with za [v.invar] ~ [v.pres]. Replace skya [v.invar] with skya [v.invar] ~ [v.pres]. Replace skyoṅ [v.invar] with skyoṅ [v.invar] ~ [v.pres].
Background: The syllable gśags is either an invariant verb 'to tighten' or the past of a verb gśog 'to split'. The syllable bor is either an invariant verb 'to wane, be lost' or the past of a verb ḥbor'discard'. The syllable mchis is either an invariant verb 'to be' or the past of a verb mchi 'to go'. The syllable ches is either an invariant verb 'believe' (yid ches) or the past of a verb che 'be large'.  The syllable phas can either be the past or imperative of the verb phaṅ, phaṅs, ḥphaṅ, phaṅs 'save, economize' or it can be an invariant verb phaṅs 'long for, feel loss'. The syllable gtams can be either the past of the verb gtam, gtams 'say' or the invariant verb gtams 'fill'. The syllable phyis can either be the past of 'wipe' or the invariant verb 'gestate'. The syllable bcas can either be the invariant verb 'to have' or the past of a verb ḥcha 'make'.
Rule: Replace gśags [v.invar] with gśags [v.invar] ~ [v.past]. Replace bor [v.invar] with bor [v.invar] ~ [v.past]. Replace mchis [v.invar] with mchis [v.invar] ~ [v.past]. Replace ches [v.invar] with ches [v.invar] ~ [v.past]. Replace phas [v.invar] with phas [v.invar] ~ [v.past]. Replace gtams [v.invar] with gtams [v.invar] ~ [v.past]. Replace phyis [v.invar]  with phyis [v.invar] ~ [v.past]. Replace bcas [v.invar] with bcas [v.invar] ~ [v.past].
Background: The syllable bcas can either be the invariant verb 'to have' or the past of a verb ḥcha 'make'. After daṅ it can be specified as [v.invar], i.e. as not specifically [v.past].  
Rule: If bcas associated with the both the tags [v.past] and [v.invar] occurs in the phrase daṅ bcas specify bcas as not specifically [v.past], i.e. as [v.invar] or a non-verbal tag.
Background: The syllable byor is either an invariant verb 'come', or the present (and possibly future) of a verb byor 'adhere'. The syllable yo is either the present/future of 'go/come' or an invariant verb 'be able, be sufficient'.
Rule: Replacebyor [v.invar] with byor [v.invar] ~ [v.fut.v.pres]. Replace yo[v.invar] with yo [v.invar] ~ [v.fut.v.pres].
Background: After mi [neg] a verb should not be interpreted as past. However, in some cases the rule based tagger is unable to determine whether mi is [neg] or [n.count]. In such cases the interpretation of the following verb as [v.invar] should not be forced, even though the past cannot be precluded, because the correction of the human tagger to [v.fut.v.pres] should be permitted.
Rule: Replace mi|[neg][n.count] XXX|[v.invar] with mi|[neg][n.count] XXX|[v.fut.v.pres] ~ [v.invar].
Background: The syllable bźag is either the past of jog 'leave, put aside' or is the present or past of bźag 'split, tear' (intrans.) (cf. rule 83). The syllable gśags is either an invariant verb 'to tighten' or the past of a verb gśog 'to split'. Because syntactic disambiguation will have already specified some contexts as [v.fut.v.pres] (e.g. gśags nas) we must disambiguate [v.past.v.pres] as well as [v.invar], which was handled in rule 73. The syllable bor is either an invariant verb 'to wane, be lost' or the past of a verb ḥbor'discard'. Because syntactic disambiguation will have already specified some contexts as [v.fut.v.pres] (e.g. bor nas) we must disambiguate [v.past.v.pres] as well as [v.invar], which was handled in rule 73. The syllable mchis is either an invariant verb 'to be' or the past of a verb mchi 'to go'. Because syntactic disambiguation will have already specified some contexts as [v.past.v.pres] (e.g. mchis nas) we must disambiguate [v.past.v.pres] as well as [v.invar], which was handled in rule 73.
Rule: Replace bźag [v.past.v.pres] with bźag [v.past] ~ [v.past.v.pres]. Replace gśags [v.past.v.pres] with gśags [v.past] ~ [v.past.v.pres]. Replace bor [v.past.v.pres] with bor[v.past] ~ [v.past.v.pres]. Replace mchis [v.past.v.pres] with mchis[v.past] ~ [v.past.v.pres].
Background: The syllable skyoṅ is either the present of 'guard' or the invariant verb 'go along a road'. Before the converb -nas the rule tagger will analyze this syllable as [v.past.v.pres]. This rule allows skyoṅ [v.pres] in this position.
Rule: Replace skyoṅ [v.past.v.pres] with skyoṅ [v.pres] ~ [v.past.v.pres].
Background: The syllable byon is either the past of the verb 'arrive' or ambiguous past and future of a verb 'be capable'. The computer by creating ambiguous tags will always present byon as [v.fut.v.past], but the human user must be presented with both possible interpretations.
Rule: Replace byon [v.fut.v.past] with byon [v.fut.v.past] ~ [v.past].
Background: The syllable jug is the present of the transitive verb 'insert', but is also both present and future of the intransitive verb 'enter'. The syllable za is either the present of 'eat' or the invariant verb 'itch'. Because syntactic disambiguation will have already specified some contexts as [v.fut.v.pres] (e.g. mi za) we must disambiguate [v.fut.v.pres] as well as [v.invar], which was handled in rule 78. The syllable jois either [v.pres] of 'milk' or the [v.fut.v.pres] of 'fulfill'.
Rule: Replace jug [v.fut.v.pres] with jug [v.fut.v.pres] ~ [v.pres]. Replace za [v.fut.v.pres] with za [v.fut.v.pres] ~ [v.pres]. Replace jo [v.fut.v.pres] with jo [v.fut.v.pres] ~ [v.pres].
Background: The syllable gźig is the future of jig 'destroy' and the present and future of the verb gźig 'make minute inquiry'.
Rule: Replace gźig [v.fut.v.pres] with gźig [v.fut] ~ [v.fut.v.pres].
Background: The syllable źu is either the present/future of 'ask', or the past of 'to melt'.
Rule: Replace źu-ba [n.v.invar] with źu-ba [n.v.fut.n.v.pres] ~ [n.v.past].
Background: The verbal noun za-ba is either the present of the verb 'eat' or the invariant verb 'itch'. The verbal noun skya-ba is either the present of the verb 'carry' or the invariant verb 'be gray'. The verbal noun jo-ba is either [v.pres] of 'milk' or the [v.invar] of 'fulfill'.
Rule: Replace za-ba [n.v.invar] with [n.v.invar] ~ [n.v.pres]. Replace skya-ba [n.v.invar] with [n.v.invar] ~ [n.v.pres]. Replace jo-ba [n.v.invar] with [n.v.invar] ~ [n.v.pres].
Background: The syllable btsog is either an invariant verb 'to be dirty' or the future of a verb 'smash up'.
Rule: Replace btsog [n.v.invar] with btsog [n.v.fut] ~ [n.v.invar].
Background: According to the dictionaries the syllable riṅs is apparently an invariant verb 'to hurry', but is also the past of the verb ḥdri 'be distant' seen frequently in the phrase glo-ba ḥdri 'be loyal'. The dictionaries agree that there is an invariant verb gtogs 'to be included in' and a verb pres. gtog, past gtogs, fut. gtog 'snap'. Thus, the form gtogs is itself ambiguous. The orthographic form mchis-pa is either a nominalized form of the invariant verb mchis 'to be', or the past tense of the verb mchi 'to go'. The orthographic form ches-pa is either a nominalized form of the invariant verb 'believe', or the past tense of the verb che 'be big'.
Rule: Replace riṅs-pa [n.v.invar] with riṅs-pa [n.v.invar] ~ [n.v.past]. Replace gtogs-pa [n.v.invar] with gtogs-pa [n.v.invar] ~ [n.v.past]. Replace gtogs-pa [n.v.invar] with gtogs-pa [n.v.invar] ~ [n.v.past]. Replace ches-pa [n.v.invar] with ches-pa [n.v.invar] ~ [n.v.past].
Background: The syllable ches is either an invariant verb 'believe' (yid ches) or the past of a verb che 'be large'.  However, the verb 'believe' always comes after yid 'mind'. So the interpretation [v.past] can be removed from ches when it appears after yid 'mind'.
Rule: When ches [v.invar] ~ [v.past] appears after yid [n.count], then delete [v.past] from ches.  Similarly, when ches-pa [n.v.invar] ~ [n.v.past] appears after yid [n.count], then delete [n.v.past] from ches
Background: The syllable jug is the present of the transitive verb 'insert', but is also both present and future of the intransitive verb 'enter'. The syllable za is either the present of 'eat' or the invariant verb 'itch'. Because syntactic disambiguation will have already specified some contexts as [v.fut.v.pres] (e.g. mi za) we must disambiguate [v.fut.v.pres] as well as [v.invar], which was handled in rule 78. The syllable jo is either [v.pres] of 'milk' or the [v.invar] of 'fulfill'; it will have already specified in some contexts as [v.fut.v.pres] (e.g. mi jo), so we must disambiguate [v.fut.v.pres] as well as [v.invar],
Rule: Replace jug-pa [n.v.fut.n.v.pres] with jug-pa [n.v.fut.n.v.pres] ~ [n.v.pres]. Replace za-ba [n.v.fut.n.v.pres] with za-ba [n.v.fut.n.v.pres] ~ [n.v.pres]. Replace jo-ba [n.v.fut.n.v.pres] with jo-ba [n.v.fut.n.v.pres] ~ [n.v.pres].
Background: The syllable rtog is either the present of rtog, brtags, brtag, rtogs 'examine' or is an ambiguous present or alternate past of rtog(s) 'perceive'.
Rule: Replace rtog-pa [n.v.past.n.v.pres] with rtog-pa [n.v.past.n.v.pres] ~ [n.v.pres].
Background: The syllable rtog is either the present of rtog, brtags, brtag, rtogs 'examine' or is an ambiguous present or alternate past of rtog(s) 'perceive'. In the phrases so-so r rtog-paḥi ye-śes and so-so r rtog-paḥi śes-rab, the word rtog-pa can be unambiguously tagged as [n.v.past.n.v.pres].
Rule: In the phrases སོ་སོ ར་ རྟོག་པ འི་ ཤེས་རབ་ and སོ་སོ ར་ རྟོག་པ འི་ ཡེ་ཤེས་ tag རྟོག་པ as [n.v.past.n.v.pres].
Background: The syllable bźag is either the past of jog 'leave, put aside' or is the present or past of bźag 'split, tear' (intrans.) (cf. 73). The syllable ḥbogs is either the present of ḥbogs, phog, dbog, phog 'bestow, impart' or is the past of the verb ḥbog, ḥbogs, ḥbog, ḥbogs 'cross, exceed'.
Rule: Replace bźag-pa [n.v.past.n.v.pres] with bźag-pa [n.v.past] ~ [n.v.pres]. Replace ḥbogs-pa [n.v.past.n.v.pres] with ḥbogs-pa [n.v.past] ~ [n.v.pres].
Background: After mi [neg] a verbal noun should not be interpreted as past. However, in some cases the rule based tagger is unable to determine whether mi is [neg] or [n.count]. In such cases the interpretation of the following verb as [n.v.invar] should not be forced, even though the past cannot be precluded, because the correction of the human tagger to [n.v.fut.n.v.pres] should be permitted.
Rule: Replace mi|[neg][n.count] XXX|[n.v.invar] with mi|[neg][n.count] XXX|[n.v.fut.n.v.pres] ~ [n.v.invar].
Background: The syllable la has many interpretations: the allative case, the allative converb, the stem of the proclausal adverb lar 'moreover', and the noun mountain pass. Between two imperative verbs only the allative converb is possible; the noun 'mountain pass' can be precluded in this context (rule 45 already excluded the interpretation of la as a case marker in this context).
Rule: If the syllable la occurs after one [v.imp] and before another [v.imp] then delete [n.count] from the syllable la.
Background: Earlier rules work to isolate numbers from those morphemes with which they are sometimes homophonous (cf. rule 13), but these rules were written conservatively, requiring an unambiguous number in the immediate context. If three or more morphemes occur in a row, each of which has an analysis as a number, the string of morphemes should together be taken as a number.   (The word re we tag as [num.card] only in a particular structure such as źo re źo do, consequently the word re must be excluded from this rule.)
Rule: If three or more morphemes occur in a row each of which has an analysis [num.card], then tag all of them with [num.card]. (This rule should not apply to the word re).
Background: As a word on its own gcig-pa means 'alone, sole' and we tag it as [adj]. (The ordinal number formed fromt the cardinal gcig 'one' is daṅ-po 'first'). However, the ordinals formed from cardinals above ten may end in gcig-pa, for example ñi-śu rtsa gcig-pa 'the 21st'. Thus, any given instance of gcig-pa may have either the tag [adj] or the tag [num.ord], but these two interpretations are very easy to distinguish syntactically.
Rule: If the word gcig-pa (with the hypothesized tags [adj] and [num.ord]) is immediately preceded by a word that is unambiguously tagged [num.card] (or a sequence of two words, the first of which is unambiguously tagged [num.card] and the second of which has [num.card] among its hypothesized tags), then delete the tag [adj] from gcig-pa
Background: As a word on its own gcig-pa means 'alone, sole' and we tag it as [adj]. (The ordinal number formed fromt the cardinal gcig 'one' is daṅ-po 'first'). However, the ordinals formed from cardinals above ten may end in gcig-pa, for example ñi-śu rtsa gcig-pa 'the 21st'. Thus, any given instance of gcig-pa may have either the tag [adj] or the tag [num.ord], but these two interpretations are very easy to distinguish syntactically.
Rule: If the word gcig-pa (with the hypothesized tags [adj] and [num.ord]) is not immediately preceded by a word that is tagged [num.card], then delete the tag [num.ord] from gcig-pa
Background: As a word on its own gcig-pa means 'alone, sole' and we tag it as [adj]. (The ordinal number formed fromt the cardinal gcig 'one' is daṅ-po 'first'). However, the ordinals formed from cardinals above ten may end in gcig-pa, for example ñi-śu rtsa gcig-pa 'the 21st'. Thus, any given instance of gcig-pa may have either the tag [adj] or the tag [num.ord], but these two interpretations are very easy to distinguish syntactically. However, specifically in the combination bcu gcig-pa '11th', because bcu itself can be ambiguous between a number and a verb form, the preceding two steps of this rule will leave gcig-pa ambiguous. Consequently, a sub-rule is required that targets bcu gcig-pa specifically.
Rule: The sequence of words bcu gcig-pa should be tagged bcu [num.card] gcig-pa [num.ord].
Background: The word rin-po-che has several different functions. It can be an adjective in combinations such as bla-ma rin-po-che or nor-bu rin-po-che, but it can also be a mass noun in phrases such as rin-po-che sna bdun. In general it may seem unecessary to clearly distinguish between mass nouns and adjectives because all mass nouns can function as adjectives (bum-pa dngul 'a silver pot'). There are two reasons however to want to distinguish rin-po-che (and potentially other words) that are both mass nouns and adjectives. First, the adjective can be followed by a numeral but a mass noun can not. Second, the semantics of the adjective is different from that of the mass noun, i.e. whereas bum-pa dngul means 'a pot made out of silver' bla-ma rin-po-che does not mean 'a lama made out of preciousness'. So, it is beneficial to make a rule that precludes the mass noun reading following an unambiguous count noun.
Rule: If a word has the potential tags [adj] and [n.mass] and this word follows an unambiguous [n.count], [n.mass] or [n.prop], which in turn is preceeded by a punctuation marker, then delete [n.mass] from the word in question.
Background: The word rin-po-che has several different functions. It can be an adjective in combinations such as bla-ma rin-po-che or nor-bu rin-po-che, but it can also be a mass noun in phrases such as rin-po-che sna bdun. In general it may seem unecessary to clearly distinguish between mass nouns and adjectives because all mass nouns can function as adjectives (bum-pa dngul 'a silver pot'). There are two reasons however to want to distinguish rin-po-che (and potentially other words) that are both mass nouns and adjectives. First, the adjective can be followed by a numeral but a mass noun can not. Second, the semantics of the adjective is different from that of the mass noun, i.e. whereas bum-pa dngul means 'a pot made out of silver' bla-ma rin-po-che does not mean 'a lama made out of preciousness'. So, it is beneficial to make a rule that precludes the adjective reading before sna bdun.
Rule: If the word rin-po-che (with the potential tags [adj] and [n.mass]) is followed by sna, which in turn is followed by a word with the tag [num.card], then delete the tag [adj] from rin-po-che.
Background: A defining difference between mass nouns and count nouns is that mass nouns cannot be directly counted. Consequently if an ambiguous count/mass noun is directly followed by a number, the plural marker rnams, the plural marker tsho, the adjective chen-po 'big' or the indefinite marker, then the interpretation of it as a mass noun can be precluded. (This rule was leading to the false positive rin-po-che char bzhin 'jewels like rain' because char can also be used in the formation of fractions.)
Rule: If a word has the tags [n.mass] and [n.count], and is followed by [num.xxx] (other than char), [d.indef], chen-po, tsho, sna-tshogs, or rnams, possibly with an intervening [adj] or [d.dem], then remove the tag [n.mass].
Background: The word chu can mean both 'water' (a mas noun) and 'river' (a count noun). In the phrase chu brgal 'cross a river', one can specify the count noun meaning.
Rule: If the word chu has the tags [n.mass] and [n.count] and is followed by brgal(-ba), then remove the tag [n.mass].
Background: The word nor-bu can be a mass or count noun. In the phrase yid bźin nor-bu 'wishfulfilling jewel' it can be tagged as [n.count].
Rule: In the phrase yid bźin nor-bu tag nor-bu as [n.count].
Background: In general a mass noun is assume to have the same syntax as a count noun, except that a mass noun can be followed directly by a count noun whereas two count nouns should not normally follow one another (unless they are in apposition). Consequently, in the absence of a reason to do otherwise, a noun that can be either a count noun or a mass noun, should be interpreted as a mass noun. Because zaṅs means 'a copper pot' in addition to 'copper' and sa means 'the ground' in addition to 'earth', and gos means 'a garment' as well as 'cloth', this rule cannot apply to these three syllables. Furthermore, although in one place yi-ge 'letter' appears to be used in a mass noun construction (i.e. yi-ge ḥbru 'speck of a letter') it would not seem appropriate to assume that yi-ge is always a mass noun as a default. (A false positive ser-ba re 'each hail stone' means that we have to stipulate that the following word is not re. The reason to not specify that ser-ba should be [n.count] is that re has other functions and might come after a mass noun.)
Rule: If a word other than yi-ge, sa, gos or zaṅs, which is not followed by re or dag, has the tags [n.mass] and [n.count], then delete the tag [n.count].
Background: The syllable sa means 'the ground' in addition to 'earth'. When it means 'the ground' it is often followed by the allative case marker la. Furthermore, la (as opposed to na)envisions that the surface of something is left in tact, or that it is seen as a whole. Consequently, when sa is followed by la, the analysis of it as a mass noun can be removed.
Rule: If sa is followed by la, then remove [n.mass] from sa
Background: The phrase mi dgaḥ-ba has two allowed taggings, viz. mi [n.count] dgaḥ-ba [n.v.invar] 'a happy man' and mi [neg] dgaḥ-ba [n.v.fut.n.v.pres] 'unhappy'. By allowing both of these possibilities the rule based tagger also allows the taggins mi [n.count] dgaḥ-ba [n.v.fut.n.v.pres], and mi [neg] dgaḥ-ba [n.v.invar], which are impossible and mean nothing. It would consequently be good fo find these forbidden patters, and force an annotator to amend them to one of the two permitted patterns. Based on the intuition that an annotator is more likely to corectly distinguish mi [neg] and mi [n.count]  than dgaḥ-ba [n.v.invar] and dgaḥ-ba [n.v.fut.n.v.pres], the rule is structured to recommend brining the verbal noun tag into line with what is implied by the tagging of mi. Because mi has two possible tags, the rule has two parts. 
Rule: If an annotator has tagged mi [n.count] dgaḥ-ba [n.v.fut.n.v.pres], change this to mi [n.count] dgaḥ-ba [n.v.invar],
Background: The phrase mi dgaḥ-ba has two allowed taggings, viz. mi [n.count] dgaḥ-ba [n.v.invar] 'a happy man' and mi [neg] dgaḥ-ba [n.v.fut.n.v.pres] 'unhappy'. By allowing both of these possibilities the rule based tagger also allows the taggins mi [n.count] dgaḥ-ba [n.v.fut.n.v.pres], and mi [neg] dgaḥ-ba [n.v.invar], which are impossible and mean nothing. It would consequently be good fo find these forbidden patters, and force an annotator to amend them to one of the two permitted patterns. Based on the intuition that an annotator is more likely to corectly distinguish mi [neg] and mi [n.count]  than dgaḥ-ba [n.v.invar] and dgaḥ-ba [n.v.fut.n.v.pres], the rule is structured to reocmmend brining the verbal noun tag into line with what is implied by the tagging of mi. Because mi has two possible tags, the rule has two parts. 
Rule: If an annotator has tagged mi [neg] dgaḥ-ba [n.v.invar], change this to mi [neg] dgaḥ-ba [n.v.fut.n.v.pres].
Background: The phrase de ḥi tshe has two allowed taggings, viz. de [adv.proclausal] ḥi [case.gen] tshe [n.rel] 'at that time' and de [d.dem] ḥi [case.gen] tshe [n.count] 'his life'. By allowing both of these possibilities the rule based tagger also allows the taggings de [d.dem] ḥi [case.gen] tshe [n.rel] and de [adv.proclausal] ḥi [case.gen] tshe [n.count], which are impossible and mean nothing. It would consequently be good fo find these forbidden patters, and force an annotator to amend them to one of the two permitted patterns. Based on the intuition that an annotator is more likely to corectly distinguish tshe [n.rel] 'when' and tshe [n.count] 'life' than de [adv.proclausal] and de [d.dem], the rule is structured to recommend brining the verbal noun tag into line with what is implied by the tagging of mi. Because mi has two possible tags, the rule has two parts. 
Rule: If an annotator has tagged de [d.dem] ḥi [case.gen] tshe [n.rel], change this to de [adv.proclausal] ḥi [case.gen] tshe [n.rel].
Background: The phrase de ḥi tshe has two allowed taggings, viz. de [adv.proclausal] ḥi [case.gen] tshe [n.rel] 'at that time' and de [d.dem] ḥi [case.gen] tshe [n.count] 'his life'. By allowing both of these possibilities the rule based tagger also allows the taggings de [d.dem] ḥi [case.gen] tshe [n.rel] and de [adv.proclausal] ḥi [case.gen] tshe [n.count], which are impossible and mean nothing. It would consequently be good fo find these forbidden patters, and force an annotator to amend them to one of the two permitted patterns. Based on the intuition that an annotator is more likely to corectly distinguish tshe [n.rel] 'when' and tshe [n.count] 'life' than de [adv.proclausal] and de [d.dem], the rule is structured to recommend brining the verbal noun tag into line with what is implied by the tagging of mi. Because mi has two possible tags, the rule has two parts. 
Rule: If an annotator has tagged de [adv.proclausal] ḥi [case.gen] tshe [n.count], change this to de [d.dem] ḥi [case.gen] tshe [n.count].
Background: The preceding rules should always leave a word with at least one remaining tag. Assigning the [null] tag to any word without a tag will trigger a rule suggestion and thereby draw our attention to words without tags. We can then attempt to locate and correct the offending rule. Note that this rule should run after all other rules.
Rule: If a word has no tag, then assign to it the [null] tag.