Persian Noun Phrase

Karine Megerdoomian
Computing Research Laboratory

Introduction

The highly ambiguous structure of the Persian Noun Phrase (NP) causes immense difficulties for automatic parsing of written text. Numerous factors contribute to the ambiguity of the Persian NP structure. Certain vowels, known as short vowels, are not written, which produces additional lexical ambiguities. There are very few overt morphemes in the language to mark boundaries of Noun Phrases: With the exception of the specific object marker, the language lacks Case morphology. There are often no particles in written text linking the constituents of a Noun Phrase, such as "of" in English, since these particles are pronounced as short vowels and are therefore not transcribed in written form. Furthermore, since the basic word order in Persian is Subject-Object-Verb, the lack of overt morphology for marking boundaries makes it very difficult to determine where the Subject ends and the Object begins. All of these factors, coupled with a relatively free word order and the optionality of the subject, combine to make the Persian Noun Phrase extremely ambiguous for an analysis of written text.

This report introduces the constituents forming a Noun Phrase in Persian, as well as a description of its structure. It also shows how the lexical and morphological information present in the written text could be used in determining the boundaries of the NP. To describe the NP rules in the Shiraz project, a unification-based syntactic grammar was used. This grammar, known as Bolero, operates on typed feature structures. Relative Clauses are discussed briefly in the last section.

Noun Phrase Structure

We distinguish between the simple and the complex Noun Phrase. A complex NP is the noun phrase structure that is formed from several simple NPs.

Constituents

The main constituents of a Noun Phrase in Persian are introduced below. These parts of speech will be used in later sections when discussing the structure of the NP. Note that there is no definite article in Persian, only an indefinite, which appears as an affix attached to the noun or adjective.

Noun: The Noun is the head of the Noun Phrase.
Adjective: Adjectives modify the noun. There is no agreement on adjectives and they can be repeated. There is a distinction between the behavior of superlative adjectives in Persian. In contrast to the other adjectivals, the superlatives occupy the position preceding the noun.
Adverb: Adverbs may appear in the Adjectival Phrase (AP) preceding the adjective; they cannot be used without the latter in a NP.
Pronoun: Pronouns include personal as well as quantifying pronouns (QPronouns) such as everyone and someone. Pronouns usually appear in the position of the possessor, and they can form a whole NP on their own. Note that personal pronouns can ap pear either as separate lexical elements or as morphemes on the noun or adjective.
Proper Noun: Proper Nouns usually form an NP on their own. They usually occupy the position of the possessor noun.
Determiner: The Determiner precedes the head noun. The determiners are in (this), An (that), har (each/any).
Numeral: These consist of cardinal or ordinal numbers. The indefinite article yek (a/an, one), which appears before the head noun, is treated as a Numeral. There are three different ordinal types in Persian and they all behave differently from the cardinal numbers, displaying properties similar to the superlative adjective.
Unit: The units are used in numeral constructions and usually follow the Numeral itself. These are lexical words such as hezAr (thousand) or milyon (million).
Classifier: Classifiers usually follow the numeral and classify the head noun by indicating its kind or type. A numeral can appear without a classifier but a classifier can never appear without a numeral.
Quantifier: Quantifiers with meanings of some (e.g., ba'zi), every (e.g., hame), any/no (e.g., hich).
Title: Titles are forms of address preceding proper names.
Infinitival: The infinitival is used as a nominal within an NP. The infinitival, by virtue of being a verbal category, can appear with complements. The whole verbal structure is then nominalized and used in the NP.
Conjunction: Conjunction can relate parts of the AP or NP.
Article: There is no definite article in Persian, only an indefinite. This article appears only as an affix attached to the noun or adjective.

Simple Noun Phrase

Simple NPs can also be divided into three distinct groups. The structure of the noun phrase with a nominal head (referred to here as standard NP) differs from that of the infinitival constructions, hence we will describe each NP type separately. Furthermore, the elements that can act as the possessors, such as proper names and pronouns, are listed as separate noun phrases.

1. Standard NP

The head Noun is followed by the modifiers, which usually consist of an Adjectival Phrase (AP) construction. There could be several modifiers in a Noun Phrase. The elements preceding the head noun are the determiner, the numeral constructions and the quantifiers. Although adjectives always follow the noun, the superlative adjective can only appear before the head. Numerals, quantifiers and superlative adjectives are in complementary distribution; if one of these elements is present, the others cannot appear within the NP. Since complementary distribution usually indicates that the lexical elements occupy the same position, the numeral, quantifier and superlative constructions are all placed under the specifier category.

The relative ordering of the constituents of the simple NP is as follows:

NP = determiner specifier head modifier

where the head is a Noun and the parts of speech or phrases that can appear in each of the other categories are as shown below. Brackets indicate optionality. Note that all the constituents, with the exception of the head noun, are optional.

determiner:

Determiner

specifier:

Numeral (Unit) (Classifier)
Numeral [Ordinal]
Adjective [Superlative]
Quantifier

modifier:

(Adverb) Adjective (Note: Modifiers may be recursive)

The example below represents a simple Noun Phrase where CL stands for Classifier and EZ for the ezafe morpheme.

in do tA ketAb-e kohne

this two CL book-EZ old

2. Infinitival Noun Phrase

The head of the noun phrase can be an infinitival verb. These NPs are very similar to the gerundive constructions in English. Note that the infinitive head can be either a simple verb or a light verb, and it can appear in a predicate construction or with an adverbial.

The NP in (1) is an example of a predicative construction formed with the verb budan "to be". The predicate element zan "woman" is placed before the verb. The objects of the verb, however, become arguments of a genitive (or possessive) construction as shown in the example in (2). The infitival verb koshtan "to kill" carries an ezafe morpheme linking it to the object of the verb shir "lion". Note that a similar relation is present in the English translation for this example, in which the object of the gerundive (i.e., "lion") is linked to the verb by an "of" construction.

zan budan-ash
woman be-her
`her being a woman'
koshtan-e shir
kill-EZ lion
`the killing of a lion'

In our analysis, the NP structure in (1) is analyzed as an infinitival NP in the simple NP rules. The structure in (2), however, is treated as a complex NP, formed from the concatenation of two distinct simple NPs. Complex NPs are discussed in the following section.

The structure of the Infinitival Noun Phrase constituents can be described as either

NP = predicate head or NP = adjunct head

3. Possessor Noun Phrase

Possessive pronouns and proper names follow the head noun in Persian, as exemplified below:

ketAb-e man

book-EZ 1sg-pronoun 1

ketAb-e dAryush

book-EZ Dariush

Pronouns and proper names mark the boundary of the Noun Phrase (i.e., no other NP element can appear to the right of pronouns and proper names), hence they are often included as the last element in the Persian NP. Since these lexical elements can also be heads of their own NP (i.e., NPs can consist of simply a pronoun or a proper name), we will treat them as separate noun phrase constructions 2 . The next section will explain how these phrases are used to form bigger NPs. The possessor NP can consist of any of the following lexical elements:

NP = personal pronoun
NP = quantifier pronoun
NP = (title) proper name

Complex Noun Phrase

The complex noun phrase is the equivalent of the genitive or possessive constructions in English, such as "Mao's red book", "her mother's hat" or "the syntax of noun phrases". In English, the link between the two nouns is marked by `s (e.g., Mao's) or the preposition of. In the case of pronouns, the latter appear in their genitive form (e.g., her). Other languages, such as Turkish or Armenian, use Case to indicate the link between noun phrases.

The element joining the Persian noun phrase constituents to each other is the ezafe suffix. The ezafe, however, is usually pronounced as the short vowel /e/ and is therefore not marked in written text. The result, in Persian written text, is a series of consecutive nouns without any overt links or boundaries as shown in the example (1) transcribed as it appears in Persian text (i.e., without short vowels). The actual pronunciation for this example is given in (2); the ezafe morpheme is represented by the -e following the first three nouns, linking each one to the following constituent. Note that the last constituent in the NP does not carry the ezafe suffix, thus marking the end boundary of the noun phrase.

ktab dvst pdr daryvsh
book friend father Dariush
`Dariush's father's friend's book'

ketAb-e dust-e pedar-e dAryush

In this example, each noun forms a simple NP which then join together to form the complex NP given in (1). The lack of Case and agreement, as well as overt linking morphemes, coupled with a verb-final word order, can make the computational parsing of Persian NPs extremely ambiguous. In the next section, we will present possible boundary markers or joining elements in Persian that can help resolve some of the parsing ambiguities.

Noun Phrase Boundaries

Lexical Categories

The constituent ordering for the simple standard NP, discussed above, already points to some of the beginning boundaries of the Persian NP. Hence, the determiner, if present, is the first element of the noun phrase. If there is no determiner, the specifier is the first element. As already mentioned, the possessor elements constitute the boundaries of the complex noun phrase as well. In other words, if a simple nominal NP is followed by a possessor NP structure, the two NPs can join to form a bigger NP, but no other NP element can join to the right of this newly formed complex NP 3.

Consider the sentence in (1) below with its corresponding noun phrase boundaries as shown on the gloss in (2).

bh gfth ayn xbrgzary vzyr xarJh Ayndh kshvr banv Albrayt ast
according to this news agency minister foreign future country lady Albright is
`According to this news agency the future foreign minister of the country is Lady Albright.'
[according to this news agency]NP [ minister foreign future country]NP [ lady Albright]NP is

In this example, there are eight NP constituents between the preposition according to and the final verb. The NP boundary can, in principle, fall after any of the nouns in this sentence, which leads to a very high parsing ambiguity. Now compare the sentence below containing proper nouns:

bh gfth xbrgzary fransh vzyr xarJh Ayndh Amryka banv Albrayt ast
according to news agency France minister foreign future United States lady Albright is
`According to France's news agency the futrue foreign minister of United States is Lady Albright.'
[according to news agency France]NP [ minister foreign future U.S.]NP [ lady Albright]NP is

In this case, the proper names can be used to detect the final boundaries of the noun phrases, thus analyses joining Amryka(US) and banv (Lady), or linking fransh (France) and vzyr (minister) will not be formed.

Indefinite Article / Enclitic Particle (IE)

Although the indefinite article and the enclitic particle have different syntactic functions, they have the same surface representation and cannot be differentiated in morphology. This morpheme can appear on a noun or on an adjective.

nkth~ay

point-IE

nkth~ay mhm

point-IE important

nkth mhmy

point important-IE

If there is more than one adjective, the IE will appear on the head adjective (the last one).

nkth Jalb v mhmy

point interesting and important-IE

In all of these instances, the presence of the IE marks the NP boundary, in the sense that no other NP element can follow the noun in or the noun-AP combination in the examples above. This is exemplified below:

mrdmy Azadh rvabT aqtSqdy ayran ra mhm mydannd kh ...

people-IE noble relations economic Iran OBJ important know-3pl that...

Since the noun mrdm appears with the enclitic affix y, the simple NP consisting of mrdmy Azadh is not allowed to join to the NP to the right.

Personal Pronoun Clitics

Instead of appearing as a lexical pronoun, the possessive pronoun may be cliticized onto the rightmost constituent of the simple NP as shown in the two examples below. When the clitic is present, it marks the end boundary and the simple NP can not join to the following nominal element to form a complex NP.

ktabsh

book-Clit

ktab khnh~ash

book old-Clit

The sentence below shows how the clitic is used to denote the end boundary of the Noun Phrase, thus not allowing it to join to the following element.

hmsayganman ktab khnh shma ra brdashtnd

neighbor-Plur-Clit(1pl) book old you(2pl) OBJ took-3pl

Ezafe Morpheme

The ezafe morpheme does not mark the end boundary of a Noun Phrase but rather the lack thereof, since the ezafe is used to join the head of a NP to the constituents following it. As already mentioned, the ezafe is rarely written in Persian text since it is a short vowel. When it appears after a vowel, however, it has the surface form y. In these cases, the ezafe can be used to indicate that the simple NP should be joined to the following nominal phrase.

In the sentence below, the adjective zyba (beautiful) appears with an overt ezafe morpheme, which indicates that the simple NP zn zybay (beautiful woman/wife + EZ) should be joined to the following NP (Dariush) thus forming the complex noun phrase zn zybay daryush. In other words, the NP boundary can NOT be at this location.

zn zybay daryvsh vard shod

wife beautiful -EZ Dariush entered

In the second example below, on the other hand, the adjective zyba (beautiful) does not carry the ezafe suffix. Note that since this word ends in a vowel, if the ezafe were present, it would have apperaed in its overt form y, hence we can co nclude with certainty that the ezafe is not available. The absence of the ezafe indicates that a boundary should be set following the adjective thus forming two separate noun phrases as shown.

zn zyba daryvsh ra shnaxt

woman beautiful Dariush OBJ recognized

Unfortunately, in most cases, the ezafe is an unwritten short vowel and cannot provide any information as to the boundaries of the NP. In our grammar, we treat such cases as having an Undefined ezafe affix. When the ezafe is Undefined, it is treated as both a boundary and as a single NP resulting in ambiguous parsing.

Combination of Boundary-Marking Features

The following table presents the coocurrence possibilities for the ezafe, indefinite/enclitic and the pronominal clitic morphemes. The combination of these features is used in certain rules in the syntactic parser.

	combination 1	combination2	combination 3	combination 4	combination 5
*ezafe*	True	False	False	False	Undefined
*clitic*	False	True	False	False	False
*indefinite/enclitic*	False	False	True	False	False

Bolero Grammar

This section introduces a few sample NP rules from the Bolero syntactic grammar. These rules demonstrate how the information from the structure and the boundary-marking elements of the NP are incorporated within the grammar.

The presence of a boundary marker, such as the indefinite/enclitic morpheme, is denoted by a feature on the NP feature structure rules called boundary. When a boundary marker (e.g., clitic or IE) is encountered, the value for this feature is set to True. The True value indicates that the NP has reached a boundary and cannot join to the following constituent to form a bigger noun phrase. If a boundary-marking morpheme was not found on the NP constituents, the value is set to False. In such cases, the NP is free to join to the next element. In certain cases, as when the presence or absence of an ezafe morpheme can not be determined, the boundary is set to "Undefined", in which case the NP may or may not join to the constituent following it.

Consider the rule NounBarIndefinite given below. This rule contains a left-hand side (lhs) and a right-hand side (rhs) as in rewrite rules.


	NounBarIndefinite = Rule[
	lhs: NounBar[
	  head: #head,
	  boundary: True],
	rhs: <:
	  #head= Entry[form.morph:[
		         lex.pos: Noun,
			 infl.indefiniteEnclitic: True]]
	:>
];

The right-hand side of this rule is satisfied if an entry with a Noun POS is recognized, which also carries an indefinite/enclitic morpheme. As can be seen in the left-hand side of this rule, this nominal element is tagged as the head of the N' and the value of the boundary feature is set to True. The boundary value is transferred up when the higher NP level is formed as shown below in the NPo rule.

The NPo is the feature structure forming a standard simple NP. It contains all of the constituents that could constitute the standard noun phrase. Each constituent on the rhs is linked by a variable (marked by the pound sign #) to the elements in t he feature structure in the lhs of the rule. As mentioned, the boundary value that was set for the NounBar (N') is also transferred up to the NPo structure.


// NPo --> Det? Spec? N' 			where N' --> Noun Adj?
NPo = Rule[
	lhs: NounPhraseZero[
   	  determiner: #det,
	  specifier: #spec,
	  head: #head,
	  modifier: #mod,
	  boundary: #bnd],
	rhs: <:
	  "optional"	#det= Entry[form.morph.lex.pos: Determiner]
	  "optional" Specifier[specType: #spec = Top] 
	  NounBar[
	     head: #head = Top,
	     modifier: #mod = Top,
	     boundary: #bnd = Top]
	:>
];

The complex noun phrase, which consists of two or more simple NPs, is formed using the recursive rule called complexNP. The right-hand side of this rule looks for an NPo structure followed by a Noun Phrase feature structure. This construction could be exemplified with the noun phrase zn zybay daryvsh (woman beautiful-EZ Dariush), in which the simple NP (or NPo) "woman beautiful-EZ" and the proper name NP "Dariush" join to form a bigger NP. What should be noted is that the right-hand side of this rule is satisfied only if the boundary value is set to False or to Undefined. Hence, if the boundary value is True, such as when an IE morpheme is encountered, the complex nou n phrase will not be formed.


complexNP = Rule[
	lhs: NounPhrase[
	  head: #np1,
	  possessor: #np2],
	rhs: <:
	  #np1= NounPhraseZero[
	  	  boundary: FalseOrUndefined]
	  #np2= NounPhrase 
	:>
];

Relative Clause

Relative Clauses are used to give further information about a nominal element, such as in the English sentence "The man, whom I met yesterday, has had an accident.", where "whom I met yesterday" represents a relative clause providing further information about "the man". In Persian, relative clauses are usually introduced by the relativizer kh [ke] (that), which is used regardless of the animacy, gender or function of the head noun. In nonrestrictive relative clauses, the head noun often carries the Enclitic morpheme which links it to the fol lowing relative clause. In these instances, the head noun is usually interpreted as a definite.

The relative clause construction is similar to English: The head noun is followed by the relativizer (kh in Persian), which is then followed by the clause that relates to the head noun, as shown below:

head noun [`kh' [ Clause] ] ...

In certain cases, the relative clause can be separated from the head noun by the verb of the sentence. In addition, several relative clauses could follow a head noun. As mentioned above, the relativizer kh does not vary depending on animacy or func tion of the head noun; in other words, relative pronouns such as "who", "which", "whom" do not exist in Persian. It is also not possible to precede the relativizer by a preposition as in the English examples "to whom&quo t;, "in which".

If the head noun is the subject or direct object of the relative clause, it is often left as a gap as shown in the examples (1) and (2) below, respectively. Note that the subject in the clause (tv "you") is optional, since Persian is a pro-drop (i.e., optional subject) language.

zn-y kh ktab myxvand [zani ke ketAb mikhAnad]
woman-Encl that book read(pres/3sg)
`The woman who reads books'
zn-y kh tv myshnasy [zani ke to mishenAsi]
woman-Encl that you know(pres/2sg)
`The woman that you know'

In certain instances, however, even if the head noun is the subject or direct object of the relative clause, it may be replaced by a pronoun in the clause it originated from. In the following example, the head noun plak kvchk (small plaque) is the subject of the relative clause; it is substituted by the resumptive pronoun An (it). The use of the resumptive pronoun usually occurs when the head noun is separated from the relative clause by an intervening verb. In this example, the verb py brdh and (have found) precedes the relative clause.

danshmndan bh plak kvchk-y dr mQz py brdh and kh An nyz taknvn nashnaxth mandh bvd.

scientists to plaque small-Encl in brain found that it also until now unknown had remained

When the head noun is the indirect object or is extracted from a Prepositional Phrase adjunct in the clause, a resumptive pronoun is used. In other words, the position from which the head noun originates is substituted by a pronoun that agrees with the head noun. This is exemplified in the three NP cases below:

ayn bchh -ha kh az Anha Adrs myprsydy [in bachehA ke az AnhA Adres miporsidi]
this kid -Plur that from them address ask(imp/2sg)
`These kids from whom you asked for the address'
shhr-y ke dr An tZahrat shdh bvd [shahri ke dar An tazAhorat shode bud]
city-Encl that in it demonstrations become(pluperf/3sg)
`The city in which demonstrations took place'
zn-y kh bray-sh ktab xrydy [zani ke barAyash ketAb kharidi]
woman-Encl that for-Clitic(3sg) book buy(past/2sg)
`The woman for whom you bought a book' or
`The woman that you bought a book for'

In the example (1) above, the head noun ayn bchh-ha (these kids) is the indirect object of the clause; it is extracted from the PP complement of the verb "ask". As the example shows, the preposition az (from) is left behind in the relative clause, and the head noun is replaced by a pronoun Anha (they/them). A similar example is given in (2) with an inanimate head noun. In (3), the head noun zn (woman) is also extracted from the PP complement of the clausal verb. In this instance, however, the head noun is replaced by a clitic pronoun -sh (him/her), which appears attached on the preposition bray (for). The word for word gloss for this example is then woman that for-her book (you) bought.

If the head noun of the relative clause is the object of the main sentence, then it may appear with the object marker ra, as shown in the following sentence. Note that the head noun receives an object marker, even if it is the subject of the relati ve clause.

flsTynyan-y ra kh dr xyabanha tZahrat mykrdn d kshtnd

Palestinians-Encl OBJ that in streets demonstration do(imp/3pl) kill(past/3pl)

The examples below show a head noun separated from the relative clause by an intervening verb and an intervening adverb, respectively.

nxstyn kar-y ast kh dr ayn kshvr bh nfe fransvyha anJam shdh ast

first work-Encl is that in this country to benefit french-Plur perform(passive/perf/3sg)

bradr av nyz kh bshdt eSbany shdh bvd nqshh qtl Jan ra kshyd

brother his also that intensely angry had become plan murder John OBJ pulled

Conclusion

This report describes the structure of the Noun Phrase in Persian and explains how certain morphological and syntactic features could be helpful in determining boundaries of Noun Phrases in Persian. These constraints, when incorporated within the syntactic grammar, can reduce the number of parses produced during analysis. The way in which these boundary markers were incorporated within the Bolero grammar, used in specifying the syntactic structure of Persian, is also discussed. The final section covers the relative clause constructions in Persian.

References

Bateni, M. (1995). Towsif-e Sakhteman-e Dastury-e Zaban-e Farsi [Description of the Linguistic Structure of Persian Language]. Amir Kabir Publishers, Tehran, Iran.
Lazard, G. (1992). A Grammar of Contemporary Persian. Mazda Publishers.
Mahootian, Sh. (1997). Persian. Routledge, New York, NY.

Footnotes

1. Since there is no Case in Persian, the surface form of the pronoun is always the same whether it is used in a subject, object or possessive context. Back

2. There are no capital letters in Persian, hence proper names are not easily differentiated from nouns. Back

3. There are cases as in (i), where the proper name can be modified and even joined to the right by another possessor (here, a pronoun) but such instances seldom occur in written text. Back

(i) shirAz-e zibA-ye mA
Shiraz-ez beautiful-ez our
`our beautiful Shiraz'

Top of Page