CAMEL POS Schema and Guidelines¶
1. Motivation and Goals:¶
CAMEL POS is inspired by the ARZATB tagset and guidelines (Maamouri et al., 2012), which was based on the PATB guidelines (Maamouri et al., 2009). CAMEL POS is designed as a single tagset for both MSA and the dialects with the following goals in mind:
- Facilitating research on adaptation between MSA and the dialects, and among the dialects.
- Supporting backward compatibility with previously annotated resources.
- Enforcing a functional morphology analysis that is deeper and more compatible with Arabic morphosyntactic rules than form-based analysis (Alkuhlani and Habash, 2011).
2. Overview¶
The CAMEL POS tags and features are the union of those in MSA and the dialects. The features are available to use when needed, for example, case and state features are used more often in MSA; but on the other hand, dialects tend to have many more clitics than MSA, including non-MSA ones.
One main property of the CAMEL POS tagset that sets it apart from ARZATB is that the morphological features of both gender and number of nominals are annotated functionally (i.e. semantically) (Alkuhlani and Habash, 2011; Smrž, 2007). This decision allows us to assign the features to the baseword without the need to specify the surface form affixes that mark the surface form gender and number. This is not the case in ARZATB, where broken (irregular) plural nouns are tagged as singular because they do not use the sound plural affixes.
Another property is that in the CAMEL POS tagset we omit case and state features for nominals, and voice and mood for verbs when annotating dialectal Arabic as the dialects have lost them completely, except for some high frequency fossilized MSA forms, such as طبعاً /t. a b 3 a n/1 ‘of course’ which retains an indefinite ending.
The main part of the word, that is the baseword, is tagged in the following
format: POS.features
, where POS
is the core POS tag and features
is the
possible feature combination that goes with the POS tag, a .
separates the POS
from the feature combination. Proclitics, however, get only a POS
tag since
they have no features. On the other hand, pronominal enclitics get a similar tag format
as the baseword (i.e.PRON.features
).
3. Schema details¶
3.1 Core POS¶
The following tables show the list of the POS
tagset used in this schema compared
with the ones used ARZATB. The tagset is divided into three categories according
to the D3 tokenization scheme (Habash et al, 2010): proclitics (14 tags), enclitics
(2 tags) and baseword (39 tags). Together with the features, CAMEL POS tagset
maps to ARZATB and retains backward compatibility. It also offers an intuitive
Arabic scheme that is suitable to use for annotation.
Proclitics tags
CAMEL POS Arabic | CAMEL POS | ATB POS |
---|---|---|
أداة_تعريف | PART_DET | DET |
حرف_عطف | CONJ | CONJ |
حرف_جر | PREP | PREP |
أداة_نفي | PART_NEG | NEG_PART |
أداة_استقبال | PART_FUT | FUT_PART |
أداة_مضارعة | PART_PROG | PROG_PART |
أداة_ربط | CONJ_SUB | SUB_CONJ |
ضمير_إشارة | PRON_DEM | DEM_PRON |
ضمير_استفهام | PRON_INTERROG | INTERROG_PRON |
أداة | PART | PART |
حرف_ربط | PART_CONNECT | CONNEC_PART |
أداة_توكيد | PART_EMPHATIC | EMPHATIC_PART |
جواب_شرط | PART_RC | RC_PART |
أداة_نداء | PART_VOC | VOC_PART |
Enclitics tags
CAMEL POS Arabic | CAMEL POS | ATB POS |
---|---|---|
أداة_نفي | PART_NEG | NEG_PART |
ضمير | PRON | *SUFF_DO:[PGN] |
ضمير | PRON | POSS_PRON_[\PGN] |
ضمير | PRON | PRON_[PGN] |
Baseword tags
CAMEL POS Arabic | CAMEL POS | ATB POS |
---|---|---|
اسم | NOUN | NOUN |
اسم_عدد | NOUN_NUM | NOUN_NUM |
اسم_علم | NOUN_PROP | NOUN_PROP |
اسم_كم | NOUN_QUANT | NOUN_QUANT |
صفة | ADJ | ADJ |
صفة_عدد | ADJ_NUM | ADJ_NUM |
صفة_مقارنة | ADJ_COMP | ADJ_COMP |
ظرف | ADV | ADV |
ظرف_استفهام | ADV_INTERROG | INTERROG_ADV |
ظرف_موصول | ADV_REL | REL_ADV |
فعل | VERB | IV/PV/CV |
شبه_فعل | VERB_PSEUDO | PSEUDO_VERB |
اسم_فعل | VERB_NOM | VERB |
ضمير | PRON | PRON_[PGN] |
ضمير_إشارة | PRON_DEM | DEM_PRON_[GN] |
ضمير_استفهام | PRON_INTERROG | INTERROG_PRON |
ضمير_تعجب | PRON_EXCLAM | EXCLAM_PRON |
ضمير_موصول | PRON_REL | REL_PRON |
أداة | PART | PART |
أداة_تعريف | PART_DET | DET |
أداة_نفي | PART_NEG | NEG_PART |
أداة_استقبال | PART_FUT | FUT_PART |
أداة_مضارعة | PART_PROG | PROG_PART |
أداة_فعل | PART_VERB | VERB_PART |
أداة_نداء | PART_VOC | VOC_PART |
أداة_استفهام | PART_INTERROG | INTERROG_PART |
أداة_استثناء | PART_RESTRICT | RESTRIC_PART |
أداة_تفصيل | PART_FOCUS | FOCUS_PART |
أداة_توكيد | PART_EMPHATIC | EMPHATIC_PART |
جواب_شرط | PART_RC | RC_PART |
أداة_ربط | CONJ_SUB | SUB_CONJ |
حرف_جر | PREP | PREP |
حرف_عطف | CONJ | CONJ |
أداة_ربط | PART_CONNECT | CONNEC_PART |
رقم | DIGIT | NOUN_NUM |
اختصار | ABBREV | ABBREV |
تعجب | INTERJ | INTERJ |
أجنبي | FORIEGN | FOREIGN |
علامة_ترقيم | PUNC | PUNC |
3.2 Features¶
CAMEL POS provides a full array of features:
- Aspect with the values Perfective, Imperfective and Command.
- Person with the values 1st, 2nd, 3rd.
- Gender with values Masculine and Feminine.
- Number with values Simgular, Dual and Plural.
- State with values Definite, Indefinite and Construct.
- Case with values Nominative, Genitive and Accusative.
- Voice with values Active and Passive.
- Mood with values Subjunctive, Indicative and Jussive.
Not all the features mentioned are necessarily relevant to the dialects. In the
full POS tag, the specified values of the different features
will appear in the following order:
<POS>.<A><P><G><N>.<S><C><V><M>
For a subset of POS
tags in the baseword category, each tag has a limited
number of possible feature combinations that is paired with it. Below is the
list of the POS
tags that take features and their possible ordered combination.
Note that the below combinations are for the dialects, in the case of MSA, nominals take Case and State, and verbs take Voice and Mood in addition to the listed feature combinations.
-
NOUN, NOUN_*, ADJ, ADJ_* All nominals take the combination of Gender, Number. For example جالس /y aa l i s/ ‘sitting’ is tagged
ADJ.MS
; In the occasional uses of State, such as طبعاً /t. a b 3 a n/ ‘of course’ the tag would beNOUN.MS.I
. -
VERB All verbs take the combination of Aspect, Person, Gender and Number. For example يقطع /y i g t. a 3/ ‘cut’ is tagged as
VERB.I3MS
-
PRON All pronouns take the combination of Person, Gender and Number. For example انتي /2 i n t y/ ‘you [fs]’ is tagged as
PRON.2FS
-
PRON_DEM All demonstrative pronouns take the combination of Gender and Number. For example هاذا /h aa dh a/ ‘this’ is tagged as
PRON_DEM.3MS
In cases where a feature is not present, such as gender in verbs of first person
inflections, the gender feature is simply dropped and does not require a
placeholder since the possible feature values are ordered and unique. For example
the imperfective 1st person verb أقول /2 a g uu l/ ‘I say’ will be
tagged as VERB.I1S
4. Detailed annotation guidelines¶
Morphological annotation using CAMEL POS assumes that each word is
tokenized using the D3 tokenization scheme. Therefore, each token gets at least a POS
tag according to the table above.
In a large-scale full morphological annotation task, additional annotations are usually provided, such as lemmatization, English gloss, amd dialect identification.
In this section we provide detailed guidelines in the context of a comperhensive annotation task.
4.1 Tokenization¶
The tokenization scheme recommended when annotating using CAMEL POS is D3.
- D3 Tokenization
- D3 tokenizes all clitics: question particle, conjunctions, particles, prepositions, articles, and pronominal enclitics.
Notes:
- In the tokenization task, all tokens must be orthographically normalized, that is undoing all of the morphophonemic and orthographic rewrite rules. For example, the word مكتبتها should be tokenized as مكتبة +ها NOT مكتبت +ها
- Remember that clitics are optional to word formation and they include particles and pronouns.
Tokenization Phenomenon | Word Form | Tokenization | English Gloss | Dialect |
---|---|---|---|---|
Definite Article | للمكتب | ل+ ال+ مكتب | for the office | MSA,GLF,EGY |
Ta Marbuta | مكتبتنا | مكتبة +نا | our library | MSA,GLF,EGY |
Ta Marbuta | كاتباها | كاتبة +ها | she wrote it (deverbal) | GLF,EGY |
Alif Maqsura | حكاها | حكى +ها | he recounted it | MSA,GLF,EGY |
Hamza Form | بهاؤه،بهائه،بهاءه | بهاء +ه | his glory | MSA,GLF,EGY |
Waw-of-Plurality | قالوها | قالوا +ها | They said it | MSA,GLF,EGY |
Various clitics | وستجننني | و+ س+ تجنن +ني | and she will drive me crazy | MSA |
Various clitics | وهتجننني | و+ ه+ تجنن +ني | and she will drive me crazy | EGY |
Various clitics | وبتجننني | و+ ب+ تجنن +ني | and she will drive me crazy | GLF |
4.1.1 Clitics¶
Clitics are syntactically independent morphems that are orthographically attached to the baseword. They can be in a number of parts of speech.
Notes:
- Clitics may interact with the spelling of the baseword. See the notes above on Tokenization and the CODA general rules.
- Although writers -in dialectal Arabic mostly- tend to attach what is considered as an indirect object clitic with the baseword (verbs, adjectives that are active participles), in the CODA convention they should be separate. For example اجيبلك should be اجيب لك, and جايبلها should be جايب لها. For the list of clitics, please refer to the CODA seed lexicon
4.2 CAMEL POS tagset¶
4.2.1 Features¶
Features refer to specific morphosyntactic aspects of the word that are abstracted away in the lemma form. For example, the word أميرات 'princesses' has the lemma أمير 'prince' with the features gender: feminine and number: plural.
Notes
- Features may not necessarily match the form of the word: e.g. حامل 'pregnant' is gender: feminine even though it has no 'Ta marbuta'2 ending; and خليفة 'Khalifa (name); caliph' is gender: masculine and number: singular even though it ends with 'Ta marbuta'.
- Some words have plurality to their meaning, but morphosyntactically are singular (collectable plurals). For example, شجر 'trees' is singular because we say شجر طويل 'tall trees' similar to رجل طويل 'a tall man'.
- The assignment of the features is in context (sentence and document) and depends on the morphosyntactic agreement at all times.
- For specific examples and cases, refer to the notes section of the different parts-of-speech.
- Features include gender, number, person, and aspect. Each feature has a number of possible values, see the full description of the features.
- Features are represented in combinations in our system. The examples in the section are some feature-value pairs.
Features | الخصائص | POS | قسم الكلام | Description الوصف |
---|---|---|---|---|
.P3MS | ماضي.هو | فعل | VERB | Aspect:(P);Person:(3);Gender:(M);Number:(S) الزمن:ماضي؛الضمير:مفرد مذكر غائب |
.P1P | ماضي.أنا | فعل | VERB | Aspect:(P);Person:(1);Gender:unspecified;Number:(S) الزمن:ماضي؛الضمير:جمع متكلم |
.MS | هو | اسم | NOUN | Gender:(M);Number:(S) الجنس:مذكر؛العدد:مفرد |
4.2.2 NOUN - اسم¶
- Common Nouns
- Common nouns refer to entities and concepts that have a more general reference than proper nouns. Common nouns inflect for prefixes and suffixes of person, gender, number.
Notes:
- Some nouns, such as prepositional nouns (عند، بين، أمام ... etc) don't necessarily have clear features. To assign features for those cases, use a syntactic test for a nonsensical semantic context3. For example, the word أمام can be in a nonsensical construction where you might say الأمام الأول والأمام الثاني للبيت. According to the morphosyntactic agreement, this makes the features for أمام to be gender: M and number:S
- For gender-ambiguous cases, such as طريق, where طريق could be both masculine and feminine depending on the usage (طريق طويل and طريق طويلة). To assign the gender feature, resolve using the context if such is impossible assign it the form-based gender.
- Common nouns include derived such as دباديب and non-derived nouns such as ام.
- Titles are also annotated as common nouns.
- Common nouns also include a set of borrowed nouns.
- In the context of dialectal text annotation, only nouns that appear to have a case ending such as غصبٍ will have state and case feature annotated. The 'case' feature in this situation is not the real case but rather a remnant from the MSA.
Examples
Below are examples of inflection tables based on basic paradigms, which means that the inflections are for the complete set of possible morphological features except for clitic features.
Lemmas with full paradigms
MSA: مَلِك 'King'
Word Form | Tag | العلامة |
---|---|---|
مَلِك | NOUN.MS.IU | اسم.نكرة.هو |
مَلِكٌ | NOUN.MS.IN | اسم.نكرة.هو.مرفوع |
مَلِكٍ | NOUN.MS.IG | اسم.نكرة.هو.مجرور |
مَلِكاً | NOUN.MS.IA | اسم.نكرة.هو.منصوب |
مَلِكُ | NOUN.MS.DN | اسم.معرفة.هو.مرفوع |
مَلِكُ | NOUN.MS.CN | اسم.مضاف.هو.مرفوع |
مَلِكِ | NOUN.MS.CG | اسم.مضاف.هو.مجرور |
مَلِكَ | NOUN.MS.CA | اسم.مضاف.هو.منصوب |
مَلِكَة4 | NOUN.FS.IU | اسم.نكرة.هي |
مَلِكَةٌ | NOUN.FS.IN | اسم.نكرة.هي.مرفوع |
مَلِكَةٍ | NOUN.FS.IG | اسم.نكرة.هي.مجرور |
مَلِكَةً | NOUN.FS.IA | اسم.نكرة.هي.منصوب |
مَلِكَةُ | NOUN.FS.DN | اسم.معرفة.هي.مرفوع |
مَلِكَةُ | NOUN.FS.CN | اسم.مضاف.هي.مرفوع |
مَلِكَةِ | NOUN.FS.CG | اسم.مضاف.هي.مجرور |
مَلِكَةَ | NOUN.FS.CA | اسم.مضاف.هي.منصوب |
مَلِكانِ | NOUN.MD.IN | اسم.نكرة.هما♂.مرفوع |
مَلِكَيْنِ | NOUN.MD.IG | اسم.نكرة.هما♂.مجرور |
مَلِكَيْنِ | NOUN.MD.IA | اسم.نكرة.هما♂.منصوب |
مَلِكانِ | NOUN.MD.DN | اسم.معرفة.هما♂.مرفوع |
مَلِكا | NOUN.MD.CN | اسم.مضاف.هما♂.مرفوع |
مَلِكَيْ | NOUN.MD.CG | اسم.مضاف.هما♂.مجرور |
مَلِكَيْ | NOUN.MD.CA | اسم.مضاف.هما♂.منصوب |
مَلِكَتانِ | NOUN.FD.IN | اسم.نكرة.هما♀.مرفوع |
مَلِكَتَيْنِ | NOUN.FD.IG | اسم.نكرة.هما♀.مجرور |
مَلِكَتَيْنِ | NOUN.FD.IA | اسم.نكرة.هما♀.منصوب |
مَلِكَتانِ | NOUN.FD.DN | اسم.معرفة.هما♀.مرفوع |
مَلِكَتا | NOUN.FD.CN | اسم.مضاف.هما♀.مرفوع |
مَلِكَتَيْ | NOUN.FD.CG | اسم.مضاف.هما♀.مجرور |
مَلِكَتَيْ | NOUN.FD.CA | اسم.مضاف.هما♀.منصوب |
مُلُوك | NOUN.MP.IU | اسم.نكرة.هم♂ |
مُلُوكٌ | NOUN.MP.IN | اسم.نكرة.هم♂.مرفوع |
مُلُوكٍ | NOUN.MP.IG | اسم.نكرة.هم♂.مجرور |
مُلُوكاً | NOUN.MP.IA | اسم.نكرة.هم♂.منصوب |
مُلُوكُ | NOUN.MP.DN | اسم.معرفة.هم♂.مرفوع |
مُلُوكُ | NOUN.MP.CN | اسم.مضاف.هم♂.مرفوع |
مُلُوكِ | NOUN.MP.CG | اسم.مضاف.هم♂.مجرور |
مُلُوكَ | NOUN.MP.CA | اسم.مضاف.هم♂.منصوب |
أَمْلاك | NOUN.MP.IU | اسم.نكرة.هم♂ |
أَمْلاكٌ | NOUN.MP.IN | اسم.نكرة.هم♂.مرفوع |
أَمْلاكٍ | NOUN.MP.IG | اسم.نكرة.هم♂.مجرور |
أَمْلاكاً | NOUN.MP.IA | اسم.نكرة.هم♂.منصوب |
أَمْلاكُ | NOUN.MP.DN | اسم.معرفة.هم♂.مرفوع |
أَمْلاكُ | NOUN.MP.CN | اسم.مضاف.هم♂.مرفوع |
أَمْلاكِ | NOUN.MP.CG | اسم.مضاف.هم♂.مجرور |
أَمْلاكَ | NOUN.MP.CA | اسم.مضاف.هم♂.منصوب |
مَلِكات | NOUN.FP.IU | اسم.نكرة.هن |
مَلِكاتٌ | NOUN.FP.IN | اسم.نكرة.هن.مرفوع |
مَلِكاتٍ | NOUN.FP.IG | اسم.نكرة.هن.مجرور |
مَلِكاتٍ | NOUN.FP.IA | اسم.نكرة.هن.منصوب |
مَلِكاتُ | NOUN.FP.DN | اسم.معرفة.هن.مرفوع |
مَلِكاتُ | NOUN.FP.CN | اسم.مضاف.هن.مرفوع |
مَلِكاتِ | NOUN.FP.CG | اسم.مضاف.هن.مجرور |
مَلِكاتِ | NOUN.FP.CA | اسم.مضاف.هن.منصوب |
GLF: شَيخ 'Sheikh;chieftain'
Word Form | Tag | العلامة |
---|---|---|
شَيخ | NOUN.MS | اسم.هو |
شَيخَة | NOUN.FS | اسم.هي |
شَيخَين | NOUN.MD | اسم.هما♂ |
شَيختَين | NOUN.FD | اسم.هما♀ |
شْيُوخ | NOUN.MP | اسم.هم♂ |
شَيخَات | NOUN.FP | اسم.هن |
EGY: أُستَاذ 'professor;teacher'
Word Form | Tag | العلامة |
---|---|---|
أُستَاذ | NOUN.MS | اسم.هو |
أُستَاذَة | NOUN.FS | اسم.هي |
أُستَاذَين | NOUN.MD | اسم.هما♂ |
أُستَاذتَين | NOUN.FD | اسم.هما♀ |
أَساتذَة | NOUN.MP | اسم.هم♂ |
أُستَاذَات | NOUN.FP | اسم.هن |
Lemmas with partial paradigms
MSA: حُبّ 'affection;love'
Word Form | Tag | العلامة |
---|---|---|
حُبّ | NOUN.MS.IU | اسم.نكرة.هو |
حُبٌّ | NOUN.MS.IN | اسم.نكرة.هو.مرفوع |
حُبٍّ | NOUN.MS.IG | اسم.نكرة.هو.مجرور |
حُبّاً | NOUN.MS.IA | اسم.نكرة.هو.منصوب |
حُبُّ | NOUN.MS.DN | اسم.معرفة.هو.مرفوع |
حُبُّ | NOUN.MS.CN | اسم.مضاف.هو.مرفوع |
حُبِّ | NOUN.MS.CG | اسم.مضاف.هو.مجرور |
حُبَّ | NOUN.MS.CA | اسم.مضاف.هو.منصوب |
GLF: سَيَّارَة 'car'
Word Form | Tag | العلامة |
---|---|---|
سَيَّارَة | NOUN.FS | اسم.هي |
سَيَّارتَين | NOUN.FD | اسم.هما♀ |
سَيَّارَات | NOUN.FP | اسم.هن |
سْيَايِير | NOUN.FP | اسم.هن |
EGY: وِشّ 'face'
Word Form | Tag | العلامة |
---|---|---|
وِشّ | NOUN.MS | اسم.هو |
وِشَّين | NOUN.MD | اسم.هما♂ |
وُشُوش | NOUN.MP | اسم.هم♂ |
Examples of various words the are common in different Arabic varieties
Click here to view the examples
Word Form | Tag | العلامة | English Gloss | Dialect |
---|---|---|---|---|
حرمات | NOUN.FP | اسم.هن | women | GLF |
حريم | NOUN.FP | اسم.هن | women | GLF |
نسوان | NOUN.FP | اسم.هن | women | GLF,EGY |
خالوه | NOUN.FS | اسم.هي | aunt! (maternal) | GLF |
عموه | NOUN.FS | اسم.هي | aunt! (paternal) | GLF |
حجرة | NOUN.FS | اسم.هي | room | GLF |
ميز | NOUN.FS | اسم.هي | table | GLF |
بكرة | NOUN.FS | اسم.هي | tomorrow | GLF,EGY |
شيشة | NOUN.FS | اسم.هي | waterpipe | GLF,EGY |
حرمة | NOUN.FS | اسم.هي | woman | GLF |
مرة | NOUN.FS | اسم.هي | woman | GLF |
بزران | NOUN.MP | اسم.هم | child | GLF |
عيال | NOUN.MP | اسم.هم | child | GLF,EGY |
رجّال | NOUN.MP | اسم.هم | men | GLF |
رجاجيل | NOUN.MP | اسم.هم | men | GLF |
حق | NOUN.MS | اسم.هو | for the benefit of | GLF |
خلاص | NOUN.MS | اسم.هو | enough | GLF |
سكين | NOUN.MS | اسم.هو | knife | GLF |
سكين | NOUN.MS | اسم.هي | knife | GLF |
مكتوب | NOUN.MS | اسم.هو | letter | GLF,EGY |
مثل | NOUN.MS | اسم.هو | like | GLF |
حلق | NOUN.MS | اسم.هو | mouth | GLF |
خشم | NOUN.MS | اسم.هو | nose | GLF |
مال | NOUN.MS | اسم.هو | of | GLF |
حقّ | NOUN.MS | اسم.هو | of, belongs to | GLF |
برع | NOUN.MS | اسم.هو | outside, outside of | GLF |
حد | NOUN.MS | اسم.هو | somebody, someone | GLF,EGY |
باكر | NOUN.MS | اسم.هو | tomorrow | GLF |
امس | NOUN.MS | اسم.هو | yesterday | GLF |
كذي | NOUN.MS | اسم.هو | like this, as this | GLF |
4.2.3 NOUN_PROP - اسم_علم¶
- Proper Nouns
- Proper nouns are nouns that have a unique referential meaning in context that is mutually exclusive with other entities.
Notes:
- Proper nouns refer to names of people, geographical entities, months, and acronyms.
- Proper nouns with more than one part such as محمد علي should have both words annotated as proper nouns.
- Names such as عبد الله and علاء الدين should be split, both words annotated as proper nouns.
- Titles of newspapers, magazines, and news agencies, sports teams are annotated as proper nouns, as well as names of political parties, etc.
- Proper nouns might exhibit a different kind of ambiguity where the word as a
NOUN
has features that faill the morpho-syntactic agreement when used as a proper noun. For example the proper noun احلام as a female given name behaves as aFS
hence will be givenNOUN_PROP.FS
as a tag. The same applies to other proper nouns such as the newspaper name الاهرام, see examples below. - Proper nouns can be confused with common nouns. A case in point is the word جَنُوب إِفرِيقيا, the two parts of the word are considered as proper nouns when it refers to the country, South Africa.
- The lemma of a proper noun does not include Al but it includes the 'Ta Marbuta'. The proper noun القاهرة has the lemma قاهرة.
- Just linke nouns can inflect for subset of morphological features, usually the number in certain cases.
Examples
Below are examples of different proper nouns and their respective tags. Those examples are valid in most Arabic varieties. Note that in MSA, the tag will include the state and case feature depending on the context surrounding the word.
Word Form |
POS.Features | قسم الكلام.الخصائص | English Gloss |
Comments Examples |
---|---|---|---|---|
خليفة | NOUN_PROP.MS | اسم علم.هو | Khalifa | name of a person |
هند | NOUN_PROP.FS | اسم علم.هي | Hind | name of a person |
عبد الله | NOUN_PROP.MS | اسم علم.هو | Abdullah | each word gets the same POS tag |
امريكا | NOUN_PROP.FS | اسم علم.هي | America | geographical entity |
ناتو | NOUN_PROP.MS | اسم علم.هو | NATO | acronym |
امشير | NOUN_PROP.MS | اسم علم.هو | Meshir | month (coptic calendar) |
الاهرام | NOUN_PROP.FS | اسم علم.هي | Al Ahram | newspaper نشرت الاهرام التقرير النهائي |
الاهرام | NOUN_PROP.MS | اسم علم.هو | Al Ahram | newspaper تلقى الاهرام اتصالا هاتفيا |
الاخوان | NOUN_PROP.FS | اسم علم.هي | The Muslim Brotherhood | political party تلقت الإخوان تمويلاً |
4.2.4 NOUN_QUANT - اسم_كم¶
- Noun quantifiers
- Noun quantifiers express either quantity or approximation.
Examples
Below are examples of common noun quantifiers from different Arabic varieties. Note that they can inflect to other morphological features in the same as nouns.
MSA examples
The examples below are in their indifenite nominative form.
Word Form | POS.Features | قسم الكلام.الخصائص | English Gloss |
---|---|---|---|
عُشُرٌ | NOUN_QUANT.MS.IN | نكرة.هو.مرفوع | (one)_tenth |
تُسُعٌ | NOUN_QUANT.MS.IN | نكرة.هو.مرفوع | (one)_ninth |
ثُمُنٌ | NOUN_QUANT.MS.IN | نكرة.هو.مرفوع | (one)_eighth |
سُبُعٌ | NOUN_QUANT.MS.IN | نكرة.هو.مرفوع | (one)_seventh |
سُدُسٌ | NOUN_QUANT.MS.IN | نكرة.هو.مرفوع | (one)_sixth |
خُمُسٌ | NOUN_QUANT.MS.IN | نكرة.هو.مرفوع | (one)_fifth |
رُبُعٌ | NOUN_QUANT.MS.IN | نكرة.هو.مرفوع | quarter,(one)_fourth |
ثُلُثٌ | NOUN_QUANT.MS.IN | نكرة.هو.مرفوع | (one)_third |
نِصْفٌ | NOUN_QUANT.MS.IN | نكرة.هو.مرفوع | half |
شَطْرٌ | NOUN_QUANT.MS.IN | نكرة.هو.مرفوع | part,portion,division,section |
بِضْعَةٌ | NOUN_QUANT.FS.IN | نكرة.هي.مرفوع | some,a_few,several |
بِضْعٌ | NOUN_QUANT.MS.IN | نكرة.هو.مرفوع | some,a_few,several |
بَعْضٌ | NOUN_QUANT.MS.IN | نكرة.هو.مرفوع | some,a_few,little,part |
مُعْظَمٌ | NOUN_QUANT.MS.IN | نكرة.هو.مرفوع | most,majority |
أَغْلَبُ | NOUN_QUANT.MS.IN | نكرة.هو.مرفوع | most_(of),the_majority_(of),the_greater_portion_of_part_(of) |
غالِبِيَّةٌ | NOUN_QUANT.FS.IN | نكرة.هي.مرفوع | majority,greater_part_of_portion |
جُلٌّ | NOUN_QUANT.MS.IN | نكرة.هو.مرفوع | most_(of),the_majority_(of),bulk,major_part |
حَوالَي | NOUN_QUANT.MS.IU | نكرة.هو | about,approximately,around,roughly,nearly,almost |
زُهاءَ | NOUN_QUANT.MS.IU | نكرة.هو | about,approximately,around,roughly,nearly,almost |
قُرابَةٌ | NOUN_QUANT.FS.IN | نكرة.هي.مرفوع | approximately,almost,just,about,nearly,roughly |
قَيْدٌ | NOUN_QUANT.MS.IN | نكرة.هو.مرفوع | around |
قَيْسٌ | NOUN_QUANT.MS.IN | نكرة.هو.مرفوع | measure,gauge,quantify |
أَكْثَرُ | NOUN_QUANT.MS.IN | نكرة.هو.مرفوع | more,most,majority |
جَمِيعٌ | NOUN_QUANT.MS.IN | نكرة.هو.مرفوع | all,entire,every,whole |
كُلٌّ | NOUN_QUANT.MS.IN | نكرة.هو.مرفوع | all,entire,every,whole |
كامِلٌ | NOUN_QUANT.MS.IN | نكرة.هو.مرفوع | entire |
كافَّةٌ | NOUN_QUANT.FS.IN | نكرة.هي.مرفوع | all |
ضِعْفٌ | NOUN_QUANT.MS.IN | نكرة.هو.مرفوع | double;multiple |
كِلا | NOUN_QUANT.MD.IU | نكرة.هما♂ | both_of |
كِلْتا | NOUN_QUANT.FD.IU | نكرة.هما♀ | both_of |
GLF examples
The examples in their lemma form. These examples are extracted form The Annotated Gumar Corpus.
Word Form | POS.Features | قسم الكلام.الخصائص | English Gloss |
---|---|---|---|
شُوَيّ | NOUN_QUANT.MS | اسم_كم.هو | few;a_little_bit |
اَيّ | NOUN_QUANT.MS | اسم_كم.هو | any |
حَبَّة | NOUN_QUANT.FS | اسم_كم.هي | some |
بَعض | NOUN_QUANT.MS | اسم_كم.هو | some;part |
كَم | NOUN_QUANT.MS | اسم_كم.هو | some;severam |
كِذا | NOUN_QUANT.MS | اسم_كم.هو | many |
كِلّ | NOUN_QUANT.MS | اسم_كم.هو | all;whole;entire;every |
نِصف | NOUN_QUANT.MS | اسم_كم.هو | half |
نُصّ | NOUN_QUANT.MS | اسم_كم.هو | half |
رُبع | NOUN_QUANT.MS | اسم_كم.هو | quarter |
Examples for other dialecs will be added soon
4.2.5 NOUN_NUM - اسم_عدد¶
- Cardinal numbers
- Cardinal numbers quantify rather than rank. They answer the question “How many?”
Notes:
- Compound numerals (from 21 to infinite) that are coordinated with the conjunction wa- follow the POS tag of the first element. If the first element is a cardinal number, the second element should be annotated as such.
- Cardinal numbers can occur in pre-nominal and post-nominal positions without any agreement with the noun they occur with. They are invariable.
- MSA has different agreement rules from the dialects.
Examples
Below are the full list of the basic numbers in units, tens, and hundreds.
GLF examples
Word Form | POS.Features | قسم_الكلام.الخصائص | English Gloss |
---|---|---|---|
صفر | NOUN_NUM.MS | اسم_عدد.هو | 0, zero |
واحد | NOUN_NUM.MS | اسم_عدد.هو | 1, one |
عشر | NOUN_NUM.MP | اسم_عدد.هم | 10, ten |
عشرة | NOUN_NUM.FP | اسم_عدد.هن | 10, ten |
امية | NOUN_NUM.FP | اسم_عدد.هن | 100, one hundred |
مية | NOUN_NUM.FP | اسم_عدد.هن | 100, one hundred |
الف | NOUN_NUM.MP | اسم_عدد.هم | 1000, one thousand |
آلاف | NOUN_NUM.MP | اسم_عدد.هم | 1000, thousands |
احدعش | NOUN_NUM.MP | اسم_عدد.هم | 11, eleven |
اثنعش | NOUN_NUM.MP | اسم_عدد.هم | 12, twelve |
ثلتعش | NOUN_NUM.MP | اسم_عدد.هم | 13, thirteen |
اربعتعش | NOUN_NUM.MP | اسم_عدد.هم | 14, fourteen |
خمستعش | NOUN_NUM.MP | اسم_عدد.هم | 15, fifteen |
ستعش | NOUN_NUM.MP | اسم_عدد.هم | 16, sixteen |
سبعتعش | NOUN_NUM.MP | اسم_عدد.هم | 17, seventeen |
ثمنتعش | NOUN_NUM.MP | اسم_عدد.هم | 18, eighteen |
تسعتعش | NOUN_NUM.MP | اسم_عدد.هم | 19, nineteen |
اثنين | NOUN_NUM.MD | اسم_عدد.هما♂ | 2, two |
عشرين | NOUN_NUM.MP | اسم_عدد.هم | 20, twenty |
ميتين | NOUN_NUM.FP | اسم_عدد.هن | 200, two hundred |
ثلاث | NOUN_NUM.MP | اسم_عدد.هم | 3, three |
ثلاثة | NOUN_NUM.FP | اسم_عدد.هن | 3, three |
ثلاثين | NOUN_NUM.MP | اسم_عدد.هم | 30, thirty |
ثلاثمية | NOUN_NUM.FP | اسم_عدد.هن | 300, three hundred |
اربع | NOUN_NUM.MP | اسم_عدد.هم | 4, four |
اربعة | NOUN_NUM.FP | اسم_عدد.هن | 4, four |
اربعين | NOUN_NUM.MP | اسم_عدد.هم | 40, fourty |
اربعمية | NOUN_NUM.FP | اسم_عدد.هن | 400, four hundred |
خمس | NOUN_NUM.MP | اسم_عدد.هم | 5, five |
خمسة | NOUN_NUM.FP | اسم_عدد.هن | 5, five |
خمسين | NOUN_NUM.MP | اسم_عدد.هم | 50, fifty |
خمسمية | NOUN_NUM.FP | اسم_عدد.هن | 500, five hundred |
ست | NOUN_NUM.MP | اسم_عدد.هم | 6, six |
ستة | NOUN_NUM.FP | اسم_عدد.هن | 6, six |
ستين | NOUN_NUM.MP | اسم_عدد.هم | 60, sixty |
ستمية | NOUN_NUM.FP | اسم_عدد.هن | 600, six hundred |
سبع | NOUN_NUM.MP | اسم_عدد.هم | 7, seven |
سبعة | NOUN_NUM.FP | اسم_عدد.هن | 7, seven |
سبعين | NOUN_NUM.MP | اسم_عدد.هم | 70, seventy |
سبعمية | NOUN_NUM.FP | اسم_عدد.هن | 700, seven hundred |
ثمان | NOUN_NUM.MP | اسم_عدد.هم | 8, eight |
ثمانية | NOUN_NUM.FP | اسم_عدد.هن | 8, eight |
ثمانين | NOUN_NUM.MP | اسم_عدد.هم | 80, eighty |
ثمانمية | NOUN_NUM.FP | اسم_عدد.هن | 800, eight hundred |
تسع | NOUN_NUM.MP | اسم_عدد.هم | 9, nine |
تسعة | NOUN_NUM.FP | اسم_عدد.هن | 9, nine |
تسعين | NOUN_NUM.MP | اسم_عدد.هم | 90, ninety |
تسعمية | NOUN_NUM.FP | اسم_عدد.هن | 900, nine hundred |
Examples for MSA and other dialecs will be added soon
4.2.6 PRON - ضمير¶
- Bound pronouns - الضمائر المتصلة
- Bound pronouns are morphemes that cannot occur independently of another morpheme. They are related to other words called their hosts.
Notes:
- These pronouns bind to verbs to mark a direct object, to nouns to mark possession, and to prepositions.
Examples
Below is a list of the bound pronouns in different Arabic varieties. Note that there are difference in pronounciatation between the different varieties, however, orthographically they are spelled the same way without diacritics.
Word From | POS.Features | قسم الكلام.الخصائص | English Gloss | Diaclect |
---|---|---|---|---|
+نا | PRON.1P | ضمير.نحن | our, us | MSA,GLF,EGY |
+ني | PRON.1S | ضمير.انا | me | MSA,GLF,EGY |
+ي | PRON.1S | ضمير.انا | me, my | MSA,GLF,EGY |
+كن | PRON.2FP | ضمير.انتن | you, your | MSA,GLF |
+ج | PRON.2FS | ضمير.انت♀ | you, your | GLF |
+ك | PRON.2FS | ضمير.انت♀ | you, your | MSA, EGY |
+كم | PRON.2MP | ضمير.انتم | you, your | MSA,GLF |
+كم | PRON.2P | ضمير.انتم⚥ | you, your | GLF,EGY |
+ك | PRON.2MS | ضمير.انت♂ | you, your | MSA,GLF,EGY |
+هن | PRON.3FP | ضمير.هن | their, them | MSA,GLF |
+ها | PRON.3FS | ضمير.هي | her, it, its | MSA,GLF,EGY |
+هم | PRON.3MP | ضمير.هم♂ | their, them | MSA,GLF |
+هم | PRON.3P | ضمير.هم⚥ | their, them | GLF,EGY |
+ه | PRON.3MS | ضمير.هو | him, his, it, its | MSA,GLF,EGY |
- Unbound pronouns - الضمائر المنفصلة
- Unbound pronouns are free morphemes that occur as separate words.
Examples
Below is a list of the bound pronouns in different Arabic varieties. Note that there are difference in pronounciatation between the different varieties, however, orthographically they are spelled the same way without diacritics.
Word Form | POS.Features | قسم الكلام.الخصائص | English Gloss | Diaclect |
---|---|---|---|---|
احن | PRON.1P | ضمير.نحن | we | GLF |
احنا | PRON.1P | ضمير.نحن | we | GLF, GLF |
حنّا | PRON.1P | ضمير.نحن | we | GLF |
نحن | PRON.1P | ضمير.نحن | we | MSA,GLF |
انا | PRON.1S | ضمير.انا | I | MSA,GLF,EGY |
انتي | PRON.2FS | ضمير.انت♀ | you | EGY, GLF |
انت | PRON.2MS | ضمير.انت♂ | you | MSA,GLF,EGY |
انتو | PRON.2P | ضمير.انتم⚥ | you | GLF,EGY |
هي | PRON.3FS | ضمير.هي | she is | MSA,GLF,EGY |
هو | PRON.3MS | ضمير.هو | he, it | MSA,GLF,EGY |
هم | PRON.3P | ضمير.هم⚥ | they | GLF,EGY |
4.2.7 PRON_DEM - ضمير_إشارة¶
- Demonstrative pronouns
- Demonstrative pronouns are pronouns used for proximal or distal references.
Notes:
- Demonstrative Pronouns can be basewords and/or proclitics, some baseword pronouns take no features, see examples below.
- The relationship between proximity and distance does not seem to exist in Egyptian Arabic.
Examples
Below is a list of the demonstrative pronouns in different Arabic varieties. Note that there are difference in pronounciatation between the different varieties, however, orthographically they are spelled the same way without diacritics.
Word Form | POS.Features | قسم الكلام.الخصائص | English Gloss | Dialect Examples |
---|---|---|---|---|
ا+ | PRON_DEM | ضمير_إشارة | that;this | EGY أهو |
ه+ | PRON_DEM | ضمير_إشارة | that;this | GLF هالبيت |
ذا | PRON_DEM.MS | ضمير_إشارة.هو | that;this | MSA,GLF |
ذاك | PRON_DEM.MS | ضمير_إشارة.هو | that;this | GLF |
ذيك | PRON_DEM.FS | ضمير_إشارة.هي | that;this | GLF |
ها | PRON_DEM.MS | ضمير_إشارة.هو | that;this | MSA,GLF |
هاذي | PRON_DEM.FS | ضمير_إشارة.هي | that;this | GLF |
هاذن | PRON_DEM.FP | ضمير_إشارة.هن | these;those | GLF |
هاك | PRON_DEM.MS | ضمير_إشارة.هو | that;this | GLF |
هاكي | PRON_DEM.FS | ضمير_إشارة.هي | that;this | GLF |
هاي | PRON_DEM.FS | ضمير_إشارة.هي | that;this | GLF |
هاييل | PRON_DEM.UP | ضمير_إشارة.هم⚥ | these;those | GLF |
هذا | PRON_DEM.MS | ضمير_إشارة.هو | that;this | MSA,GLF |
هذاك | PRON_DEM.MS | ضمير_إشارة.هو | that;this | GLF |
هذايك | PRON_DEM.FS | ضمير_إشارة.هي | that;this | GLF |
هذوه | PRON_DEM.MS | ضمير_إشارة.هو | that;this | GLF |
ده | PRON_DEM.MS | ضمير_إشارة.هو | that;this | ُEGY |
دي | PRON_DEM.FS | ضمير_إشارة.هي | that;this | EGY |
دول | PRON_DEM.UP | ضمير_إشارة.هم⚥ | that;this | EGY |
4.2.8 PRON_INTERROG - ضمير_استفهام¶
- Interrogative Pronouns
- Interrogative Pronouns are independent words that used to form direct questions.
Notes:
- Interrogative pronouns don't take features.
Examples
Below is a list of the interrogative pronouns in different Arabic varieties. Note that there are difference in pronounciatation between the different varieties, however, orthographically they are spelled the same way without diacritics.
Word Form | POS.Features | قسم الكلام.الخصائص | English Gloss | Dialect |
---|---|---|---|---|
شو | PRON_INTERROG | ضمير_استفهام | what | GLF, LEV |
ايش | PRON_INTERROG | ضمير_استفهام | what | GLF |
ايه | PRON_INTERROG | ضمير_استفهام | what | EGY |
اي | PRON_INTERROG | ضمير_استفهام | which;who | MSA,GLF,EGY |
كم | PRON_INTERROG | ضمير_استفهام | how_much;how_many | MSA,GLF,EGY |
كمن | PRON_INTERROG | ضمير_استفهام | how_much;how_many | GLF |
كيف | PRON_INTERROG | ضمير_استفهام | how | MSA,GLF,EGY |
ما | PRON_INTERROG | ضمير_استفهام | which;what | MSA |
ماذا | PRON_INTERROG | ضمير_استفهام | which;what | MSA |
من | PRON_INTERROG | ضمير_استفهام | whom;who | MSA,GLF |
منو | PRON_INTERROG | ضمير_استفهام | whom;who | GLF |
مين | PRON_INTERROG | ضمير_استفهام | whom;who | GLF,EGY |
4.2.9 PRON_REL - ضمير_موصول¶
- Relative pronouns
- Relative pronouns introduce relative clauses.
Notes:
- Most dialectal relative pronouns don't take any features.
- MSA relative pronouns can take gender and number features.
Examples
Below is a list of the interrogative pronouns in different Arabic varieties. Note that there are difference in pronounciatation between the different varieties, however, orthographically they are spelled the same way without diacritics.
Word Form | POS.Features | قسم الكلام.الخصائص | English Gloss | Dialect |
---|---|---|---|---|
الذي | PRON_REL.MS | ضمير_موصول.هو | which;that;whom;who | MSA |
اللتي | PRON_REL.FS | ضمير_موصول.هي | which;that;whom;who | MSA |
الذين | PRON_REL.MP | ضمير_موصول.هم | which;that;whom;who | MSA |
اللواتي | PRON_REL.FP | ضمير_موصول.هن | which;that;whom;who | MSA |
اللي | PRON_REL | ضمير_موصول | which;that;who | GLF,EGY, LEV |
ما | PRON_REL | ضمير_موصول | which;that;what | MSA,GLF,EGY, LEV |
من | PRON_REL | ضمير_موصول | whom;who | MSA,GLF |
4.2.10 PRON_EXCLAM - ضمير_تعجب¶
- Exclamative Pronouns
- Exclamative Pronouns introduces exclamative structures.
Notes:
- Exclamative Pronouns don't take any features.
Examples
Below is a list of the exclamative pronouns in different Arabic varieties. Note that there are difference in pronounciatation between the different varieties, however, orthographically they are spelled the same way without diacritics.
Word Form | POS.Features | قسم الكلام.الخصائص | English Gloss | Dialect |
---|---|---|---|---|
ما | PRON_EXCLAM | ضمير_تعجب | how;what | MSA,GLF,EGY |
وش | PRON_EXCLAM | ضمير_تعجب | how;what | GLF |
ايه | PRON_EXCLAM | ضمير_تعجب | how;what | EGY |
4.2.11 ADJ - صفة¶
- Adjectives
- Adjectives are nominals that describe or clarify a noun
Notes:
- Adjectives must inflect for gender and number according to the agreement rule with nouns.
- The agreement rule states that adjectives must agree with the noun that they modify in gender and number. EXCEPT for plural irrational (غير عاقل) nouns, which always take feminine singular adjectives. For example of agreement: the word مهم inflects in agreement with the noun in the case of امرأة مهمة and نساء مهمات. It also agrees in the case of كتاب مهم but not كتب مهمة. Because the word كتاب is an irrational noun, hence the plural كتب takes a singular feminine adjective.
- In the case where there is no noun in the sentence, the word will be tagged as an adjective if a specific noun can be recovered directly from the context. Otherwise, the word will be tagged as a noun. For example, the word عرب 'Arabs' could be a noun or an adjective. عرب as in جاء الرجال العرب 'The Arab men came' is tagged as an adjective, whereas in معجم لسان العرب 'Lisan Al Arab dictionary' عرب is tagged as a noun.
- Although adjective must inflict in agreement with the noun, the features of the adjectives are annotated independently. In the example كتب مهمة the adjective مهمة is annotated with features feminine singular and NOT masculine plural.
Examples
Below are examples of inflection tables based on basic paradigms, which means that the inflections are for the complete set of possible morphological features except for clitic features. Adjectives, just like nouns, can inflect to full or partial paradigms.
MSA: أَمِين 'faithful;loyal'
Word Form | POS.Features | قسم الكلام.الخصائص |
---|---|---|
أَمِين | ADJ.MS.IU | صفة.نكرة.هو |
أَمِينٌ | ADJ.MS.IN | صفة.نكرة.هو.مرفوع |
أَمِينٍ | ADJ.MS.IG | صفة.نكرة.هو.مجرور |
أَمِيناً | ADJ.MS.IA | صفة.نكرة.هو.منصوب |
أَمِينُ | ADJ.MS.DN | صفة.معرفة.هو.مرفوع |
أَمِينُ | ADJ.MS.CN | صفة.مضاف.هو.مرفوع |
أَمِينِ | ADJ.MS.CG | صفة.مضاف.هو.مجرور |
أَمِينَ | ADJ.MS.CA | صفة.مضاف.هو.منصوب |
أَمِينَة | ADJ.FS.IU | صفة.نكرة.هي |
أَمِينَةٌ | ADJ.FS.IN | صفة.نكرة.هي.مرفوع |
أَمِينَةٍ | ADJ.FS.IG | صفة.نكرة.هي.مجرور |
أَمِينَةً | ADJ.FS.IA | صفة.نكرة.هي.منصوب |
أَمِينَةُ | ADJ.FS.DN | صفة.معرفة.هي.مرفوع |
أَمِينَةُ | ADJ.FS.CN | صفة.مضاف.هي.مرفوع |
أَمِينَةِ | ADJ.FS.CG | صفة.مضاف.هي.مجرور |
أَمِينَةَ | ADJ.FS.CA | صفة.مضاف.هي.منصوب |
أَمِينانِ | ADJ.MD.IN | صفة.نكرة.هما♂.مرفوع |
أَمِينَيْنِ | ADJ.MD.IG | صفة.نكرة.هما♂.مجرور |
أَمِينَيْنِ | ADJ.MD.IA | صفة.نكرة.هما♂.منصوب |
أَمِينانِ | ADJ.MD.DN | صفة.معرفة.هما♂.مرفوع |
أَمِينا | ADJ.MD.CN | صفة.مضاف.هما♂.مرفوع |
أَمِينَيْ | ADJ.MD.CG | صفة.مضاف.هما♂.مجرور |
أَمِينَيْ | ADJ.MD.CA | صفة.مضاف.هما♂.منصوب |
أَمِينَتانِ | ADJ.FD.IN | صفة.نكرة.هما♀.مرفوع |
أَمِينَتَيْنِ | ADJ.FD.IG | صفة.نكرة.هما♀.مجرور |
أَمِينَتَيْنِ | ADJ.FD.IA | صفة.نكرة.هما♀.منصوب |
أَمِينَتانِ | ADJ.FD.DN | صفة.معرفة.هما♀.مرفوع |
أَمِينَتا | ADJ.FD.CN | صفة.مضاف.هما♀.مرفوع |
أَمِينَتَيْ | ADJ.FD.CG | صفة.مضاف.هما♀.مجرور |
أَمِينَتَيْ | ADJ.FD.CA | صفة.مضاف.هما♀.منصوب |
أُمَناء | ADJ.MP.IU | صفة.نكرة.هم♂ |
أُمَناءُ | ADJ.MP.IN | صفة.نكرة.هم♂.مرفوع |
أُمَناءَ | ADJ.MP.IG | صفة.نكرة.هم♂.مجرور |
أُمَناءَ | ADJ.MP.IA | صفة.نكرة.هم♂.منصوب |
أُمَناءُ | ADJ.MP.DN | صفة.معرفة.هم♂.مرفوع |
أُمَناءُ | ADJ.MP.CN | صفة.مضاف.هم♂.مرفوع |
أُمَناءِ | ADJ.MP.CG | صفة.مضاف.هم♂.مجرور |
أُمَناءَ | ADJ.MP.CA | صفة.مضاف.هم♂.منصوب |
أَمِينات | ADJ.FP.IU | صفة.نكرة.هن |
أَمِيناتٌ | ADJ.FP.IN | صفة.نكرة.هن.مرفوع |
أَمِيناتٍ | ADJ.FP.IG | صفة.نكرة.هن.مجرور |
أَمِيناتٍ | ADJ.FP.IA | صفة.نكرة.هن.منصوب |
أَمِيناتُ | ADJ.FP.DN | صفة.معرفة.هن.مرفوع |
أَمِيناتُ | ADJ.FP.CN | صفة.مضاف.هن.مرفوع |
أَمِيناتِ | ADJ.FP.CG | صفة.مضاف.هن.مجرور |
أَمِيناتِ | ADJ.FP.CA | صفة.مضاف.هن.منصوب |
GLF: مِتكَشِّخ 'looking elegant'
Word Form | POS.Features | قسم الكلام.الخصائص |
---|---|---|
متكشخ | ADJ.MS | صفة.هو |
متكشخة | ADJ.FS | صفة.هي |
متكشخين | ADJ.MP | صفة.هم |
متكشخات | ADJ.FP | صفة.هن |
EGY: جَدَع 'strong;macho'
Word Form | POS.Features | قسم الكلام.الخصائص |
---|---|---|
جدع | ADJ.MS | صفة.هو |
جدعة | ADJ.FS | صفة.هي |
جدعان | ADJ.UP | صفة.هم⚥ |
4.2.12 ADJ_NUM - صفة_عدد¶
- Ordinal numbers
- Ordinal numbers are used for ranking.
Examples
Word Form | POS.Features | قسم الكلام.الخصائص | English Gloss | Diaclect |
---|---|---|---|---|
الاوائل | ADJ_NUM.MP | صفة_عدد.هم | the first | GLF,EGY |
اول | ADJ_NUM.MS | صفة_عدد.هو | first | GLF,EGY |
الثانية | ADJ_NUM.FS | صفة_عدد.هي | second | GLF,EGY |
Examples for MSA and other dialecs will be added soon
4.2.13 ADJ_COMP - صفة_مقارنة¶
- Comparative Adjectives
- A comparative adjective is a form derived from verbs according to their inflectional category.
Notes:
- No morphological distinction is made between the comparative and the superlative meanings. The distinction is made based on the use of idafa (construct) with the superlative.
Examples
Word Form | POS.Features | قسم الكلام.الخصائص | English Gloss | Diaclect |
---|---|---|---|---|
ابرك | ADJ_COMP.MS | صفة_مقارنة.هو | better;best | GLF |
اقل | ADJ_COMP.MS | صفة_مقارنة.هو | less;least | GLF |
احسن | ADJ_COMP.MS | صفة_مقارنة.هو | better;best | GLF |
Examples for MSA and other dialecs will be added soon
4.2.14 VERB - فعل¶
- Verbs
- A verb is a word used to describe an action, state, or occurrence, and forming the main part of the predicate of a sentence.
Examples
Below are examples of inflection tables based on basic paradigms, which means that the inflections are for the complete set of possible morphological features except for clitic features.
GLF: سار 'go;walk'
Word Form | POS.FEATURES | قسم_الكلام.الخصائص. |
---|---|---|
سرت | VERB.P1S | فعل.ماضي.انا |
سرنا | VERB.P1P | فعل.ماضي.نحن |
سرت | VERB.P2MS | فعل.ماضي.انت♂ |
سرتي | VERB.P2FS | فعل.ماضي.انت♀ |
سرتوا | VERB.P2P | فعل.ماضي.انتم⚥ |
سرتوا | VERB.P2MP | فعل.ماضي.انتم♂ |
سرتن | VERB.P2FP | فعل.ماضي.انتن |
سار | VERB.P3MS | فعل.ماضي.هو |
سارت | VERB.P3FS | فعل.ماضي.هي |
ساروا | VERB.P3P | فعل.ماضي.هم⚥ |
ساروا | VERB.P3MP | فعل.ماضي.هم♂ |
سارن | VERB.P3FP | فعل.ماضي.هن |
اسير | VERB.I1S | فعل.مضارع.انا |
نسير | VERB.I1P | فعل.مضارع.نحن |
تسير | VERB.I2MS | فعل.مضارع.انت♂ |
تسيرين | VERB.I2FS | فعل.مضارع.انت♀ |
تسيرون | VERB.I2P | فعل.مضارع.انتم⚥ |
تسيرون | VERB.I2MP | فعل.مضارع.انتم♂ |
تسيرن | VERB.I2FP | فعل.مضارع.انتن |
يسير | VERB.I3MS | فعل.مضارع.هو |
تسير | VERB.I3FS | فعل.مضارع.هي |
يسيرون | VERB.I3P | فعل.مضارع.هم⚥ |
يسيرون | VERB.I3MP | فعل.مضارع.هم♂ |
يسيرن | VERB.I3FP | فعل.مضارع.هن |
سير | VERB.C2MS | فعل.أمر.أنت♂ |
سيري | VERB.C2FS | فعل.أمر.أنت♀ |
سيروا | VERB.C2P | فعل.أمر.انتم⚥ |
سيروا | VERB.C2MP | فعل.أمر.انتم♂ |
سيرن | VERB.C2FP | فعل.أمر.انتن |
EGY: مِضِي 'walk;proceed'
Word Form | POS.FEATURES | قسم_الكلام.الخصائص |
---|---|---|
مشيت | VERB.P1S | فعل.ماضي.انا |
مشينا | VERB.P1P | فعل.ماضي.نحن |
مشيت | VERB.P2MS | فعل.ماضي.انت♂ |
مشيتي | VERB.P2FS | فعل.ماضي.انت♀ |
مشيتوا | VERB.P2P | فعل.ماضي.انتم⚥ |
مشي | VERB.P3MS | فعل.ماضي.هو |
مشيت | VERB.P3FS | فعل.ماضي.هي |
مشيوا | VERB.P3P | فعل.ماضي.هم⚥ |
امشي | VERB.I1S | فعل.مضارع.انا |
نمشي | VERB.I1P | فعل.مضارع.نحن |
تمشي | VERB.I2MS | فعل.مضارع.انت♂ |
تمشي | VERB.I2FS | فعل.مضارع.انت♀ |
تمشوا | VERB.I2P | فعل.مضارع.انتم⚥ |
يمشي | VERB.I3MS | فعل.مضارع.هو |
تمشي | VERB.I3FS | فعل.مضارع.هي |
يمشوا | VERB.I3P | فعل.مضارع.هم⚥ |
امشي | VERB.C2MS | فعل.أمر.أنت♂ |
امشي | VERB.C2FS | فعل.أمر.أنت♀ |
امشوا | VERB.C2P | فعل.أمر.انتم⚥ |
Examples for MSA will be added soon
4.2.15 VERB_PSEUDO - شبه_فعل¶
- Pseudo Verbs
- Pseudo verbs are words that have the same syntactic behavior as verbs in that they take a subject and a predicate, or a sentential complement.
Notes:
- Pseduo verbs don't take any features
Examples
Word Form | POS.Features | قسم الكلام.الخصائص | English Gloss | Dialect Example |
---|---|---|---|---|
ريت | VERB_PSEUDO | شبه_فعل | if only;wish | GLF,EGY يا ريتْني ما جيت |
ترى | VERB_PSEUDO | شبه_فعل | by the way;for your information | GLF ترى الرحلة طويلة لإيطاليا |
تو | VERB_PSEUDO | شبه_فعل | just now;at the moment | GLF كانت توها داشة الفيلا |
Examples for MSA and other dialects will be added soon
4.2.16 VERB_NOM - اسم_فعل¶
- Non-Inflectional verbs, also called Frozen Verbs
- These are frozen expressions that behave like verbs syntactically but not morphologically. From a morphological point of view they are not inflectional, meaning that they do not inflect for all their tenses, sometimes none, and they do not have gender/number agreement. Syntactically, they subcategorize for arguments in the form of prepositional phrases and direct objects.
Examples
Word Form | POS.Features | قسم الكلام.الخصائص | English Gloss | Diaclect |
---|---|---|---|---|
(ما) عدا | VERB_NOM | اسم_فعل | except | GLF,EGY |
آمين | VERB_NOM | اسم_فعل | Amen | GLF,EGY |
اف | VERB_NOM | اسم_فعل | ugh | GLF,EGY |
اخص | VERB_NOM | اسم_فعل | Shame on you! | EGY |
آه | VERB_NOM | اسم_فعل | Ah! | GLF,EGY |
اوه | VERB_NOM | اسم_فعل | Ah! | GLF,EGY |
ما | VERB_NOM | اسم_فعل | not | GLF,EGY |
حاشا | VERB_NOM | اسم_فعل | except | GLF,EGY |
يالله | VERB_NOM | اسم_فعل | hurry up;come on | GLF,EGY |
ايا | VERB_NOM | اسم_فعل | watch out | GLF,EGY |
Examples for MSA will be added soon
4.2.17 ADV - ظرف¶
- Adverbs
- Adverbs are invariable and terminal words that give information about the time, location, manner, cause, purpose, or any other adverbial function modifying the verb or sentence.
Notes:
- A word is invariable as it does not participate in an idafa construction. A word is terminal when nothing modifies it.
- Some adverbs take pronominal clitics, in such cases, those pronouns are going to be cliticized normally. Example, يا دوب +ك. Also, note that adverbs with initial يا are considered to be two separate words.
- Adverbs don't take features.
Examples
Word Form | POS.Features | قسم_الكلام.الخصائص | English Gloss | Diaclect |
---|---|---|---|---|
لسة | ADV | ظرف | still | EGY |
يا دوب | ADV | ظرف | It’s high time; a little;yet;still;just | EGY |
دلوقت | ADV | ظرف | Now;at this moment;at this time;at present | EGY |
هنا | ADV | ظرف | here | GLF,EGY |
هناك | ADV | ظرف | there | GLF,EGY |
كمان | ADV | ظرف | also;too | GLF,EGY |
برضك | ADV | ظرف | also;too;nevertheless;even so;all the same. intensifier « really, surely » | EGY |
برضو | ADV | ظرف | also;too;nevertheless;even so;all the same. intensifier « really, surely » | EGY |
بعدين | ADV | ظرف | later;next | GLF,EGY |
امال | ADV | ظرف | hence;then;so | EGY |
بس | ADV | ظرف | only;enough | GLF,EGY |
بقى | ADV | ظرف | so;then | EGY |
بعد | ADV | ظرف | also;still | GLF |
هني | ADV | ظرف | here | GLF |
سيدا | ADV | ظرف | straight ahead | GLF |
Examples for MSA will be added soon
4.2.18 ADV_INTERROG - ظرف_استفهام¶
- Interrogative Adverbs
- Interrogative adverbs are invariable words that introduce questions that give specific information about time, location, manner, or purpose.
Notes:
- Interrogative Adverbs don't take any features
Examples
Word From | POS.Features | قسم الكلام.الخصائص | English Gloss | Dialect |
---|---|---|---|---|
شلون | ADV_INTERROG | ظرف_استفهام | how | GLF |
كيف | ADV_INTERROG | ظرف_استفهام | how | GLF |
وشي | ADV_INTERROG | ظرف_استفهام | how | GLF |
متى | ADV_INTERROG | ظرف_استفهام | when | GLF |
وين | ADV_INTERROG | ظرف_استفهام | where | GLF |
ليش | ADV_INTERROG | ظرف_استفهام | why | GLF |
Examples for MSA and other dialects will be added soon
4.2.19 ADV_REL - ظرف_موصول¶
- Relative Adverbs
- Relative adverbs are invariable words that introduce adverbial relative clauses that give specific information about time, location, manner, or purpose.
Notes:
- Relative Adverbs don't take any features
Examples
Word Form | POS.Features | قسم الكلام.الخصائص | English Gloss | Diaclect Example |
---|---|---|---|---|
وين | ADV_REL | ظرف_موصول | where | GLF لازم تخبرني وين سرت |
فين | ADV_REL | ظرف_موصول | where | EGY لازم تقول لي رحت فين |
أين | ADV_REL | ظرف_موصول | where | MSA أخبرني إلى أين ذهبت |
4.2.20 PREP - حرف_جر¶
- Prepositions
- The term preposition is used to represent the closed class of items which have traditionally been identified as prepositions in Arabic.
Notes:
- Prepositions don't take any features.
Examples
Word Form | POS.Features | قسم الكلام.الخصائص | English Gloss | Diaclect |
---|---|---|---|---|
في | PREP | حرف_جر | in, at | MSA,GLF,EGY |
مع | PREP | حرف_جر | with | MSA,GLF,EGY |
ويّا | PREP | حرف_جر | with | MSA,GLF,EGY |
ب+ | PREP | حرف_جر | with | MSA,GLF,EGY |
و+ | PREP | حرف_جر | by | MSA,GLF,EGY |
4.2.21 INTERJ - تعجب¶
- Interjections
- Interjections are words or phrases (response particles) that express the speaker’s reaction to a particular proposition or sentence.
Notes:
- Interjections don't take any features.
Examples
Word Form | POS.Features | قسم الكلام.الخصائص | English Gloss | Diaclect |
---|---|---|---|---|
بس | INTERJ | تعجب | enough | GLF,EGY |
مرحبا | INTERJ | تعجب | hello | GLF,EGY |
الو | INTERJ | تعجب | hello (on phone) | GLF,EGY |
يالله | INTERJ | تعجب | hurry up;come on! | GLF,EGY |
لأ | INTERJ | تعجب | no | GLF,EGY |
انزين | INTERJ | تعجب | OK | GLF |
اوكيه | INTERJ | تعجب | OK | GLF,EGY |
حشى | INTERJ | تعجب | GLF |
Examples for MSA will be added soon
4.2.22 CONJ - حرف_عطف¶
- Coordinating Conjunctions
- Conjunctions are used to coordinate and link independent constituents with each other.
Notes:
- Conjunctions don't take any features.
Examples
Word Form | POS.Features | قسم الكلام.الخصائص | English Gloss | Diaclect |
---|---|---|---|---|
بس | CONJ | حرف_عطف | But | GLF,EGY |
و+ | CONJ | حرف_عطف | and | MSA,GLF,EGY |
ف+ | CONJ | حرف_عطف | and, then | MSA,GLF,EGY |
ولا | CONJ | حرف_عطف | or (in questions) | GLF,EGY |
4.2.23 CONJ_SUB - أداة_ربط¶
- Subordinating Conjunctions
- A subordinating conjunction marks a sentence as dependent to another sentence that is independent and called the main clause.
Notes:
- Subordinating conjunction don't take any features
Examples
Word Form | POS.Features | قسم الكلام.الخصائص | English Gloss | Diaclect |
---|---|---|---|---|
و+ | CONJ_SUB | أداة_ربط | while | MSA,GLF,EGY |
More examples will be added soon
4.2.24 PART_VOC - حرف_نداء¶
Vocative Particles
Notes:
- Vocative Particles don't take any features
Examples
Word Form | POS.Features | قسم الكلام.الخصائص | English Gloss | Diaclect |
---|---|---|---|---|
يا | PART_VOC | حرف_نداء | O!;hey! | MSA,GLF,EGY |
More examples will be added soon
4.2.25 PART_RESTRICT - اداة_استثناء¶
- Restrictive Particles
- A restrictive particle is used in a negative construction marking a restriction.
Notes:
- Restrictive Particles don't take any features
Examples
Word Form | POS.Features | قسم الكلام.الخصائص | English Gloss | Comments/Examples/Diaclect |
---|---|---|---|---|
الا | PART_RESTRICT | اداة_استثناء | except for;only | MSA,GLF,EGY |
More examples will be added soon
4.2.26 PART_NEG - اداة_نفي¶
- Negative Particles
- A particle that negate what comes after it.
Notes:
- Negative particles don't take any features
Examples
Word Form | POS.Features | قسم الكلام.الخصائص | English Gloss | Diaclect |
---|---|---|---|---|
م+ | PART_NEG | اداة_نفي | not | GLF,EGY |
ما | PART_NEG | اداة_نفي | not | MSA,GLF,EGY |
مب | PART_NEG | اداة_نفي | not | GLF |
هب | PART_NEG | اداة_نفي | not | GLF |
لا | PART_NEG | اداة_نفي | not;neither;nor | MSA,GLF,EGY |
More examples will be added soon
4.2.27 PART_DET - اداة_تعريف¶
- Determiner Particles
- A clitic that attaches to nominals.
Notes:
- Determiner particles don't take any features
Examples
Word Form | POS.Features | قسم الكلام.الخصائص | English Gloss | Comments/Examples/Diaclect |
---|---|---|---|---|
ال+ | PART_DET | اداة_تعريف | the | MSA,GLF,EGY |
4.2.28 PART_INTERROG - اداة_استفهام¶
- Interrogative Particles
- Interrogative particles introduce questions.
Notes:
- Interrogative particles don't take any features
4.2.29 PART_FUT - اداة_استقبال¶
- Future Particles
- Interrogative particles mark the future when attaches to imperfective verbs.
Notes:
- Future particles don't take any features
Examples
Word Form | POS.Features | قسم الكلام.الخصائص | English Gloss | Diaclect |
---|---|---|---|---|
ب+ | PART_FUT | اداة_استقبال | will | GLF |
ح+ | PART_FUT | اداة_استقبال | will | EGY |
ه+ | PART_FUT | اداة_استقبال | will | EGY |
س+ | PART_FUT | اداة_استقبال | will | MSA |
سوف | PART_FUT | اداة_استقبال | will | MSA |
رح | PART_FUT | اداة_استقبال | will | GLF |
4.2.30 PART_FOCUS - اداة_تفصيل¶
- Focus Particles
- Focus particles highlight the topic of the sentence or adds emphasis.
Notes:
- Focus particles don't take any features.
Examples
Word Form | POS.Features | قسم الكلام.الخصائص | English Gloss | Diaclect |
---|---|---|---|---|
اما | PART_FOCUS | اداة_تفصيل | as for | MSA,GLF,EGY |
More examples will be added soon
4.2.31 PART_EMPHATIC - اداة_توكيد¶
- Emphatic Particles
- Emphatic particles adds emphasis.
Notes:
- Emphatic particles don't take any features.
Examples
Word Form | POS.Features | قسم الكلام.الخصائص | English Gloss | Comments/Examples/Diaclect |
---|---|---|---|---|
ل+ | PART_EMPHATIC | اداة_توكيد | that | MSA,GLF,EGY |
More examples will be added soon
4.2.32 PART_RC - جواب_شرط¶
- Response Conditional Particles
- Response conditional particles are used in conditional sentences introducing the apodosis sentence/main clause.
Notes:
- Response conditional particles don't take any features.
Examples
Word Form | POS.Features | قسم الكلام.الخصائص | English Gloss | Comments/Examples/Diaclect |
---|---|---|---|---|
ف+ | PART_RC | جواب_شرط | so then | MSA,GLF,EGY |
More examples will be added soon
4.2.33 PART - حرف¶
- Particles
- Particles do not assign case and they can be omitted without affecting or altering meaning and/or structure.
Notes:
- Particles don't take any features.
Examples
Word Form | POS.Features | قسم الكلام.الخصائص | English Gloss | Diaclect Example |
---|---|---|---|---|
و+ | PART | حرف | and | GLF,EGY |
جان | PART | حرف | if, and so | GLF لولاها جان ما كانت الكل بالكل |
More examples will be added soon
4.2.34 PART_PROG - حرف_مضارعة¶
- Progressive Particle
- Denotes that a verb is in action
Notes:
- Progressive particles attache to imperfective verbs only.
- Progressive particles don't take any features.
Example
Word Form | POS.Features | قسم الكلام.الخصائص | English Gloss | Diaclect Example |
---|---|---|---|---|
ب+ | PART_PROG | حرف_مضارعة | EGY |
More examples will be added soon
4.2.35 PART_CONNECT - حرف_ربط¶
- Connective Particles
- Connective particles connect two clauses. They are most commonly used to introduce a comment clause after a clause starting with اما.
- Connective particles don't take any features
Examples
Word Form | POS.Features | قسم الكلام.الخصائص | English Gloss | Diaclect |
---|---|---|---|---|
ف+ | PART_CONNECT | حرف_ربط | {Discourse connective} | GLF,EGY |
4.3 Aditional Annotation Tasks¶
4.3.1 Lemmatization¶
The lemma is the citation form of the word. Across all our guidelines, we follow the lemma specification in (Graff et al, 2009), where:
- The lemma of all nominals is the masculine singular form of the word or the feminine singular form if no masculine form exists.
- The lemma of a verb is the perfective 3rd person masculine singular form.
- For all others (i.e. particles, adverbs, ... etc) the lemma is the same form as the baseword.
Notes:
- For some nominal cases such as اسم الوحدة ‘collective plurals’ which are also uncountable nouns, the lemma is the same as the noun. See examples below.
- The diacritization of the lemma includes adding all the short vowel diacritics except for the sukun ‘absence of a vowel’.
- Cases to look out for are the /oo/ and /ee/ long vowels. The vowel /oo/ is marked as ـَو, /ee/ is marked either as ـَي or ـِا.
- In the case of long vowels /aa/, /uu/, and /ii/ a short vowel marker of the same kind precedes the long vowel (i.e. ـَا, ـُو, and ـِي).
Word Form | Lemma | POS | English | Diaclect Comments |
---|---|---|---|---|
كتب | كِتاب | NOUN.MP | books | GLF,EGY |
كتبوا | كَتَب | VERB.P3MP | They wrote | GLF,EGY |
تفاح | تُفَّاح | NOUN.MS | apples | GLF,EGY collective plurals |
تفاحة | تُفَّاحَة | NOUN.FS | apple | GLF,EGY |
تفاحات | تُفَّاحَة | NOUN.FP | apples | GLF,EGY |
تمر | تَمر | NOUN.MS | dates (fruit) | GLF,EGY collective plurals |
تمور | تَمر | NOUN.MP | dates (fruit) | GLF,EGY |
تمرة | تَمرَة | NOUN.FS | date (fruit) | GLF,EGY |
تمرات | تَمرَة | NOUN.FP | dates (fruit) | GLF,EGY |
ناس | نَاس | NOUN.MP | people, humans | GLF,EGY collective plurals |
4.3.2 Gloss¶
The English gloss refers to the semantic translation of the Arabic lemma.
Notes:
- For nominals the gloss is the singular form of the word.
- For verbs the gloss infinitive form.
Lemma | Word Form | POS. | English Gloss | Comments/Examples/Diaclect |
---|---|---|---|---|
كِتاب | كتب | NOUN.MP | book | GLF,EGY |
كَتَب | كتبوا | VERB.P3MP | write |
4.3.3 Dialect Identification¶
Dialect identification (DID) is the task of tagging a certain context with a given dialect tag.
Deciding the dialect tag depends on the context of the sentence and/or the document. As dialects may share the same words within themselves or with MSA, the dialect is inferred from the sentence structure and word order of that specific dialect.
Although all words belonging to the same sentence may get the same dialect tag, in some cases two different dialectal structure could occur in the same sentence, hence we tag per word
5. Acknowledgments¶
- This work was funded by a Research Enhancement Fund from New York University Abu Dhabi
- Portions of the Egyptian Arabic Guidelines are based on the LDC's Egyptian Arabic Morphological Guidelines (Maamouri et al., 2013)
- Portions of the Gulf entries are from the textbook: Ramsah, An Introduction to learning Emirati Dialect and Culture (Nasser Isleem and Ayesha Al Hashemi, 2015)
-
Refer to the phonology guidelines for the complete CAPHI reference. ↩
-
'Ta Marbuta' suffix is usually used to mark the feminine gender. ↩
-
The feminine singualr form مَلِكَة translates to 'Queen' in English. ↩