CAMEL POS Schema and Guidelines

1. Motivation and Goals:

CAMEL POS is inspired by the ARZATB tagset and guidelines (Maamouri et al., 2012), which was based on the PATB guidelines (Maamouri et al., 2009). CAMEL POS is designed as a single tagset for both MSA and the dialects with the following goals in mind:

  • Facilitating research on adaptation between MSA and the dialects, and among the dialects.
  • Supporting backward compatibility with previously annotated resources.
  • Enforcing a functional morphology analysis that is deeper and more compatible with Arabic morphosyntactic rules than form-based analysis (Alkuhlani and Habash, 2011).

2. Overview

The CAMEL POS tags and features are the union of those in MSA and the dialects. The features are available to use when needed, for example, case and state features are used more often in MSA; but on the other hand, dialects tend to have many more clitics than MSA, including non-MSA ones.

One main property of the CAMEL POS tagset that sets it apart from ARZATB is that the morphological features of both gender and number of nominals are annotated functionally (i.e. semantically) (Alkuhlani and Habash, 2011; Smrž, 2007). This decision allows us to assign the features to the baseword without the need to specify the surface form affixes that mark the surface form gender and number. This is not the case in ARZATB, where broken (irregular) plural nouns are tagged as singular because they do not use the sound plural affixes.

Another property is that in the CAMEL POS tagset we omit case and state features for nominals, and voice and mood for verbs when annotating dialectal Arabic as the dialects have lost them completely, except for some high frequency fossilized MSA forms, such as طبعاً /t. a b 3 a n/1 ‘of course’ which retains an indefinite ending.

The main part of the word, that is the baseword, is tagged in the following format: POS.features, where POS is the core POS tag and features is the possible feature combination that goes with the POS tag, a . separates the POS from the feature combination. Proclitics, however, get only a POS tag since they have no features. On the other hand, pronominal enclitics get a similar tag format as the baseword (i.e.PRON.features).

3. Schema details

3.1 Core POS

The following tables show the list of the POS tagset used in this schema compared with the ones used ARZATB. The tagset is divided into three categories according to the D3 tokenization scheme (Habash et al, 2010): proclitics (14 tags), enclitics (2 tags) and baseword (39 tags). Together with the features, CAMEL POS tagset maps to ARZATB and retains backward compatibility. It also offers an intuitive Arabic scheme that is suitable to use for annotation.

Proclitics tags

CAMEL POS Arabic CAMEL POS ATB POS
أداة_تعريف PART_DET DET
حرف_عطف CONJ CONJ
حرف_جر PREP PREP
أداة_نفي PART_NEG NEG_PART
أداة_استقبال PART_FUT FUT_PART
أداة_مضارعة PART_PROG PROG_PART
أداة_ربط CONJ_SUB SUB_CONJ
ضمير_إشارة PRON_DEM DEM_PRON
ضمير_استفهام PRON_INTERROG INTERROG_PRON
أداة PART PART
حرف_ربط PART_CONNECT CONNEC_PART
أداة_توكيد PART_EMPHATIC EMPHATIC_PART
جواب_شرط PART_RC RC_PART
أداة_نداء PART_VOC VOC_PART

Enclitics tags

CAMEL POS Arabic CAMEL POS ATB POS
أداة_نفي PART_NEG NEG_PART
ضمير PRON *SUFF_DO:[PGN]
ضمير PRON POSS_PRON_[\PGN]
ضمير PRON PRON_[PGN]

Baseword tags

CAMEL POS Arabic CAMEL POS ATB POS
اسم NOUN NOUN
اسم_عدد NOUN_NUM NOUN_NUM
اسم_علم NOUN_PROP NOUN_PROP
اسم_كم NOUN_QUANT NOUN_QUANT
صفة ADJ ADJ
صفة_عدد ADJ_NUM ADJ_NUM
صفة_مقارنة ADJ_COMP ADJ_COMP
ظرف ADV ADV
ظرف_استفهام ADV_INTERROG INTERROG_ADV
ظرف_موصول ADV_REL REL_ADV
فعل VERB IV/PV/CV
شبه_فعل VERB_PSEUDO PSEUDO_VERB
اسم_فعل VERB_NOM VERB
ضمير PRON PRON_[PGN]
ضمير_إشارة PRON_DEM DEM_PRON_[GN]
ضمير_استفهام PRON_INTERROG INTERROG_PRON
ضمير_تعجب PRON_EXCLAM EXCLAM_PRON
ضمير_موصول PRON_REL REL_PRON
أداة PART PART
أداة_تعريف PART_DET DET
أداة_نفي PART_NEG NEG_PART
أداة_استقبال PART_FUT FUT_PART
أداة_مضارعة PART_PROG PROG_PART
أداة_فعل PART_VERB VERB_PART
أداة_نداء PART_VOC VOC_PART
أداة_استفهام PART_INTERROG INTERROG_PART
أداة_استثناء PART_RESTRICT RESTRIC_PART
أداة_تفصيل PART_FOCUS FOCUS_PART
أداة_توكيد PART_EMPHATIC EMPHATIC_PART
جواب_شرط PART_RC RC_PART
أداة_ربط CONJ_SUB SUB_CONJ
حرف_جر PREP PREP
حرف_عطف CONJ CONJ
أداة_ربط PART_CONNECT CONNEC_PART
رقم DIGIT NOUN_NUM
اختصار ABBREV ABBREV
تعجب INTERJ INTERJ
أجنبي FORIEGN FOREIGN
علامة_ترقيم PUNC PUNC

3.2 Features

CAMEL POS provides a full array of features:

  • Aspect with the values Perfective, Imperfective and Command.
  • Person with the values 1st, 2nd, 3rd.
  • Gender with values Masculine and Feminine.
  • Number with values Simgular, Dual and Plural.
  • State with values Definite, Indefinite and Construct.
  • Case with values Nominative, Genitive and Accusative.
  • Voice with values Active and Passive.
  • Mood with values Subjunctive, Indicative and Jussive.

Not all the features mentioned are necessarily relevant to the dialects. In the full POS tag, the specified values of the different features will appear in the following order:
<POS>.<A><P><G><N>.<S><C><V><M>

For a subset of POS tags in the baseword category, each tag has a limited number of possible feature combinations that is paired with it. Below is the list of the POS tags that take features and their possible ordered combination.

Note that the below combinations are for the dialects, in the case of MSA, nominals take Case and State, and verbs take Voice and Mood in addition to the listed feature combinations.

  • NOUN, NOUN_*, ADJ, ADJ_* All nominals take the combination of Gender, Number. For example جالس /y aa l i s/ ‘sitting’ is tagged ADJ.MS ; In the occasional uses of State, such as طبعاً /t. a b 3 a n/ ‘of course’ the tag would be NOUN.MS.I.

  • VERB All verbs take the combination of Aspect, Person, Gender and Number. For example يقطع /y i g t. a 3/ ‘cut’ is tagged as VERB.I3MS

  • PRON All pronouns take the combination of Person, Gender and Number. For example انتي /2 i n t y/ ‘you [fs]’ is tagged as PRON.2FS

  • PRON_DEM All demonstrative pronouns take the combination of Gender and Number. For example هاذا /h aa dh a/ ‘this’ is tagged as PRON_DEM.3MS

In cases where a feature is not present, such as gender in verbs of first person inflections, the gender feature is simply dropped and does not require a placeholder since the possible feature values are ordered and unique. For example the imperfective 1st person verb أقول /2 a g uu l/ ‘I say’ will be tagged as VERB.I1S

4. Detailed annotation guidelines

Morphological annotation using CAMEL POS assumes that each word is tokenized using the D3 tokenization scheme. Therefore, each token gets at least a POS tag according to the table above. In a large-scale full morphological annotation task, additional annotations are usually provided, such as lemmatization, English gloss, amd dialect identification. In this section we provide detailed guidelines in the context of a comperhensive annotation task.


4.1 Tokenization

The tokenization scheme recommended when annotating using CAMEL POS is D3.

D3 Tokenization
D3 tokenizes all clitics: question particle, conjunctions, particles, prepositions, articles, and pronominal enclitics.

Notes:

  • In the tokenization task, all tokens must be orthographically normalized, that is undoing all of the morphophonemic and orthographic rewrite rules. For example, the word مكتبتها should be tokenized as مكتبة +ها NOT مكتبت +ها
  • Remember that clitics are optional to word formation and they include particles and pronouns.
Tokenization Phenomenon Word Form Tokenization English Gloss Dialect
Definite Article للمكتب ل+ ال+ مكتب for the office MSA,GLF,EGY
Ta Marbuta مكتبتنا مكتبة +نا our library MSA,GLF,EGY
Ta Marbuta كاتباها كاتبة +ها she wrote it (deverbal) GLF,EGY
Alif Maqsura حكاها حكى +ها he recounted it MSA,GLF,EGY
Hamza Form بهاؤه،بهائه،بهاءه بهاء +ه his glory MSA,GLF,EGY
Waw-of-Plurality قالوها قالوا +ها They said it MSA,GLF,EGY
Various clitics وستجننني و+ س+ تجنن +ني and she will drive me crazy MSA
Various clitics وهتجننني و+ ه+ تجنن +ني and she will drive me crazy EGY
Various clitics وبتجننني و+ ب+ تجنن +ني and she will drive me crazy GLF

4.1.1 Clitics

Clitics are syntactically independent morphems that are orthographically attached to the baseword. They can be in a number of parts of speech.

Notes:

  • Clitics may interact with the spelling of the baseword. See the notes above on Tokenization and the CODA general rules.
  • Although writers -in dialectal Arabic mostly- tend to attach what is considered as an indirect object clitic with the baseword (verbs, adjectives that are active participles), in the CODA convention they should be separate. For example اجيبلك should be اجيب لك, and جايبلها should be جايب لها. For the list of clitics, please refer to the CODA seed lexicon

4.2 CAMEL POS tagset

4.2.1 Features

Features refer to specific morphosyntactic aspects of the word that are abstracted away in the lemma form. For example, the word أميرات 'princesses' has the lemma أمير 'prince' with the features gender: feminine and number: plural.

Notes

  • Features may not necessarily match the form of the word: e.g. حامل 'pregnant' is gender: feminine even though it has no 'Ta marbuta'2 ending; and خليفة 'Khalifa (name); caliph' is gender: masculine and number: singular even though it ends with 'Ta marbuta'.
  • Some words have plurality to their meaning, but morphosyntactically are singular (collectable plurals). For example, شجر 'trees' is singular because we say شجر طويل 'tall trees' similar to رجل طويل 'a tall man'.
  • The assignment of the features is in context (sentence and document) and depends on the morphosyntactic agreement at all times.
  • For specific examples and cases, refer to the notes section of the different parts-of-speech.
  • Features include gender, number, person, and aspect. Each feature has a number of possible values, see the full description of the features.
  • Features are represented in combinations in our system. The examples in the section are some feature-value pairs.
Features الخصائص POS قسم الكلام Description
الوصف
.P3MS ماضي.هو فعل VERB Aspect:(P);Person:(3);Gender:(M);Number:(S)
الزمن:ماضي؛الضمير:مفرد مذكر غائب
.P1P ماضي.أنا فعل VERB Aspect:(P);Person:(1);Gender:unspecified;Number:(S)
الزمن:ماضي؛الضمير:جمع متكلم
.MS هو اسم NOUN Gender:(M);Number:(S)
الجنس:مذكر؛العدد:مفرد

4.2.2 NOUN - اسم

Common Nouns
Common nouns refer to entities and concepts that have a more general reference than proper nouns. Common nouns inflect for prefixes and suffixes of person, gender, number.

Notes:

  • Some nouns, such as prepositional nouns (عند، بين، أمام ... etc) don't necessarily have clear features. To assign features for those cases, use a syntactic test for a nonsensical semantic context3. For example, the word أمام can be in a nonsensical construction where you might say الأمام الأول والأمام الثاني للبيت. According to the morphosyntactic agreement, this makes the features for أمام to be gender: M and number:S
  • For gender-ambiguous cases, such as طريق, where طريق could be both masculine and feminine depending on the usage (طريق طويل and طريق طويلة). To assign the gender feature, resolve using the context if such is impossible assign it the form-based gender.
  • Common nouns include derived such as دباديب and non-derived nouns such as ام.
  • Titles are also annotated as common nouns.
  • Common nouns also include a set of borrowed nouns.
  • In the context of dialectal text annotation, only nouns that appear to have a case ending such as غصبٍ will have state and case feature annotated. The 'case' feature in this situation is not the real case but rather a remnant from the MSA.

Examples

Below are examples of inflection tables based on basic paradigms, which means that the inflections are for the complete set of possible morphological features except for clitic features.

Lemmas with full paradigms

MSA: مَلِك 'King'
Word Form Tag العلامة
مَلِك NOUN.MS.IU اسم.نكرة.هو
مَلِكٌ NOUN.MS.IN اسم.نكرة.هو.مرفوع
مَلِكٍ NOUN.MS.IG اسم.نكرة.هو.مجرور
مَلِكاً NOUN.MS.IA اسم.نكرة.هو.منصوب
مَلِكُ NOUN.MS.DN اسم.معرفة.هو.مرفوع
مَلِكُ NOUN.MS.CN اسم.مضاف.هو.مرفوع
مَلِكِ NOUN.MS.CG اسم.مضاف.هو.مجرور
مَلِكَ NOUN.MS.CA اسم.مضاف.هو.منصوب
مَلِكَة4 NOUN.FS.IU اسم.نكرة.هي
مَلِكَةٌ NOUN.FS.IN اسم.نكرة.هي.مرفوع
مَلِكَةٍ NOUN.FS.IG اسم.نكرة.هي.مجرور
مَلِكَةً NOUN.FS.IA اسم.نكرة.هي.منصوب
مَلِكَةُ NOUN.FS.DN اسم.معرفة.هي.مرفوع
مَلِكَةُ NOUN.FS.CN اسم.مضاف.هي.مرفوع
مَلِكَةِ NOUN.FS.CG اسم.مضاف.هي.مجرور
مَلِكَةَ NOUN.FS.CA اسم.مضاف.هي.منصوب
مَلِكانِ NOUN.MD.IN اسم.نكرة.هما♂.مرفوع
مَلِكَيْنِ NOUN.MD.IG اسم.نكرة.هما♂.مجرور
مَلِكَيْنِ NOUN.MD.IA اسم.نكرة.هما♂.منصوب
مَلِكانِ NOUN.MD.DN اسم.معرفة.هما♂.مرفوع
مَلِكا NOUN.MD.CN اسم.مضاف.هما♂.مرفوع
مَلِكَيْ NOUN.MD.CG اسم.مضاف.هما♂.مجرور
مَلِكَيْ NOUN.MD.CA اسم.مضاف.هما♂.منصوب
مَلِكَتانِ NOUN.FD.IN اسم.نكرة.هما♀.مرفوع
مَلِكَتَيْنِ NOUN.FD.IG اسم.نكرة.هما♀.مجرور
مَلِكَتَيْنِ NOUN.FD.IA اسم.نكرة.هما♀.منصوب
مَلِكَتانِ NOUN.FD.DN اسم.معرفة.هما♀.مرفوع
مَلِكَتا NOUN.FD.CN اسم.مضاف.هما♀.مرفوع
مَلِكَتَيْ NOUN.FD.CG اسم.مضاف.هما♀.مجرور
مَلِكَتَيْ NOUN.FD.CA اسم.مضاف.هما♀.منصوب
مُلُوك NOUN.MP.IU اسم.نكرة.هم♂
مُلُوكٌ NOUN.MP.IN اسم.نكرة.هم♂.مرفوع
مُلُوكٍ NOUN.MP.IG اسم.نكرة.هم♂.مجرور
مُلُوكاً NOUN.MP.IA اسم.نكرة.هم♂.منصوب
مُلُوكُ NOUN.MP.DN اسم.معرفة.هم♂.مرفوع
مُلُوكُ NOUN.MP.CN اسم.مضاف.هم♂.مرفوع
مُلُوكِ NOUN.MP.CG اسم.مضاف.هم♂.مجرور
مُلُوكَ NOUN.MP.CA اسم.مضاف.هم♂.منصوب
أَمْلاك NOUN.MP.IU اسم.نكرة.هم♂
أَمْلاكٌ NOUN.MP.IN اسم.نكرة.هم♂.مرفوع
أَمْلاكٍ NOUN.MP.IG اسم.نكرة.هم♂.مجرور
أَمْلاكاً NOUN.MP.IA اسم.نكرة.هم♂.منصوب
أَمْلاكُ NOUN.MP.DN اسم.معرفة.هم♂.مرفوع
أَمْلاكُ NOUN.MP.CN اسم.مضاف.هم♂.مرفوع
أَمْلاكِ NOUN.MP.CG اسم.مضاف.هم♂.مجرور
أَمْلاكَ NOUN.MP.CA اسم.مضاف.هم♂.منصوب
مَلِكات NOUN.FP.IU اسم.نكرة.هن
مَلِكاتٌ NOUN.FP.IN اسم.نكرة.هن.مرفوع
مَلِكاتٍ NOUN.FP.IG اسم.نكرة.هن.مجرور
مَلِكاتٍ NOUN.FP.IA اسم.نكرة.هن.منصوب
مَلِكاتُ NOUN.FP.DN اسم.معرفة.هن.مرفوع
مَلِكاتُ NOUN.FP.CN اسم.مضاف.هن.مرفوع
مَلِكاتِ NOUN.FP.CG اسم.مضاف.هن.مجرور
مَلِكاتِ NOUN.FP.CA اسم.مضاف.هن.منصوب
GLF: شَيخ 'Sheikh;chieftain'
Word Form Tag العلامة
شَيخ NOUN.MS اسم.هو
شَيخَة NOUN.FS اسم.هي
شَيخَين NOUN.MD اسم.هما♂
شَيختَين NOUN.FD اسم.هما♀
شْيُوخ NOUN.MP اسم.هم♂
شَيخَات NOUN.FP اسم.هن
EGY: أُستَاذ 'professor;teacher'
Word Form Tag العلامة
أُستَاذ NOUN.MS اسم.هو
أُستَاذَة NOUN.FS اسم.هي
أُستَاذَين NOUN.MD اسم.هما♂
أُستَاذتَين NOUN.FD اسم.هما♀
أَساتذَة NOUN.MP اسم.هم♂
أُستَاذَات NOUN.FP اسم.هن


Lemmas with partial paradigms

MSA: حُبّ 'affection;love'
Word Form Tag العلامة
حُبّ NOUN.MS.IU اسم.نكرة.هو
حُبٌّ NOUN.MS.IN اسم.نكرة.هو.مرفوع
حُبٍّ NOUN.MS.IG اسم.نكرة.هو.مجرور
حُبّاً NOUN.MS.IA اسم.نكرة.هو.منصوب
حُبُّ NOUN.MS.DN اسم.معرفة.هو.مرفوع
حُبُّ NOUN.MS.CN اسم.مضاف.هو.مرفوع
حُبِّ NOUN.MS.CG اسم.مضاف.هو.مجرور
حُبَّ NOUN.MS.CA اسم.مضاف.هو.منصوب
GLF: سَيَّارَة 'car'
Word Form Tag العلامة
سَيَّارَة NOUN.FS اسم.هي
سَيَّارتَين NOUN.FD اسم.هما♀
سَيَّارَات NOUN.FP اسم.هن
سْيَايِير NOUN.FP اسم.هن
EGY: وِشّ 'face'
Word Form Tag العلامة
وِشّ NOUN.MS اسم.هو
وِشَّين NOUN.MD اسم.هما♂
وُشُوش NOUN.MP اسم.هم♂


Examples of various words the are common in different Arabic varieties

Click here to view the examples
Word Form Tag العلامة English Gloss Dialect
حرمات NOUN.FP اسم.هن women GLF
حريم NOUN.FP اسم.هن women GLF
نسوان NOUN.FP اسم.هن women GLF,EGY
خالوه NOUN.FS اسم.هي aunt! (maternal) GLF
عموه NOUN.FS اسم.هي aunt! (paternal) GLF
حجرة NOUN.FS اسم.هي room GLF
ميز NOUN.FS اسم.هي table GLF
بكرة NOUN.FS اسم.هي tomorrow GLF,EGY
شيشة NOUN.FS اسم.هي waterpipe GLF,EGY
حرمة NOUN.FS اسم.هي woman GLF
مرة NOUN.FS اسم.هي woman GLF
بزران NOUN.MP اسم.هم child GLF
عيال NOUN.MP اسم.هم child GLF,EGY
رجّال NOUN.MP اسم.هم men GLF
رجاجيل NOUN.MP اسم.هم men GLF
حق NOUN.MS اسم.هو for the benefit of GLF
خلاص NOUN.MS اسم.هو enough GLF
سكين NOUN.MS اسم.هو knife GLF
سكين NOUN.MS اسم.هي knife GLF
مكتوب NOUN.MS اسم.هو letter GLF,EGY
مثل NOUN.MS اسم.هو like GLF
حلق NOUN.MS اسم.هو mouth GLF
خشم NOUN.MS اسم.هو nose GLF
مال NOUN.MS اسم.هو of GLF
حقّ NOUN.MS اسم.هو of, belongs to GLF
برع NOUN.MS اسم.هو outside, outside of GLF
حد NOUN.MS اسم.هو somebody, someone GLF,EGY
باكر NOUN.MS اسم.هو tomorrow GLF
امس NOUN.MS اسم.هو yesterday GLF
كذي NOUN.MS اسم.هو like this, as this GLF


4.2.3 NOUN_PROP - اسم_علم

Proper Nouns
Proper nouns are nouns that have a unique referential meaning in context that is mutually exclusive with other entities.

Notes:

  • Proper nouns refer to names of people, geographical entities, months, and acronyms.
  • Proper nouns with more than one part such as محمد علي should have both words annotated as proper nouns.
  • Names such as عبد الله and علاء الدين should be split, both words annotated as proper nouns.
  • Titles of newspapers, magazines, and news agencies, sports teams are annotated as proper nouns, as well as names of political parties, etc.
  • Proper nouns might exhibit a different kind of ambiguity where the word as a NOUN has features that faill the morpho-syntactic agreement when used as a proper noun. For example the proper noun احلام as a female given name behaves as a FS hence will be given NOUN_PROP.FS as a tag. The same applies to other proper nouns such as the newspaper name الاهرام, see examples below.
  • Proper nouns can be confused with common nouns. A case in point is the word جَنُوب إِفرِيقيا, the two parts of the word are considered as proper nouns when it refers to the country, South Africa.
  • The lemma of a proper noun does not include Al but it includes the 'Ta Marbuta'. The proper noun القاهرة has the lemma قاهرة.
  • Just linke nouns can inflect for subset of morphological features, usually the number in certain cases.

Examples

Below are examples of different proper nouns and their respective tags. Those examples are valid in most Arabic varieties. Note that in MSA, the tag will include the state and case feature depending on the context surrounding the word.

Word
Form
POS.Features قسم الكلام.الخصائص English
Gloss
Comments
Examples
خليفة NOUN_PROP.MS اسم علم.هو Khalifa name of a person
هند NOUN_PROP.FS اسم علم.هي Hind name of a person
عبد الله NOUN_PROP.MS اسم علم.هو Abdullah each word gets the same POS tag
امريكا NOUN_PROP.FS اسم علم.هي America geographical entity
ناتو NOUN_PROP.MS اسم علم.هو NATO acronym
امشير NOUN_PROP.MS اسم علم.هو Meshir month (coptic calendar)
الاهرام NOUN_PROP.FS اسم علم.هي Al Ahram newspaper نشرت الاهرام التقرير النهائي
الاهرام NOUN_PROP.MS اسم علم.هو Al Ahram newspaper تلقى الاهرام اتصالا هاتفيا
الاخوان NOUN_PROP.FS اسم علم.هي The Muslim Brotherhood political party تلقت الإخوان تمويلاً

4.2.4 NOUN_QUANT - اسم_كم

Noun quantifiers
Noun quantifiers express either quantity or approximation.

Examples

Below are examples of common noun quantifiers from different Arabic varieties. Note that they can inflect to other morphological features in the same as nouns.

MSA examples

The examples below are in their indifenite nominative form.

Word Form POS.Features قسم الكلام.الخصائص English Gloss
عُشُرٌ NOUN_QUANT.MS.IN نكرة.هو.مرفوع (one)_tenth
تُسُعٌ NOUN_QUANT.MS.IN نكرة.هو.مرفوع (one)_ninth
ثُمُنٌ NOUN_QUANT.MS.IN نكرة.هو.مرفوع (one)_eighth
سُبُعٌ NOUN_QUANT.MS.IN نكرة.هو.مرفوع (one)_seventh
سُدُسٌ NOUN_QUANT.MS.IN نكرة.هو.مرفوع (one)_sixth
خُمُسٌ NOUN_QUANT.MS.IN نكرة.هو.مرفوع (one)_fifth
رُبُعٌ NOUN_QUANT.MS.IN نكرة.هو.مرفوع quarter,(one)_fourth
ثُلُثٌ NOUN_QUANT.MS.IN نكرة.هو.مرفوع (one)_third
نِصْفٌ NOUN_QUANT.MS.IN نكرة.هو.مرفوع half
شَطْرٌ NOUN_QUANT.MS.IN نكرة.هو.مرفوع part,portion,division,section
بِضْعَةٌ NOUN_QUANT.FS.IN نكرة.هي.مرفوع some,a_few,several
بِضْعٌ NOUN_QUANT.MS.IN نكرة.هو.مرفوع some,a_few,several
بَعْضٌ NOUN_QUANT.MS.IN نكرة.هو.مرفوع some,a_few,little,part
مُعْظَمٌ NOUN_QUANT.MS.IN نكرة.هو.مرفوع most,majority
أَغْلَبُ NOUN_QUANT.MS.IN نكرة.هو.مرفوع most_(of),the_majority_(of),the_greater_portion_of_part_(of)
غالِبِيَّةٌ NOUN_QUANT.FS.IN نكرة.هي.مرفوع majority,greater_part_of_portion
جُلٌّ NOUN_QUANT.MS.IN نكرة.هو.مرفوع most_(of),the_majority_(of),bulk,major_part
حَوالَي NOUN_QUANT.MS.IU نكرة.هو about,approximately,around,roughly,nearly,almost
زُهاءَ NOUN_QUANT.MS.IU نكرة.هو about,approximately,around,roughly,nearly,almost
قُرابَةٌ NOUN_QUANT.FS.IN نكرة.هي.مرفوع approximately,almost,just,about,nearly,roughly
قَيْدٌ NOUN_QUANT.MS.IN نكرة.هو.مرفوع around
قَيْسٌ NOUN_QUANT.MS.IN نكرة.هو.مرفوع measure,gauge,quantify
أَكْثَرُ NOUN_QUANT.MS.IN نكرة.هو.مرفوع more,most,majority
جَمِيعٌ NOUN_QUANT.MS.IN نكرة.هو.مرفوع all,entire,every,whole
كُلٌّ NOUN_QUANT.MS.IN نكرة.هو.مرفوع all,entire,every,whole
كامِلٌ NOUN_QUANT.MS.IN نكرة.هو.مرفوع entire
كافَّةٌ NOUN_QUANT.FS.IN نكرة.هي.مرفوع all
ضِعْفٌ NOUN_QUANT.MS.IN نكرة.هو.مرفوع double;multiple
كِلا NOUN_QUANT.MD.IU نكرة.هما♂ both_of
كِلْتا NOUN_QUANT.FD.IU نكرة.هما♀ both_of
GLF examples

The examples in their lemma form. These examples are extracted form The Annotated Gumar Corpus.

Word Form POS.Features قسم الكلام.الخصائص English Gloss
شُوَيّ NOUN_QUANT.MS اسم_كم.هو few;a_little_bit
اَيّ NOUN_QUANT.MS اسم_كم.هو any
حَبَّة NOUN_QUANT.FS اسم_كم.هي some
بَعض NOUN_QUANT.MS اسم_كم.هو some;part
كَم NOUN_QUANT.MS اسم_كم.هو some;severam
كِذا NOUN_QUANT.MS اسم_كم.هو many
كِلّ NOUN_QUANT.MS اسم_كم.هو all;whole;entire;every
نِصف NOUN_QUANT.MS اسم_كم.هو half
نُصّ NOUN_QUANT.MS اسم_كم.هو half
رُبع NOUN_QUANT.MS اسم_كم.هو quarter


Examples for other dialecs will be added soon

4.2.5 NOUN_NUM - اسم_عدد

Cardinal numbers
Cardinal numbers quantify rather than rank. They answer the question “How many?”

Notes:

  • Compound numerals (from 21 to infinite) that are coordinated with the conjunction wa- follow the POS tag of the first element. If the first element is a cardinal number, the second element should be annotated as such.
  • Cardinal numbers can occur in pre-nominal and post-nominal positions without any agreement with the noun they occur with. They are invariable.
  • MSA has different agreement rules from the dialects.

Examples

Below are the full list of the basic numbers in units, tens, and hundreds.

GLF examples
Word Form POS.Features قسم_الكلام.الخصائص English Gloss
صفر NOUN_NUM.MS اسم_عدد.هو 0, zero
واحد NOUN_NUM.MS اسم_عدد.هو 1, one
عشر NOUN_NUM.MP اسم_عدد.هم 10, ten
عشرة NOUN_NUM.FP اسم_عدد.هن 10, ten
امية NOUN_NUM.FP اسم_عدد.هن 100, one hundred
مية NOUN_NUM.FP اسم_عدد.هن 100, one hundred
الف NOUN_NUM.MP اسم_عدد.هم 1000, one thousand
آلاف NOUN_NUM.MP اسم_عدد.هم 1000, thousands
احدعش NOUN_NUM.MP اسم_عدد.هم 11, eleven
اثنعش NOUN_NUM.MP اسم_عدد.هم 12, twelve
ثلتعش NOUN_NUM.MP اسم_عدد.هم 13, thirteen
اربعتعش NOUN_NUM.MP اسم_عدد.هم 14, fourteen
خمستعش NOUN_NUM.MP اسم_عدد.هم 15, fifteen
ستعش NOUN_NUM.MP اسم_عدد.هم 16, sixteen
سبعتعش NOUN_NUM.MP اسم_عدد.هم 17, seventeen
ثمنتعش NOUN_NUM.MP اسم_عدد.هم 18, eighteen
تسعتعش NOUN_NUM.MP اسم_عدد.هم 19, nineteen
اثنين NOUN_NUM.MD اسم_عدد.هما♂ 2, two
عشرين NOUN_NUM.MP اسم_عدد.هم 20, twenty
ميتين NOUN_NUM.FP اسم_عدد.هن 200, two hundred
ثلاث NOUN_NUM.MP اسم_عدد.هم 3, three
ثلاثة NOUN_NUM.FP اسم_عدد.هن 3, three
ثلاثين NOUN_NUM.MP اسم_عدد.هم 30, thirty
ثلاثمية NOUN_NUM.FP اسم_عدد.هن 300, three hundred
اربع NOUN_NUM.MP اسم_عدد.هم 4, four
اربعة NOUN_NUM.FP اسم_عدد.هن 4, four
اربعين NOUN_NUM.MP اسم_عدد.هم 40, fourty
اربعمية NOUN_NUM.FP اسم_عدد.هن 400, four hundred
خمس NOUN_NUM.MP اسم_عدد.هم 5, five
خمسة NOUN_NUM.FP اسم_عدد.هن 5, five
خمسين NOUN_NUM.MP اسم_عدد.هم 50, fifty
خمسمية NOUN_NUM.FP اسم_عدد.هن 500, five hundred
ست NOUN_NUM.MP اسم_عدد.هم 6, six
ستة NOUN_NUM.FP اسم_عدد.هن 6, six
ستين NOUN_NUM.MP اسم_عدد.هم 60, sixty
ستمية NOUN_NUM.FP اسم_عدد.هن 600, six hundred
سبع NOUN_NUM.MP اسم_عدد.هم 7, seven
سبعة NOUN_NUM.FP اسم_عدد.هن 7, seven
سبعين NOUN_NUM.MP اسم_عدد.هم 70, seventy
سبعمية NOUN_NUM.FP اسم_عدد.هن 700, seven hundred
ثمان NOUN_NUM.MP اسم_عدد.هم 8, eight
ثمانية NOUN_NUM.FP اسم_عدد.هن 8, eight
ثمانين NOUN_NUM.MP اسم_عدد.هم 80, eighty
ثمانمية NOUN_NUM.FP اسم_عدد.هن 800, eight hundred
تسع NOUN_NUM.MP اسم_عدد.هم 9, nine
تسعة NOUN_NUM.FP اسم_عدد.هن 9, nine
تسعين NOUN_NUM.MP اسم_عدد.هم 90, ninety
تسعمية NOUN_NUM.FP اسم_عدد.هن 900, nine hundred


Examples for MSA and other dialecs will be added soon

4.2.6 PRON - ضمير

Bound pronouns - الضمائر المتصلة
Bound pronouns are morphemes that cannot occur independently of another morpheme. They are related to other words called their hosts.

Notes:

  • These pronouns bind to verbs to mark a direct object, to nouns to mark possession, and to prepositions.

Examples

Below is a list of the bound pronouns in different Arabic varieties. Note that there are difference in pronounciatation between the different varieties, however, orthographically they are spelled the same way without diacritics.

Word From POS.Features قسم الكلام.الخصائص English Gloss Diaclect
+نا PRON.1P ضمير.نحن our, us MSA,GLF,EGY
+ني PRON.1S ضمير.انا me MSA,GLF,EGY
PRON.1S ضمير.انا me, my MSA,GLF,EGY
+كن PRON.2FP ضمير.انتن you, your MSA,GLF
PRON.2FS ضمير.انت♀ you, your GLF
PRON.2FS ضمير.انت♀ you, your MSA, EGY
+كم PRON.2MP ضمير.انتم you, your MSA,GLF
+كم PRON.2P ضمير.انتم⚥ you, your GLF,EGY
PRON.2MS ضمير.انت♂ you, your MSA,GLF,EGY
+هن PRON.3FP ضمير.هن their, them MSA,GLF
+ها PRON.3FS ضمير.هي her, it, its MSA,GLF,EGY
+هم PRON.3MP ضمير.هم♂ their, them MSA,GLF
+هم PRON.3P ضمير.هم⚥ their, them GLF,EGY
PRON.3MS ضمير.هو him, his, it, its MSA,GLF,EGY

Unbound pronouns - الضمائر المنفصلة
Unbound pronouns are free morphemes that occur as separate words.

Examples

Below is a list of the bound pronouns in different Arabic varieties. Note that there are difference in pronounciatation between the different varieties, however, orthographically they are spelled the same way without diacritics.

Word Form POS.Features قسم الكلام.الخصائص English Gloss Diaclect
احن PRON.1P ضمير.نحن we GLF
احنا PRON.1P ضمير.نحن we GLF, GLF
حنّا PRON.1P ضمير.نحن we GLF
نحن PRON.1P ضمير.نحن we MSA,GLF
انا PRON.1S ضمير.انا I MSA,GLF,EGY
انتي PRON.2FS ضمير.انت♀ you EGY, GLF
انت PRON.2MS ضمير.انت♂ you MSA,GLF,EGY
انتو PRON.2P ضمير.انتم⚥ you GLF,EGY
هي PRON.3FS ضمير.هي she is MSA,GLF,EGY
هو PRON.3MS ضمير.هو he, it MSA,GLF,EGY
هم PRON.3P ضمير.هم⚥ they GLF,EGY

4.2.7 PRON_DEM - ضمير_إشارة

Demonstrative pronouns
Demonstrative pronouns are pronouns used for proximal or distal references.

Notes:

  • Demonstrative Pronouns can be basewords and/or proclitics, some baseword pronouns take no features, see examples below.
  • The relationship between proximity and distance does not seem to exist in Egyptian Arabic.

Examples

Below is a list of the demonstrative pronouns in different Arabic varieties. Note that there are difference in pronounciatation between the different varieties, however, orthographically they are spelled the same way without diacritics.

Word Form POS.Features قسم الكلام.الخصائص English Gloss Dialect
Examples
ا+ PRON_DEM ضمير_إشارة that;this EGY
أهو
ه+ PRON_DEM ضمير_إشارة that;this GLF
هالبيت
ذا PRON_DEM.MS ضمير_إشارة.هو that;this MSA,GLF
ذاك PRON_DEM.MS ضمير_إشارة.هو that;this GLF
ذيك PRON_DEM.FS ضمير_إشارة.هي that;this GLF
ها PRON_DEM.MS ضمير_إشارة.هو that;this MSA,GLF
هاذي PRON_DEM.FS ضمير_إشارة.هي that;this GLF
هاذن PRON_DEM.FP ضمير_إشارة.هن these;those GLF
هاك PRON_DEM.MS ضمير_إشارة.هو that;this GLF
هاكي PRON_DEM.FS ضمير_إشارة.هي that;this GLF
هاي PRON_DEM.FS ضمير_إشارة.هي that;this GLF
هاييل PRON_DEM.UP ضمير_إشارة.هم⚥ these;those GLF
هذا PRON_DEM.MS ضمير_إشارة.هو that;this MSA,GLF
هذاك PRON_DEM.MS ضمير_إشارة.هو that;this GLF
هذايك PRON_DEM.FS ضمير_إشارة.هي that;this GLF
هذوه PRON_DEM.MS ضمير_إشارة.هو that;this GLF
ده PRON_DEM.MS ضمير_إشارة.هو that;this ُEGY
دي PRON_DEM.FS ضمير_إشارة.هي that;this EGY
دول PRON_DEM.UP ضمير_إشارة.هم⚥ that;this EGY

4.2.8 PRON_INTERROG - ضمير_استفهام

Interrogative Pronouns
Interrogative Pronouns are independent words that used to form direct questions.

Notes:

  • Interrogative pronouns don't take features.

Examples

Below is a list of the interrogative pronouns in different Arabic varieties. Note that there are difference in pronounciatation between the different varieties, however, orthographically they are spelled the same way without diacritics.

Word Form POS.Features قسم الكلام.الخصائص English Gloss Dialect
شو PRON_INTERROG ضمير_استفهام what GLF, LEV
ايش PRON_INTERROG ضمير_استفهام what GLF
ايه PRON_INTERROG ضمير_استفهام what EGY
اي PRON_INTERROG ضمير_استفهام which;who MSA,GLF,EGY
كم PRON_INTERROG ضمير_استفهام how_much;how_many MSA,GLF,EGY
كمن PRON_INTERROG ضمير_استفهام how_much;how_many GLF
كيف PRON_INTERROG ضمير_استفهام how MSA,GLF,EGY
ما PRON_INTERROG ضمير_استفهام which;what MSA
ماذا PRON_INTERROG ضمير_استفهام which;what MSA
من PRON_INTERROG ضمير_استفهام whom;who MSA,GLF
منو PRON_INTERROG ضمير_استفهام whom;who GLF
مين PRON_INTERROG ضمير_استفهام whom;who GLF,EGY

4.2.9 PRON_REL - ضمير_موصول

Relative pronouns
Relative pronouns introduce relative clauses.

Notes:

  • Most dialectal relative pronouns don't take any features.
  • MSA relative pronouns can take gender and number features.

Examples

Below is a list of the interrogative pronouns in different Arabic varieties. Note that there are difference in pronounciatation between the different varieties, however, orthographically they are spelled the same way without diacritics.

Word Form POS.Features قسم الكلام.الخصائص English Gloss Dialect
الذي PRON_REL.MS ضمير_موصول.هو which;that;whom;who MSA
اللتي PRON_REL.FS ضمير_موصول.هي which;that;whom;who MSA
الذين PRON_REL.MP ضمير_موصول.هم which;that;whom;who MSA
اللواتي PRON_REL.FP ضمير_موصول.هن which;that;whom;who MSA
اللي PRON_REL ضمير_موصول which;that;who GLF,EGY, LEV
ما PRON_REL ضمير_موصول which;that;what MSA,GLF,EGY, LEV
من PRON_REL ضمير_موصول whom;who MSA,GLF

4.2.10 PRON_EXCLAM - ضمير_تعجب

Exclamative Pronouns
Exclamative Pronouns introduces exclamative structures.

Notes:

  • Exclamative Pronouns don't take any features.

Examples

Below is a list of the exclamative pronouns in different Arabic varieties. Note that there are difference in pronounciatation between the different varieties, however, orthographically they are spelled the same way without diacritics.

Word Form POS.Features قسم الكلام.الخصائص English Gloss Dialect
ما PRON_EXCLAM ضمير_تعجب how;what MSA,GLF,EGY
وش PRON_EXCLAM ضمير_تعجب how;what GLF
ايه PRON_EXCLAM ضمير_تعجب how;what EGY

4.2.11 ADJ - صفة

Adjectives
Adjectives are nominals that describe or clarify a noun

Notes:

  • Adjectives must inflect for gender and number according to the agreement rule with nouns.
  • The agreement rule states that adjectives must agree with the noun that they modify in gender and number. EXCEPT for plural irrational (غير عاقل) nouns, which always take feminine singular adjectives. For example of agreement: the word مهم inflects in agreement with the noun in the case of امرأة مهمة and نساء مهمات. It also agrees in the case of كتاب مهم but not كتب مهمة. Because the word كتاب is an irrational noun, hence the plural كتب takes a singular feminine adjective.
  • In the case where there is no noun in the sentence, the word will be tagged as an adjective if a specific noun can be recovered directly from the context. Otherwise, the word will be tagged as a noun. For example, the word عرب 'Arabs' could be a noun or an adjective. عرب as in جاء الرجال العرب 'The Arab men came' is tagged as an adjective, whereas in معجم لسان العرب 'Lisan Al Arab dictionary' عرب is tagged as a noun.
  • Although adjective must inflict in agreement with the noun, the features of the adjectives are annotated independently. In the example كتب مهمة the adjective مهمة is annotated with features feminine singular and NOT masculine plural.

Examples

Below are examples of inflection tables based on basic paradigms, which means that the inflections are for the complete set of possible morphological features except for clitic features. Adjectives, just like nouns, can inflect to full or partial paradigms.

MSA: أَمِين 'faithful;loyal'
Word Form POS.Features قسم الكلام.الخصائص
أَمِين ADJ.MS.IU صفة.نكرة.هو
أَمِينٌ ADJ.MS.IN صفة.نكرة.هو.مرفوع
أَمِينٍ ADJ.MS.IG صفة.نكرة.هو.مجرور
أَمِيناً ADJ.MS.IA صفة.نكرة.هو.منصوب
أَمِينُ ADJ.MS.DN صفة.معرفة.هو.مرفوع
أَمِينُ ADJ.MS.CN صفة.مضاف.هو.مرفوع
أَمِينِ ADJ.MS.CG صفة.مضاف.هو.مجرور
أَمِينَ ADJ.MS.CA صفة.مضاف.هو.منصوب
أَمِينَة ADJ.FS.IU صفة.نكرة.هي
أَمِينَةٌ ADJ.FS.IN صفة.نكرة.هي.مرفوع
أَمِينَةٍ ADJ.FS.IG صفة.نكرة.هي.مجرور
أَمِينَةً ADJ.FS.IA صفة.نكرة.هي.منصوب
أَمِينَةُ ADJ.FS.DN صفة.معرفة.هي.مرفوع
أَمِينَةُ ADJ.FS.CN صفة.مضاف.هي.مرفوع
أَمِينَةِ ADJ.FS.CG صفة.مضاف.هي.مجرور
أَمِينَةَ ADJ.FS.CA صفة.مضاف.هي.منصوب
أَمِينانِ ADJ.MD.IN صفة.نكرة.هما♂.مرفوع
أَمِينَيْنِ ADJ.MD.IG صفة.نكرة.هما♂.مجرور
أَمِينَيْنِ ADJ.MD.IA صفة.نكرة.هما♂.منصوب
أَمِينانِ ADJ.MD.DN صفة.معرفة.هما♂.مرفوع
أَمِينا ADJ.MD.CN صفة.مضاف.هما♂.مرفوع
أَمِينَيْ ADJ.MD.CG صفة.مضاف.هما♂.مجرور
أَمِينَيْ ADJ.MD.CA صفة.مضاف.هما♂.منصوب
أَمِينَتانِ ADJ.FD.IN صفة.نكرة.هما♀.مرفوع
أَمِينَتَيْنِ ADJ.FD.IG صفة.نكرة.هما♀.مجرور
أَمِينَتَيْنِ ADJ.FD.IA صفة.نكرة.هما♀.منصوب
أَمِينَتانِ ADJ.FD.DN صفة.معرفة.هما♀.مرفوع
أَمِينَتا ADJ.FD.CN صفة.مضاف.هما♀.مرفوع
أَمِينَتَيْ ADJ.FD.CG صفة.مضاف.هما♀.مجرور
أَمِينَتَيْ ADJ.FD.CA صفة.مضاف.هما♀.منصوب
أُمَناء ADJ.MP.IU صفة.نكرة.هم♂
أُمَناءُ ADJ.MP.IN صفة.نكرة.هم♂.مرفوع
أُمَناءَ ADJ.MP.IG صفة.نكرة.هم♂.مجرور
أُمَناءَ ADJ.MP.IA صفة.نكرة.هم♂.منصوب
أُمَناءُ ADJ.MP.DN صفة.معرفة.هم♂.مرفوع
أُمَناءُ ADJ.MP.CN صفة.مضاف.هم♂.مرفوع
أُمَناءِ ADJ.MP.CG صفة.مضاف.هم♂.مجرور
أُمَناءَ ADJ.MP.CA صفة.مضاف.هم♂.منصوب
أَمِينات ADJ.FP.IU صفة.نكرة.هن
أَمِيناتٌ ADJ.FP.IN صفة.نكرة.هن.مرفوع
أَمِيناتٍ ADJ.FP.IG صفة.نكرة.هن.مجرور
أَمِيناتٍ ADJ.FP.IA صفة.نكرة.هن.منصوب
أَمِيناتُ ADJ.FP.DN صفة.معرفة.هن.مرفوع
أَمِيناتُ ADJ.FP.CN صفة.مضاف.هن.مرفوع
أَمِيناتِ ADJ.FP.CG صفة.مضاف.هن.مجرور
أَمِيناتِ ADJ.FP.CA صفة.مضاف.هن.منصوب
GLF: مِتكَشِّخ 'looking elegant'
Word Form POS.Features قسم الكلام.الخصائص
متكشخ ADJ.MS صفة.هو
متكشخة ADJ.FS صفة.هي
متكشخين ADJ.MP صفة.هم
متكشخات ADJ.FP صفة.هن
EGY: جَدَع 'strong;macho'
Word Form POS.Features قسم الكلام.الخصائص
جدع ADJ.MS صفة.هو
جدعة ADJ.FS صفة.هي
جدعان ADJ.UP صفة.هم⚥


4.2.12 ADJ_NUM - صفة_عدد

Ordinal numbers
Ordinal numbers are used for ranking.

Examples

Word Form POS.Features قسم الكلام.الخصائص English Gloss Diaclect
الاوائل ADJ_NUM.MP صفة_عدد.هم the first GLF,EGY
اول ADJ_NUM.MS صفة_عدد.هو first GLF,EGY
الثانية ADJ_NUM.FS صفة_عدد.هي second GLF,EGY

Examples for MSA and other dialecs will be added soon

4.2.13 ADJ_COMP - صفة_مقارنة

Comparative Adjectives
A comparative adjective is a form derived from verbs according to their inflectional category.

Notes:

  • No morphological distinction is made between the comparative and the superlative meanings. The distinction is made based on the use of idafa (construct) with the superlative.

Examples

Word Form POS.Features قسم الكلام.الخصائص English Gloss Diaclect
ابرك ADJ_COMP.MS صفة_مقارنة.هو better;best GLF
اقل ADJ_COMP.MS صفة_مقارنة.هو less;least GLF
احسن ADJ_COMP.MS صفة_مقارنة.هو better;best GLF

Examples for MSA and other dialecs will be added soon

4.2.14 VERB - فعل

Verbs
A verb is a word used to describe an action, state, or occurrence, and forming the main part of the predicate of a sentence.

Examples

Below are examples of inflection tables based on basic paradigms, which means that the inflections are for the complete set of possible morphological features except for clitic features.

GLF: سار 'go;walk'
Word Form POS.FEATURES قسم_الكلام.الخصائص.
سرت VERB.P1S فعل.ماضي.انا
سرنا VERB.P1P فعل.ماضي.نحن
سرت VERB.P2MS فعل.ماضي.انت♂
سرتي VERB.P2FS فعل.ماضي.انت♀
سرتوا VERB.P2P فعل.ماضي.انتم⚥
سرتوا VERB.P2MP فعل.ماضي.انتم♂
سرتن VERB.P2FP فعل.ماضي.انتن
سار VERB.P3MS فعل.ماضي.هو
سارت VERB.P3FS فعل.ماضي.هي
ساروا VERB.P3P فعل.ماضي.هم⚥
ساروا VERB.P3MP فعل.ماضي.هم♂
سارن VERB.P3FP فعل.ماضي.هن
اسير VERB.I1S فعل.مضارع.انا
نسير VERB.I1P فعل.مضارع.نحن
تسير VERB.I2MS فعل.مضارع.انت♂
تسيرين VERB.I2FS فعل.مضارع.انت♀
تسيرون VERB.I2P فعل.مضارع.انتم⚥
تسيرون VERB.I2MP فعل.مضارع.انتم♂
تسيرن VERB.I2FP فعل.مضارع.انتن
يسير VERB.I3MS فعل.مضارع.هو
تسير VERB.I3FS فعل.مضارع.هي
يسيرون VERB.I3P فعل.مضارع.هم⚥
يسيرون VERB.I3MP فعل.مضارع.هم♂
يسيرن VERB.I3FP فعل.مضارع.هن
سير VERB.C2MS فعل.أمر.أنت♂
سيري VERB.C2FS فعل.أمر.أنت♀
سيروا VERB.C2P فعل.أمر.انتم⚥
سيروا VERB.C2MP فعل.أمر.انتم♂
سيرن VERB.C2FP فعل.أمر.انتن
EGY: مِضِي 'walk;proceed'
Word Form POS.FEATURES قسم_الكلام.الخصائص
مشيت VERB.P1S فعل.ماضي.انا
مشينا VERB.P1P فعل.ماضي.نحن
مشيت VERB.P2MS فعل.ماضي.انت♂
مشيتي VERB.P2FS فعل.ماضي.انت♀
مشيتوا VERB.P2P فعل.ماضي.انتم⚥
مشي VERB.P3MS فعل.ماضي.هو
مشيت VERB.P3FS فعل.ماضي.هي
مشيوا VERB.P3P فعل.ماضي.هم⚥
امشي VERB.I1S فعل.مضارع.انا
نمشي VERB.I1P فعل.مضارع.نحن
تمشي VERB.I2MS فعل.مضارع.انت♂
تمشي VERB.I2FS فعل.مضارع.انت♀
تمشوا VERB.I2P فعل.مضارع.انتم⚥
يمشي VERB.I3MS فعل.مضارع.هو
تمشي VERB.I3FS فعل.مضارع.هي
يمشوا VERB.I3P فعل.مضارع.هم⚥
امشي VERB.C2MS فعل.أمر.أنت♂
امشي VERB.C2FS فعل.أمر.أنت♀
امشوا VERB.C2P فعل.أمر.انتم⚥

Examples for MSA will be added soon

4.2.15 VERB_PSEUDO - شبه_فعل

Pseudo Verbs
Pseudo verbs are words that have the same syntactic behavior as verbs in that they take a subject and a predicate, or a sentential complement.

Notes:

  • Pseduo verbs don't take any features

Examples

Word Form POS.Features قسم الكلام.الخصائص English Gloss Dialect
Example
ريت VERB_PSEUDO شبه_فعل if only;wish GLF,EGY
يا ريتْني ما جيت
ترى VERB_PSEUDO شبه_فعل by the way;for your information GLF
ترى الرحلة طويلة لإيطاليا
تو VERB_PSEUDO شبه_فعل just now;at the moment GLF
كانت توها داشة الفيلا

Examples for MSA and other dialects will be added soon

4.2.16 VERB_NOM - اسم_فعل

Non-Inflectional verbs, also called Frozen Verbs
These are frozen expressions that behave like verbs syntactically but not morphologically. From a morphological point of view they are not inflectional, meaning that they do not inflect for all their tenses, sometimes none, and they do not have gender/number agreement. Syntactically, they subcategorize for arguments in the form of prepositional phrases and direct objects.

Examples

Word Form POS.Features قسم الكلام.الخصائص English Gloss Diaclect
(ما) عدا VERB_NOM اسم_فعل except GLF,EGY
آمين VERB_NOM اسم_فعل Amen GLF,EGY
اف VERB_NOM اسم_فعل ugh GLF,EGY
اخص VERB_NOM اسم_فعل Shame on you! EGY
آه VERB_NOM اسم_فعل Ah! GLF,EGY
اوه VERB_NOM اسم_فعل Ah! GLF,EGY
ما VERB_NOM اسم_فعل not GLF,EGY
حاشا VERB_NOM اسم_فعل except GLF,EGY
يالله VERB_NOM اسم_فعل hurry up;come on GLF,EGY
ايا VERB_NOM اسم_فعل watch out GLF,EGY

Examples for MSA will be added soon

4.2.17 ADV - ظرف

Adverbs
Adverbs are invariable and terminal words that give information about the time, location, manner, cause, purpose, or any other adverbial function modifying the verb or sentence.

Notes:

  • A word is invariable as it does not participate in an idafa construction. A word is terminal when nothing modifies it.
  • Some adverbs take pronominal clitics, in such cases, those pronouns are going to be cliticized normally. Example, يا دوب +ك. Also, note that adverbs with initial يا are considered to be two separate words.
  • Adverbs don't take features.

Examples

Word Form POS.Features قسم_الكلام.الخصائص English Gloss Diaclect
لسة ADV ظرف still EGY
يا دوب ADV ظرف It’s high time; a little;yet;still;just EGY
دلوقت ADV ظرف Now;at this moment;at this time;at present EGY
هنا ADV ظرف here GLF,EGY
هناك ADV ظرف there GLF,EGY
كمان ADV ظرف also;too GLF,EGY
برضك ADV ظرف also;too;nevertheless;even so;all the same. intensifier « really, surely » EGY
برضو ADV ظرف also;too;nevertheless;even so;all the same. intensifier « really, surely » EGY
بعدين ADV ظرف later;next GLF,EGY
امال ADV ظرف hence;then;so EGY
بس ADV ظرف only;enough GLF,EGY
بقى ADV ظرف so;then EGY
بعد ADV ظرف also;still GLF
هني ADV ظرف here GLF
سيدا ADV ظرف straight ahead GLF

Examples for MSA will be added soon

4.2.18 ADV_INTERROG - ظرف_استفهام

Interrogative Adverbs
Interrogative adverbs are invariable words that introduce questions that give specific information about time, location, manner, or purpose.

Notes:

  • Interrogative Adverbs don't take any features

Examples

Word From POS.Features قسم الكلام.الخصائص English Gloss Dialect
شلون ADV_INTERROG ظرف_استفهام how GLF
كيف ADV_INTERROG ظرف_استفهام how GLF
وشي ADV_INTERROG ظرف_استفهام how GLF
متى ADV_INTERROG ظرف_استفهام when GLF
وين ADV_INTERROG ظرف_استفهام where GLF
ليش ADV_INTERROG ظرف_استفهام why GLF

Examples for MSA and other dialects will be added soon

4.2.19 ADV_REL - ظرف_موصول

Relative Adverbs
Relative adverbs are invariable words that introduce adverbial relative clauses that give specific information about time, location, manner, or purpose.

Notes:

  • Relative Adverbs don't take any features

Examples

Word Form POS.Features قسم الكلام.الخصائص English Gloss Diaclect
Example
وين ADV_REL ظرف_موصول where GLF
لازم تخبرني وين سرت
فين ADV_REL ظرف_موصول where EGY
لازم تقول لي رحت فين
أين ADV_REL ظرف_موصول where MSA
أخبرني إلى أين ذهبت

4.2.20 PREP - حرف_جر

Prepositions
The term preposition is used to represent the closed class of items which have traditionally been identified as prepositions in Arabic.

Notes:

  • Prepositions don't take any features.

Examples

Word Form POS.Features قسم الكلام.الخصائص English Gloss Diaclect
في PREP حرف_جر in, at MSA,GLF,EGY
مع PREP حرف_جر with MSA,GLF,EGY
ويّا PREP حرف_جر with MSA,GLF,EGY
ب+ PREP حرف_جر with MSA,GLF,EGY
و+ PREP حرف_جر by MSA,GLF,EGY

4.2.21 INTERJ - تعجب

Interjections
Interjections are words or phrases (response particles) that express the speaker’s reaction to a particular proposition or sentence.

Notes:

  • Interjections don't take any features.

Examples

Word Form POS.Features قسم الكلام.الخصائص English Gloss Diaclect
بس INTERJ تعجب enough GLF,EGY
مرحبا INTERJ تعجب hello GLF,EGY
الو INTERJ تعجب hello (on phone) GLF,EGY
يالله INTERJ تعجب hurry up;come on! GLF,EGY
لأ INTERJ تعجب no GLF,EGY
انزين INTERJ تعجب OK GLF
اوكيه INTERJ تعجب OK GLF,EGY
حشى INTERJ تعجب GLF

Examples for MSA will be added soon

4.2.22 CONJ - حرف_عطف

Coordinating Conjunctions
Conjunctions are used to coordinate and link independent constituents with each other.

Notes:

  • Conjunctions don't take any features.

Examples

Word Form POS.Features قسم الكلام.الخصائص English Gloss Diaclect
بس CONJ حرف_عطف But GLF,EGY
و+ CONJ حرف_عطف and MSA,GLF,EGY
ف+ CONJ حرف_عطف and, then MSA,GLF,EGY
ولا CONJ حرف_عطف or (in questions) GLF,EGY

4.2.23 CONJ_SUB - أداة_ربط

Subordinating Conjunctions
A subordinating conjunction marks a sentence as dependent to another sentence that is independent and called the main clause.

Notes:

  • Subordinating conjunction don't take any features

Examples

Word Form POS.Features قسم الكلام.الخصائص English Gloss Diaclect
و+ CONJ_SUB أداة_ربط while MSA,GLF,EGY

More examples will be added soon

4.2.24 PART_VOC - حرف_نداء

Vocative Particles

Notes:

  • Vocative Particles don't take any features

Examples

Word Form POS.Features قسم الكلام.الخصائص English Gloss Diaclect
يا PART_VOC حرف_نداء O!;hey! MSA,GLF,EGY

More examples will be added soon

4.2.25 PART_RESTRICT - اداة_استثناء

Restrictive Particles
A restrictive particle is used in a negative construction marking a restriction.

Notes:

  • Restrictive Particles don't take any features

Examples

Word Form POS.Features قسم الكلام.الخصائص English Gloss Comments/Examples/Diaclect
الا PART_RESTRICT اداة_استثناء except for;only MSA,GLF,EGY

More examples will be added soon

4.2.26 PART_NEG - اداة_نفي

Negative Particles
A particle that negate what comes after it.

Notes:

  • Negative particles don't take any features

Examples

Word Form POS.Features قسم الكلام.الخصائص English Gloss Diaclect
م+ PART_NEG اداة_نفي not GLF,EGY
ما PART_NEG اداة_نفي not MSA,GLF,EGY
مب PART_NEG اداة_نفي not GLF
هب PART_NEG اداة_نفي not GLF
لا PART_NEG اداة_نفي not;neither;nor MSA,GLF,EGY

More examples will be added soon

4.2.27 PART_DET - اداة_تعريف

Determiner Particles
A clitic that attaches to nominals.

Notes:

  • Determiner particles don't take any features

Examples

Word Form POS.Features قسم الكلام.الخصائص English Gloss Comments/Examples/Diaclect
ال+ PART_DET اداة_تعريف the MSA,GLF,EGY

4.2.28 PART_INTERROG - اداة_استفهام

Interrogative Particles
Interrogative particles introduce questions.

Notes:

  • Interrogative particles don't take any features

4.2.29 PART_FUT - اداة_استقبال

Future Particles
Interrogative particles mark the future when attaches to imperfective verbs.

Notes:

  • Future particles don't take any features

Examples

Word Form POS.Features قسم الكلام.الخصائص English Gloss Diaclect
ب+ PART_FUT اداة_استقبال will GLF
ح+ PART_FUT اداة_استقبال will EGY
ه+ PART_FUT اداة_استقبال will EGY
س+ PART_FUT اداة_استقبال will MSA
سوف PART_FUT اداة_استقبال will MSA
رح PART_FUT اداة_استقبال will GLF

4.2.30 PART_FOCUS - اداة_تفصيل

Focus Particles
Focus particles highlight the topic of the sentence or adds emphasis.

Notes:

  • Focus particles don't take any features.

Examples

Word Form POS.Features قسم الكلام.الخصائص English Gloss Diaclect
اما PART_FOCUS اداة_تفصيل as for MSA,GLF,EGY

More examples will be added soon

4.2.31 PART_EMPHATIC - اداة_توكيد

Emphatic Particles
Emphatic particles adds emphasis.

Notes:

  • Emphatic particles don't take any features.

Examples

Word Form POS.Features قسم الكلام.الخصائص English Gloss Comments/Examples/Diaclect
ل+ PART_EMPHATIC اداة_توكيد that MSA,GLF,EGY

More examples will be added soon

4.2.32 PART_RC - جواب_شرط

Response Conditional Particles
Response conditional particles are used in conditional sentences introducing the apodosis sentence/main clause.

Notes:

  • Response conditional particles don't take any features.

Examples

Word Form POS.Features قسم الكلام.الخصائص English Gloss Comments/Examples/Diaclect
ف+ PART_RC جواب_شرط so then MSA,GLF,EGY

More examples will be added soon

4.2.33 PART - حرف

Particles
Particles do not assign case and they can be omitted without affecting or altering meaning and/or structure.

Notes:

  • Particles don't take any features.

Examples

Word Form POS.Features قسم الكلام.الخصائص English Gloss Diaclect
Example
و+ PART حرف and GLF,EGY
جان PART حرف if, and so GLF
لولاها جان ما كانت الكل بالكل

More examples will be added soon

4.2.34 PART_PROG - حرف_مضارعة

Progressive Particle
Denotes that a verb is in action

Notes:

  • Progressive particles attache to imperfective verbs only.
  • Progressive particles don't take any features.

Example

Word Form POS.Features قسم الكلام.الخصائص English Gloss Diaclect
Example
ب+ PART_PROG حرف_مضارعة EGY

More examples will be added soon

4.2.35 PART_CONNECT - حرف_ربط

Connective Particles
Connective particles connect two clauses. They are most commonly used to introduce a comment clause after a clause starting with اما.
  • Connective particles don't take any features

Examples

Word Form POS.Features قسم الكلام.الخصائص English Gloss Diaclect
ف+ PART_CONNECT حرف_ربط {Discourse connective} GLF,EGY

4.3 Aditional Annotation Tasks

4.3.1 Lemmatization

The lemma is the citation form of the word. Across all our guidelines, we follow the lemma specification in (Graff et al, 2009), where:

  • The lemma of all nominals is the masculine singular form of the word or the feminine singular form if no masculine form exists.
  • The lemma of a verb is the perfective 3rd person masculine singular form.
  • For all others (i.e. particles, adverbs, ... etc) the lemma is the same form as the baseword.

Notes:

  • For some nominal cases such as اسم الوحدة ‘collective plurals’ which are also uncountable nouns, the lemma is the same as the noun. See examples below.
  • The diacritization of the lemma includes adding all the short vowel diacritics except for the sukun ‘absence of a vowel’.
  • Cases to look out for are the /oo/ and /ee/ long vowels. The vowel /oo/ is marked as ـَو, /ee/ is marked either as ـَي or ـِا.
  • In the case of long vowels /aa/, /uu/, and /ii/ a short vowel marker of the same kind precedes the long vowel (i.e. ـَا, ـُو, and ـِي).
Word Form Lemma POS English Diaclect
Comments
كتب كِتاب NOUN.MP books GLF,EGY
كتبوا كَتَب VERB.P3MP They wrote GLF,EGY
تفاح تُفَّاح NOUN.MS apples GLF,EGY
collective plurals
تفاحة تُفَّاحَة NOUN.FS apple GLF,EGY
تفاحات تُفَّاحَة NOUN.FP apples GLF,EGY
تمر تَمر NOUN.MS dates (fruit) GLF,EGY
collective plurals
تمور تَمر NOUN.MP dates (fruit) GLF,EGY
تمرة تَمرَة NOUN.FS date (fruit) GLF,EGY
تمرات تَمرَة NOUN.FP dates (fruit) GLF,EGY
ناس نَاس NOUN.MP people, humans GLF,EGY
collective plurals

4.3.2 Gloss

The English gloss refers to the semantic translation of the Arabic lemma.

Notes:

  • For nominals the gloss is the singular form of the word.
  • For verbs the gloss infinitive form.
Lemma Word Form POS. English Gloss Comments/Examples/Diaclect
كِتاب كتب NOUN.MP book GLF,EGY
كَتَب كتبوا VERB.P3MP write

4.3.3 Dialect Identification

Dialect identification (DID) is the task of tagging a certain context with a given dialect tag.

Deciding the dialect tag depends on the context of the sentence and/or the document. As dialects may share the same words within themselves or with MSA, the dialect is inferred from the sentence structure and word order of that specific dialect.

Although all words belonging to the same sentence may get the same dialect tag, in some cases two different dialectal structure could occur in the same sentence, hence we tag per word

5. Acknowledgments

  • This work was funded by a Research Enhancement Fund from New York University Abu Dhabi
  • Portions of the Egyptian Arabic Guidelines are based on the LDC's Egyptian Arabic Morphological Guidelines (Maamouri et al., 2013)
  • Portions of the Gulf entries are from the textbook: Ramsah, An Introduction to learning Emirati Dialect and Culture (Nasser Isleem and Ayesha Al Hashemi, 2015)

  1. Refer to the phonology guidelines for the complete CAPHI reference.  

  2. 'Ta Marbuta' suffix is usually used to mark the feminine gender. 

  3. Colorless green ideas sleep furiously 

  4. The feminine singualr form مَلِكَة translates to 'Queen' in English.