You are on page 1of 19

Received: 21 April 2018 Revised: 21 January 2019 Accepted: 15 February 2019

DOI: 10.1111/exsy.12397

ORIGINAL ARTICLE

Creating sentiment lexicon for sentiment analysis in Urdu:


The case of a resource‐poor language

Muhammad Zubair Asghar1 | Anum Sattar1 | Aurangzeb Khan2 | Amjad Ali3 |

Fazal Masud Kundi1 | Shakeel Ahmad4

1
Institute of Computing and Information
Technology (ICIT), Gomal University, Dera Abstract
Ismail Khan, Pakistan The sentiment analysis (SA) applications are becoming popular among the individuals
2
Department of Computer Science, University
and organizations for gathering and analysing user's sentiments about products,
of Science and Technology, Bannu, Pakistan
3
Department of Computer and Software
services, policies, and current affairs. Due to the availability of a wide range of English
Technology, University of Swat, Saidu Sharif, lexical resources, such as part‐of‐speech taggers, parsers, and polarity lexicons, devel-
Pakistan
4
opment of sophisticated SA applications for the English language has attracted many
Faculty of Computing and Information
Technology in Rabigh (FCITR), King Abdul Aziz researchers. Although there have been efforts for creating polarity lexicons in non‐
University (KAU), Jeddah, Saudi Arabia English languages such as Urdu, they suffer from many deficiencies, such as lack of
Correspondence publically available sentiment lexicons with a proper scoring mechanism of opinion
Muhammad Zubair Asghar, Institute of words and modifiers. In this work, we present a word‐level translation scheme for
Computing and Information Technology (ICIT),
Gomal University, Dera Ismail Khan, KP, creating a first comprehensive Urdu polarity resource: “Urdu Lexicon” using a merger
Pakistan. of existing resources: list of English opinion words, SentiWordNet, English–Urdu
Email: zubair@gu.edu.pk
bilingual dictionary, and a collection of Urdu modifiers. We assign two polarity scores,
positive and negative, to each Urdu opinion word. Moreover, modifiers are collected,
classified, and tagged with proper polarity scores. We also perform an extrinsic
evaluation in terms of subjectivity detection and sentiment classification, and the
evaluation results show that the polarity scores assigned by this technique are more
accurate than the baseline methods.

K E Y W OR D S

polarity lexicon, sentiment analysis, Urdu sentiment lexicon, Urdu SentiWordNet

1 | I N T RO D U CT I O N

The Web is a rich resource If information available for the online community to know about services, policies, issues, and products (Asghar, Khan,
Bibi, Kundi, & Ahmad, 2017). The online blogs, news, and posts on social media sites such as Facebook and Twitter have attracted many
researchers in the area of opinion mining and sentiment analysis (SA) for developing applications to assist individuals, companies, and government
organizations in decision‐making (Khan, Asghar, Ahmad, Kundi, & Ismail, 2017).
The sentiment lexicons are the repositories for storing opinionated terms along with their sentiment class (+ive, −ive, or neutral) and numeric
scores. For example, “awesome,” “superb,” and “love” are the positive terms, whereas “hate,” “painful,” “terrible,” and so forth are the negative
sentiment terms. Such sentiment lexicons are considered as a driving wheel for the development of most of the sentiment analysis systems.
Due to the availability of powerful lexical repositories for English language, development of sentiment analysis applications is relatively an
easier task as compared with languages with poor resources. Different sentiment lexicons are available for English, such as SentiWordNet
(Baccianella, Esuli, & Sebastiani, 2010), General Inquirer (Stone & Hunt, 1963), WordNet Affect (Mukhtar & Khan, 2018), and opinion word list
(Wilson, Wiebe, & Hoffmann, 2005; Liu & Hu, 2004).

Expert Systems. 2019;e12397. wileyonlinelibrary.com/journal/exsy © 2019 John Wiley & Sons, Ltd 1 of 19
https://doi.org/10.1111/exsy.12397
2 of 19 ASGHAR ET AL.

There are three major techniques for creating and annotating sentiment lexicons, namely, (a) manual technique, (b) dictionary‐based, and (c)
corpus‐based (Asghar et al., 2017). Manual annotation scheme mainly relies on building manual dictionaries by a group of linguists of the target
language. Such lexicons are reliable; however, their development process is time‐consuming and subject to annotator bias, and not used indepen-
dently, but used with automated approaches to minimize the errors committed during the semantic orientation of sentiment words (Mukhtar,
Khan, & Chiragh, 2017). The dictionary‐based approach takes a list of initial seed words and expands it over the other lexical resources, such
as WordNet and SentiWordNet (Asghar, Khan, Ahmad, Khan, & Kundi, 2015). The key drawback of such technique is the limited coverage of
words required for processing domain‐specific content. The corpus‐based technique is mainly dependent on the labelled corpus of user reviews
using +ive and −ive classes of sentiment words and gives maximum coverage to domain‐specific content.
The aforementioned approaches for sentiment lexicon generation are widely used for creating sentiment to process English text. Several stud-
ies (Afraz, Muhammad, & Martinez‐Enriquez, 2011; Awais, 2012; Badaro, Baly, Hajj, Habash, & El‐Hajj, 2014; Bakliwal, Arora, & Varma, 2012;
Dashtipour et al., 2016; Dehkharghani, Saygin, Yanikoglu, & Oflazer, 2016) have been conducted to perform sentiment analysis in languages other
than English; till date, most of the research efforts made in the area of sentiment analysis deal with English text (Mukhtar et al., 2017). This is
due to the fact that extraction and analysis of sentiments from text need a rich collection of lexical resources of that language. However, unlike
English, Urdu is a resource‐poor language, and therefore, the creation of sentiment lexicon for Urdu text is an important and challenging task
(Afraz et al., 2011).
Previous work on Urdu sentiment lexicons (Afraz et al., 2011; Awais, 2012) is based on morphological syntactic, phonological, and orthographic
aspects. However, their approach did not provide sufficient coverage of Urdu sentiment words along with their sentiment scores. Such studies
have merely focused on storing and classifying the words into objective (neutral) and subjective (positive or negative) classes, with no provision
of assigning numeric scores to sentiment words, which is one of the basic requirement of most of the lexicon‐based Urdu sentiment analysis
systems. Furthermore, polarity modifiers are also not covered in such lexicons, which if incorporated can enrich the strength of sentiment lexicon
for more accurate sentiment classification.
The proposed technique is inspired by the previous work on developing sentiment lexicons (Afraz et al., 2011; Awais, 2012; Das &
Bandyopadhyay, 2010; Ijaz & Hussain, 2007). The previous studies have integrated morphological concepts with corpus‐based techniques by
classifying the words according to their part of speech, concepts, and synsets using pre‐annotated corpora. However, we propose a word‐level
translation technique to generate Urdu sentiment lexicon based on opinion word list, English–Urdu bilingual dictionary, a collection of Urdu
modifiers, and sentiment scoring technique for opinion words and modifiers. Mohammad, Dunne, and Dorr (2009) on their work on English
sentiment lexicon construction ignored part of speech tagging and focused on classifying the words (Urdu) into positive and negative polarities
along with their sentiment scores. However, our lexicon not only is a lexical resource of sentiment words with their polarity class, scores, and part
of speech but also is available publically. Finally, our lexicon also contains a set of positive and Urdu modifiers along with their sentiment scores.
A synopsis of contributions is as follows:

• Proposes and implements a word‐level translation mechanism for creating Urdu sentiment lexicon.

• Translation of English of opinion words to corresponding Urdu words using a bilingual dictionary.
• Proposes a novel sentiment scoring scheme for Urdu opinion words.
• Acquisition of Urdu modifiers and their sentiment scoring.
• Demonstrates the effectiveness of the proposed lexicon with respect to comparing lexicons using extrinsic evaluation.

The rest of the paper is structured as follows. Section 2 demonstrates challenges in Urdu sentiment analysis. Section 3 demonstrates literature
review about the creation of sentiment lexicons in non‐Urdu and Urdu languages. In Section 4, we describe the proposed method for creating
Urdu Lexicon. Section 5 describes the dataset compilation and extrinsic evaluation of the developed lexicon. Section 6 concludes the work with
a discussion on how it can be expanded in future.

2 | U R D U A N D I T S C H A LL E N G E S I N S E N T I M E N T A NA L Y S I S

Urdu is the national language of Pakistan and also a widely spoken language in India. The sentiment analysis in Urdu faces different challenges due
to different factors including lack of established lexical resources (Afraz et al., 2011; Ijaz & Hussain, 2007).
Unfortunately, most of the Urdu websites are created in graphic format instead of proper Urdu text encoding scheme, which makes it difficult
to construct machine‐readable corpus. Therefore, the development of gold standard machine‐readable corpus in Urdu is very much important. The
sentiment lexicon is a basic building block for the development of an automated sentiment analysis system in any language. English is a resource‐
rich language with a number of well‐established sentiment lexicons, such as SentiWordNet. However, Urdu is a resource‐poor language, and lim-
ited work is performed on the creation of such lexicon. To the best of our knowledge, there is no such publically available lexicon that can assist
ASGHAR ET AL. 3 of 19

the researchers and developers to perform sentiment scoring of opinion words and modifiers during the development of sentiment analysis
systems. Therefore, construction of Urdu sentiment lexicon is considered as one of the most focused areas of research in Urdu text processing.
In addition to the aforementioned challenges, there are a number of other issues, which make it very tedious and challenging to develop a fully
functional sentiment analysis system. These include word segmentation, variation in morphology, flexibility in vocabulary, and case markers. As our
work deals with the creation of sentiment lexicon, therefore, we concentrate on the challenge of Urdu sentiment lexicon creation and propose an
effective methodology to generate a valuable resource for both academia and industry.

3 | R E LA T E D WO R K

Several methods are proposed for the development of sentiment lexicons (Figure 1). In this section, we present some of the relevant works
performed for the creation of sentiment lexicons in English, non‐English, and Urdu languages.

3.1 | Sentiment lexicons in English

WordNet (Miller, Beckwith, Fellbaum, Gross, & Miller, 1990) is a most widely used lexicon used for the development of sentiment analysis
systems. It stores and groups together different synonyms of a word to form synsets, and synsets are grouped into verbs, adverbs, nouns, and
adjectives. Such synsets are linked to each other through different semantic relations, namely, hyponymy, meronym, homonym, and hypernym.
WordNet is considered a pivotal lexical resource for different sentiment analysis systems.
General Inquirer (GI) is a manually annotated English language sentiment lexicon. It classifies the words into different categories, such as
“positive,” “negative,” “hostile,” “power,” “active,” and “passive” (Stone & Hunt, 1963). It tags parts of speech of a word with the different types
of linguistic information, namely, syntactic, semantic, and pragmatic.
SentiWordNet (SWN) is an English language sentiment lexicon, with more than 50,000 words retrieved automatically from the WordNet
lexicon (Baccianella et al., 2010). Each entry in SWN is associated with three polarity scores: positive, negative, and neutral. These scores reflect
positivity, negativity, and objectivity of each of the word. Each score ranges in the interval, 0 to 1, such that the net value is equal to 1, for each of
the synsets.

3.2 | Sentiment lexicons in other languages

Kanayama and Nasukawa (2006) in their work on opinion lexicon construction developed a sentiment lexicon for the Japanese language based on
general purpose opinion lexicon and a repository of Japanese lexical items, called polar constructs. They used context coherency for obtaining the
candidate polar items. However, the system was not capable of assigning sentiment scores to domain‐specific words.
Banea, Mihalcea, and Wiebe (2008) presented a bootstrapping approach for constructing the sentiment lexicon using an online lexicon and
manually creating a seed list of subjective terms. The system is tested on Romanian language corpus, and the seed list is classified on the basis
of two measures: pointwise mutual information (PMI) and latent semantic analysis (LSA). These measures compute the similarity between the
words present in the original candidate list and each of the associated word. LSA is much faster and requires less training samples as compared
with PMI. However, the outcomes of both techniques were the same. The system can be improved further by enriching the initial seed list.
Borzì, Faro, Pavone, and Sansone (2015) developed a lexical resource to be used in Italian language sentiment applications. It is composed of
two repositories: Italian sentiment dictionary containing words annotated with their sentiment scores and a set of more than 200 polarity
modifiers. The authors claimed it as a first lexical resource that could assist in developing sentiment analysis systems in the Italian language.
Dehkharghani et al. (2016) in their work sentiment lexicons developed the first Turkish sentiment lexicon using semiautomatic approach by
assigning sentiment scores to all synsets in the Turkish WordNet using SentiWordNet and pointwise mutual information technique. The results
depicted that polarity scores assigned by their lexicon are more accurate as compared with baseline methods.

FIGURE 1 Literature classification diagram


4 of 19 ASGHAR ET AL.

Badaro et al. (2014) generated a publically available Arabic sentiment lexicon by combining different existing resources, such as English
WordNet, English SentiWordNet, Arabic WordNet, and Standard Arabic Morphological Analyzer (SAMA). The lexicon is evaluated using non‐linear
SVM classifier for subjectivity and sentiment classification and produced better results as compared with WordNet‐based technique.
In order to classify the Hindi product reviews, Bakliwal et al. (2012) proposed a graph‐based method to create subjectivity lexicon, consisting of
adjectives and adverbs along with their polarity scores. They extended the initial seed lexicon by using both synonyms as well as antonym relation.
The limitations of their work include lack of word sense disambiguation and dependency on the pre‐annotated corpus.
Das and Bandyopadhyay (2010) in their work on SentiWordNet creation for Indian languages proposed a bilingual dictionary‐based approach
using SentiWordNet and subjectivity lexicon. The proposed lexicon addressed three Indian languages, namely, Bengali, Hindi, and Telugu. To
evaluate the performance of the resulting lexicon, different extrinsic and intrinsic evaluation methodologies have been used. Furthermore, they
developed an online sentiment evaluation game to measure the performance of the final lexicon.

3.3 | Sentiment lexicons in Urdu language

While working on sentiment‐annotated Urdu lexicon, Afraz et al. (2011) proposed a method to distinguish between the subjective and objective
terms in a text. In the next step, the semantic orientation of subjective text (positive or negative) is checked, and sentiment words are intensified
accordingly. For example “ ” (bohat kubsurt, very beautiful) is a subjective phrase in which the intensifier “‫( ”ﺑﮩﺖ‬bohat, very) shows the
intensity of the opinion word “‫( ”ﺧﻮﺑﺼﻮﺭﺕ‬khoubsurat, beautiful). They achieved satisfactory results by gaining an accuracy of 74%. However, lexicon
lacks sentiment scores of opinion words, and also modifiers and their sentiment scores are not addressed.
In an early work on Urdu sentiment analysis, Afraz et al. (2011) presented a framework for analysing the text, by identifying and extracting the
sentiment information expressed in the review. The proposed method operates on two major steps: creating sentiment‐annotated lexicon and
building a classification model for sentiment classification. They achieved 72% accuracy for movie domain and 72% for product domain. However,
an extension of modifiers by adding more adjectives and enhancement of lexicon by including sentiment scores of opinion words could improve
their system.
In another work on Roman–Urdu text processing, Javed and Afzal (2013) developed a bilingual sentiment analysis system for English and
Roman–Urdu. They used a bilingual classifier to separate and classify English and Roman–Urdu tweets. For this purpose, bilingual sentiment
lexicon is created using SentiStrength, WordNet, and a bilingual list of words. The major drawback of their system is that they considered only
Roman–Urdu text and provided no mechanism to deal with pure Urdu language text.
An overview of recent sentiment lexicon creation techniques is presented in Table 1. Although sufficient work is carried out in English and
other languages such as Turkish, Arabic, and Hindi but still, there are enough gaps for improvement in languages with poor lexical resources, such
as Urdu. It is observed that as compared with other languages, there are few studies on the creation of Urdu sentiment lexicon. Furthermore, such
studies lack a proper mechanism for assigning polarity scores to sentiment words, and there is no provision of modifiers and their polarity scores.
In order to bridge this gap, there is a need to create a comprehensive sentiment lexicon for Urdu, which provides a sufficient coverage of
sentiment words and modifiers along with accurate polarity scores. Figure 2 shows the block diagram of the proposed lexicon.

3.4 | Sentiment analysis in Urdu

In this section, we present some of the latest works carried out in Urdu SA.
An Urdu SA system is proposed by Rehman and Bajwa (2016) using different lexical resources such as English and Urdu lexicons. An efficient
filtering strategy is proposed to discard irrelevant words by gaining an accuracy of more than 65%. The major limitation of their work is the use of
less efficient sentiment scoring technique with no coverage of emoticons and slang terms.
To classify Roman Urdu text, Bilal, Israr, Shahid, and Khan (2016) proposed a supervised machine learning technique by applying different
machine learning classifiers, namely, KNN, decision tree, and naïve Bayes. The experimental results show naïve Bayes yielded improved results
in terms of different evaluation metrics such as accuracy, precision, recall, and F‐measure. The major limitation is the limited size of dataset
due to which results were not much improved.
Hashim and Khan (2016) proposed a lexicon‐based technique for SA of Urdu text in headline news. They proposed nouns and adjectives as
opinion carriers and achieved an accuracy of 86%. The main drawback of the approach is the limited number of sentiment words stored in the
lexicon. Further improvement can be made by using an extended set of seed terms.
A supervised machine learning based approach is proposed by Mukhtar and Khan (2018) for the sentiment classification of Urdu blogs
comprising 151 blogs from 14 categories. They used different machine learning classifiers such as KNN, decision tree, SVM, and IBK. The exper-
imental results show that IBK performed better than other classifiers. A concept level paradigm in Urdu SA can produce improved results.
ASGHAR
ET AL.

TABLE 1 Overview of studies on sentiment lexicons

Study Approach Type Domain Results

Dehkharghani et al. (2016) Hybrid Turkish sentiment lexicon Turkish reviews 86.11% to 91.11% accuracy
●SentiWordNet
●Pointwise mutual information
Badaro et al. (2014) Dictionary‐based Arabic sentiment lexicon Arabic news and blogs Average F‐measure of more than 65%
●English WordNet
●Arabic Word‐Net
●English SentiWordNet
●SAMA
Bakliwal et al. (2012) Corpus‐based Hindi sentiment lexicon Product reviews 79% accuracy on sentiment
●Pre‐annotated seed list classification of product reviews
●Breadth first traversal for seed More than 70% agreement with human
expansion annotators
Das and Bandyopadhyay (2010) Hybrid Sentiment lexicon for three Indian News and blogs More than 80% coverage of target
●Corpus‐based languages: Bengali, Hindi, and Telugu words
●WordNet
●Bilingual dictionary‐based
translation
Ijaz and Hussain (2007) Corpus‐based Urdu lexicon without polarity Data collected from different and Comparison of total number of words
●POS tags classification offline and online sources in different and number of unique words
●Lemmas domains such as news, sports, obtained from raw and clean corpus
●Phonemic transcription finance, and others
Afraz et al. (2011) Objective and subjective word Urdu lexicon with polarity classification, Manually compiled reviews about 74–79% accuracy for positive reviews
identification but no provision of sentiment scores for movies and electronics and 66–74% accuracy on negative
opinion words reviews.
Linguistic features such as
orthographic, phonological,
syntactic, and morphological
features
Javed and Afzal (2013) Lexicon‐based Roman–Urdu sentiment lexicon on a Roman–Urdu Tweets collected from Lack of result comparison with state‐
●SentiStrength limited scale Twitter of‐art methods
●English to
Roman–Urdu dictionary
●Bigram‐based cosine similarity
●Dice coefficient
and Jaccard similarity
5 of 19
6 of 19 ASGHAR ET AL.

FIGURE 2 Block diagram of proposed


lexicon

4 | MATERIAL AND METHODS

4.1 | Resources

We have used the following resources during the development of Urdu Lexicon.
Bing Liu's List of Opinion Words: This list is composed of more than 6,000 sentiment terms tagged as +ive and −ive (Asghar et al., 2015).
English SentiWordNet: The SentiWordNet (SWN; Wilson et al., 2005) is a lexical dictionary that assigns three sentiment scores: +ive, −ive, and
objective to each of the entry, respectively.
English to Urdu bilingual dictionary: We have used an Oxford English–Urdu Dictionary bilingual dictionary (Haqqee, 2015) to translate each of
the English opinion words to corresponding Urdu word.
Urdu Opinion Lexicon (Chaoticity): A list of positive (2,607) and negative (4,728) Urdu opinion words (Awais, 2012).

4.2 | Construction

The problem addressed in this work is to construct a sentiment lexicon for Urdu, depicting the sentiment scores for all of the opinion words
translated from English opinion words. The assigned sentiment values are the score triplets showing the sentiment strength of each word. More-
over, Urdu modifiers along with their sentiment scores are also included. The construction steps are presented in Figures 3 and 4 and Algorithms 1
and 2; an explanation of each step is given in the subsequent subsections.
ASGHAR ET AL. 7 of 19

FIGURE 3 Flowchart of sentiment scoring of Urdu opinion words

To construct Urdu sentiment lexicon, word‐level translation strategy is used (Das & Bandyopadhyay, 2010; Stone & Hunt, 1963). Two
thousand positive and negative opinion words are selected from Bing Lu′s collection (Liu & Hu, 2004). The proposed technique of lexicon
generation for opinion words (Figure 3) works as follows:

1. A word from the list of opinion words (Liu & Hu, 2004) is looked up into SentiWordNet (SWN; Baccianella et al., 2010).
2. Its average sentiment score is determined from SWN by using Equations 1–4. For example, the word “compassionate” has an average positive
score +0.3125. Details of sentiment scoring are presented in Section 4.3.

3. The Urdu translation of a word is looked up in English to Urdu dictionary (Haqqee, 2015; http://www.urduenglishdictionary.org/; http://
hamariweb.com/dictionaries/urdu‐english‐dictionary.aspx?eu=brilliantly). As more than one translations of a word are returned by such
bilingual dictionaries, we select the first meaning, ignoring less common, literary, and poetic translations (Zafar et al., 2012). For example,
the positive opinion word “compassionate” in Table 2 has multiple meanings in Urdu; however, we select the first one, as it is the most nearest
translation.
4. As we have used word‐level bilingual translation strategy, therefore, sentiment score of English opinion word computed at Step 2 is assigned
to its equivalent Urdu word. Therefore, the word “‫( ﺭﲪﺪﻝ‬rehm‐dil, compassionate)” receives a positive score: +0.3125.
8 of 19 ASGHAR ET AL.

FIGURE 4 Modifier collection and scoring mechanism in Urdu Lexicon

Algorithm 1 Urdu sentiment lexicon generation

Input:
(i) O‐L: List of +ive and negative English words taken from Opinion Lexicon
(ii) SWN: SentiWordNet sentiment lexicon for sentiment scoring
(iii) EU‐Dict: English to Urdu dictionary
Output:
UOW-List: List of +ive and −ive Urdu opinion words annotated with sentiment scores
Begin
1. For each English word ew Ɛ O-L do
2. Translate English word ew to Urdu word uw using English to Urdu dictionary
3. If (ew exists in SWN) then
4. {
5. Compute sentiment score of ew using Eq. 1, Eq. 2, Eq. 3 and Eq. 4
6. }
7. If (SWN-based score of ew computed at step# 4 is correct) then
8. {
9. Compute sentiment score of Urdu word uw using Eq. 5
10. }
11. If (SWN-based score of ew computed at step#5 is incorrect) or (ew does not exist in SWN) then {
12. Compute sentiment score of Urdu word uw using Eq. 6
13. }
9.end if
10.UOW-List.add (uw, score, pos)
ASGHAR ET AL. 9 of 19

11.end for
12.Return UOW-List
End

Using the aforementioned strategy, we acquired 1,044 (positive) and 1,002 (negative) Urdu opinion words. Additionally, we acquired 2,607
positive and 4,728 negative Urdu opinion words from a list proposed by Awais (2012). In this way, the size of the resulting lexicon (proposed)
has increased to 3,651 (positive) and 5,730 (negative) words.

4.3 | Sentiment scoring

The sentiment scoring module operates in two steps: (a) English words' scoring and (b) Urdu words' scoring.

4.3.1 | Sentiment scoring of English words

To associate sentiment score to English opinion words, we choose to use SentiWordNet (SWN), due to its wide range of terms and their sentiment
scores. The SWN is a sentiment lexicon, widely used in sentiment analysis systems with more than 55,000 words retrieved automatically from
WordNet (Baccianella et al., 2010). Each term in SWN is associated with three numeric scores: positive, negative, and objective. Each entry can
range from a score of 0.0 to 1.0. To disambiguate multiple senses of a sentiment term, we calculate three average values: sent_score+,
sent_score−, and sent_scoreo for all senses of a term “ti”:

1 n
sent¯scoreþ ðewÞ ¼ ∑ sentþ ðiÞ; (1)
numSyn i¼1

1 n
sent¯score− ðewÞ ¼ ∑ sent− ðiÞ; (2)
numSyn i¼1

1 n
sent¯scoreo ðewÞ ¼ ∑ sento ðiÞ; (3)
numSyn i¼1

where sent ¯ score+, sent ¯ score−, and sent ¯ scoreo represent the average sentiment score: positive, negative, neutral of sense i for the term ti, and
numSyn is the sum of all possible synsets of the term ti.
For example, the term “relieve” has three senses with their average positive, negative, and neutral score, as shown in Table 3.
The final polarity score of a term is computed by choosing its maximum polarity as

8 þ þ − þ
< sen if maxðsen ; sen ; sen Þ ¼ sen
o
>
sen ðewÞ ¼ sen if maxðsen ; sen ; sen Þ ¼ sen−
swn − þ − o (4)
>
:
seno else

The senswn(ew) is +ive, if the average +ive score (sen+) is greater than both average negative (sen−) and neutral (seno) scores. We apply the same
rule to obtain the negative and neutral scores. For example, the sentiment score set {sen+, sen−, seno} for a word “Relieve” is {0.216, 0.102, 0.175};
therefore, senswn (“relieve”) = 0.216. This term is neutral. The same principle is applied for −ive or objective sentiment.

TABLE 2 A sample opinion word in English and its translation in Urdu

English opinion word Urdu translation Sentiment score and POS

Compassionate ‫(ﺭ َﺣﯿﻢ‬vi) ‫(۔ ﺩَﺭﺩﻣ َﻨﺪ‬iii) ‫(۔ ﮬَﻤﺪَﺭﺩ‬ii ‫ )ﺭﲪﺪﻝ)۔۔‬i)) ‫ ﺭﲪﺪﻝ‬i) +0.3125)

TABLE 3 SentiWordNet example entry of word “relieve”

Term Sense ID Synsets Pos‐score Neg‐score Neu‐score

Relieve 00064095 relieve#1 relieve#2 relieve#3 0.216 0.102 0.175


10 of 19 ASGHAR ET AL.

4.3.2 | Sentiment Scoring of Urdu Words

The sentiment score of Urdu words is performed in three ways: (a) SWN‐based, (b) manual‐driven, and (c) corpus‐based.

SWN‐based
The Urdu words are obtained by translating the English opinion words using English to Urdu dictionary, and the sentiment score of such English
word is already computed using Equation 5. Therefore, we assign the SWN‐based score of English word to corresponding Urdu word as follows:

senswn ðuwÞ ¼ senswn ðewÞ: (5)

For example, the score of English word “satisfaction” is 0.125, which is assigned to its corresponding word in Urdu: “‫( ﺍﴳﯿﻨﺎﻥ‬itminan,
satisfaction)” is 0.125 (see Table 4).

Manual‐driven scoring
Manual‐driven scoring technique is used to assign sentiment scores to such word that are either not available in SWN or they have received an
incorrect score in SWN. The proposed scheme works as follows:
Five human annotators manually assigned sentiment score to Urdu words in our lexicon. The annotators were told to assign scores on a scale
of ±0.1 to ±1. After receiving five scores for each word, an average of annotators' scores is computed for each word. The overall inter‐annotator
agreement is 91.2% with a kappa (κ) score of 0.85, which is quite high. The kappa score is widely used to verify inter‐annotator agreement
(McHugh, 2012). It shows the extent to which the agreement between the human annotators in the study is correct. It is computed as follows:
p p
κ ¼ a− e , where pa is a simple annotator agreement and pe is the likelihood that annotator is attributable to chance. According to Landis and
1 − pe
Koch (1977), the kappa score “κ” and its associated agreement level are presented in Table 5.
For example, the word “stampede” is negative opinion word; however, it received a score 0, which is incorrect (see Table 6). Therefore, the
corresponding translated word in Urdu “‫( ﺑﮭﮕﮉﺭ‬bhagdar, stampede)” will also receive an incorrect score of 0, which is incorrect. The manual‐
driven scoring scheme is formulated as follows:

 5

senmanual ðuwÞ ¼ ∑ f− þ 0:1 to − þ 1g=5; ðew ∉ SWNÞ or ðsenswn ðewÞ is incorrectÞ (6)
i¼1

Equation 6 demonstrates that if an English word “ew” is not available in SWN (ew ∉ SWN) or its senswn(ew) score is incorrect, then for the equiv-
alent Urdu word (uw), the score (senmanual(uw)) ranges between the average of the five scores of {−+0.1 and −+1}. For example, the word “illegal” is
a negative opinion word; however, it received a score 0 using Equation 4, which is incorrect (see Table 6). Therefore, the corresponding translated

TABLE 4 Partial list of positive words from Urdu Lexicon

English word Urdu word translation Roman transliteration Sentiment score Remarks

Expert ‫ﻣﺎﮨﺮ‬ Mahir 0.208 Score assigned using Equation (5)


Satisfaction ‫ﺍﴳﯿﻨﺎﻥ‬ Itminan 0.125 Score assigned using Equation (5)
Important ‫ﺍﮬﻢ‬ Aiham 0.375 Score assigned using Equation (5)
Accessible ‫ﻗﺎﺑﻞ ﺭﺳﺎﺋﯽ‬ qabil‐e‐rasai 0.375 Score assigned using Equation (5)
Beneficiary ‫ﻓﺎﺋﺪﮦ ﺍ ﭨﮭﺎﻧﮯ ﻭﺍﻟﻮﮞ ﻣﯿﮟ‬ Faeda uthanay walun maen 0.083 Score assigned using Equation (5)

TABLE 5 Kappa measures and agreement levels

Kappa score (κ) Agreement level

<0.0 Below chance agreement


>0.0 and ≤0.20 Slight agreement
>−0.21 and ≤0.40 Fair agreement
≥0.41 and ≤0.60 Moderate agreement
≥0.61 and ≤0.80 Substantial agreement
≥0.81 and ≤0.99 Almost perfect agreement
ASGHAR ET AL. 11 of 19

TABLE 6 Partial list of negative words from Urdu Lexicon

English word Urdu word translation Roman translation Sentiment score Remarks

Stampede ‫ﺑﮭﮕﮉﺭ‬ Bhagdar −0.731 Score assigned using manual annotation scheme (Equation 6)
Incomplete ‫ﺍﺩﮬﻮﺭﺍ‬ Adhoura −0.1825 SWN‐based score assigned using Equation (5)
Illegal ‫ﻧﺎﺟﺎﺋﺰ‬ Najaiz −1 Score assigned using manual annotation scheme (Equation 6)
Slave ‫ﻏﻼﻡ‬ Ghulam −0.03125 SWN‐based score assigned using Equation (5)
Thorn ‫ﮐﺎﻧﭩﺎ‬ Kanta −0.125 SWN‐based score assigned using Equation (5)

word in Urdu “‫( ﻧﺎﺟﺎﺋﺰ‬najaiz, illegal)” also receives an incorrect score of 0 (using Equation 5). To assign correct sentiment score to Urdu word “‫ﻧﺎﺟﺎﺋﺰ‬
(najaiz, illegal),” manual‐driven scoring technique (Equation 6) is used, and therefore, it receives a score of −1.

Modifier Scoring
Modifiers play an important role in the sentiment classification of Urdu text. In Urdu, there are certain words, also called modifiers, which increase
or decrease the sentiment strength of opinion words, such as “‫( ”ﺑﮩﺖ‬bohat, very), “‫( ”ﮐﭽﮫ‬kuch, some), and “‫( ”ﺍﻧﺘﮩﺎﺋﯽ‬intehai, extremely). For example,
in sentence “‫( ”ﺁﺝ ﺑﮩﺖ ﮔﺮﻡ ﺩﻥ ﮨﮯ‬aaj bohat garam din hay, today is very hot day), the word “‫( ”ﺑﮩﺖ‬bohat, very) is a modifier and precedes the adjective
“‫( ”ﮔﺮﻡ‬garam, hot) and increases the sentiment strength of opinion word “‫( ”ﮔﺮﻡ‬garam, hot).
The modifier scoring uses hand‐ranked percentage scale for representing a variety of modifiers as well as their sentiment score. We used mod-
ifiers proposed by Schmidt (1999). We assigned a polarity value to each of the Urdu modifier word translated from English modifier, by adopting
the sentiment values proposed by Polanyi and Zaenen (2006) to construct the positive and negative modifiers lists, as shown in Tables 7 and 8. To
compute overall polarity of a word, we add the polarity score of enhancer modifiers (Table 7) or subtract the value of reducer modifiers (Table 8)
by using list +ive and −ive modifiers along with hand‐ranked polarity scale. The modifier collection and scoring (Figure 4 and 2) works as follows:

1. Collection of Urdu Modifiers: Collect Urdu modifiers from Urdu grammar (Schmidt, 1999); partial lists are presented in Table 7 and Table 8.
2. Translation of Urdu modifier to English: Translate Urdu word/modifier into English using English to Urdu dictionary. For example, the Urdu
modifier “‫ ”ﺍﻧﺘﮩﺎﺋﯽ‬is translated into English as “extremely,” which is enhancer modifier.
3. Calculation of sentiment score for English modifier: Retrieve sentiment values for each English modifier from Polanyi and Zaenen (2006). For
example, sentiment value for enhancer modifier “extremely” is retrieved as “+1.”
4. Calculation of sentiment score for Urdu modifier: Assign sentiment value obtained at Step 3 to corresponding Urdu modifier. For example, the
equivalent translated word of “extremely” in Urdu is “‫ﺍﻧﺘﮩﺎﺋﯽ‬.” Therefore, it receives a sentiment value of +1.

TABLE 7 Partial list of positive/enhancer modifiers

Modifier Strength

‫( ﺑﮩﺖ‬bohat, very) 0.75


‫( ﺯﯾﺎﺩﮦ‬ziyada, more) 0.75
‫( ﻣﮑ ّﻤﻞ‬mukam'mal, completely) 1
‫( ﺑﮯ ﺣﺪ‬be‐had, excessive) 1
‫( ﺑﮯ ﺍﻧﺘﮩﺎ‬be‐inteha, endless) 1
‫( ﺍﺻﻞ ﻣﯿﮟ‬asal main, infact) 0.5
‫( ﺗﺮﯾﻦ‬tareen, extreem) 1
‫( ﺩﺭﺍﺻﻞ‬dar‐asal, verily) 0.5
‫( ﺷﺪﯾﺪ‬shaded, intense) 1
‫( ﺍﻧﺘﮩﺎﺋﯽ‬intehai, extremely) 1
‫( ﺑﺎ‬ba, with) 0.5
‫( ﻭﺍﻗﻌﯽ‬waqe'e, really) 0.5
‫( ﺑﻠﮑﻞ‬bilkul, totally) 0.5
‫( ﺍﺗﻨﯽ‬itni, this much) 0.5
‫( ﺍﻋﻠﯽ‬aala, great) 1
ٰ
‫( ﻧﺮﺍ‬nira, completely) 1
12 of 19 ASGHAR ET AL.

TABLE 8 Partial list of negative/reducer modifiers

Modifier Strength

‫( ﲟﺸﮑﻞ‬bmushkil, hardly) −0.5


‫( ﻣﺸﮑﻞ ﺳﮯ‬mushkil se, hardly) −0.75
‫( ﮐﭽﮫ‬kuch, some) −0.8
‫( ﭼﻨﺪ‬chand, some) −0.8
‫( ﺑﮯ‬be, with) −1
‫( ﺫﺭﺍ‬zara, somewhat) −0.5
‫( ﮐﻢ‬kam, less) −0.5
‫( ﺗﮭﻮﮌﺍ‬thora, little) −0.5

5. Modifier list compilation: Compile a list of positive modifiers or enhancers and a list of negative modifiers or reducers along with their sentiment
scores (Tables 7 and 8).
6. Polarity score calculation of opinion word: If a word is found in a set of positive or negative modifiers, then the polarity of the neighbouring
opinion word is computed using Equation 7.

Let MODE is a set of positive modifiers or enhancers


MODE {set of enhancers/positive modifiers}
Let MODR is set of negative modifiers or reducers.
MODR {Set of reducers/negative modifiers}

If a modifier is found in the positive modifier list, then the polarity of neighbouring opinion word is calculated as

ScoreðowÞ ¼ ScoreðowÞ þ ðScoreðowÞ × ScoreðenÞÞ; (7)

where ow represents opinion word and en is the enhancer modifier.


For example, to compute polarity of the sentence “‫( ”ﯾﮧ ﻟﺒﺎﺱ ﺍﻧﺘﮩﺎﺋﯽ ﺧﻮﺑﺼﻮﺭﺕ ﮨﮯ‬yeh libas bohat khoubsurat hay, this dress is extremely beautiful),
polarity of opinion words, such as “‫( ”ﺧﻮﺑﺼﻮﺭﺕ‬khoubsurat, beautiful), and the modifier “‫( ”ﺍﻧﺘﮩﺎﺋﯽ‬intehai, extremely) can be computed using as

ScoreðowÞ ¼ 0:688 þ ð0:688 × 100%Þ ¼ 0:688 þ ð0:688 × 1:0Þ ¼ 2:0;

where score of word “‫( ”ﺧﻮﺑﺼﻮﺭﺕ‬khoubsurat, beautiful) is computed using Equation 5 (Section 4.3.2), and the score of enhancer modifier “‫”ﺍﻧﺘﮩﺎﺋﯽ‬
(intehai, extremely) is retrieved from Table 7.
If the modifier is found in negative modifier list, then the whole polarity is calculated as

ScoreðowÞ ¼ ScoreðowÞ þ ðScoreðowÞ × absðScoreðrdÞÞÞ; (8)

where ow represents word and rd is the reducer modifier.


For example, to compute polarity of the sentence “‫( ”ﯾﮧ ﺑﺎﺕ ﺫﺭﺍ ﻓﻀﻮﻝ ﺳﯽ ﮨﮯ‬yeh bat zara fazol si hay, it is somewhat useless talk), the polarity of
opinion word “‫( ”ﻓﻀﻮﻝ‬fazol, useless) is computed using Equation 5 and the score of modifier “‫( ”ﺫﺭﺍ‬zara, somewhat) is retrieved from Table 8. Finally,
we compute sentiment score of neighbouring opinion word “‫( ﻓﻀﻮﻝ‬fazol, useless)” as follows:

ScoreðwÞ ¼ −0:625 þ ð−0:625 × absð−0:5ÞÞ ¼ −0:937:

The score of word “‫( ”ﻓﻀﻮﻝ‬fazol, useless) is computed using Equation (5) (Section 4.3.2), and the score of modifier enhancer modifier “‫( ”ﺫﺭﺍ‬zara,
little) is retrieved from Table 8.

Algorithm 2 Urdu modifier scoring

Input:
(i) UM‐L: List of +ive and −ive Urdu modifier words taken from Urdu Grammar
(ii) SWN: SentiWordNet sentiment lexicon for sentiment scoring
(iii) EU‐Dict: English to Urdu dictionary
ASGHAR ET AL. 13 of 19

Output:
UMW-List: List of +ive and −ive Urdu modifiers annotated with sentiment scores
Begin
1. For each Urdu modifier word umw Ɛ UM-L do
2. Translate Urdu modifier word umw to English modifier word emw using EU‐Dict
3. Retrieve sentiment score of English modifier word emw from (Strapparava & Valitutti, 2004)
4. Assign sentiment score of English modifier word emw calculated at step#3 to corresponding Urdu modifier Umw
5. UMW-List.add (umw, score)
6. end for
7. Return UMW-List

4.3.3 | Corpus‐based sentiment scoring

To make the resulting lexicon more generic with respect to applied datasets, we calculate sentiment scores of dataset dependent terms by
computing the difference between tf‐idf value for predicting positive and negative classes as follows:

Δtf¯idf ðtÞ ¼ tf¯idf ðt; cpÞ − tf¯idf ðt; cnÞ; (9)

where Δtf ¯ idf(t) gives a value showing the tendency of a term “t” towards a sentiment class (cp: positive class; cn: negative class). The assigned
sentiment class is “positive” if Δtf ¯ idf(t) > 0 and “negative” if Δtf ¯ idf(t) < 0; else, it is tagged as “neutral.” For example, Δtf ¯ idf(t) score of the term
“‫ ”ﺳﺎﻣﺴﻨﮓ‬is +0.386 > 0; therefore, it is tagged as positive. Table 9 shows a sample list of sentiment scores of individual terms in the Urdu datasets.
Using the aforementioned strategy, we assigned sentiment scores to all such dataset dependent terms available in the corpus.
A lexicon is made available in Appendix S1. In future, an interface will be provided to enable the user to search for an Urdu word, which will
facilitate to search for an equivalent English word.

5 | EXPERIMENTAL SETUP

5.1 | Dataset compilation

As already mentioned in Section 2, user reviews in Urdu are not easily available for experimentation in electronic form due to the following: (a)
There is no publically available machine‐readable corpus of user reviews in Urdu, (b) most of the Urdu websites contain text in graphics form,
which makes it difficult to be analysed by sentiment analysis systems (Ijaz & Hussain, 2007), and (c) there are some news websites that contain
Urdu text that can be accessed easily, but these do not contain opinionated reviews (Afraz et al., 2011). Due to the aforementioned problems,
three datasets are manually collected from drug, mobile, and books domains. These reviews are acquired from different users. There are 493 drug
reviews (DR), out of which 55% are labelled as +ive and 45% are labelled as −ive. The mobile reviews (MR) dataset is composed of 58% +ive, 42%
−ive, and 373 reviews in total. There are 431 reviews in book domain (BR), with 65% positive and 35% negative. There are 1,201 reviews in mixed
dataset. The reviews are stored in two separate text files to compile the testing and training corpora. The detail of each dataset is shown in
Table 10.

5.2 | Evaluation

We performed an extrinsic evaluation of the developed Urdu Lexicon to determine the sentiment of sentences in the datasets of +ive and ‐ive
sentences. Due to lack of Urdu corpus with gold annotations, we have acquired three small datasets manually annotated at the sentence level.
The detailed statistics of such datasets is already presented in Section 5.1.

TABLE 9 Sentiment scoring using tf‐idf computations

Term (t) tf × idf (t, cp) tf × idf (t, cn) Δtf ¯ idf(t)

‫ﺳﺎﻣﺴﻨﮓ‬ +0.637 −0.251 +0.386


‫ﻓﻮﻧﺰ‬ +0.430 −0.411 +0.019
‫ﺧﻮﺑﺼﻮﺭﺕ‬ 0.977 0.153 +0.824
14 of 19 ASGHAR ET AL.

TABLE 10 Urdu datasets statistics

Total Total number of Average number of Total number of Average number of words Total number of
Dataset reviews sentences sentences in a review words in a review distinct words

Drug 493 721 1.6 6,921 14.03 4,856


Mobile 373 589 1.57 5,841 15.65 3,943
Book 431 641 1.48 6,210 14.40 4,321
Mixed reviews 1,201 1,201 1 10,109 8.417 3,655

To evaluate the performance of the resulting lexicon, we conducted two experiments: subjectivity detection and sentence level sentiment
classification.

5.2.1 | Subjectivity analysis

This experiment is used to classify the sentences as subjective or objective. To assess the coverage of Urdu Lexicon, an experiment is conducted
to classify the sentences in the acquired corpus on the subjectivity classifier. The subjectivity classifier classifies a sentence as subjective if it
contains one or more subjective words (words present in Urdu lexicon); otherwise, it looks for the absence of subjective words, and
example sentences are shown in Table 11. The sentence is classified as objective if it contains no subjective word. For example, the sentence
“ ” (Samsung mobile bohat shandar mobile ha aur behtareen mobile kehlanay
kay qabil ha) when passed through subjectivity classifier is declared as subjective sentence due to presence of three opinion words, namely,
“‫( ﺷﺎﻧﺪﺍﺭ‬shandar, brilliant),” “‫( ﺑﮩﱰﯾﻦ‬behtareen, best),” and “‫( ﻗﺎﺑﻞ‬qabil, worth).” The sentence “45) %45 ‫ﺍﻣﺮﯾﮑﻦ ﺍﯾﭙﻞ ﻣﻮﺑﺎﺋﻞ ﺍﺳﳣﻌﺎﻝ ﮐﺮﺗﮯ ﮨﯿﮟ‬% American
Apple mobile istemal karty han, 45% Americans use Apple mobile)” is declared as objective, as it contains no opinion word. The results reported
in Table 12 depicts that subjectivity classifier on the three datasets proves that the coverage of Urdu Lexicon is quite satisfactory.
Table 13 provides the details of testing data showing how many sentences are identified as subjective and remaining as objective.

Sentence level sentiment classification


The second experiment deals with the sentiment classification of subjective sentences into positive and negative classes. For this purpose, we
implement a simple polarity classifier algorithm (Kaur & Gupta, 2014) for computing polarity of text at the sentence level. The algorithm computes
sentiment score of text by calculating polarity of each word and summing up all such individual polarity scores.

TABLE 11 Subjectivity classification of sentences

S# Sentence Subjectivity class

1 ‫ﮈﺳﭙﺮﯾﻦ ﻓﻮﺭﭦ ﺩﻣﮧ ﮐﮯ ﻣﺮﯾﻀﻮﮞ ﮐﮯ ﻟﺌﮯ ﻧﻘﺼﺎﻥ ﺩﮦ ﮨﮯ‬ Subjective


2 ‫ﺍﺱ ﺩﻭﺍ ﮐﻮ ﴍﻭﻉ ﮐﺮ ﺩﯾﺎ ۔‬ Objective
3 ‫ﻣﯿﮟ ﻧﮯ ﮐﭩﺎﻭﺱ ﮐﻮ ﺍﯾﮏ ﺩﻭ ﺳﺎﻝ ﮐﮯ ﻟﺌﮯ ﻟﯿﺎ‬ Objective
4 ‫ ﮐﯽ ﻣﺜﺎﻟﯿﮟ ﻻﺟﻮﺍﺏ ﮨﯿﮟ‬Loops ‫ﺍﺱ ﻣﯿﮟ‬ Subjective
5 ‫ﺍﺱ ﺳﮯ ﭘﮩﻠﮯ ﮐﺒﮭﯽ ﭘﺮﻭﮔﺮﺍﻣﻨﮓ ﻧﮩﯽ ﭘﮍﮬﯽ ﻟﯿﮑﻦ ﺍﺱ ﮐﺘﺎﺏ ﻧﮯ ﺳﺐ ﻣﻨﻄﻖ ﺻﺎﻑ ﮐﺮﺩﯾﮧ‬ Subjective

TABLE 12 Results for subjectivity classification

Lexicon Domain Precision (%) Recall (%) F‐measure (%)

Urdu–Sentiment Lexicon (Dehkharghani et al., 2016) Drug 74.82 81.25 77.90


Mobile 75.51 81.89 78.57
Books 71.46 79.23 75.14
Mixed 69.05 71.10 73.01
Urdu–Subjectivity Lexicon (Haqqee, 2015) Drug 76.56 82.51 79.42
Mobile 74.67 79.81 77.15
Books 71.63 77.56 74.47
Mixed 74.0 71.0 65.0
Urdu Lexicon (proposed) Drug 79.00 86.01 82.35
Mobile 76.10 83.4 79.58
Books 79.89 85.45 82.57
Mixed 79.0 79.0 76.0%
ASGHAR ET AL. 15 of 19

TABLE 13 Number of testing sentences classified as subjective and objective in different datasets

Dataset Total sentences # of subjective sentences # of objective sentences

Drug 721 572 149


Mobile 589 521 68
Book 641 595 46
Mixed reviews 1,201 980 221

n
PolðrÞ ¼ ∑ senðwiÞ; wi ∈ r; (10)
i¼1

where “wi” is the ith word in review sentence “r” and sen (wi) is the sentiment score of the word “wi.” The review sentence is classified as +ive, if
the net score is greater than zero; if it is less than zero, then review text is classified as −ive; otherwise, it is classified as neutral.
The reasons behind the selection of such simple classifiers for subjectivity detection and sentence level sentiment classification are (i) lack of
benchmark Urdu dataset to be used in a sophisticated supervised classifier and (ii) to minimize the role of other parameters like contextual
features, which may affect the classification results, and focus on how sentiment lexicons exhibit in sentiment classification. Example sentences,
classified into positive and negative classes, are presented in Table 14.
Table 15 shows the detail about number of subjective sentences, classified as positive and negative.
Table 16 shows that Urdu Lexicon performs better than the baseline methods: lexicon‐based unigram presence and subjectivity‐based
classification in terms of precision, recall, and F‐measure.
Using the sample sentence “‫( ”ﯾﮧ ﺑﺎﺕ ﺫﺭﺍ ﻓﻀﻮﻝ ﺳﯽ ﮨﮯ‬yeh bat zara fazol si hay, it is somewhat useless talk), we explain how the sentiment
classification using our approach outperforms unigram presence technique. In the previous sentence, we have one opinion word “‫( ”ﻓﻀﻮﻝ‬fazol,
useless) and one modifier “‫( ”ﺫﺭﺍ‬zara, somewhat). When we account for the presence of unigrams, then the opinion word “‫( ”ﻓﻀﻮﻝ‬fazol, useless)

TABLE 14 Sentence level sentiment classification of sentences

S# Review sentence Sentiment score Sentiment class

1 ‫ﯾﮧ ﺑﺎﺕ ﺫﺭﺍ ﻓﻀﻮﻝ ﺳﯽ ﮨﮯ‬ −0.9 Negative


2 ‫ﮐﯿﻮ ﻣﻮﺑﺎﺋﻞ ﺧﻮﺑﺼﻮﺭﺕ ﮨﮯ‬ +0.689 Positive
3 ‫‐ﮐﯿﻮ ﻣﻮﺑﺎﺋﻞ ﮐﺎﻓﯽ ﭘﺘﻼ ﮨﮯ‬ +0.45 Positive
4 ‫ﺳﯿﻼﺏ ﻧﮯ ﺗﺒﺎﮬﯽ ﺑﺮﭘﺎ ﮐﺮ ﺩﯼ‬ −0.471 Negative
5 ‫ﺷﺎﮬﺪ ﺍﻓﺮﯾﺪﯼ ﻧﮯ ﺑﮭﺖ ﺯﺑﺮﺩﺳﺖ ﮐﺎﺭﮐﺮﺩﮔﯽ ﺩﮐﮩﺎﯼ‬ +0.784 Positive

TABLE 15 Detail of number of subjective sentences classified as positive and negative

Dataset Total # of subjective sentences # of positive sentences # of negative sentences

Drug 572 346 226


Mobile 521 411 110
Book 595 487 108
Mixed reviews 980 551 429

TABLE 16 Sentence level sentiment classification

Polarity Accuracy Precision Recall F‐measure


Lexicon Method class (%) (%) (%) (%)

Urdu–Sentiment Lexicon (Afraz et al., 2011) Lexicon‐based unigram presence Positive 79.83 83.34 71.24 76.83
Negative 74.44 85% 78% 81%
Urdu–Subjectivity Lexicon (Awais, 2012). Subjectivity‐based classification Positive 72.61 74.07 95.23 83.33
Negative 69.82 82.02 85.0 84
Urdu Lexicon (proposed) Word‐level translation + N‐gram presence Positive 89.62 96.01 88.2 92.4
Negative 86.28 98.5 91.4 94.5
16 of 19 ASGHAR ET AL.

and one modifier “‫( ”ﺫﺭﺍ‬zara, somewhat) are treated individually, and the sentence is labelled as neutral, but when we classify the sentence using
our lexicon, then the sentence is labelled as negative. The score of opinion word “‫( ﻓﻀﻮﻝ‬fazol, useless)” is −0.625, and the score of modifier “‫”ﺫﺭﺍ‬
(zara, somewhat) is −0.5. The overall score of opinion word is computed as Score(w) = − 0.625 + (−0.625 × abs(−0.5)) = − 0.937. Therefore,
the sentence is tagged as negative, which is correct.

Comparing the results with the latest work done in Urdu sentiment analysis
In another experiment, we performed sentiment classification using latest work performed in Urdu sentiment analysis using supervised learning
technique (Mukhtar & Khan, 2018), showing performance evaluation results in Table 17. Similarly, we performed sentiment classification using
proposed Urdu lexicon, and the results are recorded. The performance evaluation results presented in Table 17 show that the sentiment classifi-
cation using proposed lexicon has outperformed the sentiment classification using supervised technique proposed in (Mukhtar & Khan, 2018).

Applying standard statistical technique to validate the results


We performed this experiment to test the significance level as follows: The sentiment classification using the proposed sentiment lexicon is signif-
icantly better than the baseline Urdu–Lexicon (Afraz et al., 2011). We conducted this experiment to investigate whether the performance difference
between the sentiment classification using the proposed Urdu sentiment lexicon and the baseline study (Afraz et al., 2011) is statistically significant
and does not occur coincidently. A sample set of 550 sentences was randomly taken from different datasets used in this work, and each sentence was
classified using the scores taken from the proposed lexicon and the baseline lexicon (Afraz et al., 2011). Table 18 shows the acquired results on
account of applying McNeamar's test (https://machinelearningmastery.com/mcnemars‐test‐for‐machine‐learning/), with the following settings.

H0. Classification using both lexicons have the same error rate (null hypothesis).

H1. The error rate of the two classifications using both lexicons is significantly different.

Equation (11) shows McNeamar's test uses chi‐square computation as follows:

ða01 −a10 Þ2
χ¼ (11)
ða01 þ a10 Þ

TABLE 17 Sentiment classification‐based comparison

Technique P R F A

Sentiment classification supervised technique (Mukhtar & Khan, 2018) 81 80 82 81


Sentiment classification with proposed lexicon (our work) 95.2 88.4 91.3 92.4

TABLE 18 Measuring performance difference between sentiment classification using baseline lexicon and proposed lexicon, by applying sig-
nificant test

Correct classification With Incorrect classification With


baseline lexicon (Afraz et al., 2011) baseline lexicon (Afraz et al., 2011)

Correct classification with proposed lexicon 137 95


Incorrect classification with proposed lexicon 58 260

Note. Sentiment classification using two lexicons (proposed and baseline) is statistically significant (the two‐tailed p value = 0.004, χ2 = 8.5, rejecting the null
hypothesis and accepting alternative hypothesis with 1 degree of freedom).

TABLE 19 Summary statistics

McNemar's chi‐squared statistic 8.5


Degrees of freedom 1
p value (two‐tailed) 0.004
Odds ratio 1.638
Lower 95% CL 1.169
Upper 95% CL 2.311
ASGHAR ET AL. 17 of 19

TABLE 20 Lexicon coverage

Total terms in No. of No. of negative % of terms with manually % of terms with SWN‐based % of terms with corpus‐based
lexicon positive terms terms assigned sentiment scores sentiment scores sentiment scores

9,381 3,651 5,730 28% 56% 16%

The significance test applied proves that the performance difference between the sentiment classification with proposed Urdu lexicon and
with the baseline lexicon (Afraz et al., 2011) using limited coverage of sentiment words and scores is significant. Table 18 shows that there are
153 disagreeing sentiments (the classifier with two different lexicons showed different behaviour to the misclassification). We computed two‐
tailed p value as 0.004, and the chi‐square value as 8.5 with single (one) degree of freedom (Table 19). The p value < 0.5 rejects the null hypothesis
and supports the alternate hypothesis; that is, the performance difference between the sentiment classification using two different lexicons
(proposed vs. baseline) is statistically significant.

5.3 | Lexicon coverage

The proposed Urdu Sentiment Lexicon provides sufficient coverage of Urdu opinion terms (Table 20). The percent of terms with the assigned sen-
timent terms scores using SWN, manual, and corpus‐based strategies are also reported, showing that most of the terms (56%) have been assigned
sentiment scores using SWN, 28% are assigned using manual annotation, and rest of the 16% are corpus‐driven.

6 | C O N CL U S I O N A N D F U TU R E WO RK

This work presents a word‐level translation strategy for creating sentiment lexicon in Urdu, a resource‐poor language, by combining different
linguistic and lexical resources, such as English opinion word list,
SentiWordNet, English‐to‐Urdu bilingual dictionary, modifiers taken from Urdu grammar, and a novel scoring mechanism. The proposed
method consists of following modules: (a) English opinion words collection, (b) translation of English opinion words to the Urdu language, (c)
sentiment scoring using SentiWordNet and manual scoring technique, and (d) modifier collection and scoring.
The proposed technique assists in creating an Urdu sentiment lexicon based on opinion word list, English–Urdu bilingual dictionary, novel
manual scoring scheme, and a novel modifier scoring mechanism. The results achieved in terms of precision, recall, and F‐measure show the
efficacy of the proposed approach. In near future, we will publicize the entire resource with all Urdu words and their associated synsets.
The developed lexical resource for sentiment analysis is in the initial version (1.0) of Urdu Lexicon (UrduSentiNet). We will extend this resource
at earliest by (a) including all Urdu words, classified as positive, negative, and neutral, (b) including synonyms of each word, (c) proposing an auto-
matic scoring mechanism instead of using SentiWordNet and manual techniques, and (d) incorporating negations for more efficient coverage.

FUND ING

The authors received no specific funding for this work.

CONF LICT OF INT E RE ST

The authors declare no conflict of interest.

AUTHOR CONTRIBUTIONS

M. Z. A. and A. S. conceived and designed the experiments; A. A. and F. M. K. performed the experiments; M. Z. A. and A. S. analysed the data;
A. K. contributed reagents/materials/analysis tools; M. Z. A. wrote the paper.

INFORMED CONS ENT

All procedures followed were in accordance with the ethical standards of the responsible committee on human experimentation (institutional and
national) and with the Helsinki Declaration of 1975, as revised in 2008 (5). Additional informed consent was obtained from all patients for which
identifying information is included in this article.

HUMAN AND ANIMAL RIGHTS

This study did not involve any experimental research on humans or animals; hence, an approval from an ethics committee was not applicable in
this regard. The data collected from the online forums are publicly available data, and no personally identifiable information of the forum users
were collected or used for this study.
18 of 19 ASGHAR ET AL.

ORCID
Muhammad Zubair Asghar https://orcid.org/0000-0003-3320-2074

RE FE R ENC E S
Afraz, Z. S., Muhammad, A., & Martinez‐Enriquez, A. M. (2011). Sentiment‐annotated lexicon construction for an Urdu text based sentiment analyzer.
Pakistan Journal of Science, 63(4), 222‐225.
Asghar, M. Z., Khan, A., Ahmad, S., Khan, I. A., & Kundi, F. M. (2015). A unified framework for creating domain dependent polarity lexicons from user
generated reviews. PLoS ONE, 10(10), e0140204. https://doi.org/10.1371/journal.pone.0140204
Asghar, M. Z., Khan, A., Bibi, A., Kundi, F. M., & Ahmad, H. (2017). Sentence‐level emotion detection framework using rule‐based classification. Cognitive
Computation, 9(6), 868–894. https://doi.org/10.1007/s12559‐017‐9503‐3
Awais, (2012). Retrieved from http://chaoticity.com/urdusentimentlexicon/urdusentimentlexicon.zip, last accessed 09 April 2016.
Baccianella, S., Esuli, A., & Sebastiani, F. (2010, May). Sentiwordnet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining. In Lrec
(Vol. 10, No. 2010, pp. 2200–2204).
Badaro, G., Baly, R., Hajj, H., Habash, N., & El‐Hajj, W. (2014). A large scale Arabic sentiment lexicon for Arabic opinion mining. In Proceedings of the EMNLP
2014 Workshop on Arabic Natural Language Processing (ANLP) (Pp. 165–173).
Bakliwal, A., Arora, P., & Varma, V. (2012, May). Hindi subjective lexicon: A lexical resource for Hindi polarity classification. In Proceedings of the Eight
International Conference on Language Resources and Evaluation (LREC)(pp. 1189–1196).
Banea, C., Mihalcea, R., & Wiebe, J. (2008, May). A bootstrapping method for building subjectivity lexicons for languages with scarce resources. In LREC
(Vol. 8, pp. 2–764).
Bilal, M., Israr, H., Shahid, M., & Khan, A. (2016). Sentiment classification of Roman–Urdu opinions using naïve Bayesian, decision tree and KNN classifica-
tion techniques. Journal of King Saud University‐Computer and Information Sciences, 28(3), 330–344. https://doi.org/10.1016/j.jksuci.2015.11.003
Borzì, V., Faro, S., Pavone, A., & Sansone, S. (2015). Prior polarity lexical resources for the Italian language. arXiv preprint arXiv, 1507, 00133.
Das, A., & Bandyopadhyay, S. (2010). SentiWordNet for Indian languages. In Proceedings of the Eighth Workshop on Asian Language Resouces (pp. 56–63).
Dashtipour, K., Poria, S., Hussain, A., Cambria, E., Hawalah, A. Y., Gelbukh, A., & Zhou, Q. (2016). Multilingual sentiment analysis: State of the art and inde-
pendent comparison of techniques. Cognitive Computation, 8(4), 757–771. https://doi.org/10.1007/s12559‐016‐9415‐7
Dehkharghani, R., Saygin, Y., Yanikoglu, B., & Oflazer, K. (2016). SentiTurkNet: A Turkish polarity lexicon for sentiment analysis. Language Resources and
Evaluation, 50(3), 667–685. https://doi.org/10.1007/s10579‐015‐9307‐6
Haqqee, S. H. (2015). Retrieved from Oxford English–Urdu Dictionary bilingual dictionary, ISBN: 9780195793406, Oxford University Press, Karachi,
Pakistan 2015.
Hashim, F. & Khan MA, (2016), Sentence level sentiment analysis using Urdu nouns, P: 101‐108, in the Proceedings of the Conference on Language &
Technology 2016.
Ijaz, M., & Hussain, S. (2007, August). Corpus based Urdu lexicon development. In the Proceedings of Conference on Language Technology (CLT07), University
of Peshawar, Pakistan (Vol. 73).
Javed, I., & Afzal, H. (2013). Opinion analysis of bi‐lingual event data from social networks. In ESSEM@ AI* IA (pp. 164–172).
Kanayama, H., & Nasukawa, T. (2006, July). Fully automatic lexicon expansion for domain‐oriented sentiment analysis. In Proceedings of the 2006 conference
on empirical methods in natural language processing (pp. 355–363). Association for Computational Linguistics.
Kaur, A., & Gupta, V. (2014). Proposed algorithm of sentiment analysis for Punjabi text. Journal of Emerging Technologies in Web Intelligence, 6(2), 180–183.
Khan, A., Asghar, M. Z., Ahmad, H., Kundi, F. M., & Ismail, S. (2017). A rule‐based sentiment classification framework for health reviews on mobile social
media. Journal of Medical Imaging and Health Informatics, 7(6), 1445–1453. https://doi.org/10.1166/jmihi.2017.2208
Landis, J., & Koch, G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33, 159–174. https://doi.org/10.2307/2529310
Liu, B., & Hu, M. (2004). Retrieved from http://www.cs.uic.edu/~liub/FBS/sentiment‐analysis.html# lexicon, last accessed 2 December 2015
McHugh, M. L. (2012). Interrater reliability: The kappa statistic. Biochemia Medica, 22(3), 276–282.
Miller, G. A., Beckwith, R., Fellbaum, C., Gross, D., & Miller, K. J. (1990). Introduction to WordNet: An on‐line lexical database. International Journal of
Lexicography, 3(4), 235–244. https://doi.org/10.1093/ijl/3.4.235
Mohammad, S., Dunne, C., & Dorr, B. (2009, August). Generating high‐coverage semantic orientation lexicons from overtly marked words and a thesaurus.
In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2‐Volume 2 (pp. 599–608). Association for
Computational Linguistics.
Mukhtar, N., & Khan, M. A. (2018). Urdu sentiment analysis using supervised machine learning Approach. International Journal of Pattern Recognition and
Artificial Intelligence, 32(02), 1851001. https://doi.org/10.1142/S0218001418510011
Mukhtar, N., Khan, M. A., & Chiragh, N. (2017). Effective use of evaluation measures for the validation of best classifier in Urdu sentiment analysis. Cognitive
Computation, 9(4), 446–456. https://doi.org/10.1007/s12559‐017‐9481‐5
Polanyi, L., & Zaenen, A. (2006). Contextual valence shifters. In Computing attitude and affect in text: Theory and applications (pp. 1–10). Dordrecht:
Springer.
Rehman, Z. U., & Bajwa, I. S. (2016, August). Lexicon‐based sentiment analysis for Urdu language. In Innovative Computing Technology (INTECH), 2016 Sixth
International Conference on (pp. 497–501). IEEE
ASGHAR ET AL. 19 of 19

Schmidt, R. L. (1999). Urdu, an essential grammar. New York: Psychology Press, Routledge Publishing.
Stone, P. J., & Hunt, E. B. (1963, May). A computer approach to content analysis: Studies using the General Inquirer system. In Proceedings of the May
21–23, 1963, spring joint computer conference (pp. 241–256). ACM.
Strapparava, C., & Valitutti, A. (2004, May). WordNet Affect: An affective extension of WordNet. In Lrec (Vol. 4, pp. 1083–1086).
Wilson, T., Wiebe, J., & Hoffmann, P. (2005, October). Recognizing contextual polarity in phrase‐level sentiment analysis. In Proceedings of the conference on
human language technology and empirical methods in natural language processing (pp. 347–354). Association for Computational Linguistics.
Zafar, A., Mahmood, A., Abdullah, F., Zahid, S., Hussain, S., & Mustafa, A. (2012). Developing Urdu WordNet using the merge approach. In Proceedings of the
Conference on Language and Technology (pp. 55–59).

AU THOR BIOG RAPH IES


Dr Muhammad Zubair Asghar is an assistant professor at Institute of Computing and Information Technology, Gomal University, Dera Ismail
Khan, KP, Pakistan, and approved an HEC approved PhD supervisor recognized by Higher Education Commission (HEC), Pakistan. PhD
research includes recent issues in machine learning, text mining, opinion mining and sentiment analysis, computational linguistics and natural
language processing, and big data solutions for social networks. More than 40 publications in journals of international repute (JCR and ISI
indexed) and having more than 15 years of University teaching and laboratory experience in Artificial Intelligence and Intelligent System
Design. He is guest editor of special issues in Social Computing in Health Informatics and Business Intelligence. He is reviewer of many impact
factor journals and an associate editor of IEEE ACCESS and Plos One.

Anum Sattar is currently pursuing MSCS in Gomal University, Dera Ismail khan, KP, Pakistan. Her research interests include text mining,
machine learning, deep learning, and data science. She received the MCS in computer science from Gomal University Dera Ismail khan, KP,
Pakistan, in 2013 with focus on developing intelligent systems.

Dr Aurangzeb Khan is an associate professor and chairman at Department of Computer Science, University of Science and Technology, Bannu,
Pakistan. His area of interest includes data mining, text mining, and opinion mining.

Dr Amjad Ali is working as chairman and assistant professor at Department of Computer and Software Technology, University of Swat. He
received his PhD (Real‐time Systems) from Gyeongsang National University, South Korea, and MS (Computer Science) from University of
Peshawar.

Dr Fazal Masud Kundi received his MSIT from University of Peshawar and PhD in Computer Science from Gomal University. He is working as
assistant professor at Institute of Computing and Information Technology, Gomal University, DI Khan, Pakistan. His research interests include
recent trends in data mining, text mining, and computational linguistics. He has more than 40 publications in journals of international repute.
He has vast laboratory experience in different research tools and languages, such as NLTK‐based python, Weka, Rapid Miner, Excel Miner, R,
and now, he is working on KNIME analytics platform.

Prof. Dr Shakeel Ahmad received his BSc with distinction from Gomal University, Pakistan (1986), and MSc (Computer Science) from Qauid‐e‐
Azam University, Pakistan (1990). He received his PhD degree in computer science in January 2008 and completed 1 year post‐doctoral study
from University Science Malaysia (USM) in 2010. He started his career as a lecturer in 1990 and served for 11 years in Institute of Computing
and Information Technology (ICIT), Gomal University Pakistan. Then he served as assistant professor, associate professor, professor, and direc-
tor at Institute of Computing and Information Technology (ICIT), Gomal University Pakistan during 2001–2014. Now days, he is serving as pro-
fessor in Faculty of Computing and Information Technology at Rabigh (FCITR), King Abdulaziz University Jeddah, Kingdom of Saudi Arabia. Dr
Shakeel has an outstanding teaching career with proficient research background, reflecting more than 27 years of teaching and research expe-
rience in performance modelling, sentiment analysis and text mining, optimization of congestion control techniques, and electronic learning. He
has produced many publications in journal of international repute and presented papers in international conferences.

SUPPORTING INFORMATION
Additional supporting information may be found online in the Supporting Information section at the end of the article.

How to cite this article: Asghar MZ, Sattar A, Khan A, Ali A, Masud Kundi F, Ahmad S. Creating sentiment lexicon for sentiment analysis in
Urdu: The case of a resource‐poor language. Expert Systems. 2019;e12397. https://doi.org/10.1111/exsy.12397

You might also like