Professional Documents
Culture Documents
Name and date of 2018 15th International Conference on Smart Cities: Improving Quality
conference of Life Using ICT & IoT (HONET-ICT). October 09, 2018
Title of output Lexicon and Heuristics Based Approach for Identification of Emotion in
Text
Name and date of 2018 12th International Conference on Open Source Systems and
conference Technologies (ICOSST). December 19, 2018.
Author(s) Anwar Ur Rehman, Zobia Rehman, Junaid Akram, Waqar Ali, Munam
Ali Shah, Muhammad Salman
Name and date of 2018 24th International Conference on Automation and Computing
conference (ICAC). September 06, 2018.
Title of output Efficient Energy Utilization in Fog Computing based Wireless Sensor
Networks
Name and date of 2019 2nd International Conference on Computing, Mathematics and
conference Engineering Technologies (iCoMET). January 30, 2019.
Abstract—Wireless Sensor Networks consist of many sensing relaxations to the localization problem regarding the nodes
devices which are distributed inside of a given area. Each sensor ordering, anchor nodes distribution, or global information
node consists of multiple heterogeneous components such as sharing were also discussed in [5] and [6].
power supply, CPU, memory, and a transceiver. Since the location
of sensors is needed in most of the WSNs, Trilateration-based Trilateration-based localization (TBL) and Multilateration-
localization (TBL) has been used to locate the sensors in the based localization (MBL) techniques are among the well-
network. This study formulates the concern on how wireless known and most used methods for localization. In this study,
sensor networks can take advantage of the computational intel- the various performance aspects of the TBL algorithm are
ligent techniques using both single and multi-objective particle examined through the application of single and multi-objective
swarm optimization (PSO) with an overall aim of concurrently
minimizing the required time for localization, minimizing energy variants of particle swarm optimization (PSO). We imple-
consumed during localization, and maximizing the number of mented two version of PSO in this study to allow nodes to
nodes fully localized through the adjustment of wireless sensor vary the transmission power level when broadcasting messages
transmission ranges while using TBL process. A parameter-study during localization. Trade-offs between multiple objectives
of the applied PSO variants is performed, leading to results that (the number of transmitted messages, number of localized
show algorithmic improvements of up to 21% in the evaluated
objectives. nodes, power consumption and the time needed to localize
Index Terms—Wireless Sensor Networks, Trilateration, Local- as many nodes as possible) are studied.
ization, Particle Swarm Optimization The paper present the results of two implemented versions
of the particle swarm optimization and clearly show the
I. I NTRODUCTION performance of them while trying to optimize the WSN work.
Wireless sensor networks (WSN) consist of many sensing The key novelty of this paper is the optimization of the
devices which are distributed inside of a given area. Sensors in power consumption of the whole network without the need to
the network carry out different tasks such as recording weather cluster or build any small sensor islands such as in [7]. This
conditions, sensing motion, or recording sounds in addition to study takes advantage of the functionalities of toady’s WSNs
many other tasks. In WSNs, sensors cooperate with each other nodes to enhance the performance of the whole network, the
to formulate a fully connected network to allow information ZigBee technology of transceivers in wireless nodes made
sharing between the network nodes. Such networks have that possible by allowing us to use multiple transmission
many applications both for civilian and military purposes, the power levels, where the different variants of PSO were used
position of sensing devices that record the humidity of a place to programmically change the transmission level after the
or the position of a military vehicle in a war zone are two evaluation of the designed fitness functions.
examples of such applications where knowing the location of
the information source is very important. II. L ITERATURE R EVIEW
Wireless sensor nodes in WSNs may be positioned perma-
nently or dynamically in a field depending on the localization WSNs consist of many sensing devices which are distributed
protocol and nodes functionalities as thoroughly discussed in inside of a given area. Sensors in the network carry out
[1]. For permanent localization scenarios, knowing the location different tasks such as recording weather conditions, sensing
of the sensor is not a problem throughout the life time of the motion, or recording sounds in addition to many other tasks
network; but in dynamic networks, localizing nodes can be [8]. In WSNs, sensors cooperate with each other to formulate a
time and power consuming and, in some scenarios, a lack fully connected network to allow information sharing between
of accuracy may occur. To solve problems of localization the network nodes. The collected information can also be
accuracy and increase the number of localized nodes in a sent to a command and control center for processing and
time critical localization scenarios, meta-heuristic solutions decision making. Such networks have many applications both
and novel range-based iterative localization algorithms have for civilian and military purposes, yet finding the actual
previously been proposed in [2], [3] and [4]. Additionally, to location of a single sensor in any type of WSN is important.
allow mapping localization solutions into real world scenarios For example, it can be used in military applications to detect
79
• Messages sent: Depending on the localization procedure
and communication mechanism between nodes, the num- ⎧
⎪
⎨mRange If vi < mRange
ber of messages sent back and forth between nodes
vi = xRange If vi > xRange (2)
will vary. However, in this study we assume that each ⎪
⎩
already localized node will broadcast once in order to vi If mRange ≤ pi ≤ xRange
help other non-localized nodes achieve localization. Thus, 1
the number of sent messages depends on the number of s(pi ) = (3)
1 + e−pi
localized nodes.
• Time required: In the proposed method, one unit of time
mRange If s(pi ) ≤ Ran
is equivalent to one step in which sender nodes broadcast pi = (4)
their locations and receivers receive the information. xRange otherwise
The step ends by running the location estimation using Here s(pi ) is the sigmoid function value of the particle i.e.
TBL method for each blind node that receives at least is Euler’s number, and ran is any random binary. Sigmoid
three messages from three different localized nodes. The function was used in the equation to scale the value to stay
localization procedure is going to terminate when no any in the range [0, 1]. The inertia weight (ω) in Eq. 1 can
new blind node was able to localize itself by the end of be modified dynamically (instead of a constant value) using
each step. mechanisms such as the Simulated Annealing to increase
• Power consumption: The variance in this objective mainly the probability of finding a near-optimal solution in fewer
comes from the use of discrete and continuous transmis- iterations and with less computational time
sion ranges, leading to various levels of power consump-
tion. The power consumption is measured based on the IV. S IMULATIONS AND D ISCUSSION
power level or the transmission range each node uses to Fig. 1 shows the flow chart of the simulation procedure.
broadcast its message. Accordingly, the power consump- Note that in Step-2, the implemented Java code reads the
tion is the sum of each node’s power consumption as positions of each node from a saved topology file, where each
chosen by PSO. node’s types: anchor or blind node, in addition to the X and
Y coordinates of the anchor nodes are saved. For this study a
TABLE I: Binary PSO Positions Matrix
random WSN topology file containing 240 nodes, 40 of which
Range Min Mid Max
are anchor nodes, are scattered among a field of 1000 × 1000
node1 0 1 0 meters. In Step-3, one of the two proposed PSO versions is
node2 1 0 0 used.Step-4 and Step-5 is part of the fitness function where
node3 0 0 1
each particle’s solution is examined by flooding the network
and using the TBL localization method.
• Number of nodes: Choosing which power range a node
will use to transmit messages or the transmission range
of each node plays a significant role in the number
of localized nodes. Through use of this objective, the
proposed method maximizes the number of nodes capable
of localizing using the least amount of consumed power,
which means the least average transmission ranges of all
nodes.
80
used will aid in decreasing the localization time, representing
a clear trade-off between these two objectives.
Parameter Value
Particles 100
Iterations 200
Min Tran Range 64
Max Tran Range 132
C1 and C2 1.49445
Inertia Weight (ω) RIW
81
it outperformed the two previous methods at all levels, (i.e. study can be mapped to real test beds using techniques such
localizing all nodes during the shortest time possible and with as the component based localization, nodes clustering, and
power consumption less than any other solutions found by the RTS/CTS methods, in addition to many others, as suggest by
methods before). [5] and [6].
During the 50 trials, 115 different, yet optimal, solutions
R EFERENCES
were found. Of these, 28 outperformed the baseline in terms of
power consumption while maintaining the same time and num- [1] K. M. Modieginyane, B. B. Letswamotse, R. Malekian, and A. M.
Abu-Mahfouz, “Software defined wireless sensor networks application
ber of localized nodes. The total power consumption ranged opportunities for efficient network management: A survey,” Computers
from 4% to 21% lower than the baseline measurements. In the & Electrical Engineering, vol. 66, pp. 274–287, 2018.
best case, the MOPSO method improved power consumption [2] S. K. Rout, A. K. Rath, P. K. Mohapatra, P. K. Jena, and A. Swain, “A
fuzzy optimization technique for energy efficient node localization in
by 29%, but was only capable of localizing 145 nodes — a wireless sensor network using dynamic trilateration method,” in Progress
clear trade-off. in Computing, Analytics and Networking. Springer, 2018, pp. 325–338.
[3] E. Tuba, M. Tuba, and M. Beko, “Two stage wireless sensor node
TABLE IV: BMOPSO Parameters Value localization using firefly algorithm,” in Smart Trends in Systems, Security
and Sustainability. Springer, 2018, pp. 113–120.
[4] S. Arora and S. Singh, “Node localization in wireless sensor networks
Parameter Value using butterfly optimization algorithm,” Arabian Journal for Science and
Particles 100 Engineering, vol. 42, no. 8, pp. 3325–3335, 2017.
Iterations 200 [5] T. Liu, X. Luo, and Z. Liang, “Enhanced sparse representation-based
Min Tran Range 64 device-free localization with radio tomography networks,” Journal of
Max Tran Range 132 Sensor and Actuator Networks, vol. 7, no. 1, p. 7, 2018.
Mutation Percentage 15% [6] X. Wang, Y. Liu, Z. Yang, K. Lu, and J. Luo, “Robust component-
Mutation Value Min Tran Range based localizationin sparse networks,” IEEE Transactions on Parallel
C1 and C2 1.49445 and Distributed Systems, vol. 25, no. 5, pp. 1317–1327, 2014.
Inertia Weight (ω) RIW [7] W. Cheng, N. Zhang, X. Cheng, M. Song, and D. Chen, “Time-
bounded essential localization for wireless sensor networks,” IEEE/ACM
Transactions on Networking, vol. 21, no. 2, pp. 400–412, 2013.
[8] C. Sivaranjani, A. Surendar, and T. Sakthevel, “Energy efficient de-
ployment of mobile node in wireless sensor networks ‘,” International
Journal of Communication and Computer Technologies, vol. 1, no. 20,
pp. 75–78, 2013.
[9] G. Han, J. Jiang, C. Zhang, T. Q. Duong, M. Guizani, and G. K.
Karagiannidis, “A survey on mobile anchor node assisted localization in
wireless sensor networks.” IEEE Communications Surveys and Tutorials,
vol. 18, no. 3, pp. 2220–2243, 2016.
[10] B. Martinez, M. Monton, I. Vilajosana, and J. D. Prades, “The power
of models: Modeling power consumption for iot devices,” IEEE Sensors
Journal, vol. 15, no. 10, pp. 5777–5789, 2015.
[11] J. J. Robles, S. Tromer, M. Quiroga, and R. Lehnert, “A low-power
scheme for localization in wireless sensor networks,” in Meeting of the
European Network of Universities and Companies in Information and
Communication Engineering. Springer, 2010, pp. 259–262.
[12] M. Z. A. Bhuiyan, G. Wang, J. Cao, and J. Wu, “Energy and bandwidth-
Fig. 5: Power consumption, localization time, and number of efficient wireless sensor networks for monitoring high-frequency events,”
localized nodes of a solution set containing 115 solutions in Sensor, Mesh and Ad Hoc Communications and Networks (SECON),
while using the BMOPSO method over 50 trials. 2013 10th Annual IEEE Communications Society Conference on. IEEE,
2013, pp. 194–202.
[13] W. Everywhere, “Wirelessly connecting everywhere,” Wireless Connec-
tivity, no. 2Q, pp. 1–72, 2013.
V. C ONCLUSIONS [14] T. Instruments, “Cc2420: 2.4 ghz ieee 802.15. 4/zigbee-ready rf
transceiver,” 2006.
This paper has presented single-objective and multi- [15] H. Ren and M. Q.-H. Meng, “Power adaptive localization algorithm for
objective binary PSO-based solutions for the power con- wireless sensor networks using particle filter,” IEEE Transactions on
Vehicular Technology, vol. 58, no. 5, pp. 2498–2508, 2009.
sumption of WSN during trilateration-based localization. The [16] K.-B. Chang, Y.-B. Kong, and G.-T. Park, “Clustering algorithm in wire-
overall performance of the TBL algorithm was evaluated and less sensor networks using transmit power control and soft computing,”
improved through the simultaneous optimization of various in Intelligent Control and Automation. Springer, 2006, pp. 171–175.
[17] T. Instruments, “Cc2420 datasheet,” Reference SWRS041B, 2007.
objective functions. Results clearly show that the use of
SOPSO and MOPSO to optimize the TBL algorithm in terms
of power consumption is effective, providing improvements
up to 21% only on the Transmit mode of transceivers. Also,
as shown by the study, using single global output power is
less stable in localizing nodes than using multiple levels and
using the maximum possible output level is not cost effective
solution to the stability of localization, therefore, PSO was
found to solve the problem without negatively affect the TBL
work in terms of localizability in particular. However, our
82
2018 International Conference on Frontiers of Information Technology (FIT)
Abstract—The cloud and fog computing integration with Smart Internet of Things is integrated with cloud computing be-
Grid (SG) improve the efficiency of SG. SG is a modern electricity cause of its useful features, and to fulfill the promise of our
network that improves performance, reliability, stability and connected world. One of the feature is storage, cloud provides
energy consumption. The SG integration with cloud computing
improves allocation of resources. Another concept, fog computing large volume of storage facility based on your requirements.
is introduced to reduce the load on cloud and improve the Similarly, computing and analyst is another important feature
allocation of resources. The fog provides the same services as for IoT. Cloud provides better and faster online computing
the cloud. However, fog is closest to the end users that improve services for processing your task, and attach with modern
response time and resource utilization. Fog cover small area algorithms to analysis data and present data in different visual
than cloud and store data temporarily for permanent storage
fog communicate with the cloud. The main features of fog are form. Moreover, it provides meter service which means we
mobility, low latency and location awareness. In this paper, will pay only for what we have used. Self service is another
we presented cloud and fog based framework for information dashing feature, where all the IT resources we need will have
management. Fog computing makes the system efficient by using self service access so that we can customized as we want [4].
load balancing algorithm to allocate Virtual Machines (VMs). Fog provides various important features such as location
The load balancing algorithms evaluated in this paper are
Round Robin, Throttled, Active Virtual Machine, Particle Swarm recognition and low response time. Fog computing maximizes
Optimization, Ant Colony Optimization and odds algorithm. the validity of SG and increases security issue, decrease
Particle Swarm Optimization outperform other algorithms. computation cost and response time [5]. In cloud-fog based
Index Terms—Fog Computing, Microgrid, Smart Grid, Re- platform the request of end-users are sent to fog which are
sponse Time, Tasks Scheduling, Load Balancing, Cloud Comput- forwarded to get Micro Grid (MG) services. The consumers
ing
send requests to fog and fog search for the nearest MG and
make interaction with the microgrid controller. With the help
I. I NTRODUCTION of fog the processing time and response time minimizes and
overall performance increases.
The Smart Grid (SG) manage the distribution of energy and In this paper, a cloud and fog integrated smart grid model
allows Internet of things (IoT) devices to exchange information is proposed. In the proposed model, fog manages the flow
with each other. SG provides two-way interaction between of data and forwarding requests for electricity to microgrid.
end-users and service provider which enable them to monitor Load is managed dynamically on the fog servers. Each fog
their energy consumption and pricing. Due to a large number server consists of virtual machines. VMs are responsible for
of SG user, SG requires response time and processing time. processing of requests. These requests are to be balanced on
To manage a large number of requests yielding by the large servers using load balancing techniques for efficient utiliza-
number of devices such as a smart meter which requires to tion of resources. Six load balancing algorithms have been
managed, processed and stored by cloud. Evolution of cloud discussed in this paper.
has minimized the need of large computation power [1].Smart
grid gets private data usage and energy usage information II. M OTIVATION
through the smart meter which is stored permanently on Due to the rapid increase of Internet of things devices
the cloud. The cloud with IoT integration provide numerous computation power of smart grid is adversely affected so a
services and permit data to manipulate smartly. centralized cloud and fog based platform is presented [6].
Rapid growth of IoT devices has caused response time and Request response time and data storage have become a critical
processing time on the cloud to increase which is crucial for issue from the past few decades, to manage this fog computing
big applications. SG provide privacy when transmitting data concept is presented [7]. Fog Computing acts as an extra level
to the cloud [2]. As the end-users increases in the cloud, it of computational and communication nodes that unburden the
becomes difficult to manage requests and load balancing issues Cloud from multi-tasking while handling large amounts of user
arise in the cloud [3]. Cloud and fog concept is introduced to requests [4]. The key concern with the fog computing is to
manage a large number of requests. manage the incoming requests from the user in a competent
189
fog for computations and other required operations. The fog The following equation gives the number of houses in a
entertains the users request by efficient utilization of resource. particular cluster of building.
This efficient utilization of resources is obtained by the various
Load Balancing techniques employed at fog. Virtualization b
is used for efficient utilization of resources. In virtualization BLT otal = (BLm ) (3)
multiple Virtual Machines (VMs) are created at a single server m=1
providing services to the users. The RT is the time duration Let Total number of user requests UR a Fog can handle is
between uses’r request sent to fog and a response is received given by the Eq. below.
against that request. The processing time (PT) is the time
during which the fog process the user’s request The load u
balancing algorithms and virtualization help in reducing the U RT otal = (U Rj ) (4)
RT and PT of the fog. The total number of VMs associated j=1
with a particular fog is given by the following equation. The PT of a user request is given in equation below where
is the processing time and Sc is the task assigned to each of
v
the request:
V MT otal = (V Mi ) (1)
i=1 m
n
PT = (Pck ∗ Sc ) (5)
The number of clusters of buildings in a region is specified c=1 k=1
by the following equation.
Response Time is calculated using the following Equation:
c
CLT otal = (CLn ) (2)
n=1
RT = DelayT ime + F inishT ime − ArrivalT ime (6)
190
V. L OAD BALANCING A LGORITHMS in this way equal numbers of the tasks are assigned to each
Load balancing is the process of fair distribution of re- virtual machine. No virtual machine is underutilized.
sources for efficient resource utilization. Resources may in-
clude nodes, network resources and processing units. Efficient D. Particle Swarm Optimization
resource utilization means to decrease the response, increase In PSO algorithm, solutions are known as particle and
throughput, and minimizing overheads. Details of the tech- population is known as the swarm. Each particle in PSO
niques are discussed below: consists of fitness value through which fitness function is
A. Round Robin evaluated. Each particle has a particular position in the entire
group. This algorithm is used to allocate users requests.
Round robin processes the request by selecting the virtual
machines randomly and then the workload gets allocated in the
circular manner. However it does not consider the processing E. Ant Colony Optimization
time. A request has to wait in queue if no virtual machine is ACO algorithm finds the suitable path from large range by
available. seeking the optimal path. In real-world ants roam to find food
B. Throttled and after finding food ants return back to their colony by laying
down a pheromone track. The other following ants follow the
Throttled algorithm assigns the workload uniformly to the same path to find food by searching the shortest path.
virtual machines. It takes care of the predefined amount of the
tasks are allocated to the virtual machine. Throttle maintains
index table with information of virtual machine. Throttle
assigns requests to that virtual machine that has the capacity
to fulfill end-user demand. Data center controller receives the
request and is then sent to the load balancer to allocate virtual
machine. From the index table, load balancer search for the
available virtual machine and allocate the virtual machine to
respective tasks.
C. Active Vm Algorithm
Active Vm distributes the workload equally to all the virtual
machines. Requests sent to datacenter are kept in queue. So
191
F. Odds Algorithm TABLE I: Response Time Summary
To search the virtual machine that can available to perform
Algorithms Average Minimum Maximum
useful tasks, odds algorithm use optimal stopping technique. (ms) (s) (ms)
In odds technique, virtual machines are checked one by one Round Robin 116.01 41.79 584.69
and stop on the first virtual machine that has enough capacity Active Vm 63.03 40.56 87.37
to perform tasks. ACO 143.60 42.63 577.13
Throttled 63.07 38.70 87.37
VI. S IMULATIONS AND D ISCUSSION Odds 144.10 42.63 581.35
PSO 63.02 38.70 87.37
A system having Core i3 processor, four GB ram and
windows 7 operating system is used for simulations. In this
thesis, for simulation purposes, the comparison of PSO is 150
done against round robin, Active Vm, ant colony optimization,
throttled and odds algorithm. The performance parameters
considered are response time and response time. CloudAnalyst
160
140
0
RR Active ACO Throt Odds PSO
120 Algorithms
Response Time ( ms )
80
Round Robin
60 Active Vm
Round Robin ACO
40 Active Vm Throttle
ACO Odds
Throttle PSO
20 Odds
PSO 120
0
Cluster 1 Cluster 2 Cluster 3 Cluster 4 100
Processing Time ( ms )
Clusters
80
Fig. 2: Average RT of Clusters.
60
colony optimization. Round robin perform better than both Fig. 4: Average PT Of Fogs
odds algorithm and ant colony optimization. Table I shows
the response time summary of above mentioned algorithms.
PSO outperforms other algorithms. PSO optimizes the load on the fog by efficient allocation
Fig. 3 display average response times obtained by robin, of recourses. PSO finds the most optimal solution for every
Active Vm, ACO, PSO, throttled and odds algorithm is 116.01 task and schedules the requests in most efficient way. Results
ms, 63.03 ms, 143.60 ms, 63.07 ms, 144.10 ms and 63.02 ms show that it is better algorithm for optimizing processing time
respectively. in comparison to robin, Active Vm, ACO, throttled and odds
192
algorithm. Results show a very little difference between odds R EFERENCES
algorithm and ant colony optimization. Round robin perform [1] Z. Cao, J. Lin, C. Wan, Y. Song, Y. Zhang, and X. Wang, “Optimal
better than both odds algorithm and ant colony optimization. cloud computing resource allocation for demand side management in
Table II shows the processing time summary of above men- smart grid,” IEEE Transactions on Smart Grid, vol. 8, no. 4, pp. 1943–
1955, 2017.
tioned algorithms. PSO outperforms other algorithms. [2] J. Kim and Y. Kim, “Benefits of cloud computing adoption for smart
grid security from security perspective,” The Journal of Supercomputing,
TABLE II: Processing Time Summary vol. 72, no. 9, pp. 3522–3534, 2016.
[3] M. A. Al Faruque and K. Vatanparvar, “Energy management-as-a-service
over fog computing platform,” IEEE internet of things journal, vol. 3,
Algorithms Average Minimum Maximum no. 2, pp. 161–169, 2016.
(ms) (s) (ms) [4] M. Chiang and T. Zhang, “Fog and iot: An overview of research
Round Robin 66.33 0.30 531.56 opportunities,” IEEE Internet of Things Journal, vol. 3, no. 6, pp. 854–
Active Vm 13.40 0.17 26.12 864, 2016.
ACO 93.95 1.14 530.31 [5] M. Hussain, M. S. Alam, M. Beg et al., “Fog computing in iot
Throttled 13.45 0.30 26.12 aided smart grid transition-requirements, prospects, status quos and
Odds 94.59 0.64 530.53 challenges,” arXiv preprint arXiv:1802.01818, 2018.
[6] G. Ramadhan, T. W. Purboyo, and R. Latuconsina, “Experimental
PSO 13.39 0.17 26.12 model for load balancing in cloud computing using throttled algorithm,”
International Journal of Applied Engineering Research, vol. 13, no. 2,
pp. 1139–1143, 2018.
[7] F. Luo, J. Zhao, Z. Y. Dong, Y. Chen, Y. Xu, X. Zhang, and K. P. Wong,
“Cloud-based information infrastructure for next-generation power grid:
100
Conception, architecture, and applications,” IEEE Transactions on Smart
90
Grid, vol. 7, no. 4, pp. 1896–1912, 2016.
[8] S. Bera, S. Misra, and J. J. Rodrigues, “Cloud computing applications
80 for smart grid: A survey,” IEEE Transactions on Parallel & Distributed
Overall Processing Time ( ms )
193
2018 International Conference on Frontiers of Information Technology (FIT)
Abstract—Recognition of emotion from text is emerging as steps. First, we make a lexicon of words, then we use
new field of research which can add another dimension to the both the human judgment power and WordNet [1]. We used
analysis of textual data. People directly or indirectly express their seed words to automatically generate emotion lexicons using
emotion through facial expressions, speech or written content.
People are putting a lot of textual content on social media WordNet. Second one is, consider ”:)”, ”¿: O” and ”ROFL”
and microblogging platforms. This data can be very useful in through a large emoticons dataset and third one is to reduce
discovering different aspects including emotions. Recognition of many problems linked with keyword spotting techniques us-
emotion is a very complex task. We propose a technique which ing various heuristic rules. Our approach classifies text into
uses word lexicons, emoticons, negations and intensity modifiers. Ekman’s six emotion classes i.e. Happiness, Sadness, Anger,
Our approach follows Ekman’s emotion model. Word lexicons
are generated using WordNet from a set of seed words collected Fear, Disgust and Surprise [2]. Therefore, we can say that
manually. Twitter is used to collect emoticons. Negations and described technique can analyze the online fragmented text
intensity modifiers are collected manually from literature. Several communication that is rich in contextual expressions.
heuristic rules are designed to support our keyword spotting We concluded that our proposed algorithm gives promising
technique. Experiments on data obtained from Twitter shows results which shows high precision and recall for classification.
high precision and recall for all emotional classes.
Index Terms—emotion recognition, natural language process- The detailed study of results and methods is discussed in
ing, text analysis, text classification further sections.
are applied if more than one emotion keyword is found the quality and size of dataset. The dataset should contain
in a sentence. . In [19], a computational keyword spotting annotated emotion rich text. In [20], a large dataset of tweets
technique is proposed. It combines syntactic and semantic is collected. Support Vector Machines and majority classifier
information to detect emotion. The emotion matched with the is used to predict emotion from input text.
keywords is considered the emotion of the sentences. There a This alternative to keyword identification also increase
lot of techniques to create dictionaries for different emotions. problems: lack of semantic precision large corpus required for
One of the free tool available is WordNet, which categorizes built execution and rejection in many cases and other syntactic
words in the form of set of synonyms. constructions [21].
294
is based on lexicons generated from Wordnet synsets and Algorithm 1 Generation of Emotion Lexicon Dataset
emoticons from twitter and other social networking sites. Some
1: Input: Seed Words for lexicon generation
common abbreviations and slangs are also included in the
2: Output: Emotion Lexicon Dataset
emoticon dataset. Emoticon dataset is generated manually.
3:
Both lexicon and emoticons are associated with weights which
4: SW = Set of Seed Words
relate to corresponding to Ekman’s six basic emotions i.e. hap-
5: SW.type = Emotion Category of Word
piness, sadness, fear, anger , surprise, disgust. Each emotion
6: // 1 = Happy, 2 = Sad, 3 = Angry,
is weighted between 0 and 1. We chose Ekman’s emotional
7: // 4 = Fear, 5 = Disgust, 6 = Surprise
model because it is most commonly used for textual emotion
8: SW.Iter = Iteration of Seed word
classification.
9: LD = Lexicon Dataset
10: LD.emo = 6-tuple of weights of emotion categories
11:
A. Lexicon Dataset
12: for i = 1 to I = 3 do
The method we use to generate lexicons is proposed by 13: N = number of seed words
[23]. We start with a small set of seed words which are unam- 14: for n = 1 to N do
biguously semantically synonymous to the relevant emotional 15: Find synsets of SW[n]
category. These seed words act as a starting point to start 16: Add synsets to SW
collecting lexicons from WordNet. Our approach assumes that 17: Category of synsets = SW[n].type
the closer the word is to the seed words semantically higher 18: Iteration of synsets = (SW[n].Iter + 1)
the weightage of the emotion. In order to create the set of
seed words we conducted a study with three persons related to 19:
20: if SW[n] do not exist in LD then
English literature. We asked them to list at least six words for 21: Add SW[n] to LD
each emotion which is synonymous to the relevant emotion. 22: end if
We selected these words to be our seed words. Such words 23:
were “satisfaction” for happiness, “depression” for sadness, 24: value = 1 - (SW[n].Iter * 0.1)
“panic” for fear, “frustration” for anger, “unexpected” for 25: if SW[n] exist in LD then
surprise, “filth” for disgusting etc. 26: if value > LD.emo[SW[n].type] then
Then we use these seed words to generate synsets from 27: LD.emo[SW[n].type] = value
WordNet. A synset is a group of synonyms which are seman- 28: end if
tically similar. Many words can have more than one meaning 29: else
and can show different intensities of different emotions. So, it 30: LD.emo[SW[n].type] = value
is likely that they can be found in more than one synset. The 31: end if
rules for generation of lexicons are described in Algorithm 1. 32: Delete SW[n] from SW
The algorithm explains that the WordNet is searched for 33: end for
synsets based on the seed words. The word used to generate 34: end for
synsets is then added to the lexicon dataset with calculated
weights of the emotional categories. These synsets are then
added to the list of seed words with the emotion category
and the iteration number. This step is repeated I times. In B. Emoticon Dataset
our case I is 3. 1400 words are commonly used in the 70%
of communication. I is set to 3 because in three iterations,
more than 3700 lexicons are generated which can sufficiently An emoticon is a symbol that is used to express emotion by
recognize the emotion in text. With each iteration the weights typographically making a facial expression, such as :(, :’(, :).
are penalized by 10 percent. When all lexicons have been Emoticons are used to tag the text with an emotion, showing
generated, they are stored in a file. The lexicon dataset consists more explicitly that how a text should be perceived.
of more than 3700 words. With the wide use emoticon in online communication plat-
forms such as Facebook and Twitter we argue that any textual
TABLE II: A Small Portion of Lexicon Dataset emotion recognition algorithm should consider emoticons.
Unfortunately, WordNet or any other lexical database do not
Word Happy Sad Anger Fear Disgust Surprise contain emoticons. Therefore, we manually created the emoti-
frustrated 0 0 0.6 0 0 0 con dataset. We collected most popular and commonly used
satisfaction 1.0 0 0 0 0 0
emoticons from Skype, Twitter and Facebook. Emoticons are
feisty 0 0.4 0 0.4 0.4 0
offend 0 0.25 0.25 0 0 0.25 quite self-explanatory in expressing their emotional semantics.
sudden 0 0 0 0 0 1.0 So, we assigned them with suitable weights. We also included
slangs and abbreviations like “OMG”, “LOL”, “damn”, “yuck”
in our emoticon dataset.
295
TABLE III: A Small Portion of Emoticon Dataset Algorithm 2 Emotion Recognition
Emoticon Happy Sad Anger Fear Disgust Surprise
1: Input: A string T of text
:( 0 1.0 0 0 0 0 2: Output: Recognized Emotion E from the text
:) 1.0 0 0 0 0 0 3:
lol 1.0 0 0 0 0 0 4: Split T into Sentences S
damn 0 0.5 1.0 0 0 0
5: N = number of sentences
6: Total Weights = 0
C. Negations and Intensity Modifiers 7: // Happiness = Sadness = Anger = 0
8: // Fear = Disgust = Surprise = 0
Negations are a very important part of a language. Negation
9:
can flip the emotion of a sentence. Like “I am happy” and “I
10: for n = 1 to N do
am not happy” have opposite emotional semantics. Keyword
11: if S[n] has Exclamation Mark then
spotting and machine learning techniques have difficulty deal-
12: Increase emotion weights of S[n] by 20%
ing with negations. So we created a dataset of commonly used
13: end if
negation from the literature, such as ”no”, ”not”, ”wouldn’t”
14: if S[n] has “?!” then
etc. We have developed some heuristics to tackle this problem.
15: Set S[n]’s emotion weight of surprise to 1
There are some adverbs which highly increase the emotional
16: end if
intensity of a sentence. But keywords spotting techniques also
17: Split Sentence S[n] into words W
neglect this. Adverbs don’t carry emotional semantic them-
18: M = number of words
selves but have a huge impact on the intensity of emotions. So
19: for m = 1 to M do
we have collected commonly used adverbs from the literature
20: if W[m] is emoticon then
which act as intensity modifiers of emotions, such as “highly”,
21: W[m] weights = Emoticon’s weights
“extremely”, “eminently” etc.
22: Break for loop
IV. E MOTION R ECOGNITION 23: else
24: if W[m] is in uppercase then
Briefly, our approach gets text as input, parse it into sen-
25: Increase weights of W[m] by 50%
tences, Parse sentences into words, compare these words to
26: end if
lexicon dataset and then applies heuristics to the text. Java’s
27: if W[m] is preceded by intensity modifier
BreakIterator class is used to parse text.
then
We have developed several heuristics to overcome issues
28: Increase weights of W[m] by 50%
associated with keyword spotting techniques, such as detection
29: end if
and handling of negations. Finally, the algorithm computes the
30: end if
overall emotion of the text.
31: end for
The heuristics which apply to text are as follows:
32: if S[n] has negation then
• If there is an exclamation mark in the sentence then the
33: Flip emotion weights of S[n]
emotion gets intensified by 20 34: //Assign Happiness weight to Sadness and
• If the sentence consists of “?!” or “!?” or any other
35: // Assign Sadness + Anger weights to Happiness
random combination of characters then surprise weight 36: end if
is set to zero. 37: Add emotion weights of S[n] to Total Weights
• If a word is in uppercase the emotion is intensified by 50
38: end for
• If a word is preceded by intensity modifier then emotion
39: return emotion with highest weight
get intensified by 50
• If a negation is spotted in a sentence, the values of posi-
tive emotion (happiness) and negative emotions(sadness,
anger, fear and disgust) get switched. V. R ESULTS
Algorithm 2 explains the identification of emotion in text. We have collected random 150 tweets to evaluate our
For example, we take a sentence “I am not brokenhearted!”. system. The collected dataset was presented to five participants
In this sentence the algorithm will only detect the emotion from National University of Sciences and Technology (NUST).
keyword “brokenhearted”, which has the following emotion The participants were fluent in English language and were
vector: [0.0, 0.8, 0.0, 0.0, 0.0, 0.0], which only carries the familiar with the modern day slangs and abbreviations. The
weight of sadness. But there is a negation in the sentence so participants were regularly using social networks and mi-
the emotional weights gets switched. Now happiness carries croblogging platforms. The participants annotated the dataset
weight of 0.8. There is an exclamation mark at the end of based on the Ekman’s six emotions. The emotion of the tweet
the sentence so the weights of the sentence get intensified by was selected through a voting based approach. The emotion
20%. So the new vector will be [0.80, 0.0, 0.0, 0.0, 0.0, 0.0]. selected by the majority of the participants was selected for
The dominant emotion of the sentence is happiness. the respective tweet.
296
TABLE IV: Precision, Recall, and F-Measure for Each [3] N. Sharma, R. Pabreja, U. Yaqub, V. Atluri, S. Chun, and J. Vaidya,
“Web-based application for sentiment analysis of live tweets,” in
Emotion Class Proceedings of the 19th Annual International Conference on Digital
Government Research: Governance in the Data Age. ACM, 2018, p.
Emotion Precision Recall F-Measure 120.
Happy 0.9512 0.8297 0.886 [4] F. Poecze, C. Ebster, and C. Strauss, “Social media metrics and sentiment
Sad 0.7941 0.7941 0.7941 analysis to evaluate the effectiveness of social media posts,” Procedia
Anger 0.7222 0.9286 0.8125 computer science, vol. 130, no. C, pp. 660–666, 2018.
Fear 0.9 0.6833 0.7832 [5] A. Summa, B. Resch, and M. Strube, “Microblog emotion classification
Disgust 0.9230 0.9230 0.9230 by computing similarity in text, time, and space,” in Proceedings of the
Surprise 0.846 0.7333 0.7856 Workshop on Computational Modeling of People’s Opinions, Personality,
Average 0.8576 0.8153 0.8307 and Emotions in Social Media (PEOPLES), 2016, pp. 153–162.
[6] A. Sen, M. Sinha, S. Mannarswamy, and S. Roy, “Multi-task repre-
sentation learning for enhanced emotion categorization in short text,”
in Pacific-Asia Conference on Knowledge Discovery and Data Mining.
Out of 150, 109 tweets were detected correctly. So the Springer, 2017, pp. 324–336.
system shows 73% accuracy. But accuracy alone is not a [7] J. A. Russell, “A circumplex model of affect.” Journal of personality
and social psychology, vol. 39, no. 6, p. 1161, 1980.
good measure for classification. So we have also computed [8] U. Gupta, A. Chatterjee, R. Srikanth, and P. Agrawal, “A sentiment-
precision, recall and f-measure as shown in TABLE IV. and-semantics-based approach for emotion detection in textual conver-
Our approach show very high precision for happiness, fear sations,” arXiv preprint arXiv:1707.06996, 2017.
[9] V. K. Jain, S. Kumar, and S. L. Fernandes, “Extraction of emotions from
and disgust i.e. 0.9512, 0.9 and 0.923 respectively. Precision multilingual text using intelligent text processing and computational
for sadness and anger is slightly low i.e. 0.7941 and 0.7222 re- linguistics,” Journal of Computational Science, vol. 21, pp. 316–326,
spectively. We have achieved a very high recall value for anger 2017.
[10] L. Dini and A. Bittar, “Emotion analysis on twitter: The hidden
and disgust i.e. 0.9286 and 0.923 respectively. The f-measure challenge.” in LREC, 2016.
scores range from 0.7832 to 0.923 which is significantly above [11] S. M. Mohammad and F. Bravo-Marquez, “Wassa-2017 shared task on
baseline scores for all classes. emotion intensity,” arXiv preprint arXiv:1708.03700, 2017.
[12] X. Kang, F. Ren, and Y. Wu, “Exploring latent semantic information
for textual emotion recognition in blog articles,” IEEE/CAA Journal of
VI. C ONCLUSIONS AND F UTURE W ORK Automatica Sinica, vol. 5, no. 1, pp. 204–216, 2018.
We have proposed an approach to identify emotion in text [13] I. Perikos and I. Hatzilygeroudis, “Recognizing emotions in text using
ensemble of classifiers,” Engineering Applications of Artificial Intelli-
which can handle formal and informal communication. We gence, vol. 51, pp. 191–201, 2016.
have incorporated emoticons and slangs commonly used in [14] S. P. Tiwari, M. V. Raju, G. Phonsa, and D. K. Deepu, “A novel approach
the communication carried out on social networks and mi- for detecting emotion in text,” Indian Journal of Science and Technology,
vol. 9, no. 29, 2016.
croblogging platforms. In addition, we have handles negations [15] N. Kanger and G. Bathla, “Recognizing emotion in text using neural
by making a dataset of commonly used negations. Common network and fuzzy logic,” Indian Journal of Science and Technology,
machine learning based approaches are unable to handle vol. 10, no. 12, 2017.
[16] S. Grover and A. Verma, “Design for emotion detection of punjabi text
negations. We have also collected commonly used adverbs using hybrid approach,” in Inventive Computation Technologies (ICICT),
which intensify the emotion of a sentence. Our approach gets International Conference on, vol. 2. IEEE, 2016, pp. 1–6.
text as input, matches the words with our lexicon and emoticon [17] A. Joshi, V. Tripathi, R. Soni, P. Bhattacharyya, and M. J. Carman,
“Emogram: An open-source time sequence-based emotion tracker and
datasets, set the the emotional weights accordingly. Various its innovative applications.” in AAAI Workshop: Knowledge Extraction
heuristics are designed to support the algorithm. Our approach from Text, 2016.
show very high precision for happiness, disgust and fear. F- [18] D. Ghazi, D. Inkpen, and S. Szpakowicz, “Prior and contextual emotion
of words in sentential context,” Computer Speech & Language, vol. 28,
measure scores are significantly above baseline scores for all no. 1, pp. 76–92, 2014.
the classes. [19] H. Binali, C. Wu, and V. Potdar, “Computational approaches for emotion
In future, we will improve the performance of our approach. detection in text,” in Proceedings of the IEEE international conference
on digital ecosystems and technologies (DEST 2010). IEEE, 2010, pp.
We have achieved relatively lower precision for sadness and 172–177.
anger. We will more focus on these classes. Try to design [20] S. M. Mohammad and S. Kiritchenko, “Using hashtags to capture fine
heuristics which can help us improve the performance for these emotion categories from tweets,” Computational Intelligence, vol. 31,
no. 2, pp. 301–326, 2015.
classes. [21] A. Neviarouskaya, H. Prendinger, and M. Ishizuka, “Affect analysis
We will refine our lexicon dataset and extend the emoticon model: novel rule-based approach to affect sensing from text,” Natural
dataset. Handling negations is a very complex task. For Language Engineering, vol. 17, no. 1, pp. 95–135, 2011.
[22] Y.-S. Seol, D.-J. Kim, and H.-W. Kim, “Emotion recognition from
example, ”Not very disappointed, just a little bit” does not text using knowledge-based ann,” in ITC-CSCC: International Technical
imply that we flip the emotional weights but to decrease the Conference on Circuits Systems, Computers and Communications, 2008,
weights. We will study more about semantics and try to handle pp. 1569–1572.
[23] C. Ma, H. Prendinger, and M. Ishizuka, “Emotion estimation and reason-
the issue. Double negations in a sentence are also a issue we ing based on affective textual interaction,” in International Conference
will like to resolve. on Affective Computing and Intelligent Interaction. Springer, 2005, pp.
622–628.
R EFERENCES
[1] G. A. Miller, “Wordnet: a lexical database for english,” Communications
of the ACM, vol. 38, no. 11, pp. 39–41, 1995.
[2] P. Ekman, “An argument for basic emotions,” Cognition & emotion,
vol. 6, no. 3-4, pp. 169–200, 1992.
297
,QWHUQDWLRQDO&RQIHUHQFHRQ2SHQ6RXUFH6\VWHPVDQG7HFKQRORJLHV,&2667
Abstract—Vehicle-2-Everything (V2X) communication consists in optimal time which leads to the low latency rate in V2X
of mainly two types of communication, one is Vehicle to Vehicle communication.
(V2V) and the other is a vehicle to infrastructure(V2I).V2X has Vehicle to everything (V2X) communication is basically
many uses including road safety that can help avoid collisions to
infotainment like traffic information, navigation, and multimedia. passing data from vehicle to any element which impact the
As the driver assistance systems are rapidly growing, user inter- vehicle. V2X brings many services like automatic driving,
action and acceptance of these systems has to be evaluated. The Traffic information, Current and upcoming position, vehicle
aim of this work is to improve an already existing communication control information, traffic and road safety services. IEEE
model for V2X communication by minimizing the handover dura- 802.11p has been standardized the V2X. In IEEE 802.11p, due
tion. IEEE 802.11p has been a standard for V2X communication.
In IEEE 802.11p, due to high density, the data collision occurs. to high density the data collision is occurred. So we replace
So we replaced IEEE 802.11p with 5G SDN based approach. IEEE 802.11p with 5G SDN based approach.
Now the current communication simulation implements the X2 We have organized the remaining portion of this paper as
interface based Vehicle Unit Inter-system handover (VUIH)” in followed; In Section II, the relevant literature is reviewed
a partially 5G SDN-based V2X communication. Our proposed whereas the proposed system model and methodology in-
approach has achieved much better results than the baseline
results. We have optimized the preparation time of handover by cluding handover management and experimental explanation
1.24ms and completion time of handover by 0.8308ms. are considered in section III and IV. Section V presents the
Index Terms—Handover, V2X Communication, Software De- simulation and discussion about the proposed methodology.
fined Networks, VU Vehicle Unit. Finally section VI infers the overall simulation and results in
the form of conclusion.
I. I NTRODUCTION
Vehicular communications, especially Vehicle-to- II. L ITERATURE R EVIEW
infrastructure (V2I),use several wireless technologies. The rapid evolution of mobile data information has been
However, they are either capacity limited or coverage limited. extensively known and accountable, therefore the vehicular
Scalability is another issue considered in such platforms. communication trade is getting ready to deal with a 100x
The Vehicular Ad-hoc Networks (VANETs) that utilize the rise of data by 2021 over 2011 [1]. Additionally, associated
networks standard IEEE 802.11p are proposed to be used individuals see their smartphone devices as an extension of
for vehicular communications. Efficient traffic control and its their working area whereas on the move and therefore the
management is a prime parameter to avoid road accidents continuous enhancement to the mobile specification is turning
by enabling vehicles to cooperate, in order to balance the into progressively necessary to support the performance needs
traffic flow steering wheels angles and best end to end for the persistent wireless connections.
transportation. Various research works have been carried out Furthermore. due to the growing demand of wireless com-
on improving the Vehicle-to-Everything (V2X) architecture. In munications for a wide range of purposes, wireless network
this research we explore other important aspects that improves infrastructure required more flexibly and to make them adap-
performance and robustness specifically for Vehicle to Vehicle tive to match the actual demands. In this context during the
(V2V),Vehicle to a person (V2P) and Vehicle to Infrastructure recent years Software Defined Networking (SDN) [2] and
(V2I) communications by using LTE since 3GPP release 16 Network Function Virtualization (NFV) [3] have been adopted
also supports V2X communications which are a potential heaps of courtesy from the analysis community of research and
candidate for V2V in terms of latency, reliability, and standardized organizations. SDN provides versatile ways that
throughput. For the procurement of these set requirements to analyze and manage network with efficiency with separation
and to address major capacity and coverage issues we have to of control, data plane and virtualization allows the hardware
utilize radio resource management algorithms or re-encounter infrastructure to be provided as a service by abstraction and
the previous algorithms for decreasing the interference and sharing of physical resources [4]. SDN and NFV are projected
enabling a better quality of Services( QoS). The signal from to play a significant role in 5G [5]. One the other side, SDN
one place to another place should be transmitted or received paradigm separates the control and data plane [6]. Control
,(((
,QWHUQDWLRQDO&RQIHUHQFHRQ2SHQ6RXUFH6\VWHPVDQG7HFKQRORJLHV,&2667
,QWHUQDWLRQDO&RQIHUHQFHRQ2SHQ6RXUFH6\VWHPVDQG7HFKQRORJLHV,&2667
,QWHUQDWLRQDO&RQIHUHQFHRQ2SHQ6RXUFH6\VWHPVDQG7HFKQRORJLHV,&2667
,QWHUQDWLRQDO&RQIHUHQFHRQ2SHQ6RXUFH6\VWHPVDQG7HFKQRORJLHV,&2667
We set the parameters for simulation in which data rate is and data plane. These results fulfill the control plane delay
set upto 1Gbps, UDP packets are set to 1000 and simulation requirements of 5G networks.
interval is 10ms as seen in Table I.
R EFERENCES
[1] C. V. N. Index, “Cisco visual networking index: global mobile data
traffic forecast update, 2014–2019,” Tech. Rep, 2015.
[2] H. Chan, D. Liu, P. Seite, H. Yokota, and J. Korhonen, “Requirements
for distributed mobility management,” Tech. Rep., 2014.
[3] G. Liu and D. Jiang, “5g: Vision and requirements for mobile commu-
nication system towards year 2020,” Chinese Journal of Engineering,
vol. 2016, 2016.
[4] P. Ameigeiras, J. J. Ramos-Muñoz, L. Schumacher, J. Prados-Garzon,
J. Navarro-Ortiz, and J. M. López-Soler, “Link-level access cloud
architecture design based on sdn for 5g networks.” IEEE network,
vol. 29, no. 2, pp. 24–31, 2015.
[5] J. Costa-Requena, J. L. Santos, V. F. Guasch, K. Ahokas, G. Premsankar,
S. Luukkainen, O. L. Pérez, M. U. Itzazelaia, I. Ahmad, M. Liyanage
Fig. 7: Comparison Chart. et al., “Sdn and nfv integration in generalized mobile network archi-
tecture,” in Networks and Communications (EuCNC), 2015 European
Conference on. IEEE, 2015, pp. 154–158.
[6] T. Taleb, M. Corici, C. Parada, A. Jamakovic, S. Ruffino, G. Karagiannis,
and T. Magedanz, “Ease: Epc as a service to ease mobile core network
deployment over cloud,” IEEE Network, vol. 29, no. 2, pp. 78–88, 2015.
[7] Y. Zhou, K. Chen, J. Zhang, J. Leng, and Y. Tang, “Exploiting the
vulnerability of flow table overflow in software-defined network: Attack
model, evaluation, and defense,” Security and Communication Networks,
vol. 2018, 2018.
[8] H. I CL, “S., xu, z., et al.:‘new paradigm of 5g wireless internet’,” IEEE
J. Sel. Areas Commun, vol. 34, no. 3, pp. 474–482, 2016.
[9] P. Khuntia and R. Hazra, “Resource sharing for device-to-device com-
munication underlaying cellular network,” in 2018 4th International
Conference on Recent Advances in Information Technology (RAIT).
IEEE, 2018, pp. 1–5.
[10] X. Xiaodong, Z. Huixin, D. Xun, H. Yanzhao, T. Xiaofeng, and Z. Ping,
“Sdn based next generation mobile network with service slicing and
trials,” China Communications, vol. 11, no. 2, pp. 65–77, 2014.
Fig. 8: Simulation graph. [11] A. Nakao, P. Du, Y. Kiriha, F. Granelli, A. A. Gebremariam, T. Taleb,
and M. Bagaa, “End-to-end network slicing for 5g mobile networks,”
Journal of Information Processing, vol. 25, pp. 153–163, 2017.
Graph in Fig.8 shows our simulation results of Handover [12] S. Kukliński, Y. Li, and K. T. Dinh, “Handover management in sdn-
preparation and completion time between data rate and delay based mobile networks,” in Globecom Workshops (GC Wkshps), 2014.
IEEE, 2014, pp. 194–200.
time. Our system has achieved much better results than the [13] L. M. Contreras, L. Cominardi, H. Qian, and C. J. Bernardos, “Software-
baseline results. We have optimized the preparation time of defined mobility management: Architecture proposal and future direc-
HO by 1.24ms and completion time of HO by 0.8308ms. tions,” Mobile Networks and Applications, vol. 21, no. 2, pp. 226–236,
2016.
[14] L. Richardson, M. Amundsen, and S. Ruby, RESTful Web APIs: Services
VI. C ONCLUSIONS for a Changing World. ” O’Reilly Media, Inc.”, 2013.
In this paper, we proposed the implementation of X2- [15] J. Prados-Garzon, O. Adamuz-Hinojosa, P. Ameigeiras, J. J. Ramos-
Munoz, P. Andres-Maldonado, and J. M. Lopez-Soler, “Handover imple-
Based VU Intersystem handover in a partially 5G Software mentation in a 5g sdn-based mobile network architecture,” in Personal,
Defined Network (SDN) based. We have built a simulator Indoor, and Mobile Radio Communications (PIMRC), 2016 IEEE 27th
for this network within the ns-3 setup. The application of all Annual International Symposium on. IEEE, 2016, pp. 1–6.
network elements such as exchange the DP and OpenFlow
protocol are fulfilled by simulation We verified our proposed
system experimentally, we achieved an HO Preparation time
is 5.7ms and a HO Completion time is 7.48ms which is
better than the baseline results where HO Preparation and
Completion time is 6.94ms and 8.31ms. Our implementation
and comparisons to existing implementation is shown in Fig.8
and Fig.7 respectively and in addition to that, still, there are
some bugs (as we found in ns-3 discussion forums) related
to X2 handover. Unrelated to these limitations, more analysis
of this simulation is possible by permitting the built-in trace
supports such as PHY Layer Traces, MAC Layer Traces, RLC
Traces, and PDCP Traces etc. So that we can do a quality
research simulation within these limits of this simulator. In
5G we have to achieve better low latency rate for control
Efficient Resource Distribution in Cloud
and Fog Computing
Abstract. Smart Grid (SG) is a modern electrical grid with the com-
bination of traditional grid and Information, Communication and Tech-
nology. SG includes various energy measures including smart meters and
energy-efficient resources. With the increase in the number of Internet
of Things (IoT) devices data storage and processing complexity of SG
increases. To overcome these challenges cloud computing is used with SG
to enhance the energy management services and provides low latency. To
ensure privacy and security in cloud computing fog computing concept is
introduced which increase the performance of cloud computing. The main
features of fog are; location awareness, low latency and mobility. The fog
computing decreases the load on the Cloud and provides same facili-
ties as Cloud. In the proposed system, for load balancing we have used
three different load balancing algorithms: Round Robin (RR), Throttled
and Odds algorithm. To compare and examine the performance of the
algorithms Cloud Analyst simulator is used.
1 Introduction
1.1 Motivation
The rapid increase of technologies affects the computing power of SG so a cen-
tralized cloud platform is presented [6]. In the past decade, response time and
Efficient Resource Distribution in Cloud and Fog Computing 211
storage in the cloud has been a crucial issue, to handle this fog computing con-
cept is presented in [7]. The fog is an intermediate layer between the end user
and the cloud [8] that distribute resources closer to the customers, which reduces
response time. A large number of consumers are using cloud and fog which may
cause load balancing issues [9]. To handle a large number of request load bal-
ancing techniques are used in this paper.
1.2 Contribution
Cloud and fog integration with SG provides benefits to end user. Optimal alloca-
tion of resources to consumers is tackle by load balancing techniques, following
are the contribution of this paper:
2 Related Work
Cloud and fog computing provides shared resources which allocate VMs to reduce
the load on the cloud. Different load balancing algorithms are applied on the
cloud and fog to manage the tasks. The authors in [10], proposed dragonfly opti-
mization algorithm. The efficiency of this algorithm is outperformed than others
in the term of execution time, response time, tasks migration and load balance
between machines is higher than other algorithms. However, the limitation of
the proposed system is that there is no categorization of the cloud. In [11], the
authors proposed self-similarity-based load balancing mechanism. The efficiency
of SSLB is outperformed than other algorithms in the term of reducing cost and
achieve the best allocation of resources. However, the limitation of the proposed
system is that execution time increase. In [12], the authors presented cloud load
balancing Algorithm. CLB performs better than other load balancing algorithms
for reducing the processing time. However, the limitation of the proposed system
is that cost does not balance.
In [13], authors proposed an Ant Colony Optimization. Ant Colony Optimiza-
tion optimizes response time and balances the load. However, the limitation of
the proposed system is that high execution time occurs. In [14], authors proposed
Particle Swarm Optimization and Pattern Search Algorithm that achieve relia-
bility and power utilization. The proposed algorithm performs better for reduc-
ing cost. However, the limitation is that the main function can be improved for
better performance. In [15], authors proposed energy management system that
212 M. Mehmood et al.
The first issue is the satisfying budget constraint and the second issue is to reduce
the schedule length of applications. The small-time complexity is schedule to
reduce schedule length. The experiment results demonstrate that the proposed
algorithm produce low scheduling length in different condition and the different
budget limit.
In [18], authors presented the tail matching sub-sequence mobility prediction
based approach to predict the mobility pattern of users to put resources near to
users. For optimal offloading decision improved genetic algorithm is proposed. To
enhance the efficiency of genetic algorithm reproduction operations crossover and
mutation are used. Integer encoding is developed to consider multiple cloudlet
scenarios. In this scenario, several cloudlets for a user may be accessible. Exper-
imental demonstration on prediction is performed by using mobility dataset of
humans. Which is used for evaluation of cloudlets reliability and use as a feature
in offloading scenario.
Table 1 shows the summary of related work.
Remaining paper is categories as; in Sect. 2 literature works has been dis-
cussed, Sect. 3 illustrates the proposed system model, Sect. 4 contains load bal-
ancing algorithm, Sect. 4 simulation results and discussion and Sect. 5 is based
on conclusion.
3 System Model
In this paper, a cloud and fog based environment. Fog is an extension of the cloud.
The cloud and fog combination provide enhanced services to end-users, helps to
minimize latency and minimize load on cloud data centres. Fog computing is
an environment, that provides data storage, computation and communication
between devices and cloud.
Figure 1 shows a fog and the cloud-based platform that comprises three-
layered architecture, the first layer contains cloud, the second layer contains fog
and the third layer comprises of end users. The communication occurs between
all layers. As end users communicate with fog intern the fog interact with the
cloud. The cloud and fogs are connected with MG. The fogs transfer information
with the cluster of buildings directly or through the smart meter.
At the end user layer two clusters of buildings are assumed and these clusters
comprise of multiple homes. Buildings communicate with fog through the smart
meter and send the request to fog. The fog finds the nearest MG and establishes
a connection with MG. If MG has efficient power supply then respond back to fog
otherwise fog interact with the cloud to find another nearest MG. Fogs contain
multiple VMs to handle user requests to achieve minimum response time and
enhance the computing power which is important for load balancing in MG.
The fog layer manages latency problems and manages resources in an effective
way. Fog devices contain resources such as main memory, data storage and com-
putation. To run applications on a single hardware platform, virtual machines
contain processors for executing applications. In virtual machines, numerous
applications on processors are running to execute services. Fog is a middle layer
between cloud and end user.
Cloud layer contains data centres which provide on-demand storage, compu-
tation processing to end users and provide services with pay-as-you-go paradigm
(pay only for necessary services and resources) in cloud architecture. The con-
sumer only pays for the necessary services.
The world is divided into six regions, comprises of six continents as shown in
Table 2. Fogs are placed in different regions of the world. In this paper region 3
and region 4 are assumed in each region two clusters are considered and one fog
is placed per region. These two regions have the energy crisis and load balancing
issues as a large number of end users send the request to MG. The two fogs are
located in these two regions and respond to the cluster of buildings in the region
where fog are located. Different load balancing algorithms are used to manage
the user requests.
Efficient Resource Distribution in Cloud and Fog Computing 215
Region Region Id
North America 0
South America 1
Europe 2
Asia 3
Africa 4
Oceania 5
3.2 Throttled
Throttle algorithm contains the index table of all VMs which retains information
of all VMs and assigns the requests to VM that can fulfill demand request. For
allocating virtual machines requests sent to load balancer and identify which
machine is able to perform tasks efficiently. If all VMs are busy then it waits for
VMs in the queue.
are considered. All virtual machines have the same capacity and are assigned
to tasks base on users requests. The total numbers of virtual machines (m) and
requests (R) are mathematically demonstrated by Eqs. 1 and 2
m
TV M s = (V Mi ) (1)
i=1
In this paper, Cloud Analyst tool is used to perform the simulation. Cloud Ana-
lyst tool use to inspect the performance of load balancing algorithm in the same
scenario. The comparison is done between RR, throttled and Odds algorithm,
on the basis of fog’s response time, processing time and total cost.
4.1 Discussions
Figure 2 shows comparison between response times of RR, Throttled and Odds
Algorithm.
Figure 3 shows the Response time of each algorithm. Total Response time of
throttled in the scenario is 56.94 ms, Round Robin Response time is 57.35 ms
whereas the overall Response time of odds algorithm is 57.42 ms.
Figure 3 shows Response time of odds algorithm and throttled is much better
as compared to Round Robin because, Round Robin allocate the tasks without
looking the task allocation on VMs, which maximize the load on VMs and max-
imize the response time.
Figure 4 shows the average time a fog takes to fulfil a request when RR,
Throttled and odd algorithm are used. RR has high processing time because of
the same reason that is mentioned above.
Figure 5 shows the total cost of fogs processing RR, Priority-based and throt-
tled algorithm, each fog manage multiple requests from clusters. The total cost
includes VMs cost, the cost of the power station and data transfer cost.
218 M. Mehmood et al.
60
50
Overall Response Time
40
30
20
10
0
Odds Throttled RR
Algorithm
Fig. 2. Algorithms.
70
60
50
Response Time (ms)
40
30
20
Odds
10
Round Robin
Throttle
0
Cluster 1 Cluster 2 Cluster 3 Cluster 4
Clusters
12
2 Odds
Round Robin
Throttle
0
Fog 1 Fog 2
Fogs
250
200
150
Cost ($)
100
50
Odds
Round Robin
Throttle
0
Fog1 Fog2
Fogs
5 Conclusion
In the given scenario, as proposed in the paper the Throttled algorithm outper-
forms the Round Robin Algorithm. The odds algorithm under some conditions
dominates both Round Robin and Throttled algorithm. But as we increase the
number of requests on fogs from different users residing in the different regions
of the world the odds algorithm behave inefficiently and gets outperformed by
both Round Robin and Throttled algorithm.
References
1. Cao, Z.: Optimal cloud computing resource allocation for demand side management
in smart grid. IEEE Trans. Smart Grid 8(4), 1943–1955 (2017)
2. Kim, J.Y., Kim, Y.: Benefits of cloud computing adoption for smart grid security
from security perspective. J. Supercomput. 72(9), 3522–3534 (2016)
3. Faruque, A., Abdullah, M., Vatanparvar, K.: Energy management-as-a-service over
fog computing platform. IEEE Internet Things J. 3(2), 161–169 (2016)
4. Chiang, M., Zhang, T.: Fog and IoT: an overview of research opportunities. IEEE
Internet Things J. 3(6), 854–864 (2016)
5. Hussain, Md., Alam, M.S., Beg, M.M.: Fog Computing in IoT Aided Smart Grid
Transition-Requirements, Prospects, Status Quos and Challenges. arXiv preprint
arXiv:1802.01818 (2018)
6. Ramadhan, G., Purboyo, T.W., Latuconsina, R.: Experimental model for load
balancing in cloud computing using throttled algorithm. Int. J. Appl. Eng. Res.
13(2), 1139–1143 (2018)
7. Luo, F., Zhao, J., Dong, Z.Y., Chen, Y., Xu, Y., Zhang, X., Wong, K.P.: Cloud-
based information infrastructure for next-generation power grid: Conception, archi-
tecture, and applications. IEEE Trans. Smart Grid 7(4), 1896–1912 (2016)
8. Chiang, M., Zhang, T.: Fog and IoT: an overview of research opportunities.
IEEE Internet Things J. 3(6), 854–864 (2016). https://doi.org/10.1109/JIOT.2016.
2584538
9. Bera, S., Misra, S., Rodrigues, J.: Cloud computing applications for smart grid: a
survey. IEEE Trans. Parallel Distrib. Syst. (2014). https://doi.org/10.1109/TPDS.
2014.2321378
10. Branch, S.R., Rey, S.: Providing a load balancing method based on dragonfly opti-
mization algorithm for resource allocation in cloud computing (2018)
11. Li, C., et al.: SSLB: self-similarity-based load balancing for large-scale fog comput-
ing. Arab. J. Sci. Eng., 1–12 (2018)
12. Chen, S.L., Chen, Y.Y., Kuo, S.H.: CLB: a novel load balancing architecture and
algorithm for cloud services. Comput. Electric. Eng. 58, 154–160 (2017)
13. Dam, S., et al.: An ant-colony-based meta-heuristic approach for load balancing in
cloud computing. Appl. Comput. Intell. Soft Comput. Eng., 204–232 (2018)
14. Gabbar, H.A., Labbi, Y., Bower, L., Pandya, D.: Performance optimization of
integrated gas and power within MG using hybrid PSOPS algorithm. Int. J. Energy
Res. 40(7), 971–982 (2016)
15. Varela Souto, A.: Optimization and Energy Management of a Microgrid Based on
Frequency Communications (2016)
Efficient Resource Distribution in Cloud and Fog Computing 221
16. Armant, V., De Cauwer, M., Brown, K.N., O’Sullivan, B.: Semi-online task assign-
ment policies for workload consolidation in cloud computing systems. Future
Gener. Comput. Syst. (2018)
17. Chen, W., Xie, G., Li, R., Bai, Y., Fan, C., Li, K.: Efficient task scheduling for bud-
get constrained parallel applications on heterogeneous cloud computing systems.
Future Gener. Comput. Syst. 74, 1–11 (2017)
18. Shi, Y., Chen, S., Xiang, X.: MAGA: a mobility-aware computation offloading
decision for distributed mobile cloud computing. IEEE Internet Things J. 5(1),
164–174 (2018)
19. Devi, D.C., Uthariaraj, V.R.: Load balancing in cloud computing environment
using improved weighted round robin algorithm for nonpreemptive dependent
tasks. Sci. World J. 2016, 1–14 (2016)
20. Wickremasinghe, B., Buyya, R.: CloudAnalyst: a cloudsim-based tool for modelling
and analysis of large scale cloud computing environments. MEDC Project Rep.
22(6), 433–659 (2009)
Proceedings of the 24th International Conference on
Automation & Computing, Newcastle University,
Newcastle upon Tyne, UK, 6-7 September 2018
Abstract—Natural Language Processing (NLP) is a branch written from right to left, with 39 basic alphabets,
of Artificial Intelligence to help computers manipulate and typically follows Nastaliq writing script like the Arabic
interpret human languages. In NLP, text mining is a language [1].
technique to derive useful information from text. Topic 1.1.2 Urdu: Encoding
Model (TM) is a statistical model to extract topics from a
One of the basic purposes of a computer program is to
large collection of unlabeled text using NLP and machine
learning techniques. Several effective TM are available to recognize the characters in the text of any language.
fulfill the needs of various languages like English, German, However, in case of Urdu language, computer program
Arabic etc. However no compelling TM is available for poor faces the problem and use Unicode Transformation
resource South Asian language Urdu. In this research study, Format 8 (UTF-8) encoding to recognize the Urdu
our focus is to work on existing TM like Latent Dirichlet characters.
Allocation (LDA) to overcome the issues of Urdu language 1.2 Topic models
in text mining. We studied and analyzed LDA as an In recent years, the continuous use of electronic
unsupervised model for the Urdu topic identification. documents on the web has been extremely increased.
Hence, we studied LDA deeply for Urdu topic identification
Many news forums like newspapers or news agencies and
at two levels: Variational Bayes (VB) based LDA for Urdu
(VB-ULDA) with stemmer and without stemmer. journalists or column writers, releasing their daily
Experiments are performed on a self-created massive updated news and articles on their website respectively.
number of Urdu documents in four different corpora. This shift of news content and other electronic documents
Experimental study shows that VB-ULDA outperformed in on the internet has brought different challenging task for
the identification of topics from Urdu text documents as the researchers. How can one understand what is
compared to existing Urdu LDA (ULDA) in terms of happening in the collection of trillions of an unstructured
accuracy and efficiency and results also reveal the high document? This is progressively a common challenging
impact of stemming algorithm in Urdu topic identification. problem: filtering the emails of an organization, deep
Keywords-Urdu; topic modeling; Artificial Intelligence; Urdu
searching in a million of news articles for a topic,
language Processing; Natural Language Processing; identifying a scientific research area, etc. Being capable
variational Bayes of resolving such problems, NLP techniques have been
receiving a strong attention these days.
I. INTRODUCTION Topic model (TM) is a statistical [2] method that can
1.1 Urdu: Language uncover the hidden topics in a text document. TM is
mostly used in text mining to find the hidden linguistic
Urdu is a widely spoken language in South Asia and it structure in a text of a particular document. It can deal
is most spoken language in Pakistan and India. There are automatic understanding, summarization, searching, an
11 million and 300 million Urdu speakers in Pakistan and organization of a huge amount of electronic text data. A
in other parts of the world respectively [1]. It is very group of documents may have various topics and often a
different from the most famous languages like English, single document may be presenting a mixture of topics.
French, Chinese, etc. The Urdu language has different To select a specific topic through a computer program or
grammatical forms, various word antonyms, synonyms algorithm, TM technique will be needed.
and its words meanings varies according to factors such There are various types of TM are available to identify
as word order. Despite these complexities, limited the topics from a given text documents like the earliest
contribution has been dedicated to the Urdu language in topic model is Latent Semantic analysis (LSA) was
NLP, especially related to the English language which proposed by Papadimitriou et al. [3], Probabilistic Latent
has been dealt by many research communities. Majority Semantic analysis (PLSA) [4], the statistical approach for
of software, an Application Program Interface (APIs) and topic modeling was further improved by David et al. [5]
tools of NLP do not meet the requirements of Urdu that is Latent Dirichlet Allocation (LDA), Hierarchical
Language for Information Retrieval (IR), etc. To use topic models (hLDA) [6], Correlated Topic Model
these software and tools for Urdu data, substantial (CorrLDA) [7], chong et al. proposed Online variational
changes and additional work would be needed for the inference for LDA in [8] etc. One of the widely used
better performance. Unlike other languages, Urdu is topic model is LDA to extract and identify the
meaningful topics form the documents and in various II. RELATED WORK
other domain related to text [9]. In recent years, different techniques of TM have been
1.2.1 Latent Dirichlet Allocation introduced, which are briefly explained in the survey [9].
LDA [5] is a generative and Bayesian However, we will concentrate on approaches for Urdu
probabilistic model in which each text documents are topic modeling. Some LDA based proposed approaches
expressed as a mixture of topics. All topics have for topic modeling are discussed here.
distribution over vocabulary in the form of words. Authors used LDA as a tool to analyze the huge
Suppose a document consists of different subjects. That collection of news articles in [11]. A New York Times’
document has words which refer to a specific subject. nuclear technology articles from 1945 to 2015 as a corpus
The basic purpose of LDA algorithm is to map each was used in topic modeling by the authors. Authors
word(s) to their corresponding topic(s). LDA is a concluded that LDA performed well to identify topics,
probabilistic model that extracts a set of words from text analyze the patterns and trends in a large digital news
documents, that set of words has a statistical relation in a article. Arthur et al. [12] highlights the importance and
collection of text documents. Table 1 is representing showed the connection between mostly used inference
topics from a text given in Fig.2 and Fig.1 is describing a technique for TM which are Gibbs sampling, variational
graphical form of LDA. The process of generative and inference, maximum a posterior estimation, etc. Authors
iterative LDA algorithm is as follows: described how, and which techniques are good for
• For each topic K in a document K ε {1,….K}, βk accurate learning, smoothing, hypermeter selection and
is Multinomial distribution identification of topics in TM. Zhai et al. [13] proposed
• For every document d ε {1,….M}, ϴd is a variational inference for LDA to extract topics from a
multinomial distribution from Dirichlet large-scale dataset. Authors proposed technique
distribution with parameter α outperformed in terms of scalability, held-out likelihood
• For every word position n ε {1,…N}, choose a and execution speed.
latent topic Zn form multinomial distribution A stemming based LDA approach is proposed by
with parameter ϴ authors in [14]. Authors studied different stemming-based
methods for Arabic TM. LDA is used to generate topics
• Select observed word Wn from distribution βzn
from Arabic news articles. Authors discovered interesting
topics from the news articles by using stemming and
LDA together. Arabic LDA (ALDA) and Arabic Name
Entity Recognition (RenA) is proposed by the authors in
[15]. Authors successfully extracted name and
organization from the online news articles using proposed
RenA and identified topic with high accuracy from online
news using ALDA. RenA and ALDA outperformed as
compared to famous Arabic NER and existing LDA.
Marwa et al. [16] studied Arabic language text for TM.
Figure 1. LDA graphical form. Authors deeply studied and implemented unsupervised
LDA on Arabic text with two levels: stemming and
For English and other western languages, different selection of hyper parameters of LDA like α and β. Their
methods, tools, and resources are available for topic study shows that LDA technique is very effective for
modeling, classification and identification like LDA [5], topic identification in Arabic text with a right selection of
and other effective techniques like term frequency- hyper parameters and strong stemming algorithm.
inverse document frequency (TF-IDF) [10], topic An experimental study is conducted by the author in
Unigram [10], cache model [10], etc. However, there is [17] to identify topics in Chinese text. Author extracts
no compelling research available for the Urdu language topics using LDA from Center for Chinese Linguistics
as it has lack of resources, annotated datasets, and corpus and compared the results with K nearest
complex morphological structure. In this research study, neighbours algorithm, TF-IDF schema, and probabilistic
we will work on the identification of Urdu topics by latent semantic indexing model. Results showed that
explaining TM for Urdu in the research community. LDA performs better in term of precision as compared to
Furthermore, we will deeply study and implement other existing techniques. Xia et al. [18] classified
unsupervised LDA using VB for Urdu topic deification. Chinese text using LDA. Authors combined LDA with
We will also create an effective dataset for this purpose. Support Vector Machine (SVM) and LDA with TF-IDF
The rest of the paper is organized as follows: the and compared their results on Chinese text documents.
introduction is in section 1, related work and An experimental study showed that LDA with SVM runs
methodology are in section 2 and 3 respectively, section 4 better in terms of running time and classification
illustrates experiment and discussion and conclusion is in accuracy.
section 5. First and the only one Urdu topic detection
framework is proposed by Khadija et al. in [19]. Authors
proposed Urdu LDA (ULDA) combining LDA with
Gibbs Sampler algorithm. They implemented proposed
پاکستان کے زير انتظام کشمير ميں بين االقوامی معيار کے صرف دو کرکٹ There are only two cricket grounds of international standards in
گراؤنڈز ہيں ور کچھ کثير المقاصد ميدان بھی ہيں ليکن ان ميں مناسب Pakistan-administered Kashmir. There are also some
multipurpose grounds, but they lack adequate facilities. Due to
سہوليات کا فقدان ہے۔ جس کی وجہ سے اکثر نوجوان سﮍکوں اور گليوں
which many youngsters appear to play cricket in the streets.
ميں کرکٹ کھيلتے نظر آتے ہيں۔ کشمير کے اس عالقے ميں سو کرکٹ There are hundred cricket clubs in this area of Kashmir who are
کلب ہيں جو پاکستان کرکٹ بورڈ کے ساتھ رجسٹرڈ ہيں ليکن ان ميں registered with the Pakistan Cricket Board but they do not have
کوچنگ ميسر نہيں ہے۔ پاکستان کے زير انتظام کشمير کے وزير اعظم coaching opportunities. Raja Farooq Haider Khan, Prime
راجہ فاروق حيدر خان نے کشمير سپر ليگ کے انعقاد کو خوش آئند قرار Minister of Pakistan-administered Kashmir, has declared the
ديا ہے۔ فاروق حيدر نے کہا 'يہ ٹيلنٹ کو آگے النے ميں بنيادی کردار ادا holding of the Kashmir Super League. Farooq Haider said, "It
کرے گا اور ہماری حکومت آزاد کشمير ميں کھيلوں کے فروغ کے ليے will play a key role in bringing talent forward and our
مکمل تعاون کرے گی۔ پاکستان کے باقی حصوں کی طرح کشمير کے اس government will fully cooperate with the promotion of sports in
خطے ميں بھی نوجوانوں ميں کرکٹ مقبول ترين کھيل ہے اور زياده تر Azad Kashmir." Like in the rest of Pakistan, in this region of
Kashmir, cricket in the youth is the most popular game and most
لوگ کرکٹ ہی ديکھتے اور کھيلتے ہيں ليکن اس خطے کا کوئی بھی
people watch and play cricket, but none of this region has been
شخص اب تک پاکستان کی قومی کرکٹ ٹيم ميں شامل نہيں ہوا ہے۔ کشمير included in Pakistan's national cricket team. In addition to the
سپر ليگ کے انعقاد کے ليے حکومت کے عالوه کئی غير سرکاری government, many non-government organizations and
تنظيموں اور کمپنيوں نے بھی مدد کی يقين دہانی کرائی ہے۔ پاکستان کی companies have also been given help to hold the Kashmir Super
قومی کرکٹ ٹيم کے سابق ٹيسٹ کرکٹر عاقب جاويد کا کہنا ہے کہ کشمير League. Former Test cricketer of Pakistan's cricket team, Aqib
سپر ليگ ايک مفيد ايونٹ ثابت ہو گا۔ عاقب جاويد نے کہا کہ'کشميری عوام Javaid says that Kashmir Super League will prove to be a useful
جب تک آپ گراونڈز ميں نہيں آئيں گے اس،اس ايونٹ کو سپورٹ کريں event. Aqib Javed said, "The people of Kashmir should support
کی خوبصورتی نہيں بنے گی اور بين اال قوامی معيار کے کھالڑی نہيں this event, unless you come to the grounds, it will not be beauty
پيدا ہوں گے۔ and will not create international players."
Figure 2. Urdu and its translated text. Different colors are representing “Express” using link https://www.express.pk/. Corpus 4
different topics.
contains economic text article from all above sources. All
Table 1: Topics from Urdu and English text in figure 1.
Topics from English section in Topics from Urdu section in above corpora are newly created and have not been
Figure 1 Figure 1 published or used in any research study. We evaluated
Kashmir Cricket کرکٹ کشمير above corpora’s performance in our proposed technique
Pakistan Grounds گراؤنڈز پاکستان after the completion of few steps that are discussed in
International Coaching کوچنگ بين االقوامی
section 3.2. Table 2. is briefly describing the above four
Prime Minister Super League سپر ليگ وزير اعظم
Region Cricketer کرکٹر خطے corpora.
National Cricket team کرکٹ ٹيم قومی Table 2: Details of four corpora.
Azad Kashmir Cricket Board کرکٹ بورڈ کشميری Class No of Total Unique
Beauty Players کھالڑی خوبصورتی Documents words words
Government Cricket clubs کرکٹ کلب حکومت Sports 1600 81653 61757
Pakistan- Sports کھيلوں آزاد کشمير Politics 1250 61547 29123
administered
Entertainment 1450 76333 48949
Game ميدان
Economy 600 32739 18919
ULDA on three corpora and compared the results with 3.2 Text Pre-Processing
LDA and ALDA. An experimental study showed that Text Pre-processing helps enhance the accuracy of a
ULDA performed very well with the accuracy of 75% as topic detection technique efficiently and smoothly
compared to LDA and ALDA. manage the structural representation of input text data. In
this step, diacritics removal, tokenization, stop words
III. METHODOLOGY removal and stemming will be done for input text source.
TM is an efficient technique in the domain of text 3.2.1 Removal of Diacritics
mining for IR. It helps extract abstract or theme from a Removal of Diacritics: Diacritics are such sings
document that has different themes. Resources less that change the pronunciation when written on or below
language Urdu needs an accurate and effective tool for the alphabets of a language. In Urdu, diacritics are call
text mining especially to generates topics from a Aerab which are Zer, Zabar, and pesh [1]. The use of
document. Variational Bayes LDA for Urdu (VB-ULDA) diacritics is optional in Urdu and its usage in writing
TM approach is proposed for the Urdu language to depends on the writer’s preference. However, it creates
overcome the issues in existing TM to extract topics from different uncertainties. Removal of diacritics is optional
Urdu text. Fig. 3 is illustrating the methodology. in Urdu pre-processing. To make the corpus standardize,
Here are some important steps that are involved in we remove diacritics. Few Diacritics examples are given
the proposed technique. in Table 3.
Table 3. Few Urdu diacritics words.
3.1 Dataset Without Meaning With Meaning
For the experiment, we build our own corpus that diacritics diacritics
contains Urdu text articles of different Urdu famous بکری Bakree (Goat) بکری Bekree
forums and news websites. We create four corpora for (Sale)
عالم Aaalem عالم Aaalam
our proposed technique. Corpus 1 contain BBC Urdu (Educated) (World)
sports articles that have been taken from جلدی Jaldee جلدی Jeldee (Of
https://www.bbc.com/urdu/sport, corpus 2 consists of (Quickly) skin)
Urdupoint politics news articles which are collected from 3.2.2 Tokenization
the link https://www.urdupoint.com/. Corpus 3 consist of The initial step for text analysis in any language
entertainment articles from the famous newspaper task is tokenization of a given text into words. Urdu is
morphologically very rich language as it has different efficient results. Few examples of stemming are in Table
variations and nature of its words and characters. For 6.
poor resource languages like Urdu, tokenization is a Table 6. Urdu stemming examples.
difficult task because there is the random use of spaces Urdu word Stem
between the words. In English, spaces are mostly used to ( بکرياںBakriyan, Goats) ( بکریBakri, Goat)
describe the boundary of a sentence. Tokenization and ( کتابيںKitabain ,Books) ( کتابKitab, Book)
sentence boundary detection of Urdu text is a difficult
( لﮍکياںLarkian ,Girls) ( لﮍکیLarki ,Girl)
task [20]. We convert the corpus into tokens in the
process of tokenization to use in the model for TM. ( مردوںMadon ,Men) ( مردMard ,Man)
Table 4. Few Urdu space exclusion example. 3.3 Variational Bayes
non-join/separate Joined/combined Meanings
In machine learning, Variational Bayes (VB) is a
کےليے کيليے For technique to approximate intractable integrals that are
کی طرف کيطرف Towards arising in Bayesian Inference (BI). VB methods are
mostly used in complex models that have statistical data,
آپ کے آپکے Yours latent variables, and unknown parameters. For the
During the process of tokenization, we focus on space statistical inference of unobserved variables in a data, VB
exclusion problem as spaces play a vital role in method approximates the posterior probability of those
tokenization [20]. Few Space exclusion examples are unobserved variables.
shown in Table 4. 3.3.1 Variational Inference
In Urdu some alphabets are joiner when written Variational Inference (VI) is a method that
at the end of the word but some of the alphabets are non- makes the processing of a specific distribution tractable.
joiner. In Table 4, same words are written joined (without VI is an alternative to MCMC and Gibbs sampling and it
space) and non-joined (with space). Joined ending words also considers an extension of EM algorithm. For the
gives a single token in tokenization however when the variational inference, we used following equations.
same word written in non-join form gives two different Assume that c=c1:n observation and d=d1:m are latent
tokens. variable and additional fixed parameter α. Equation (1) is
3.2.3 Removal of Stop Words posterior distribution as posterior links the data in the
All natural languages are composed of two types model.
( , | )
of words: functional words and content words. Functional ( | , )= (1)
words are the words with no meanings and content words ( , | )
are the words with useful meanings. Stop words are To compute the posterior is very complex in many
meaningless words and also considered as functional interesting models. The main purpose of the variational
words, In Urdu stop words are called ( حرف جارHaroof-e- method is to choose a variational distribution for latent
Jar). Stop words do not give any meaningful information variable.
when written without any words, however, provide useful =( : | ) (2)
information when written with a word. We excluded stop Equation 2 is used to find the setting of parameter v to
words from our corpus to get only meaningful words. make q close to posterior distribution.
Removal of stop words helps reduce the size of the 3.3.2 Kullback-Leibler (KL) Divergence
corpus. We created our own stop words list. Few mostly KL divergence measures the divergence of distribution
used stopwords examples of Urdu are in Table 5. between two probabilities. KL is a part of information
Table 5. Few Urdu Stop words. theory, it is a way to keep the deep link between statistics
Urdu transliteration and machine learning. KL divergence is used to measure
کا Ka the closeness between distribution.
( )
کی Kee ( || ) ≡ [ ] (3)
( | )
کے Kay
To minimize the KL divergence, we minimized an
يہ Yeh (This) equivalent function to it up to constant. That is called
اور Aur (And) evidence lower bund (ELBO). We maximize the ELBO
وه Who (That) to get the tight bound on a log probability of data by
ہے Hay applying Jansen’s inequality equation (4).
ہيں Hain [log ( , )] − [log ( )] (4)
3.2.4 Stemming Minimization of KL divergence is same as the
Another important data pre-processing step is maximization of ELBO. We used mean field variational
stemming. The main focus of the stemming is to reduce inference to factorize the variational family in equation
the word into its stem or roots. Stemming is mostly used (5). In variational inference, we make a group of latent
to deal with the text data for the sack of IR, Data Mining variables together and factorize the distribution of each
(DM) and NLP. Jiaul et al. in [21] proposed a rule-based group. The variational family has no true posterior
stemmer for Urdu Language processing that is based on because latent variables are dependent.
affix stripping to get root or stem of the word. We have ( ,…, ) = ∏ ( ) (5)
performed a rule-based stemming on our corpus for the
4.1 VB-ULDA Evaluation
We evaluated VB-ULDA with stemmers. Thus, we
described two version: VB-ULDA with stemmer VB-
ULDA(WS) and VB-ULDA without stemmer VB-
ULDA(WiS). To study the effect of stemmer on VB-
ULDA results. We conducted experiments on each
corpus separately with and without stemmer using VB-
ULDA. Experimental results show that VB-ULDA(WS)
outperforms as compared to VB-ULDA (WiS) in terms of
performance and accuracy and there is a little difference
between them in accuracy. To evaluate the performance
of VB-ULDA, we chose existing Gibbs
Percentage
84
83.5
83
82.5
82
VB-ULDA(WS) VB-ULDA(WiS)
Abstract Wireless Sensor Networks (WSNs) consist In case of large scale sensor networks configured
of thousands of sensing nodes that are deployed at in a cloud like environment, the nodes have to deal
remote locations for continuous probing of the with massive volumes of data persistently. Processing
surrounding environment in order to collect useful data. of such data obviously requires more time and effort,
In general, each node is equipped with a fixed battery which consumes more energy. To cope with this
and limited working duration. The fixed battery situation, certain new trends have emerged, like Fog
condition is dominating these days and therefore energy architectures that provide services closer to the
efficiency is an important factor to be considered when network [4].
the protocols are being designed. In this paper, we
implement an enhanced version of Low-Energy Adaptive The goals of the Fog archetype is to facilitate the
Clustering Hierarchy (LEACH) protocol named as interconnection of the sensor nodes with the back-end
LEACH with Dijkstra’s Algorithm (LEACH-DA) under cloud via a gateway, so that they are able to make self-
a cloud environment, which optimizes the power decision for reducing computational time [5][6] or
consumption or energy utilization based on shortest path energy consumption requirements. Fog architecture
selection. Also, the proposed framework incorporates includes numerous fog nodes (FNs) placed nearest to
load balancing by picking an appropriate cluster head an edge network and gain data from Internet-of-Things
(CH) node among its alternates by calculating their (IoTs) or simply sensor nodes. Employment of FNs
traffic situation with the sink/base station or cloud. In help in minimizing latency and avoiding duplicate
addition, we employ a fog computing model for our information between base station and the end-users.
scenario (i.e. LEACH-DA-Fog) in order to increase
lifespan of network as compared to the original Thus, by having such a setup in place, we can save
implementation of the underlying protocol. Some test backbone bandwidth and reduce energy consumption
cases are presented to show that the proposed updates on of the underlying core network [7]. Fog computing
the classical LEACH protocol improve the network
architecture with respect to sensor nodes is shown in
efficiency as well as the durability of the whole network.
Figure 1.
Keywords Wireless sensor networks, LEACH protocol,
Dijkstra’s algorithm, Fog computing
I. INTRODUCTION
We first consider a simple network of 100 sensor Fig. 6. Structure of Network (N=500)
nodes that are randomly distributed in an area of
300×500 m2 as shown in figure 4. In figure 4.1, the
graph is plotted by considering energy levels of nodes
after 2500 rounds by considering different variants of
LEACH. We can observe that, residual energy of each
node is reduced as number of rounds increased. As a
result, we estimated that LEACH-DA-Fog consumed
less energy during transmission as compared to
LEACH-DA and LEACH protocol respectively.