Professional Documents
Culture Documents
ICIST
2017
Proceedings
Publisher
Society for Information Systems and Computer Networks
Belgrade
Editors
Zdravković, M., Trajanović, M., Konjović, Z.
ISBN: 978-86-85525-19-3
Issued in Belgrade, Serbia, 2017
7th International Conference on Information Society and Technology
ICIST 2017
2. Network effect – a key element for switching the provider of mobile communications
in the Republic of Macedonia
Pages 3-8
Nasteska Slavica, Gavrilovska Liljana
5. Software system for simultaneous localization and mapping based on video content
analysis
Pages 18-22
Perić Ivan, Kondić Miroslav, Anđelić Stefan, Jocić Marko, Obradović Đorđe
11. Industry 4.0 paradigm: The viewpoint of the small and medium enterprises
Pages 50-54
Dassisti Michele, Panetto Hervé, Lezoche Mario, Merla Pasquale, Semeraro Concetta, Giovannini
Antonio, Chimienti Michela
14. IoT enabled End User engagement towards energy efficient lifestyles
Pages 65-69
Batić Marko, Tomašević Nikola, Vraneš Sanja
21. Pairing BPM and IoT For Sensor Error Discovery and Recovery Automation
Pages 98-101
Popov Srdjan, Zarić Miroslav, Ćosić Đorđe
25. An IoT business environment for Multi Objective Cloud Computing Sustainability
Assessment Framework
Pages 120-125
Timčenko Valentina, Zogović Nikola, Jevtić Miloš, Đorđević Borislav
29. Schema on Read Modeling Approach as a Basis of Big Data Analytics Integration with
EIS
Pages 142-147
Janković Slađana, Mladenović Snežana, Mladenović Dušan, Vesković Slavko, Glavić Draženko
32. Transforming an Enterprise E-Health System from Process Oriented to Model Driven
Architecture
Pages 159-162
Atanasovski Blagoj, Bogdanović Miloš, Velinov Goran, Stoimenov Leonid, Sahpaski Dragan,
Skrceska Irena, Kon-Popovska Margita, Janković Dragan, Jakimovski Boro
35. GenAn – DSL for describing web application GUI based on TextX meta-language
Pages 173-178
Zoranović Bojana, Nikolić Lazar, Radišić Rade, Milosavljević Gordana, Dejanović Igor
36. MicroBuilder: A Model-Driven Tool for the Specification of REST Microservice
Architectures
Pages 179-184
Terzić Branko, Dimitrieski Vladimir, Kordić Slavica, Milosavljević Gordana, Luković Ivan
37. Modelling of Control Logic for Start-Up and Shut-Down Sequences of the Heat Pump
Pages 185-189
Pršić Dragan, Vičovac Aleksandar
42. Data assimilation for operational reservoir management on the Danube river
Pages 210-213
Rosić Nikola, Jaćimović Nenad, Prodanović Dušan, Stojanović Boban
43. Data quality exchange in service-oriented time series data management platform for
hydropower systems
Pages 214-217
Milivojević Vladimir, Milivojević Nikola, Divac Dejan, Marinković Miljana, Cirović Vukašin
49. Ontologies framework for semantic driven code generation over testbed premises
Pages 241-245
Nejković Valentina, Tošić Milorad, Jelenković Filip
50. Influence of DTX Function on the Base Station Power Decrease in GSM Systems
Pages 246-250
Lebl Aleksandar, Matić Rade, Mileusnić Mladen, Mitić Dragan, Markov Žarko, Tomić Željka
52. Enterprise information system for automation of electricity theft detection and
monitoring in electric utility companies
Pages 257-260
Stanimirović Aleksandar, Bogdanović Miloš, Puflović Darko, Frtunić Milena, Davidović Nikola,
Stoimenov Leonid
55. Artificial General Intelligence Approach for Reasoning in Clinical Decision Support
Pages 271-274
Kaplar Aleksandar, Simić Miloš, Kovačević Aleksandar
56. Computerized Radial Artery Pulse Signal Classification for Lung Cancer Detection
Pages 275-278
Zhang Zhichao, Zhuang Xujie, Zhang Yuan, Kos Anton
57. Sensitivity of selfdynamisable internal fixator to change of bar length and clamp
distance
Pages 279-281
Simeonov Marko, Korunović Nikola, Trajanović Miroslav, Zehn Manfred, Mitković Milorad
59. Personalized anatomically adjusted plate for fixation of human mandible condyle
process
Pages 288-292
Mitić Jelena, Vitković Nikola, Manić Miodrag, Trajanović Miroslav, Mišić Dragan
60. Contextual Spectrum Inpainting with Feature Learning and Context Losses
Pages 293-298
Jiao Libin, Wu Hao, Bie Rongfang, Kos Anton, Umek Anton
64. Selection of sensor networks for biofeedback systems in healthcare and sports
Pages 312-315
Kos Anton, Umek Anton
68. Analysis of the Performance of the New Generation of 32-bit Microcontrollers for IoT
and Big Data Application
Pages 330-336
Ivković Jovan, Lužija Ivković Jelena
78. Rapid prototyping of business information systems based on Django framework and
Kroki mockup tool
Pages 377-381
Filipović Milorad, Turzai Tihomir, Fakaš Arpad, Milosavljević Gordana, Abdo Amel
82. Cataloguing dataset in Library Information Systems using the MARC 21 format
Pages 395-399
Rudić Gordana, Ivanović Dragan
84. Reengineering of the current process of collecting data about business customers
Pages 405-408
Radak Zlata, Marković Goran
89. Simulation model of suburban passenger trains service on upgraded railway line
Pages 430-434
Jeremić Dušan, Milosavljević Milan, Milinković Sanjin, Vujović Dušan
90. A Microservice System for Unified Communication over Multiple Social Networks
Pages 435-440
Vujanović Angelina, Igić Nemanja, Luković Ivan, Gajić Dušan, Ivančević Vladimir
91. A Performance Evaluation of Computing Singular Value Decomposition of Matrices
on Central and Graphics Processing Units
Pages 441-446
Gajić Dušan, Manoilov Đorđe
92. Design of Reconfigurable Memory for Fast Network Packet Header Parsing
Pages 447-452
Efnusheva Danijela, Cholakoska Ana, Tentov Aristotel
1
7th International Conference on Information Society and Technology ICIST 2017
2
ICIST 2017 Proceedings Vol.1
7th International Conference on Information Society and Technology ICIST 2017
Abstract — This paper presents the analysis of mobile more valuable when more people use it. Network effect
market’s competition developments in the Republic of increases the value of customer’s subscription as the
Macedonia for period from 2005 to 2015 from the number of existing customers grows.
consumer’s perspective. The calculated results, based on the
previously developed empirical model, are compared with the Many authors have studied switching costs and network
outputs from an administered questionnaire, which was
effects on mobile market in different countries, using
distributed to consumers of all existing mobile operators. The
researh focuses on the impact of the so called „Network adjusted customer surveys, with no measurement of
effect“ on switching costs, but also, on switching at all and its switching costs directly. This paper relies on the empirical
relation to the market competition. The analysis locates research developed by Shy [1] that offers a model to
switching barriers, and points out switching procedures’ precisely calculate the switching costs only through the
weak points. The results impact appropriate regulatory observed prices and market share. The empirical model is
activities. presented in Section II. It is then tested on the actual data
for the mobile market in Republic of Macedonia for period
Keywords — Competition, Network effect, Switching 2005-2015, when different stages on the market are easily
costs, Consumer behavior, Regulation recognized. Section III presents the results. Calculated
consumer switching costs for the existing mobile operators
I. INTRODUCTION in the mentioned period were analyzed from customer
behavior perspective with respect to Lock-in [2]. This
Telecommunication market, especially mobile,
paper identifies the rationale behind the numerical values
experiences a significant development in the past decade.
and the particular calculated switching costs and highlights
The entrance of the new players on the market is the
the relation to the network effects. Finally, section IV
evidence of an industry paradigm change and a market
compares the computed switching costs with the estimates
transition. Mobile communication services are shifting
based on a survey and provides some appropriate
from voice-centered communication to a combination of
recommendations for activities in the future locating some
high-speed data communication and multimedia. The
introduction of 4G/LTE and the growth of the wireless switching barriers. Section V concludes the paper.
Internet services emphasize the start of a market’s
II. EMPIRICAL MODEL
transition period.
Direct measurement of the switching costs is a complex
The ability and willingness of consumers to switch the procedure. The implemented Shy’s model offers a quick
service provider is very important. The effective and easy methodology for calculation based only on the
competition delivers the increased choices and lower observed variables, such as prices/fees and market share
prices to consumers. In order to benefit from the [1].
competition, the consumers must have confidence to make
the choice and get benefit from that. When moving to the The methodology assumes that the companies involved in
product or service offered by a competing company, the price competition recognize the switching cost for the
consumers face significant costs, the so called switching consumers and therefore maximize their prices. These
costs. The network effect is another hot issue that attracts prices are subject to the restriction that no other company
the regulators’ attention during the last decades. It is a
phenomenon whereby a product or a service becomes
1
3
7th International Conference on Information Society and Technology ICIST 2017
will decide it is profitable to lower the prices in order to customers who have high value of time (i.e. they use the
subsidize the switching cost of its consumers. same brand for longer time).
The model initially assumes that the prices of each III. FITTING THE MODEL INTO ACTUAL MARKET
company satisfy the Undercut-proof Property [1], so no CONDITIONS
company can increase its revenues by undercutting the
rival company and no company can increase its price In this section, we use the model presented in the previous
without being profitably undercut by the competing section to calculate the switching costs for the mobile
company. market in the Republic of Macedonia. Different phases on
the mobile market development are identified during the
If we define Si to be a switching cost of a i-company research period.
consumer and assume that all Si (i = 1,...,I and denotes the
order of appearance of the company on the market) are III.1 THE PERIOD OF MONOPOLY
known to all companies and consumers, then each
Mobile telephony as a service in the Republic of
company i ≠ I takes the price charged by i-company pI as
Macedonia was introduced in September 1996 by
known and sets maximal pi to satisfy
organizational unit "Center for mobile telephony" which
πI = pI NI ≥ (pi – Si)(Ni + NI) (1) functioned within the Public Enterprise "PTT Macedonia".
N π p S
(4) "Mobimak" 877.142 70.443.528,45 80,31 64,45
"Cosmofon" 384.186 20.007.406,50 52,07 -3,77
4
7th International Conference on Information Society and Technology ICIST 2017
operator "Cosmofon" consumers, since the former already for commercial launch of 4G technology services within 9
uses network effects of its realized subscriber base, as months of authorization receipt.
presented in Fig. 1.
III.3.1 SWITCHING COSTS AFTER THE START OF THE 3RD
OPERATOR
In order to check the validity of the estimated switching
costs, it is necessary to identify the costs a subscriber faces
The switching costs for each of the three mobile operators
when he/she replaces operator "Mobimak" with operator
"T-Mobile", "ONE" and "VIP", only 3 months after the
"Cosmofon" and vice versa. In the observed period, the
commercial launch of the third operator on the market and
subscriber had the following costs:
the relevant network parameters are presented in Table II.
amount of one-time fee for service usage and
lost time due to the change of subscriber’s phone TABLE II: STATUS ON MOBILE MARKET (31.12.2007)
number. “T-Mobile” “Cosmofon” “VIP”
N 1.212.610 592.970 141.136
Considering the pricelist of the operators, the calculated π 152.876.032,14 57.681.066,36 1.463.232,93
value of the switching costs is within the applicable prices p 126,00 97,27 10,36
and market conditions. S (T->V) 124,92 (C->V) 95,28 (V->T) -102,50
(T->C) 94,05 (C->T) 12,65 (V->C) -68,20
5
7th International Conference on Information Society and Technology ICIST 2017
6
7th International Conference on Information Society and Technology ICIST 2017
TABLE IV: STATUS ON MOBILE MARKET (31.12.2015) when computations based on the model are compared with
N π p S the estimates based on a survey.
“Makedonski 1.001.578 46.291.596,00 46,22 13,88
telekom” IV.1 MATERIALS AND METHODS
“one.Vip” 1.082.005 67.403.023,00 62,29 40,07
We have used a survey with a structured questionnaire to
At the end of 2015, only two mobile operators, test the relation between the switching costs and the
"Makedonski telekom" and "one.Vip", were on the market. network effect. Customers of all three mobile operators
Table IV presents the relevant market parameters based on were sampled for the study. The study was performed
during the distortion market period. The implemented
data in [6].
research instrument was a structured questionnaire. The
The fact that even after the merging, the operator design of the questionnaire benefited from the extant
“one.Vip” impacts the market (as two separate operators literature dealing with the effects of switching costs and
versus its competitive operator), reflects in significantly barrier on consumer retention.
higher switching costs with relatively small differences in
the subscriber base size. Therefore, the merging of the The questionnaire was divided into three main sections:
operators has a negative effect on the development and the first section deals with the general demographic data,
encouragement of the competition. From regulatory the second with the consumer behavior and satisfaction
and the third with the switching of an operator. The second
aspect, the case is an evidence for market failure.
section has used a 5-points Likert’s scale ranging from
III.5 ANALYSIS OF SWITCHING COSTS FOR “T-MOBILE” “strongly agree” to “strongly disagree”. The received data
were analyzed using the SPSS (Statistical Package for the
Since mobile operator “T-Mobile” is present on Social Sciences) software package.
Macedonian mobile market for almost 20 years, it was
interesting to analyze the behavior of the switching costs IV.2 RESULTS AND DISCUSSION
for its subscribers in relation to its subscribers’ base over
the studied period. The trends are shown in Fig. 4. The questionnaire analysis leads to some conclusions and
observations:
All respondents use mobile communication services
of all three mobile operators more than three years:
50% of the respondents are consumers of the oldest
operator, 30% of the respondents are consumers of
the second appeared operator on the market, and 20%
of the respondents are the last appeared operator
consumers.
Mobile communication services are used more for
business (66.66% of the respondents) than for private
purposes (31.67% of the respondents), which is in
line with the dominance of Post-paid respondents to
Figure 4 – Network effect Vs Switching costs for customers of “T- the Pre-paid ones.
Mobile” The number of Post-paid customers prevail the
number of Pre-paid consumers (ratio 95:5 (%)),
It is evident that the network effect has a great impact over which is in line with the research about legal
the switching costs for consumers of “T-Mobile”. Also, commercial customer retention strategies that
the graph clearly presents a distinction between those costs operators use.
which may arise from legitimate commercial customer
The first legal commercial policy is the usage of
retention strategies (period 2005-2014) and those that may
loyalty agreements with duration of 1 or 2 years in
arise due to the market failure, as it happened in the
return for favorable offer of service packages
middle of 2015. (benefits in included call minutes, included SMSs
and included volume of Internet traffic into the
IV. SWITCHING COSTS AND NETWORK EFFECT
monthly subscription) and favorable offers for
Performed calculations for realistic scenarios for presented products, such as mobile handsets, TV-sets, IT
market characteristics define the dependence between the equipment, etc.
network effects and the switching costs and barriers.
Novel and interesting methodological research may arise
5
7
7th International Conference on Information Society and Technology ICIST 2017
8
7th International Conference on Information Society and Technology ICIST 2017
Abstract— This paper presents an approach to determining etc.). We than assigned each station a categorical
the popularity of bicycle-sharing stations based on the popularity grade by dividing the annual sum of all rented
influence of nearby geographical objects. For this purpose, and returned bicycles for that station into discrete
we created new datasets by integrating historical bicycle trip intervals. We experimented with different machine
data (made freely available by a bicycle sharing company learning and statistical analysis methods using the
Capital BikeShare) with a set of diverse public data sources. popularity grade as the class label.
We have then performed statistical analysis and predictive The results of both statistical analysis and ML revealed
modeling to test three claims made by Capital BikeShare that attractions, shops, facilities for fun and relaxation,
about their customers: (1) the top trip purposes overall were tourist facilities, as well as facilities for the purchase of
for personal/non-work trips, (2) large share of members also
food and beverages have a positive impact on the
used the bicycles for their trip to work and (3) Capital
popularity of the station. Positive impact of business
BikeShare served as a feeder service to reach transit stops.
objects could only be confirmed by statistical analysis.
The results of statistical analysis confirmed all three claims,
while predictive modeling provided evidence just for the The rest of the paper is organized as follows. Related
first and third claim. approaches are reviewed in Section II. In Section III we
describe our methodology. Discussion and experimental
results are given in Section IV. Section V concludes the
I. INTRODUCTION paper and provides directions for further work.
In recent years, bicycle sharing systems are becoming
increasingly popular as an alternative to traditional ways II. RELATED WORK
of transport. Cheaper transport, positive health and Researchers have thus far approached the problem of
environment implications are just a few of the reasons that predicting the popularity of bicycle sharing stations by
affected the growth of bicycle sharing systems [1, 2]. concentrating on specific, predefined subset of object
Popularization of bicycle sharing systems is particularly classes and their influence on stations individually (for
evident in large cities where traffic jams are often. example, markets, and public transport stations). Conway
Companies that own bicycle sharing systems, offer their [3] developed machine learning models to predict station
users to rent, use and return bicycles to any of their popularity based on its availability to the users for work
bicycle stations. Stations must have a good circulation of related trips. The model was created using the Capital
traffic in order to be profitable. Therefore, companies need BikeShare (CBS) data integrated with data derived from
to identify the factors that positively or negatively OpenStreetMaps and Open-TripPlanner tools. The authors
influence the amount of traffic through the stations (also obtained good results on the CBS system, however when
referred to as the popularity of the station). their model was tested on similar data from different
The goal of our research was to identify these factors companies (or cities), it did not prove so well. Rixey [4]
using statistical analysis and predictive modeling i.e., studied the effects of demographics and built environment
machine learning (ML). We specifically address the characteristics near the bicycle sharing stations on shared
question of impact that the neighboring geographical bicycle ridership levels. The model was developed using
objects have on station popularity and usefulness. Besides linear regression based on the data from CBS, Nice Ride
business, the main reasons for renting bicycles are also MN and Denver B-Cycle system integrated with
personal trips, for example entertainment, shopping, etc. OpenStreetMaps and various public census data. The
[1]. Therefore we considered a wide range of object model performed well and proved to be directly applicable
classes assuming that all of them might affect station’s to datasets from other communities. Hampshire et al. [5]
popularity and then tried to determine the possible impact used panel data based models to explain the factors that
of every object class separately. In order to factor in the contribute to bike sharing usage in Barcelona and Seville.
distance of the objects from the station into our study, we Their results showed that number of bike stations, labor
assigned each object with a real value called rank. Ranks market size and population density have a strong
were obtained by dividing the area around the station into correlation with the bike sharing usage.
concentric circles. In order to validate our approach we
collected the bicycle trip history data from the Capital III. METHODOLOGY
BikeShare system and data integrated it with different Our methodology is based on the integration of
geographical object data sources. Geographical object historical bicycle station usage data with different
classes were constructed by manually aggregating the geographical object data sources. Further in paper, these
OpenStreetMaps classes by type (e.g., cultural, transport
9
7th International Conference on Information Society and Technology ICIST 2017
10
7th International Conference on Information Society and Technology ICIST 2017
11
7th International Conference on Information Society and Technology ICIST 2017
ACKNOWLEDGMENTS
This work has been partially supported by the Ministry
of Education, Science and Technological Development of
the Republic of Serbia (project III47003).
REFERENCES
Figure 2. Stations' grade frequency [1] Capital BikeShare System, “2014 Capital BikeShare Member
Survey Report,” Tech. Rep., 2015. [Online] Available:
http://www.capitalbikeshare.com/system-data [Accessed
TABLE III. DISTRIBUTION OF STATIONS ACROSS GRADES 14.04.2017.]
[2] B. Alberts, J. Palumbo, and E. Pierce, “Vehicle 4 change: Health
Grade Frequency % implications of the capital bikeshare program,” The George
0 13 4 Washington University, Tech. Rep., 2012. [Online]. Available:
1 17 5 http://mobilitylab.org/wp-
2 72 19 content/uploads/2013/05/2013_CapitalBikeshare_HealthSurvey_R
3 179 48 eport.pdf [Accessed 14.04.2017.]
4 72 19 [3] M. W. Conway, “Predicting the popularity of bicycle sharing
5 17 5 stations: An accessibility-based approach using linear regression
Total 370 and random forests,” 2014. [Online]. Available:
http://files.indicatrix.org/Conway-Bikeshare-Accessibility.pdf
We purposefully left grade 0 in our data (even though [Accessed 14.04.2017.]
removing it would likely improve the overall results) in [4] Rixey, R. Alexander, “Station-Level Forecasting of Bike Sharing
Ridership: Station Network Effects in Three U.S. Systems,” 2013.
order to check whether there was any obvious influence
[5] Robert, C. Hampshire, Lavanya Marla, “An Analysis of Bike
on the absence of traffic. Our results were however Sharing Usage: Explaining Trip Generation and Attraction from
inconclusive. Observed Demand,” 2011. [Online]. Available:
As the results in tables I and II indicate, our models can http://nacto.org/wp-content/uploads/2012/02/An-Analysis-of-
not only be used to analyze the influence of nearby Bike-Sharing-Usage-Explaining-Trip-Generation-and-Attraction-
from-Observed-Demand-Hampshire-et-al-12-2099.pdf [Accessed
objects, but also to predict the popularity of a new station 14.04.2017.]
to be built. Our models can easily be applied given that the
[6] “OpenStreetMaps Documentation,” [Online]. Available:
coordinates of the desired station location is available, as https://www.openstreetmap.org [Accessed 09.04.2017.]
well as the information about the surrounding objects. We [7] “OverPass API Language Guide” [Online]. Available:
do note that our models were trained on the data from just http://overpass-api.de [Accessed 09.04.2017.]
one city (Washington DC) as provided by CBS, so [8] “OSM TagFinder API Documentation,” [Online]. Available:
applying them on data from other cities (without re- https://tagfinder.herokuapp.com/apidoc [Accessed 09.04.2017.]
training) would most probably result in lower performance [9] R. C. Browning, E. A. Baker, J. A. Herron and R. Kram., “Effects
because of the territorial, cultural and other differences. of obesity and sex on the energetic cost and preferred speed of
walking,” Journal of Applied Physiology, pp. 390–398, 2006
4
We note that Capital BikeShare has not listed any reasons for the [10] S. Boslaugh, “Statistics in a Nutshell,” 2nd ed., O'Reilly Media,
presence of these stations in their data. Inc., 2012.
12
7th International Conference on Information Society and Technology ICIST 2017
Abstract— We present a method for image tagging, i.e. #fashion and #spring specify topics. Inferential hashtags
assigning a set of tags/labels to an image. Three popular such as #colourful and #busy represent situational or
architectures of deep convolutional neural networks were contextual information. There are also advertising
used: VGG16, Inception-V3, and ResNet-50, which were hashtags such as #likeforlike, which are not related to the
pretrained on the ImageNet data set for a classification
photo’s content. Since there are no rules for making and
problem, and then fine-tuned on the HARRISON data set
for the image tagging problem. The final model consists of tagging hashtags, they can be diversely generated and
an ensemble of these three convolutional neural networks, freely used.
whose outputs were combined by different methods: For example, there are simple cognate hashtags in both
averaging, voting, union, intersection and by two-layer feed- the singular and plural form (e.g. #girl, #girls), hashtags
forward neural network. We verified these models on the in the lower and upper case (e.g. #LOVE, #love),
hashtag recommendation for images from social network hashtags in various forms of the same root word (e.g.
task, with a predefined set of 50 possible hashtags. All of the #fun, #funny), sentence-like hashtags (e.g.
models were evaluated using the following metrics: #iwantthissomuch, #kissme), slang-inspired hashtags
precision, recall and F1-measure.
(e.g. #lol), as well as meaningless hashtags to gain the
attention of followers (e.g. #like4like, #followforfollow).
I. INTRODUCTION Moreover, users can repeatedly tag the same hashtag for
In the field of computer vision, the number of studies emphasis. Considering the wide variety and depth of
on image understanding and visual analysis has recently context the recommendation of proper hashtags is a
grown significantly, with various works attempting to highly interesting and useful task in the age of social
challenge topics such as object classification [1][2], media [9].
object detection[3], scene classification [4], action Recommendation of hashtags has been studied in the
recognition [5], image captioning [6], and even visual fields of natural language processing (NLP), where some
questing answering [7]. of the previous work focused on content similarity of
A related problem is image tagging, or hashtag Twitter text posts (tweets) [10][11], and unsupervised
recommendation, where the task is to predict a set of topic modeling with Latent Dirichlet Allocation (LDA)
labels for an image, from a predefined set of labels. In [12][13][14]. Although the significant progress in the
contrast to the image classification problem, where the field of of hashtag recommendation has been recently
task is to predict exactly one class from given M classes published by the NLP community, hashtag
(1 of M), image tagging is the task of predicting multiple recommendation from images has not been strongly
classes (N of M, where N is arbitrary for every image). investigated in the field of image understanding. In a
The image annotation task [8] is related to the image recent paper [15], the authors have presented a hashtag
tagging task in regards to the diversity of labels. The recommendation system, which uses the metadata of user
labels and the related annotations mostly consist of information, such as gender and age, in addition to image
directly apparent information such as objects in image data. However, if formulated as a vision-only problem
and locations of image. However, hashtags include where no metadata is used, the hashtag recommendation
inferential words which require a contextual remains an open problem.
understanding of images. Additionally, a slew of words In this paper, we used deep convolutional neural
currently popular in social networks are also included in networks (CNN), which have recently been proven to be
hashtags. We define a hashtag as any word attached to the a very useful tool in various computer vision problems,
prefix character ’#’ that is used in online social networks such as image classification [1], image captioning [6],
such as Facebook, Twitter, and Instagram. Hashtags are image segmentation [16] and object detection [3]. With
commonly used to summarize the content of a user’s the introduction of multiple specialized frameworks for
post, and attract the attention of other social network training deep neural networks on the GPU instead of the
users. CPU, the time needed for training sophisticated
On the Instagram site simple hashtags such as #dog architectures has dropped by orders of magnitude,
and #beach describe the presence of simple objects or making this whole area of research bloom.
locations in a photo. Hashtags such as #happy or #sad To train and evaluate our model, we used the
express user’s emotions, while abstract hashtags such as HARRISON data set [9], a benchmark data set for hashtag
recommendation of real world images in social networks.
13
7th International Conference on Information Society and Technology ICIST 2017
The HARRISON data set is a realistic data set, which Oxford’s VGG16 [19], Google’s Inception-V3 [20] and
provides actual images which were posted with associated Microsoft’s ResNet-50 [2]. Table 1 shows a brief
hashtags on theonline social network Instagram. The comparison of these models.
HARRISON data set has a total of 57,383 images and
approximately 260,000 hashtags (with 997 unique TABLE I. – A brief comparison of three popular CNN architectures and
hashtags). Each image has an average of 4.5 associated their score on ImageNet LSVRC contest for classification task
hashtags (minimum 1 and maximum 10 associated
hashtags) due to the removal of infrequently used ImageNet
hashtags. # of # of
Architecture top-5 error
layers parameters
(%)
II. MODEL FOR IMAGE TAGGING VGG16 16 ~140M 7.3
We propose a model for image tags prediction which Inception-V3 42 ~4M 6.7
has two levels: the first level consists of three different ResNet-50 50 ~0.85M 5.2
CNN architectures which are trained to predict tags for
images, and on the second level we employed several
different methods to combine the outputs from the first In order to fine-tune these models, we changed the
level models in order to improve the performance of the number of neurons in the output layer from 1000 to 50, as
final prediction. Prior to training, all images are resized to we took only 50 most frequent labels (hashtags) to be
224x224, but the resizing is done by preserving the aspect relevant to our task, and then retrained them on
ratio in order to maintain natural proportions of the HARRISON data set.
objects in the image. The most notable difference between the base models
and fine-tuned models is in the activation function in the
output layer – the pretrained models use a softmax
A. First level: fine-tuning pretrained CNNs
activation function, whereas our fine-tuned models have a
Although deep convolutional neural networks are sigmoid activation function, which allows the fine-tuned
currently de facto the standard as a tool for computer models to be trained for the multilabel classification task.
vision tasks, training them often requires a lot of training
data (e.g. hundred thousands or millions of images) and B. Notation
plenty of resources (most state-of-the-art CNN Here, we introduce a notation to facilitate the
architectures are trained for 1-3 weeks on multiple explanation of the proposed model and different
GPUs). However, it is possible to use so called transfer ensemble techniques.
learning, i.e. using a model that is already trained to solve Let 𝑥 𝑖 be 𝑖-th input image, where 𝑖 = {1,2, … , 𝑁} and
one problem to retrain it for another problem [17]. This 𝑁 is the size of the whole data set of images. Outputs
procedure is usually called fine-tuning and it often from all models are represented as a vector 𝑦 𝑖 =
dramatically reduces the needed number of training data 𝑖 𝑇
[𝑦1𝑖 , 𝑦2𝑖 , … , 𝑦𝑀 ] , where 𝑦𝑗𝑖 ∈ [0,1] is an output for 𝑗-th
and the training time.
Usually, this procedure is performed using already label, and 𝑀 is the total number of labels to predict, in
available, pretrained, CNN models on the ImageNet data our case 𝑀 = 50. Outputs for the i-th image from the
𝑖 𝑖 𝑖
set for the classification task. This data set consists of 1.2 first level models will be denoted as 𝑦𝑉𝐺𝐺 , 𝑦𝐼𝑁𝐶 , 𝑦𝑅𝐸𝑆 for
million images, classified into 1000 different classes, and VGG16, Inception-V3 and ResNet-50 CNNs,
is used for benchmarking state-of-the-art of computer respectively. An example of notation for output vector for
𝑖
vision systems on the classification task. Since 2010, the first level model would look like 𝑦𝑉𝐺𝐺 =
𝑖 𝑖 𝑖 𝑇 𝑖
ImageNet project has been running an annual contest, the [𝑦𝑉𝐺𝐺,1 , 𝑦𝑉𝐺𝐺,2 , … , 𝑦𝑉𝐺𝐺,𝑀 ] . We will use 𝑦𝐿1 =
ImageNet Large Scale Visual Recognition Challenge 𝑖
{𝑦𝑉𝐺𝐺 𝑖
, 𝑦𝐼𝑁𝐶 𝑖
, 𝑦𝑅𝐸𝑆 } to denote a set of outputs of all first
(ILSVRC) [18], where researchers prepare software level models for an image 𝑥 𝑖 .
programs to compete to correctly classify and detect The task of first level models is to learn a function
objects and scenes. In the deep learning era, winners of 𝑓: 𝑥 → 𝑦, and thus output of first level models is 𝑦𝐿1 𝑖
=
this contest publish their CNN architectures and 𝑖
𝑓(𝑥 )
corresponding parameters (neurons’ weights) online,
The task of second level models is to learn a function
making them available to other researchers. We used
𝑔: 𝑦𝐿1 → 𝑦, and thus output of second level models is
several of these pretrained networks in this research. 𝑖 𝑖 𝑖 𝑖 𝑖 𝑖
It is important to note that, in order to fine-tune these 𝑦𝐿2 = 𝑔(𝑦𝐿1 ) = 𝑔(𝑦𝑉𝐺𝐺 , 𝑦𝐼𝑁𝐶 , 𝑦𝑅𝐸𝑆 ), where 𝑦𝐿2 =
𝑖 𝑖 𝑖 𝑖 𝑖
CNNs, images for the new task need to be similar to the {𝑦𝐴𝑉𝐺 , 𝑦𝑉𝑂𝑇𝐸 , 𝑦𝑈𝑁 , 𝑦𝐼𝑁 , 𝑦𝐹𝐹𝑁𝑁 } denotes a set of outputs of
images in the ImageNet data set, i.e. they should be real- all second level models: averaging, voting, union,
world images. For example, it is considered impractical intersection and feed-forward neural network,
to fine-tune ImageNet CNNs on medical images, since respectively.
real-world images and medical images come from a Since the input for second level models consists of
completely different data generating process. outputs from first level models, the total number of inputs
In this paper, we used three of the most popular for second level models is 3x50=150.
architectures that competed on ImageNet ILSVRC:
14
7th International Conference on Information Society and Technology ICIST 2017
C. Second level: ensembles metrics only work with binary values {0,1}, we clipped
In order to increase the prediction accuracy of our and rounded the predictions of each of the models prior to
model, we combined the outputs from the models in the evaluation.
first level using different approaches: averaging, voting, First level models, i.e. pretrained CNNs, are fine-tuned
union, intersection and lastly, as a more sophisticated by a stochastic gradient descent (SGD) optimizer with a
technique, a two-layer feed-forward neural network. learning rate of 1e-3, momentum 0.9, learning rate decay
1e-5, and batch size 32. Training of all CNNs was done on
1) Averaging a single Titan X Maxwell GPU, and training each model
For each label, output values from first level models are took around 12 hours. The loss function we used consisted
averaged. of two terms: binary cross-entropy and Kullbach-Leibler
divergence. As a form of regularization we performed data
𝑖 𝑖 𝑖
augmentation – all images were randomly rotated in the [-
𝑦𝑉𝐺𝐺,𝑗 + 𝑦𝐼𝑁𝐶,𝑗 + 𝑦𝑅𝐸𝑆,𝑗 30, 30] degrees range, shifted horizontally and vertically
𝑖
𝑦𝐴𝑉𝐺,𝑗 = (1)
3 in the [-10%, 10%] range, zoomed in the [-10%, 10%]
range, and flipped horizontally.
2) Voting The HARRISON data set was randomly split as 80%
For each label, output values from first level models are training data and 20% testing data. We should also note
first rounded and then averaged. that, since we took only 50 most recent hashtags, this
reduced the total number of images from 57,383 to
𝑖
𝑟𝑜𝑢𝑛𝑑(𝑦𝑉𝐺𝐺,𝑗 𝑖
)+ 𝑟𝑜𝑢𝑛𝑑(𝑦𝐼𝑁𝐶,𝑗 𝑖
)+𝑟𝑜𝑢𝑛𝑑( 𝑦𝑅𝐸𝑆,𝑗 ) 52,626. Since these models take a significant amount of
𝑖 (2)
𝑦𝑉𝑂𝑇𝐸,𝑗 = time to train, our validation procedure consisted of
3
validating against test data, which we resorted to in order
3) Union to save resources. We used early stopping as a another
For each label to be positive, it suffices for at least one form of regularization – F1-measure for test data is
prediction of the first level model to be positive. monitored each epoch, and if it doesn’t improve for 10
epochs, the training of a model is stopped to prevent
overfitting.
𝑖 𝑖 𝑖 𝑖
𝑦𝑈𝑁,𝑗 = 𝑟𝑜𝑢𝑛𝑑(𝑦𝑉𝐺𝐺,𝑗 ) + 𝑟𝑜𝑢𝑛𝑑(𝑦𝐼𝑁𝐶,𝑗 ) + 𝑟𝑜𝑢𝑛𝑑( 𝑦𝑅𝐸𝑆,𝑗 ) (3) All second level models, except feed-forward neural
network, don’t need training, since their outputs are
4) Intersection calculated directly. Two-layer feed-forward neural
For each label to be positive, prediction of all first network is trained with the Adam optimizer with a
levels model must be positive. learning rate of 1e-3 and a batch size of 512. For hidden
neurons, we used tanh activation function. This neural
𝑖
𝑦𝐼𝑁,𝑗 𝑖
= 𝑟𝑜𝑢𝑛𝑑(𝑦𝑉𝐺𝐺,𝑗 𝑖
) ∗ 𝑟𝑜𝑢𝑛𝑑(𝑦𝐼𝑁𝐶,𝑗 𝑖
) ∗ 𝑟𝑜𝑢𝑛𝑑( 𝑦𝑅𝐸𝑆,𝑗 )
network was trained in couple of seconds.
(4)
Table 2 shows evaluation results of first level models
and the final models, depending on which ensemble was
5) Feed-forward neural network used. All results are averaged over all images in the test
As a more advanced ensemble technique, we use simple set. We can see that ResNet-50 model achieved the best
two-layer feed-forward neural network, which has 150 results of all first level models, while feed-forward neural
input neurons, 150 neurons in the hidden layer, and 50 network outperformed other second level models.
output neurons.
Figure 2 demonstrates the examples of results for
A diagram of the final model is shown in the Figure 1. HARRISON data set in various cases. The shown
III. EXPERIMENTS AND RESULTS DISCUSSION examples are results of our best final model, i.e. two-layer
feed-forward neural network ensemble.
To quantify the performance of our model we choose
the following metrics: precision, recall and F1-measure
(as a harmonic mean of precision and recall). Since these
funny fashion
PREDICTED PREDICTED
TRUE TRUE
PREDICTED PREDICTED
- snow
TRUE TRUE
PREDICTED PREDICTED
TRUE TRUE
friend dog
PREDICTED PREDICTED
TRUE
TRUE
beach, beautiful, sun,
beach
nature
PREDICTED PREDICTED
Figure 2 – Examples of results for random images in test split of HARRISON data set
16
7th International Conference on Information Society and Technology ICIST 2017
17
7th International Conference on Information Society and Technology ICIST 2017
Abstract— This paper consists of the formal specification and extracting the data which will be used in localization and
implementation of the system for simultaneous localization mapping process.
and mapping based on video content analysis using one video 2.1. Camera calibration subsystem
camera, without usage of any additional sensors. Lack of
sensors lowers the system price, but software complexity
Camera lens characteristics and imperfections will be
becomes very high. Shi-Thomasi corner detection method is presented in this segment. Some of those imperfections can
used in system implementation, as well as Lucas-Kanade affect video processing in SLAM problem, and that’s why
optical flow method, RANSAC algorithm for optical flow the physics of the camera lens will be explained in the
vector filtering and Kalman filter for correction of the Figure 2 [1].
camera position.
I. INTRODUCTION
This paper holds the formal specification of the system
for simultaneous localization and mapping based of video
content analysis. Simultaneous localization and mapping Figure 2. Camera lens model
(a.k.a. SLAM) is problem usually found in robotics. The
basic idea lies in a fact that a moving robot should be able The fact that video camera lens exists and that is not
to map the space around him and to track its location within perfectly thin, results in image defects which are known as
that space. The system presented in this paper uses only one lens aberrations. The reason why spherical distortion of the
video camera for that purpose, without additional sensors. image occurs (spherical aberration) is that the focus point
System is implemented using computer vision techniques, of the light rays depends on the distance of the source and
numerical algorithms and others. The formal specification the optical axis. The consequence is that straight lines on
of the system will be presented in the next section of this the scene will be curved on the image. Spherical distortion
paper. Third section contains system verification from the removal is the problem that can be solved by camera
performance and accuracy viewpoints, while the fourth calibration process. Types of image distortion are displayed
section contains some further research plans. in the Figure 3.
18
7th International Conference on Information Society and Technology ICIST 2017
subsystem. Every frame that is processed in this subsystem 𝐸(𝑢, 𝑣) = ∑ 𝑤(𝑥, 𝑦) [𝐼(𝑥 + 𝑢, 𝑦 + 𝑣) − 𝐼(𝑥, 𝑦)]2 (1)
is forwarded to other subsystems. 𝑥,𝑦
2.3. Objects of interest detection subsystem 2.4. Subsystem for optical flow calculation
To enable the analysis of motion in the video, it is This chapter contains a specification of the subsystem
necessary to define some key points which will be used to for optical flow calculation for objects in the scene. The
calculate motion parameters. This process can be divided main task of this subsystem is to determine displacement
into several phases. A block diagram of this subsystem is vector of the camera. In order to analyze the displacement
shown in Figure 5. of the camera or the scene, it is necessary to have at least
two reference positions that will be used in order to
calculate those displacement vectors. The whole system is
based on the analysis of video content, so the consecutive
frames of the video recording will be analyzed. A
conceptual diagram of this subsystem is shown in the
Figure 6.
19
7th International Conference on Information Society and Technology ICIST 2017
Subsystem for movement parameters estimation has 2.6. Subsystem for estimation improvement
the task to calculate the intensity and angle of displacement If the estimated value of the new camera position is
of the camera, by using the analysis of the optical flow used in the rest of the system in its original form, with no
field for objects in the scene. Figure 9 shows a conceptual improvement, it may happen that the map of the movement
diagram of the subsystem for estimation of motion looks unnatural. The supporting structure to which the
parameters. camera is attached, for example a vehicle, won’t usually
be perfectly still. Small vibrations of the support structure
will result in a trajectory that will not be perfectly straight,
even if the camera moved by the straight line. If the camera
is attached to the vehicle, vibration is logical phenomenon.
When a man wears a camera, vibration will occur during
the walk, or if the camera is held in hands the small shake
Figure 9. Subsystem for movement parameters estimation of hands will result in a vibration. No matter how small
these vibrations are, they will be detected at the transition
Optical flow field can be quite noisy, especially if the between adjacent frames and they will bring noise to the
scene is almost homogeneous, resulting in harder object system. The basic idea behind the methodology for
detection and tracking. An example of the noisy optical improving the estimation lies in the fact that the camera
flow field is given in Figure 10. moves according to the laws of physics, because the
camera will generally be attached to the moving vehicle. It
is not possible that a vehicle at high speed makes a huge
20
7th International Conference on Information Society and Technology ICIST 2017
turning angle, for example 90 °. Those "sharp" angles will enables easy calculation of the length of any line in that
happen when the vibration affects the camera and can be plane. The width and height in pixels corresponds to a
eliminated if we take into account the laws of motion in resolution in which the video content is recorded, and the
physics. Based on the theory of Kalman filter [9], it can be width and height in meters is experimentally determined
concluded that precisely this filter could be applied to this by measuring, after the camera is fixed to a suitable height
kind of estimation improvement, even though there are h.
many others.
An input of this subsystem consists solely of the III. SYSTEM VERIFICATION
coordinates of positions of the camera in the real world. Verification of the accuracy of the system is done by
Those coordinates are used for the prediction of the future empirical methods by analyzing plotted trajectory, while
position. Kalman filter uses the current value and a the performance verification is done experimentally.
previous measurement, so the input of the subsystem will System performance was analyzed on machines with a
be the estimation of the current position of the camera and single-core, and quad-code processors of the latest
the previous position of the camera (Figure 12). generation. For high-definition video, system reached 12
and 9 processed frames per second, depending on the
machine. For video format with a resolution of 640x480,
processing of 28 and 26 frames per second is achieved,
which is enough to operate in real time. The accuracy of
the system is satisfactory if the trajectory of the camera is
longer. When turning the supporting structure of the
Figure 12. Subsystem for estimation improvement system in a small area, a loss of short rotation vectors after
RANSAC filtering happen and the resultant angle is not
As the model is formed in the discrete domain, the
good (camera rotation around one of its corners) - Figure
movement of the object is modeled by approximating the
14a. If steering relies only on the translational movement,
velocity and the acceleration by the difference equation I,
without rotation of the camera, results are good - Figure
wherein the time interval T equals time step, or T = 1.
14b.
𝑥[𝑛] − 𝑥[𝑛 − 1]
𝑣[𝑛] = (4)
𝑇
𝑣[𝑛] − 𝑣[𝑛 − 1] 𝑥[𝑛] − 2𝑥[𝑛 − 1] + 𝑥[𝑛 − 2]
𝑎[𝑛] = = (5)
𝑇 𝑇
If equations above are included in the approximation of
Newton's laws of motion in a discrete domain, the Figure 14. Resulting and expected trajectories in the case of the rotation
following expression is given: around camera edge and translationar movement
𝑇2
𝑥[𝑛 + 1] = 𝑥[𝑛] + 𝑣[𝑛] ∙ 𝑇 + 𝑎[𝑛] ∙ (6) IV. CONCLUSION
2
Based on the system verification, empirical methods
Based on these equations, the model of the Kalman filter show that it is operating well in a partially controlled
used in the system is formed. conditions. If the camera rotation angle is too sharp on a
2.7. Subsystem for movement map creation small moving distance, there is a loss of some rotation
vectors and the noise appears in the resultant trajectory. In
The task of this subsystem is the mapping of the order to avoid such a case, the camera may not be rotated
camera odometry in the corresponding metric system. An around one of its corners. Steering should be performed by
input of this subsystem is just the new camera position, the translation of the supporting structure, for example
which will be added to the map, as shown on Figure 13. vehicle, not by direct camera rotation. Further directions
of development would involve the introduction of cheap
sensors, which could follow the direction of the camera
without relying solely on the analysis of the optical flow,
and the camera wouldn’t have to be fixed at a certain
height in order to measure the length of distance traveled.
The system would be more accurate because it would have
Figure 13. Subsystem for movement map creation
two types of measurements - the measurements from the
sensors and optical flow measurements. At the same time,
This system measures the distance in meters, so that this system should be working faster because computationally
subsystem is able to switch metrics from pixels to meters demanding analysis of the optical flow could be simplified,
before the drawing of the path is performed. To make but the cost of the system would be higher. In the approach
something like this possible, several assumptions were used in this paper, system has a higher software
introduced. The camera is fixed at a height h which is not complexity but it’s cheaper and doesn’t require any
changeable. This way, the width and height of the field of additional equipment besides one video camera and a
view that the camera covers can be measured, and this computer.
21
7th International Conference on Information Society and Technology ICIST 2017
22
7th International Conference on Information Society and Technology ICIST 2017
23
7th International Conference on Information Society and Technology ICIST 2017
recognition). This imposes a need for recreating the same miscellaneous category that adds a fair amount of noise to
or at the very least similar text processing steps, as the the dataset.
ones described in [4]. The set is mostly well balanced, with the exception of
A. Reuters-21578 two categories “staff” and “department”, that have
respectively 137 and 182 documents. The very next
The collection of news articles contained in the Reuters category with the lowest document count is “project”,
dataset represents one of the most used sets for counting 504 documents. For this reason, following [4],
development of algorithms in the fields of text processing the “staff” and “department” categories, as well as the
and information retrieval. The set was initially published miscellaneous class are being excluded for analysis.
in 1987, and it consists of 21578 documents that were
manually labeled by the employees of Reuters in The literature does not provide a standard train and test
collaboration with the Carnegie Group corporation [6]. split for this dataset. Therefore a typical split of randomly
selected documents in a 80 to 20 percent ratio is applied
Most of the documents in this set have multiple labels on the WebKB dataset. This split provides a total count of
joined to them, with the total number of distinct labels 3357 training and 840 test documents.
(categories) being 115. The main issue with the dataset is
a very noticeable category imbalance. For example, the V. METHODOLOGY
“Capacity Utilisation” class is assigned to only 4
Standard algorithms for document dimensionality
documents, while the category with the highest number of
reduction by feature selection are applied before training
documents, “Earnings and Earnings Forecasts” has 3923.
any classifiers, and frequencies are normalized through
It is for this reason that only subsets of this dataset are
the use of tf-idf weighting factors. The evaluation
being used.
measures used are based on the datasets themselves since
For the split to test and training documents, a standard they vary in category balance. This chapter contains an
split from the literature for this dataset exists and is called overview of methods for text processing that are applied
the modApté split. The final number of documents for the on all of the documents in a dataset.
R8 set is 5485 for training and 2189 for testing, while the
R52 subset contains 6532 training documents and 2568 A. Text processing
test files. Non-structured text obtained from the documents is not
B. 20 Newsgroups suitable for ML algorithms and needs to be transformed.
For that reason every document is represented as a n-
This dataset is a collection of around 20000 documents, dimensional array, representing the bag of words model.
split into 20 different categories. These categories all refer In this way it is possible to quantify the attributes of each
to a news theme covered in each of the documents. document.
Originally collected by Ken Lang [5], much like the
Reuters dataset this set gained a reputation as one of the 1) Text tokenization
most used sets for experiments in information retrieval, To acquire the attributes of a text, i.e. form the
such as text categorization and clustering. vocabulary for a dataset, each document is split into n-
grams [8]. Because this research requires NER,
The text contained in the documents can be found in the tokenization is conducted in conjunction with POS
form of messages, specifically emails, that contain analysis. This kind of analysis is tasked with finding n-
elements such as signatures, greetings and in some cases grams, or sequences of words with a specific function in a
HTML, therefore creating a need for additional pruning of sentence. As the result, a tree structure is formed that
the input. The good thing about this set is a great number describes the parts of a sentence with a joined function
of named entities contained in the documents, making it a label [9].
good candidate for this paper.
After forming the tree, and labeling every part of the
The dataset is characterized with a good balance in the sentence as nouns (NNP or NN), verbs (VBD), and
frequency of used categories. Even so a subset of the similar, the NER system starts labeling the n-grams that
20000 documents is used, reducing it to 18821 text files. correspond to named entities. This kind of labeling is done
The theme with the least assigned documents has a total of in steps by traversing the tree structure in a way that
628 documents, while the most prominent category counts allows for complete n-grams to be declared as named
999 documents. entities. If a term is not deemed as an entity, than it is
The standard approach to splitting the set to training taken as a regular, key word, token.
and test subsets is taken from literature and implies a “by The described method of tokenization is used only in
date” split, giving 11293 older documents for training and the cases where named entities are included in classifier
7528 newer ones for testing. This kind of a split is training.
intuitively a sound decision providing a great way to test
the predictive properties of trained classifiers. 2) Lematization
The raw words gained through text tokenization are still
C. WebKB not in the best shape for classifier training. They can be
WebKB, or the “World Wide Knowledge Base” found in different tenses, noun cases, and such. The bag of
project, for its task had the collection of web pages related words model, by itself does not hold any knowledge of
to different fields of computer science, hosted by specific grammatical constructs, so additional transformations
universities. The project was started in 1997 and results in should be applied.
a total of 8282 web pages from the mentioned sources. The terms acquired through tokenization, for the
The data gathered from 4 different universities were purposes of this paper, can be categorized as either key
manually classified in 7 different categories, including a words, which are always made out of single words, and
named entities that can be found in the form of n-grams.
24
7th International Conference on Information Society and Technology ICIST 2017
Words from the first group are reduced down to their A. Reuters-21578 (R8)
lema [10], or “dictionary” form. For example words The R8 subset is processed in the manner described
“walking”, “walks”, “walked” are all transformed in to previously and shows results that correspond to the ones
“walk”. With this the total vocabulary is reduced greatly, provided in [4] in table 1.
and grammatical constructs are ignored.
1) BASE
3) Stop words removal Evaluation measures for the BASE setting on R8
Removing the stop words is a usual step in text dataset are shown on table 2, with a vocabulary totaling
processing, words that do not hold meaning by themselves 21680 terms.
and are generally found in any text, such as “a”, “the”,
“again”, “even” and many others. Aside from these, since Table 2 shows the micro-averaged evaluation measure
the source for many of the datasets are web sites, the text values. The average values for specific categories do not
is cleaned of all HTML tags that might occur. fluctuate greatly in comparison to the global average. Both
Naïve Bayes and SVM classifiears see similar
B. Entity Power Coefficient performance to the one provided in [4], with slight
This work builds on some of the findings made by Gui, divergence that can be attributed to certain difference in
Yaocheng et al [1]. The researchers had great results when text processing and tokenization techniques. Also Naïve
applying named entities in conjunction with other Bayes and kNN show better performance for cases with a
document features, on forming a clustering model for the smaller number of attributes. NB even shows degradation
Reuters-21578 dataset. They apply named entities by in performance when increasing the attribute count. SVM
implementing a coefficient [ ] that works best when using all of the attributes in the
indicates the impact that named entities have on making a vocabulary.
decision. Here indicates the complete absence of 2) NE
named entities and means that only named entities Figure 1 displays the change corresponding to the
are used while forming the model. In [1] the authors claim various values of EPC in the range from 0 to 10. The
to have gotten the best performance with close to 1, number of terms in the vocabulary is 29767, and is seeing
thereby justifying the use of named entities on this an increase brought in by the occurrence of named
particular problem. entities.
This paper takes the conclusions made in [1] and Table 4 shows the best measures in regard to . The last
implements a coefficient [ ) that column displays the number of terms gained through chi-
indicates the impact of named entities, in later text square and the total number of named entities figuring in
referred to as EPC (Entity Power Coefficient). The that subset.
coefficient is used for creating virtual occurrences of all The acquired results show a variation between the
terms that are found to be named entities (including n- algorithms themselves when deciding on the best case for
grams). The coefficient multiplies the term count found in EPC. All algorithms show at best the similar performance
all documents thereby modifying the real bag of words as the BASE setting, although for different EPC values.
model. For example the best performance for NB is shown to be
for and SVM . NB shows an oscillation in
VI. EXPERIMENTAL EVALUATION precision and a noticeable increase in recall that follows
The evaluation of the proposed classification method is EPC.
run in three different settings for each of the 4 described 3) NE ONLY
datasets, namely the two subsets of Reuters-21578 (R8 The last evaluation is performed for R8 is on a bag of
and R52), 20 Newsgroups and WebKB. The classifiers words model containing only named entities and no key
trained consist of the three standard ML algorithms for words.
text classification: Naïve Bayes, kNN (K Nearest
Neighbors) and SVM (Support Vector Machine).
Each of these classifiers is trained for every dataset TABLE II.
R8 BASE RESULTS
provided, in three different settings. The first setting
named “BASE” results in a reference model that is trained Classifier Accuracy Precision Recall F1 k
without any regard to named entities. The second setting, NB 0.9123 0.915 0.9123 0.91186 4896
named “NE” uses the proposed EPC and trains multiple
models based on different values for the coefficient. The SVM 0.9320 0.9324 0.9320 0.9304 21680
“NE_ONLY” setting uses only named entities without kNN 0.91 0.9084 0.91 0.907 754
paying any regard to other tokens during training and
testing. Evaluation metrics used are accuracy, recall,
precision and f-measure with micro-averaging of the TABLE III.
results. For each of the three settings, best terms are R8 NE ONLY RESULTS
picked through chi-square dimensionality reduction.
Classifier Accuracy Precision Recall F1 k
25
7th International Conference on Information Society and Technology ICIST 2017
NB 0.808 0.8167 0.808 0.7640 1072 NB 0.6023 0.5797 0.6023 0.5806 7845
SVM 0.8429 0.8435 0.8429 0.8344 11524 SVM 0.598 0.5834 0.598 0.5763 15267
kNN 0.7783 0,7450 0.7783 0.7446 1072 kNN 0.4432 0.3740 0.4432 0.3859 423
26
7th International Conference on Information Society and Technology ICIST 2017
added to the vocabulary. Table 7 displays the results and kNN 0.5312 0.5412 0.5312 0.5124 3590
again shows a noticeable fall in performance.
C. 20 Newsgroups (20_NG) referential results. On the other hand the value for EPC is
higher for better performance. Figure 3 shows the results
This dataset id characterized with a greater number of for the three classifiers trained on a total of 125238 terms
documents that the R8 and R52 sets. Also the Newsgroups including named entities. Table 9 shows the best results
set is a collection of e-mail messages as opposed to gained.
Reuters’ news articles. The same classifiers are trained
just like in former cases, starting with the base setting. 3) NE ONLY
Much like in previous cases with the Reuters dataset,
1) BASE
classifiers trained on solely named entities show the worst
The referential classifier for 20_NG dataset shows performance. The total number of vocabulary entries is
results very close to the ones provided in [4]. 110120 40736 for this configuration as is shown in table 10.
distinct terms have been found and added to the
vocabulary. The results for the referential classifier are
shown in table 8. D. WebKB
2) NE The Worldwide Knowledge Base dataset is the last set
Regarding classifiers that are made aware of named on which the evaluation is being performed. The
entities, the performance does not trail far from the documents in this dataset are web pages from four
universities. Standard steps and configurations apply.
TABLE VIII. 1) BASE
20 NEWSGROUPS BASE RESULTS The referential Naïve Bayes classifiers is failing to
Classifier Accuracy Precision Recall F1 k
reach the results provided in [4], while SVM remains in
the neighborhood of those results. The total number of
NB 0.8141 0.8162 0.8141 0.8116 59000 words without any NER amounts to 35731. Table 11
SVM 0.8107 0.8117 0.8107 0.8023
11012 contains the results of this analysis.
0
TABLE XI.
TABLE IX. WEBKB BASE RESULTS
20 NEWSGROUPS NE RESULTS
Classifier Accuracy Precision Recall F1 k
Classifier 𝛼 Accuracy Precision Recall F1 k/ent
NB 0.8583 0.8576 0.8583 0.8574 100
NB 2.2068 0.8162 0.8194 0.8162 0.8138 125238/32997
27
7th International Conference on Information Society and Technology ICIST 2017
SVM 1.6792 0.8571 0.8572 0.8571 0.8553 51613/25735 SVM 0.6059 06454 0.6059 0.5735 100
kNN 0.1010 0.7476 0.7733 0.7476 0.7298 1162/153 kNN 0.4928 0.5894 0.4928 0.4379 100
28
7th International Conference on Information Society and Technology ICIST 2017
Abstract—Collaborative Filtering technique is a state-of-the- methods have been proposed. The Recommender System
art method in recommender systems. This technique has Handbook [3] provides in-depth discussion of a variety of
proved to be successful both in research and industry. recommender methods and topics, focused primarily on
However, it is challenging to find an adequate algorithm for Collaborative Filtering. There is also a growing interest in
calculating user similarity, which enables the recommender problems surrounding recommendation. Algorithms for
system to generate the best recommendation for given understanding and predicting user preferences are merely
domain and data set. In this paper, we proposed an one piece of a broader user experience. A recommender
architecture which allows users to additionally personalize system must interact with the user, both to learn user’s
the recommendation by letting them choose algorithm for preferences and provide recommendations; these concerns
calculating user similarities, or to implement their own pose challenges for user interface and interaction design.
algorithm on a distributed service. Online tests on There is room for algorithm refinement and much work to
implemented application for movie recommendation show be done on user experience, personalization, data
that architecture proposed in this paper gives users the collection and other problems which make up the whole of
ability to personalize recommendation for more efficient the recommender experience [4].
results.
A wide variety of algorithms is used for calculating user
Keywords–Machine Learning; Collaborative Filtering; similarities. Finding an adequate algorithm which enables
Recommender system; the recommender system to generate the best
recommendation for given domain and data set is
challenging.
I. INTRODUCTION The specific problem our work addressed was to find an
Collaborative Filtering [1] is a subfield of Machine architecture which allow system users to additionally
Learning that aims at creating algorithms to predict user personalize the recommendation by letting them choose
preferences based on past user behavior in purchasing or the algorithm for calculating user similarities, or to
rating items. It is a technique that has been acclaimed both implement their own algorithm for the same task. In order
in research and industry. to maintain the efficiency of the recommender system,
users can use distributed execution of newly implemented
The input for the Collaborative Filtering prediction
algorithms for calculating user similarities. Research has
algorithm is set of user’s ratings on items. Collaborative
shown that no other party has found a solution to this
Filtering algorithm is typically decomposed into three
problem.
generic stages [2]:
The rest of this paper is organized as follows. In Section
(1) calculating the similarity of all users to the active
II, several related works are presented and discussed. In
user (the user for whom a recommendation is
section III, we first introduce architecture and design of
searched),
our system, and after that we describe the implementation
(2) selecting a certain number of most users most of our system. In Section IV the experimental results of
similar to the active user, our system are presented and analyzed. Finally we make a
(3) computing the active user rating prediction. This is brief concluding mark and give the future work in Section
done for a target item whose rating is unknown, V.
and is obtained by calculating the normalized and
weighted average of the ratings of the certain II. RELATED WORK
number of most similar users, found at (2) on the
Berkovsky, Kuflik and Ricci [2] showed that reliability
target item according to the user-to-user similarity
of user-to-user similarity computation can be solved, and
computed at (1).
the accuracy of Collaborative Filtering recommendations
In a Collaborative Filtering recommender system, the can be improved by partitioning the collaborative user
items that are finally recommended to the active user are data into distributed repositories and aggregation
typically those with maximum rating prediction. In some information coming from these repositories. Their work
cases all the items are suggested, but in this case they are explores a content-depending partitioning of collaborative
ranked according to the predicted ratings [2]. movie ratings, where the ratings are partitioned according
Over the last 20 years, a great effort has been invested to the genre of the movie and presents an evaluation of
in research on how to automatically recommend different four aggregation approaches. Evaluation demonstrated
things to people. During that period, a wide variety of that the aggregation improves the accuracy of a
29
7th International Conference on Information Society and Technology ICIST 2017
centralized system containing the same ratings and proved Data acquisition and analysis module is based on
the feasibility and advantages of a distributed Linked data [8] and NoSQL databases [9]. This module
Collaborative Filtering. Main shortcoming of their work is contains data about user ratings from data set, and data
the difficulty to apply it in recommendation applications. about target items used for recommendations.
It requires additional work to classify the items into Recommendation systems must handle accurate data in
several topics or domains.
order to compute their recommendations and preferences,
Han, Xie, Yang and Shen [5] proposed a distributed which is why it is important to collect reliable data and
Collaborative Filtering algorithm together with reduce the noise in user preferences data set.
significance refinement and unanimous amplification to
Presentation module connects system user with
further improve scalability and prediction accuracy. Their
experiments showed that distributed Collaborative Recommendation module. This module allows users to
Filtering-based recommender system has much better register themselves on the system and manage their
scalability than traditional centralized ones with distributed services. Beside this, this module forwards
comparable prediction efficiency and accuracy. user’s data and displays recommendations.
Ekstrand, Riedl and Konstan [4] presented various Communication module communicates with
algorithms for calculating user similarity. They proved distributed services. This module attempts to conform to
that the appropriate algorithm for a particular application the design principles of Representational State Transfer
should be selected based on a combination of the (REST) [10].
characteristics of the target domain and context of use, the Recommendation module communicates with other
computational performance requirements of the modules and performs recommendation. This module is
application, and the needs to the users of the system. They basis of our architecture. System users can choose
concluded that understanding what those needs are can between already implemented user similarity algorithms,
simplify algorithm selection. and their own algorithms implemented on distributed
Liu, Hu, Mian, Tian and Zhu [6] analyzed the shortages Web services.
of the existing similarity measures in Collaborative
Filtering. Also, they proposed new similarity model to B. Implementation
overcome drawbacks of existing algorithms. Comparison Data acquisition and analysis module stores data about
of many similarity measures on real data sets showed that users and target recommendation items in two separate
Collaborative Filtering recommendation system can reach collections in MongoDB [11] database. For every user in
different performance on same dataset depending on used users collection, there are data about user ratings of items
similarity measure. stored in JavaScript Object Notation – Linked Data
(JSON-LD) [12] format. This format enables easier usage
III. ELABORATION of data later in recommendation and eventual transfer to
The only way to accurately assess recommendation distributed user similarity calculation services via
efficiency is to do an online evaluation – to test a system Communication module of the system. Target
with real users. Therefore, majority of recommendation recommendation items data is stored in separated
systems nowadays are incorporated in Web applications collection, using JavaScript Object Notation (JSON)
[4]. In this paper we propose an architecture that allows format. According to information domain, there are
the user to implement algorithm for calculating user libraries in this module for easier collection of the data.
similarity. Due to performance, this calculation is safe to For movies as information domain, omdb [13] Python
be executed on distributed service. library was used to communicate with OMDb API1 Web
service to obtain movie data from the Internet Movie
A. System Architecture and Design Database (IMDb)2.
We present a recommendation system, implemented as Presentation module is realized as a single page
a Web application using Model-View-Controller (MVC) application using AngularJS framework [14].
design pattern [7]. The system modular design contains Communication with Recommendation module relies on
Representational State Transfer (REST). This module
four modules, shown in Figure 1.
interacts with the user, collects it’s preferences and
provide recommendations. Therefore, our implementation
of this module provided solution respecting good practices
of user interface and interaction design.
Communication module performs user similarity
calculation on distributed services. Using HTTP protocol
this module sends data about two users, received from
Recommendation module, to distributed services in JSON
format. After user similarity calculation on distributed
service, results are delivered to Communication module
which forwards them back to Recommendation module.
Although the implementation of Communication module
was done using Django REST framework [15] and Python
programming language, usage of HTTP protocol allows
1
Figure 1. The system modular design
http://www.omdbapi.com/
2
http://www.imdb.com/
30
7th International Conference on Information Society and Technology ICIST 2017
user to implement distributed calculation service using (1) Euclidean Distance Score,
any programing language and framewrok. (2) Pearson Correlation Score,
Recommendation module calculates recommendations (3) Jaccard Similarity Score (on distributed service).
using Collaborative Filtering, based on input data from
Presentation and Data acquisition and analysis module. - Forrest Gump (1994), 10
According to user’s choice, this module calculates user Input
similarity using already implemented algorithms (in this - A Beautiful Mind (2001), 10
solution: Euclidean Distance Score and Pearson
- Sin City (2005), 10
Correlation Score [4]) or using external service through Pearson Euclidean Jaccard
Communication module. To complete active user rating - Rory O’Shea - Seal Team - The Imitation
prediction, we are calculating the normalized and Was Here (2004) Six: The Raid Game (2014)
weighted average of the ratings of most similar users on Osama Bin
according to calculated user similarity. Calculated - The Laden (2012) -Mission:
recommendations are then sent to Presentation module in Output Manchurian Impossible –
order to be displayed to system user. Implementation of Candidate (2004) - Crash (2004) Ghost Protocol
(2011)
Recommendation module was done using Python - The Terminal - Secret
programming language and the Django Rest Framework. (2004) Window (2004) - Tower Heist
Active user’s data, including account credentials and (2011)
distributed services locations, are stored in SQLite Table 2. Online evaluation results
relational database [16].
Results in Table 1. shows that for the same input data,
IV. EVALUATION user gets different recommendations, confirming that
In this section, we describe data set and experimental architecture provided in this paper gives user the ability
results in online evaluation. to personalize recommendation by implementing
distributed services using any programming language and
A. Data set framework to personalize recommendation for more
The information domain for system evaluation consists efficient results.
of a data set collected from Turi Machine Learning
platform 3 with 82000 user preferences for movies V. CONCLUSION
represented as a (User, Movie, Rating) triple. Within Data In this paper, we propose an architecture which allows
acquisition and analysis module, collected data set was users to additionally personalize Collaborative Filtering
analyzed and filtered in order to provide accurate data for recommendation by letting them choose already
Recommendation module. Triplets with missing and implemented algorithm for calculating user similarity, or
duplicated values were removed from data set, and rating to implement their own algorithm for the same task with
values were scaled to fit [0, 10] interval. Triplets were preserved efficiency of the recommendation by enabling
distributed execution of newly implemented algorithm.
grouped by user in order to create user profile, and movie
Proposed architecture allows users to implement
names are used to collect movie data from Internet Movie distributed service using any programming language and
Database. After performed work, users’ collection in any framework which conforms with the design principles
MongoDB database was populated with 334 user profiles, of Representational State Transfer (REST). Online
and movies collection was populated with data of 5917 evaluation for movies as information domain confirms
movies collected using omdb Python library. that provided architecture gives users the ability to
additionally personalize Collaborative Filtering
B. Experimental results recommendation results by selecting appropriate
Recommendation algorithms with similar numeric algorithm for given information domain and data set.
performance have been known to return observably Also, distributed execution of that algorithm preserves
different results, and a decrease in error may or may not system efficiency.
make the system better at meeting the user’s needs. The Our future work includes investigation on a data set
only way to accurately assess system accuracy is to do an extension by using Linked data benefits to include data
online evaluation [4]. from social media. We would also like to investigate
To evaluate our system, we decided to perform abilities of collecting and analyzing feedback of system
comparison of Collaborative Filtering recommendations users to join Collaborative Filtering and Content Based
using different user similarity calculation algorithms. recommendation into Hybrid recommender system.
Therefore, we implemented Jaccard Similarity
REFERENCES
Coefficient algorithm [4] on distributed service using
Flask framework [17] and register it in our application. [1] J. Herlocker, J. A. Konstan and J. Riedl, “Explaining
Collaborative Filtering Recommendations”, in Proceedings of
We then performed recommendation calculation for the ACM Conference on Computer Supported Cooperative Work,
same input data. We rate couple of movies and get 2000.
Collaborative Filtering recommendations with three [2] S. Berkovsky, T. Kuflik and F. Ricci, “Distributed collaborative
different similarity calculation algorithms: filtering with domain specialization”, in Proceedings of the 2007
ACM conference on Recommender Systems, pp 33-40, 2007.
[3] F. Ricci, L. Rokach, B. Shapira and P. B. Kantor, “Recommender
System Handbook”, Springer, 2010.
3
https://turi.com/
31
7th International Conference on Information Society and Technology ICIST 2017
[4] M. Ekstrand, J. Riedl and J. Konstan, “Collaborative Filtering [11] “MongoDB Documentation”, 2017, [Online]. Available:
Recommendation Systems”, in Journal Foundations and Trends in https://docs.mongodb.com/, [Accessed: 13-Apr-2017].
Human-Computer Interaction, vol. 4, pp 81-173, 2011. [12] M. Lanthaler and C. Gutl, “On using JSON-LD to create
[5] P. Han, B. Xie, F. Yang and R. Shen, “A scalable P2P evolvable RESTful services”, in Proceedings of the Third
recommender system based on distributed collaborative filtering”, International Workshop on RESTful Design, pp 25-32, 2012.
in Expert Systems with Applications, vol. 27, pp 203-210, 2004. [13] “omdb 0.2.0”, 2017, [Online]. Available:
[6] H. Liu, Z. Hu, A. Mian, H. Tian and X. Zhu, “A new user https://pypi.python.org/pypi/omdb/0.2.0, [Accessed: 13-Apr-
similarity model to improve the accuracy of collaborative 2017].
filtering”, in Journal Knowledge-Based Systems, vol. 56, pp 156- [14] “Angular Docs”, 2017, [Online]. Available:
166, 2014. https://angular.io/docs/ts/latest/, [Accessed: 13-Apr-2017].
[7] R. Grove and E. Ozkan, “The MVC-Web Desing Pattern”, in [15] “Django REST framework”, 2017, [Online]. Available:
Proceedings of the 7th International Conference on Web http://www.django-rest-framework.org/, [Accessed: 13-Apr-
Information Systems and Technologies, 2011. 2017].
[8] C. Bizer, T. Heath and T. Berners-Lee, “Linked Data – The Story [16] “SQLite Documentation”, 2017, [Online]. Available:
So Far”, in International Journal on Semantic Web and https://www.sqlite.org/docs.html, [Accessed: 13-Apr-2017].
Information Systems, vol. 5, pp 1-22, 2009.
[17] “Flask (A Python Microframework)”, 2017, [Online]. Available:
[9] C. Strauch, “NoSQL Databases”, Hochschule der Medien, http://flask.pocoo.org/, [Accessed: 13-Apr-2017].
Stuttgart, 2011.
[10] R. T. Fielding, “Architectural Styles and the Design of Network-
based Software Architectures”, University of California, Irvine,
2000.
32
7th International Conference on Information Society and Technology ICIST 2017
Abstract— The deregulation of the electricity market is a accuracy of the predictions a power company can
process which is currently very hot topic in the Southeast optimize its strategy on the market as well as its
European region. Namely, few of the countries have their production or consumption in order to minimize the
own power exchanges from long time ago, but few of them financial risk and maximize the profit [1]. Compared with
have just formed ones and the remaining countries need to the forecasting of electricity consumption, the price has
decide in the near future the direction they are going to greater variability and significant spikes.
follow towards resolving this problem. With the In this paper, 24-hours ahead forecasting of the
introduction of the power exchanges and the fact that half of electricity price in the newly formed power exchanges in
the power markets in the Southeast European region exist the Southeast region is done. For this purpose, an artificial
less than one year, forecasting the electricity price on those
intelligence model, specifically neural networks is
markets is becoming very attractive research area and is of
implemented, which as an input use all information that
great importance. In this paper, 24-hours ahead forecasting
are relevant for the corresponding power exchange price
of the electricity price in these newly formed power
exchanges is. To this end, an artificial intelligence models,
forecasting.
specifically neural networks are used in this paper, which as Electricity price forecasting is a problem that has
an input use all information that are relevant for the attracted much attention in the research area. Primarily, all
corresponding power exchange price forecasting. The those studies differ in the model used to perform the
results show that among the newly formed power exchanges forecasting, and the area in which the corresponding
in the region, the price in Bulgarian power exchange is the power exchange is located. According to these two
most unpredictable one, while, on the other hand, the price conditions the input characteristics upon which the
in the Serbian power exchange is the most predictable one. electricity price forecasting depends on are determined.
Additionally, the results present in which hours of the day The models used in this research area may generally be
and in which days in the week the prices have the highest divided into five groups: multi-agent, fundamental,
variations. reduced-form, statistical and computational intelligence
based models [1, 2]. The most widely used model, using
which many authors report excellent results in the
I. INTRODUCTION literature are the neural networks which belong in the field
The electricity market is in the process of deregulation of computational intelligence [1]. Basically, neural
that started in countries around the world since the 90s. In networks are very flexible and can cope with complex and
this process, the traditional monopolistic power companies non-linear problems, which makes them suitable for short-
have been replaced by a competitive electricity markets. term predictions. Due to these reasons, the neural
In the region of Southeast Europe (SEE) this process is networks were also used in this study for day-ahead
still ongoing. Namely, in some of countries in the SEE electricity price forecasting. When it comes to the areas in
region there already exists day-ahead market, while the which the power exchanges that are analyzed are located,
remaining countries are in the process of deciding whether in the literature electricity forecast analyses are made for
they should create their own markets or make coupling the power exchanges in various countries such as Austria
with an existing power exchanges. The oldest power [3], India [4], Ontario [5], Scandinavian countries [6],
exchange in the region is the market in Romania, which is Greece [7] etc. In this paper, main emphasis is given to the
launched in 2000 and administrated by OPCOM. new day-ahead markets in the Southeast region, which, to
Following are the power exchanges in Greece, which was the best of our knowledge, have not been analyzed in the
at a process of establishment in the period from 2005- literature yet.
2010 and the power market in Hungary (HUPX) which The paper is structured as follows. First, the
exists since 2010. Relatively new are the power exchanges methodology used is described, or the model of neural
in Bulgaria (IBEX), Serbia (SEEPEX) and Croatia networks. Afterwards, detailed analyses of the input data
(CROPEX), which are created in the first quarter of 2016. is made in the following section, from which the selection
With the introduction of the power exchanges and the of the input variables of the neural network is made. The
fact that half of the power markets in the Southeast results and a corresponding discussion is presented in
European region exist less than one year, forecasting the section IV, and finally the next section concludes the
price of those markets is becoming very attractive topic paper, where directions for further research are also
and is of great importance. Furthermore, with proper presented.
33
7th International Conference on Information Society and Technology ICIST 2017
120
II. NEURAL NETWORKS
100
The neural networks that are used in this paper are
represented as a directed acyclic graph, which has several 80
EUR/MWh
60
EUR/MWh
50
1 N 1 N
ED { yi ti }2 ei2
40
(2) 30
2 i 1 2 i 1 20
10
80
60
EUR/MWh
40
20
-20
Figure 4. Correlation between the electricity prices on
Figure 1. Electricity prices on IBEX (2016) IBEX (BG), CROPEX (CR) and SEEPEX (SR)
The electricity price on the Croatian power exchange There is obvious daily trend of the electricity prices at
has more constant trend of increasing during the analyzed hourly level on the three power exchanges (Figure 5). For
period with some peaks of up to about 95 EUR/MWh each of the power exchanges there is evident peak of the
(Figure 2). price in the morning period and there is one more peak in
the afternoon period for the Croatian and Serbian power
34
7th International Conference on Information Society and Technology ICIST 2017
40
35 exchange were used for testing, and the rest of the data
30
25
were used for training.
20
At the beginning a comparison is made between the
15
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 MAPE obtained when using one and two hidden layers
Hour
and different number of neurons in the hidden layers for
IBEX CROPEX SEEPEX
each of the three analyzed power exchanges (Figure 7). As
Figure 5. Daily trend of the average hourly electricity it is shown the most suitable network structure for
prices SEEPEX is a network with one hidden layer and six
neurons in the hidden layer, by which MAPE of 9.28% is
The weekly trend of the average daily electricity price achieved.
is presented on Figure 6. The lowest price on the Serbian The price in the Croatian power exchange (CROPEX)
power exchange is on Sunday, on the Croatian power is more unpredictable, presenting a minimal MAPE of
16.9%. This value is obtained by using one hidden layer
exchange is on Monday and Sunday, and the lowest price
with eight neurons. The results for the Bulgarian power
on the Bulgarian power exchange is on Saturday. The exchange (IBEX) show that the price on this market is
highest consumption on each of the tree power exchanges much more unpredictable compared to SEEPEX and
is mainly during the working days of the week. CROPEX and the minimal value is obtained with
completely different structure of the neural network.
50
45
Namely, the minimal MAPE is 21.5%, obtained by using
Average price [EUR/MWh]
IBEX CROPEX SEEPEX A comparison between the actual data and the
Figure 6. Weekly trend of the average daily electricity forecasted data for the electricity prices for SEEPEX is
prices presented on Figure 8. It can be noted that the two curves
match quite well. As presented in Figure 9, for CROPEX,
there are more peaks in the actual data, which are not
A. Selection of input variables predicted by the neural network. The results for IBEX
Based on this analyzes of the data, a selection of the (Figure 10) show that there are a lot of peaks of the price
input variables upon which the forecasting of the in the testing period that are not predicted, mainly
electricity price depends on is made. Particularly, the because these peaks were not present in the data for the
following inputs for the neural network were selected: training period.
Hour of the day
Day of week
Holiday flag
Average price of the previous day
Price for the same hour of the previous day
Price for the same hour-day combination of the
previous week
11
20 24.5
10.8
19.5 24
10.6
MAPE [%]
10.4 19 23.5
MAPE [%]
MAPE [%]
10.2
18.5 23
10
9.8 18 22.5
9.6 22
17.5
9.4
17 21.5
9.2
9 16.5 21
0 5 10 15 20 0 10 20 30 40 0 20 40 60
Number of neurons Number of neurons Number of neurons
Figure 7. MAPE of the electricity price forecasting in respect to the number of neurons and number of layers for
SEEPEX, CROPEX and IBEX
35
7th International Conference on Information Society and Technology ICIST 2017
80
25
70
Price [EUR/MWh]
20
60
50 15
MAPE [%]
40
10
30
20 5
Actual data
10
Forecasted data 0
0 IBEX CROPEX SEEPEX
0 500 1000 1500
Hour
Figure 11. Comparison of the MAPE for IBEX,
Figure 8. Actual and forecasted electricity prices for CROPEX and SEEPEX
SEEPEX
Additionally, analyzes were made of the average
absolute percent error by hour of a day (Figure 12) and by
110 day of a week (Figure 13). The highest errors in the
Actual data
100
Forecasted data predictions are during the afternoon peak of the price, the
90 morning peak and during the night hours (when the price
80
is lowest). Concerning the forecasting error in days of a
week, the highest error in mainly on Sundays, which
70
Price [EUR/MWh]
means that there are high variations of the price for this
60 day.
50 30
AVERAGE HOURLY ABSOLUTE PERCENT ERROR [%]
40 25
30
20
20
15
10
10
0
0 500 1000 1500
5
Hour
0
Figure 9. Actual and forecasted electricity prices for 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
70 25
Price [EUR/MWh]
60 20
50
15
40
10
30
5
20
0
10
Mon Tue Wed Thu Fri Sat Sun
The forecasting of the prices for the three power V. CONCLUSION AND FUTURE WORK
exchanges are measured by using MAPE (mean absolute
Three models based on neural networks were set in this
percent error). This metric shows that most unpredictable
is the Bulgarian power exchange, with MAPE of about paper for forecasting of the electricity prices for Serbian
36
7th International Conference on Information Society and Technology ICIST 2017
(SEEPEX), Croatian (CROPEX) and Bulgarian (IBEX) [2] I. P. Panapakidis, A. S. Dagoumas, “Day-ahead electricity price
forecasting via the application of artificial neural network based
power exchanges. models,” Applied Energy, vol. 172, pp. 132–151, 2016.
From the obtained results it can be concluded that: [3] F. Ziel, R. Steinert, S. Husmann, “Forecasting day ahead
- there is strong correlation between the new power electricity spot prices: The impact of the EXAA to other European
exchanges in the region, electricity markets,” Energy Economics, vol. 51, pp. 430–444,
- the prices on the Bulgarian power exchange are 2015.
most unpredictable (MAPE 21%), [4] G.P. Girish, “Spot electricity price forecasting in Indian electricity
market using autoregressive-GARCH models,” Energy Strategy
- the prices of the Serbian power exchanges are Reviews, vol. 11-12, pp. 52-57, 2016.
most predictable (MAPE 9%). [5] H. S. Sandhu, L. Fang, L. Guan, “Forecasting day-ahead price
Additionally, the critical hours of a day and days of spikes for the Ontario electricity market,” Electric Power Systems
week were presented, where the error of forecasting is the Research, vol. 141, pp. 450–459, 2016.
highest. [6] T. Kristiansen, “A time series spot price forecast model for the
Nord Pool market,” Electrical Power and Energy Systems, vol. 61,
Furthermore, this work may be upgraded in the future pp. 20–26, 2014.
by comparing the forecasting of these newly created [7] P. Andrianesis, P. Biskas, G. Liberopoulos, “An overview of
power exchanges with the older power exchanges in the Greece’s wholesale electricity market with emphasis on ancillary
region. services,” Electric Power Systems Research, vol. 81, pp. 1631–
1642, 2011.
ACKNOWLEDGMENT [8] Independent Bulgarian Energy Exhange (IBEX)
http://www.ibex.bg/en/market-data/dam/prices-and-volumes/
This work was partially financed by the Faculty of (accessed April 27, 2017)
Computer Science and Engineering at the "Ss. Cyril and [9] Croatian power exchange (CROPEX)
Methodius" University. https://www.cropex.hr/en/trading/day-ahead-market.html
(accessed April 27, 2017)
REFERENCES [10] Serbian Power Exchange (SEEPEX) http://seepex-
spot.rs/en/market-data/day-ahead-auction (accessed April 27,
[1] R. Weron, “Electricity price forecasting: A review of the state-of- 2017)
the-art with a look into the future,” International Journal of
Forecasting, vol. 30-4, pp. 1030-1081, 2014.
37
7th International Conference on Information Society and Technology ICIST 2017
Although the beginning of beekeeping is lost in the past With development of modern information and
it is evident that the first contact of man and bee had been communications technologies (ICT), such as wireless
sweet and painful. Since then, man had been trying to sensor networks (WSN), the idea of precision beekeeping
balance these two conflicting objectives, more sweetness (PB) rises. Rational beehives are equipped with a number
and less pain, employing his knowledge about bee control. of wirelessly connected sensors, Fig. 5, which can report
Man had made simple beehive to host bees in a known measurements to Internet connected beekeeper, located
place where he could access bee products by destroying anywhere. It is expected that PB should provide more
bee colony according to his needs. Such an approach of precise, non-invasive measurement of a number of
beekeeping is referred to as traditional beekeeping, Fig. 3. parameters related to inside/outside of beehives and
Getting more knowledge about bees, man had set apiaries [1, 4-6, 14]. However, PB lacks of a systematic
himself new objectives such as pollination related approach and definition of qualifier “precision”.
objectives and production of other than honey products of Since monitoring is the phase of Observe, Orient,
bees. About 200 hundred years ago some beekeepers Decide, and Act (OODA) control cycle, taking beekeeping
recognized two fundamental properties of bee colonies: as a process, we will try in this paper to characterize
queen is significantly greater than worker bees and worker beekeeping from the perspective of dynamic multi-
bees keep on building the nest until the pasture is objective control in all OODA phases and set basis for
exhausted. Since then, modern beehives, beehives that are Cyber-Physical Beekeeping (CPB) - controlled
commonly used worldwide today are built on the two beekeeping that will enable sustainable beekeeping [7].
properties, while the general description of the hive is “Precision” in CPB relates to resolution improvement of
given in Fig. 1, [10]. Brood and honey chambers are variables, which describe spatial and temporal aspects of
separated by queen excluder, which keeps queen in brood beekeeping, as well as, the level of details improvement of
chamber while allowing worker bees to store unnecessary beekeeping process characterization. Normally, we build
honey in honey chamber. CPB concept on achievements in previous beekeeping
Current practice in beekeeping can be described as approaches, Fig. 2.
well-established rational beekeeping, Fig. 4, [3], where
beekeeper can take bee products from a beehive, honey
chamber, without disturbance of bees, sufficient to start
their defence mechanism.
38
7th International Conference on Information Society and Technology ICIST 2017
39
7th International Conference on Information Society and Technology ICIST 2017
prepared to support decision making. Currently, ICT Currently used means in beekeeping control are
application in this phase is focused on processing of raw numerous and can be divided into several categories:
data collected from beehives and opening data belonging i. Beehive characterized by capacity, the number and
to public organizations. Processed data should help size of frames, the number of boxes, etc.
decision making phase. Currently, ICT solutions for ii. Tools – bee smoker, bee brush, beekeepers knife, etc.
decision making in beekeeping are rare and low TRL3.
E.g. BeeWeb4 platform, TRL 4, supports beekeepers, iii. Accessories –veils, jackets, suits and gloves, etc.
growers and transporters to make decisions on optimal iv. Equipment – equipment for queen production,
distribution of beehives on pastures while minimizing use equipment for centrifugation of honey, equipment
of transportation means. After the decision is made, for fixing the foundations, beekeeping supplies for
appropriate actions should be conducted. Currently, main melting wax, etc.
actuator in beekeeping is a beekeeper itself and actuation v. Means of transportation (MoT):
phase of beekeeping control is manual or semi-automatic. a. Inside/outside MoT
Some low TRL solutions are reported, such as devices for
indoor climate control [11-13]. b. Standard/adaptive/modified/specialized
vi. Buildings and infrastructure
Constants that determine beekeeping process and the
system can be divided as follows:
i. Time – annual cycle, day/night cycle,
queen/drone/worker bee life span, time schedule and
duration of particular pastures, queen egg-laying
rate, etc.
ii. Space – maximal range of worker flight, pasture size,
etc.
iii. The amount of nectar in target plant
Parameters that beekeeping should adapt to, are:
i. Weather conditions – temperature, air pressure,
humidity, illumination
ii. Pollution by non-organic gases such as CO2, SO2,
NOx, particulate matter (PM), pesticides etc.
iii. Human factors affecting beekeeping such as:
Figure 6. OODA beekeeping cycle
a. The amount of cultivated honey plants in
Generally, desirable objectives in beekeeping that agriculture,
should be achieved with control, qualitatively given, are: b. Market state of bee products
i. Production efficiency maximization for all bee c. etc.
products
ii. Quality of products maximization A. Challenges
iii. Maximization of pollinated plants total amount 1. Although a number of particular aspects of
beekeeping can be improved by ICT separately, an
iv. Average transhumance cost per beehive integral framework for ICT application in beekeeping is
v. Knowledge improvement regarding algorithms bees still missing.
behave according to [2] 2. Even if ICT solutions of particular beekeeping
vi. Apitherapy methods improvement aspects exist their cost must be acceptable for use in
Once achieved and after that kept the objectives will beekeeping and cost vs. functionality tradeoff is
bring beekeeping to sustainable beekeeping. inevitable.
As the first step towards CPB we try to characterize 3. The gap between ICT and beekeeping societies
subject system. Since the system is complex we present should be addressed by multi/interdisciplinary approach to
basic qualitative characterization at layers II and III that the problem.
are the main area of beekeepers’ control. 4. Legal aspects of human to machine migration of
Process of beekeeping consists of the following sub decision making should be arranged. So far, ICT can
processes: preparation for wintering, spring build-up, improve beekeeping up to semi-automatic level.
swarming, selection and queen breeding, selection of the
target pasture and migration to it, extraction of bee B. Opprtunities
products from hive, bee products storing, healthcare of 1. To provide tracability of means used in beekeeping
bees and protection of bee enemies. RFID and QR can be applied to every significiant single
Bee products are: honey, beebread, pollen, royal jelly, peace of means, e.g. beehives and even beehive frames.
propolis, beeswax, and bee venom. 2. Databases belonging to IV-VII layers can be opend.
3. State-of-the-art already provides solutions, which
eventually have to be custumized, for automatization of
simple actuation tasks at leyer III, e.g. robotized lawn
3
Technology readiness level mowers or robotized forklifts to manipulate with
4
http://www.beeweb.co/ standalon beehives.
40
7th International Conference on Information Society and Technology ICIST 2017
4. The second challenge can be adressed by rising the local beekeepers’ organisation instead of a beekeeper
existing expensive solution to the upper layers, e.g. itself at appiary layer III.
imageing by IR camera can be provided at layer IV by
Parameter Method
PM 10 In-hive sensors/out-hive
5
Blue shaded parameters and methods are reported in [8]
41
7th International Conference on Information Society and Technology ICIST 2017
instruction is preventive grass cutting around the hive, can issue notifications for the upper beekeeping layers;
robotized lawn mower should be activated. Moreover, store results of precessed data for tracking purposes, etc.
beside notifications send to a beekeeper, diagnosis service
Appropriate Appropriate
diseases symptoms treatment
sensors actuators
Acute bee
paralysis
virus
Black shiny bees without 24 hour starvation and Visual camera and Controlled
bristles controlled antimycotic feeding image processing feeder
viral
artificial nose
smell treatment
Visual camera
patchy brood pattern, infected frames incineration, and image Controlled
ammonia-like smell controlled antimycotic feeding processing, feeder
artificial nose
Presence of the hard, Brood reduction, queen and
chalkbrood
to grey mummies in the feeding with vitamin C and and image feeder, robot
brood and in/around the nystatin processing manipulator
entrance
fungal
controlled
decreased over-wintering Chemotherapeutic measures colony state
atomizer
capability signals
indoor
Chilled
and image
bloated abdomen water feeder
processing
42
7th International Conference on Information Society and Technology ICIST 2017
poisoning
Chemical
image processing,
inside/outside beehive, feeding with sugar water, controller,
combination of
on a pasture even on Controlled pesticide controlled
colony state
flowers application during blooming feeder, robot
signals
period manipulator
43
7th International Conference on Information Society and Technology ICIST 2017
Abstract— A rapidly increasing and vast number of “things” advances in miniaturization and communication.
will be connected to the Internet, in all sectors of society Electronics can connect anything with anyone and any
(people, places, sensors, homes, industry, government place in order for them to collaborate and achieve a
services, etc.). The urgency of finding sustainable solutions common goal. More importantly, this reverses the
requires “things” and services of the overall system to display perspective of a cloud to coordinate all things. Humans and
autonomic intelligent behavior. The ability of cloud non-humans protagonists need to collaborate in order to
infrastructure to orchestrate the fine-grained and agile achieve a common goal immersed in a universe of sensing
control of “things” is limited. This mandates an alternative [2].
approach intelligently moving control to the “things”.
This idea is illustrated in Figure 1 below showing how
Thereby we minimize the reliance on cloud infrastructure,
things go from being “puppets” connected to and
and are able to build more agile, intelligent and effective
orchestrated by one entity (a centralized cloud service) to
solutions in various application areas, such as Health,
autonomous actors with relations between them. The
Transport and through Automation. We provide examples of
such novel solutions tested in “Urban ICT Arena”, a large-
Distributed Context exChange Protocol (DCXP) [3], which
scale smart city testbed in Stockholm.
is further discussed below, allows objects to take action
based on shared context knowledge, which unlike
Keywords: Decentralized IoT, 5G, Smart Cities BlockChain [4] is not required to be present in a node, only
the capability to reach it, therefore being resource efficient
I. INTRODUCTION and suitable for small devices.
We seek to build agile, intelligent and effective solutions The remainder of this paper is structured as follows.
in various application areas in order to meet the challenges Section II discusses motivations, requirements for
that face our societies. Current solutions in a wide range of Immersive IoT in comparison to related work. Section II.C
areas are not sustainable, such as Health, Transport, and in examines the properties and role of DCXP as Immersive
Industry. Increased levels of Automation by some orders of IoT. Section IV presents a novel method for programmable
magnitude has been identified as a key factor for achieving map-reduce interworking with Clouds based on local and
sustainable solutions. We seek entirely new levels of global sensor information. Section V presents case studies
Automation by leveraging a rapidly increasing and vast with prototypes built with the open-source platform based
number of “things” connected to the Internet, in all sectors on our architecture, and finally Section VI summarizes our
of society (people, places, sensors, homes, industry, conclusions and discusses directions for future work.
government services, etc.). II. TOWARDS IMMERSIVE IOT
The Internet-of-Things (IoT) is thus transforming every In this section we briefly discuss the motivations and
business sector. Currently, much effort is being spent on requirements behind Immersive IoT.
Big Data and Analytics, since the perception is that the
ability to store sensor information and extracting patterns A. Requirements
from them is essential to building successful solutions that Internet-of-Thing platforms today are built to slow
display intelligent responses or orchestrate components in operations down by design although management of
large systems to attain high levels of automation. communication links and other processing may be
A. Centralized IoT delegated to edge-gateways (a.k.a. Edge and Fog
Computing). Such solutions maximize control by
The common assumption underlying these efforts is to maintaining it in Cloud infrastructure, sacrificing
use Cloud Services and Cloud Infrastructure since these autonomy, flexibility and equally importantly speed in end
have become cost-effective with respect to storage and devices. Centralized storage and management information
availability. This assumption has a price. From an
organizational point of view it is sub-optimal to rely on one
entity (a Cloud service) for coordination. Also it constitutes
a single point of failure as demonstrated by security
incidents as Cloud Hopper [1]. Maintaining intelligence
and coordination in the cloud or at most delegate
management to edge gateways prevents sensors and end
devices to act autonomously on local and global sensor
information.
B. Decentralized IoT
The underlying idea of the Internet-of-Things is that
everything in the world can be online as the result of Figure 1. Decentralizing "Things"
44
7th International Conference on Information Society and Technology ICIST 2017
45
7th International Conference on Information Society and Technology ICIST 2017
Cloud
Node 3
46
7th International Conference on Information Society and Technology ICIST 2017
function and handed back to the cloud-based IoT existing transport private or public service providers,
platform. ambient data, based on sensor infrastructures and
A sample scenario could be the division and social networking data.
distribution of tasks into smaller ones among a vast Figure 5 shows an overview of the system where the
number of nodes in a large WSN, which amounts to longer term AI and analysis processes are located in the
substantial computational resources. central platform, while objects on the left are maintaining
Moreover, as is clear from Figure 3, the rules of the direct relations, sharing context information via the
information in nodes can trigger events cascading across Internet. Figure 5 further illustrates how transportation
relations. Thus, much more complex scenarios operating on resources in a city wide network collaborate in order to
combined scenarios are possible. Figure 4 further achieve an optimal travel experience whilst minimizing the
illustrated this, permitting different end-devices to make- overall energy footprint.
joint decisions based on context-information available in The advantage of the approach delegating autonomy to
each node and send the result back to the platform. The the individual actors based on the platform (mediasense.se)
utility of this becomes evident in Subsection V.B “Ambient were multiple. The actors could take immediate action
Assisted Living in Smart Homes”. based on observed local context changes whilst aware of
V. CASE STUDIES non-local events (such as availability of other modes of
transportation). Coordination by the cloud would not have
The following case studies evaluate the validity of the been as responsive. Moreover as none of the end-devices
requirements and approach. The architecture and open- were
source distribution of DCXP and associated support
(mediasense.se) was made available to partners in different Equally importantly, there was a guarantee of continued
EU projects and recently the Urban ICT Arena operation even without access to cloud services, as mobile
(urbanictarena.se) constituting a smart city testbed for communication is inherently unreliable. Finally, based on
novel services based on fiber, 5G and IoT infrastructures. relevance and relations between persons, transportation
(connected bikes, vehicles, etc.) and homes irrespective of
A. Decentralized Intelligent Transport Services which access network provider was involved, actors could
The EU FP7 MOBIS project (2012-2015) [16] created make decisions relevant to the local context. For instance,
Personalized Mobility Services for Energy Efficiency and a home could reduce its heating during the day but turn on
security through advanced Artificial Intelligence the heat once its occupants were on their way home.
techniques employing the decentralized IoT platform Leveraging such capabilities, the direct collaboration of the
(mediasense.se) for crowd-sensing purposes to federate sources and sinks minimized the overall energy footprint
novel artificial intelligence services and traditional even further.
information platform services coming from the following
sources:
Figure 5. Optimized Personal Mobility with Minimized Overall Energy Footprint in Smart City
47
7th International Conference on Information Society and Technology ICIST 2017
Figure 6. Shared Awareness of Well-Being for Ambient Assistant Living in Smart Homes
Extending the scenario for ITS and transport, in [9] we relations and context-sharing that occurs seamlessly across
outlined support for Cooperative Autonomous Systems, the Internet enables next-of-kin as well as professional care
such as for autonomous vehicles, with resilient (co-) to maintain shared awareness irrespective of location. Also
operation across unreliable infrastructure. Due to the the system is plug & play across the Internet allowing new
interworking and delegation support presented in Section relations ad hoc such that new members as well as sensors
IV on p. this may orchestrated on a larger scale from a can be added to increase shared awareness of well-being.
Remote Operations Center in the cloud via Edge Gateways, For instance, for safety we added flood-detection, window
which is highly relevant to Industrial Internet in general. and door sensors, stove on/off etc.
B. Ambient Assisted Living in Smart Homes C. Mobile-AR and IoT-GIS for Creative Industries
The EU AAL SALIG++ Project (2013-2016) [17] In [2], we demonstrated how our decentralized IoT
created a distributed solution for the shared awareness of infrastructure enabled scenarios in prototypes that employ
well-being between elderly, informal caregivers and Mobile Augmented Reality (M-AR) and IoT Geographic
professional care as depicted in Figure 6. Information System (GIS) Applications. In M-AR
A gateway for SmartHomes that hosts a DCXP peer applications, immediate responses on par with the human
makes sensors and appliances not only accessible from the reaction times (~0.1 sec.) are key to realistic human
Internet and to all participants within the elderly’s AAL experience and decision-making, underlining the
community but more importantly as if these we local to importance of our approach.
them as well. The gateways are connected through the IoT-GIS applications in such scenarios can further
distributed information sharing system (mediasense.se). benefit from our approach since ranking can prevent
This service enables communication with sensors and unnecessary data-streaming of sensor information via
agents running in SmartHome devices agnostic of IP wireless and mobile networks, since sensors themselves
addresses since Context User Agents (CUAs) running in can make decisions about their relevance in a particular
devices are accessed by the Universal Context Identifier context [15]. Thereby we can minimize OPEX, while
(UCIs) that is employed by the DCXP protocol. maximize utility of the overall system by sending only what
Similar to the case study in the previous section, the is important in a global context.
service portal contains functionality for the longer term
decision support and functions that are expected in such a VI. CONCLUSIONS AND FUTURE WORK
health service, such as a calendar and medication schedule. In this paper, we presented the motivations behind and
What is novel is that the service detects anomalies as the urgency of decentralizing IoT due to limitations in a
result of the collaboration between two end devices such as centralized approach but also because enabling autonomy
a pillbox and image recognition installed in the smart home. of things in principle requires decision-making to move to
The pillbox reasons about context changes as which pill has end-devices. The principle to keep decision-making in
been taken and whether it follows a schedule. The image managed services infrastructure makes sense when it comes
recognition can detect different social states, such as to longer term decision-making involving analytics
sleeping, eating, and in this case a pill. When occurring utilizing machine-learning to discover patterns in vast
shortly after one another, there is near certitude that the amounts of stored data. Agile responses matter to a person
elderly has taken his or her medication otherwise the portal accomplishing a task or devices with autonomous behavior.
is alerted. 5G, edge computing and more recently fog computing
The fact that the service platform delegates these tasks to allow data processing and management of communication
the sensors makes the solution resilient to unreliable to move to the edge of the network. This does not solve the
communication, which is important not only in urban problem which is that a network service is involved. The
environments but also in remote areas. Further, the
48
7th International Conference on Information Society and Technology ICIST 2017
49
7th International Conference on Information Society and Technology ICIST 2017
50
7th International Conference on Information Society and Technology ICIST 2017
efficiency and effectiveness on the field from the TABLE I: CPS'S INDUSTRIAL APPLICATIONS
company more than following an axiomatic statement of
smartness. Minimally Turnkey Extensible
invasiveness
III. METHODOLOGY
The first approach was a set of personal interviews and General Electric: High Low Low
the explicating the field experience in the Italian Predix [12]
Company to generalise SME’s expected advantages in
implementing the I4.0 paradigm that resulted as follows:
RTI: Connex DDS High Low Low
[13]
To analyze the different production lines, in order to
identify problems that, if compared, determine the Emerson: High Medium Low
exponential improvement in performance of the Syncade[14]
whole system;
High Low Low
To evaluate the possible actions to take, when the Bosch: Bosch IoT
Suite[15]
production processes are exposed to external events;
To make decisions and make more accurate forecasts ADACOR [16] Medium High Low
in terms of production and consumptions;
To identify and quantify the resources that contribute SkillPro [17] Medium High Low
to the increase in efficiency of the systems;
To check and supervise the use of resources in the ASG [18] Medium Medium Medium
individual phases of the production process;
@MES [19] Medium High Low
To share and integrate the information among all
members of the company;
To optimize the business performance. IV. CASE STUDY
The case discussed here is the study of an assembly work
Descending from these, a set of requirements were traced cell in an Italian small manufacturing company (Master
for the implementation of the I4.0 paradigm in SMEs Italy s.r.l.). This case is brought as an example of a
derived from a wide bibliographical analysis. This needs generic small manufacturing company, since it embeds
were summarised into three main I4.0 solution all the requirements and needs for the transition from
requirements to meet the SMEs requests, and thus to present company state toward the implementation of the
promote their adoption: I4.0 paradigm. The key to interpret what can be called the
“I4.0 transition path” was thought to be the trace of
minimal invasiveness: I4.0 solutions must rest on present information flows, understanding their meanings
(and not replace) the existing systems, and functionalities. The analysis performed provided an
hardware and software (ERP, MES, SCADA, etc.) unequivocal support to decide the digitalisation path. The
[9], [4]; I4.0 transition does not necessarily concern the increase
of data /information/knowledge (generically called info
turnkey: I4.0 solutions needs a minimal intervention here) from the field or the automation/integration of the
of the enduser at changing the use scenarios, i.e. they information flows: it should respond to the need of the
must embed the necessary knowledge for the correct availability to use of this information.
different application classes [9], [10];
extensibility: I4.0 solutions must to be flexible for
the subsequent interventions, so to support a gradual
approach; i.e. they must to ensure the possibility to
reutilize all the components if we want to scale up
the overall system.
51
7th International Conference on Information Society and Technology ICIST 2017
Figure 1: Knowledge, information and following through the work cell (actors in coloured boxes are either single operators or departments).
Symbols refers to figure 2
52
7th International Conference on Information Society and Technology ICIST 2017
For the specific company considered the following needs organisational or sensing architecture. Provided the case
are also expected: can be assumed as a generic SME, from the above it is
Acquire the cell production info automatically clear that commercial platforms responds partially to the
Detection of the actual time of execution and tooling set of on field requirements, providing only some tools
Detection of the actual quantity produced. for the creation of the I4.0 personalized solutions
Process the real-time data to verify and monitor the ([12],[15]). These are infact mainly focused to data
progress of production in the cell. mining and communication between resources. These are
Measure the actual efficiency, when there is a capable of integrating the existing legacy systems and
deviation from the standard (i.e. Time and wastes). thus they result generally minimally invasive, even
though requiring strong ICT competences (the turnkey
According to this analysis, an I4.0 implementation requirement is often not fulfilled). Moreover, provided no
strategy for the work cell should be done according to the standard model for the formalization of the managerial
following steps, which are to a certain extent a measure and technological knowledge is available so far,
of the readiness of the SME to I4.0 implementation: commercial platforms does not satisfy a fundamental
requirement of having an extensible system. The
1. Control of the instructions of the work sequence by academic solutions, on the other hand ([16],[19]) are
the operator: tailored for the designed applications (no extensibility;
Implement devices and/or sensors that indicate the exact required strong ICT competences = no turnkey). These
number of pieces to withdraw and assembly in each latter solutions also only partially fulfill the minimal
phase, and show to the operator the sequence of assembly invasiveness since they do not take into account the
operations and / or packaging. An appropriate device to interaction with existing legacy information systems, e.g.
control if all the phases of the assembly/ packaging ERP.
process are done in the right order is required. As a summary, the main problem related to the
implementation of I4.0 paradigm in SMEs is in the lack
2. Control of the quality of the components: of the perception of the link between managerial and
It is necessary to set up the work cell with devices and/or technological knowledge, and the satisfaction of
sensors that: production needs ([18], [20], [27]). As a conclusion of
• Identify autonomously the several quality problems of this paper is the formalisation of the I4.0 implementation
the components, compared to the standards (dimensions, problem for SMEs as a formalization and standardization
tolerances, finishes, quantity). problem of managerial and technological knowledge, to
•Alert operators through proper alarm systems about make this independent of the specific application and
abnormal or out of tolerance situations. user and thus allow the adoption of general purpose ICT
•Analyse and correlate the symptoms and causes of solutions.
failures and defects in production. This statement bring to the need of conceiving a sort of
•Support the choice of corrective actions to eliminate the library of Industry 4.0 formal patterns to identify, by
detected failures and defects. measuring similarities based on a set of formal criteria
and some experience/knowledge (modelled into some
3. Material handling domain ontology), the readiness of a SME for deploying
It is necessary to set up the work cell with devices and/or Industry 4.0 concepts and technologies and also the
sensors that: proneness of these technology to satisfy the true needs.
1. Record consumptions and control of the online stock, The adoption of existing Smartness Capability Maturity
avoiding interruptions in production cell, due to the Model (SCMM), similar to the well known CMMI for
unavailability of the components in the warehouse stock. guiding process improvement across a project, division,
2. Generate alarms to alert when you need to supply the or an entire organisation [28], would then greatly support
material to the workplace, to prevent the operators’ this decisional process.
interruptions. Formalisation of SCMM is not provided here, despite the
main elements of the SCMM are suggested and
V. SOLUTION/DISCUSSION highlighted, to leave a future paper to come from the
The industry 4.0 it is thought as the widespread use of Authors the burden to provide an efficient model.
sensors for the acquisition, processing and analysis of REFERENCES
data at lower and lower costs. The most critical point in
[1] Kegerman et al., «Recommendations for Implementing the
this implementation is indeed the definition and selection Strategic Initiative INDUSTRIE 4.0: Securing the Future of
of a complete set of the info really needed: this should be German Manufacturing Industry; Final Report of the Industrie
easily understandable and manageable by the different 4.0 Working Group», 2013.
actors and tools involved, unless an increase of [2] M. Dassisti e M. D. Nicolò, «Enterprise Integration and
Economical Crisis for Mass Craftsmanship: A Case Study of an
complexity is expected with respect to the present Italian Furniture Company», in On the Move to Meaningful
operating conditions. Internet Systems: OTM 2012 Workshops, P. Herrero, H. Panetto,
This situation is clear from the analysis of the case R. Meersman, e T. Dillon, A c. di Springer Berlin Heidelberg,
considered, where the complexity relies in the explicating 2012, pagg. 113–123.
[3] M. Brettel, N. Friederichsen, M. Keller, e M. Rosenberg, «How
of the experience of the operator more than in the virtualization, decentralization and network building change the
53
7th International Conference on Information Society and Technology ICIST 2017
manufacturing landscape: An industry 4.0 perspective», Int. J. [17] J. Pfrommer, D. Stogl, K. Aleksandrov, N. S. Escaida, B. Hein,
Mech. Ind. Sci. Eng., vol. 8, n. 1, pagg. 37–44, 2014. e J. Beyerer, «Plug & produce by modelling skills and service
[4] Federmeccanica, «Industria 4.0 in Italia: l’indagine di oriented orchestration of reconfigurable manufacturing
Federmeccanica». 2016. systems», - Autom., vol. 63, n. 10, pagg. 790–800, 2015.
[5] TELUS e IDC, «Internet of Things Study 2014 – The Connected [18] R. Harrison, D. Vera, e B. Ahmad, «Engineering Methods and
Canadian Business». 2014. Tools for Cyber #x2013;Physical Automation Systems», Proc.
[6] Ubisense, «2014 Smart Manufacturing Technologies Survey». IEEE, vol. 104, n. 5, pagg. 973–985, mag. 2016.
2014. [19] M. Rolón e E. Martínez, «Agent-based modeling and simulation
[7] European Commission, «Annual Report on European SMEs of an autonomic manufacturing execution system», Comput.
2015/210616». 2016. Ind., vol. 63, n. 1, pagg. 53–78, gen. 2012.
[8] Commissione Attività produttive, commercio e turismo, [20] S. Wang, J. Wan, D. Li, e C. Zhang, «Implementing Smart
«Indagine conoscitiva su “Industria 4.0”. Quale modello Factory of Industrie 4.0: An Outlook», Int. J. Distrib. Sens.
applicare al tessuto industriale italiano. Strumenti per favorire la Netw., vol. 2016, pagg. 1–10, 2016.
digitalizzazione delle filiere industriali nazionali.» . [21] B. Vogel-Heuser et al., «Challenges for Software Engineering in
[9] Bosch, «Industry 4.0 market study: demand for connected Automation», J. Softw. Eng. Appl., vol. 7, n. 5, pagg. 440–451,
software solutions», 2015. [In linea]. Available at: 2014.
https://www.boschi.com/media/en/bosch_software_innovations/ [22] A. J. C. Trappey, C. V. Trappey, U. H. Govindarajan, J. J. Sun,
media_landingpages/market_survey_industry_4_0/20150928_in e A. C. Chuang, «A Review of Technology Standards and
dustry4_0_market_study_en.pdf.. Patent Portfolios for Enabling Cyber-Physical Systems in
[10] Assolombarda, «Approfondimento sulle tecnologie abilitanti Advanced Manufacturing», IEEE Access, vol. 4, pagg. 7356–
Industria 4.0». 2016. 7382, 2016.
[11] E. A. Lee, «Cyber physical systems: Design challenges», in [23] S. I. Shafiq, C. Sanin, C. Toro, e E. Szczerbicki, «Virtual
2008 11th IEEE International Symposium on Object and Engineering Object (VEO): Toward Experience-Based Design
Component-Oriented Real-Time Distributed Computing and Manufacturing for Industry 4.0», Cybern. Syst., vol. 46, n.
(ISORC), 2008, pagg. 363–369. 1–2, pagg. 35–50, feb. 2015.
[12] «Cloud-based Platform-as-a-Service (PaaS) | Predix.io». [In [24] P. Leitão, J. Barbosa, M.-E. C. Papadopoulou, e I. S. Venieris,
linea]. Available at: https://www.predix.io/. «Standardization in cyber-physical systems: The ARUM case»,
[13] «RTI Connext DDS Software». [In linea]. Available at: in Industrial Technology (ICIT), 2015 IEEE International
https://www.rti.com/products/. Conference on, 2015, pagg. 2988–2993.
[14] «Manufacturing Execution Systems | Emerson». [In linea]. [25] R. Harrison, A. W. Colombo, A. A. West, e S. M. Lee,
Available at: http://www.emerson.com/en- «Reconfigurable modular automation systems for automotive
us/automation/operations-business-management/manufacturing- power-train manufacture», Int. J. Flex. Manuf. Syst., vol. 18, n.
execution-systems. 3, pagg. 175–190, mar. 2007.
[15] «Bosch IoT Suite - Technology for a ConnectedWorld.» [In [26] E. Carpanzano e F. Jovane, «Advanced Automation Solutions
linea]. Available at: https://www.bosch-si.com/products/bosch- for Future Adaptive Factories», CIRP Ann. - Manuf. Technol.,
iot-suite/platform-as-service/paas.html. vol. 56, n. 1, pagg. 435–438, 2007.
[16] P. Leitão e F. Restivo, «ADACOR: A holonic architecture for [27] S. Weyer, M. Schmitt, M. Ohmer, e D. Gorecky, «Towards
agile and adaptive manufacturing control», Comput. Ind., vol. Industry 4.0-Standardization as the crucial challenge for highly
57, n. 2, pagg. 121–130, feb. 2006. modular, multi-vendor production systems», IFAC-Pap., vol.
48, n. 3, pagg. 579–584, 2015.
54
7th International Conference on Information Society and Technology ICIST 2017
Abstract — Cyber-Physical Systems (CPS) lead to the 4-th The Proposal addressed in this paper is related to the
Industrial Revolution (Industry 4.0) that will have benefits study of FCA-based patterns for optimizing CPS
of high flexibility of production, easy and so more accessible interoperability in the Industry 4.0. The cooperative
participation of all involved parties of business processes. manufactory systems involve large number of Information
The Industry 4.0 production paradigm is characterized by Systems distributed over large, complex networked
autonomous behaviour and intercommunicating properties architecture in relation to physical machines. Such
of its production elements across all levels of manufacturing cooperative enterprise information systems (CEIS) have
processes so one of the key concept in this domain will be the access to a large amount of information and have to
semantic interoperability of systems. This goal can benefit of interoperate between them and with the machines to
formal methods well known various scientific domains like achieve their purpose. CEIS architects and developers
artificial intelligence, machine learning and algebra. So the have to face a hard problem: interoperability. There is a
current investigation is on the promising approach named growing demand for integrating such systems tightly with
Formal Concept Analysis (FCA) to structure the knowledge organizational and manufacturing work so that these
and to optimize the CPS interoperability. information systems can be fully, directly and
immediately exploited by the intra and inter-enterprise
Keywords: Formal Concept Analysis, Model, Cyber Physical processes [2].
System, Semantic Interoperability, Interoperability
optimization, Industry 4.0 The main prerequisite for achieving the interoperability
of information systems (and thus a set of collaborative
CPSs, noted by the authors) is to maximize the amount of
I. INTRODUCTION semantics that can be used and to enact it by making it
increasingly explicit [3]. There are different approaches in
The CPS (Cyber-Physical System) is the term that
conceptual modelling and these differences are reflected
describes a broad range of network connected, multi-
in the conceptual languages used for the modelling action.
disciplinary, physically-aware engineered systems that
Entity-Relationship approaches (E-R) have been widely
integrates embedded computing (cyber-) technologies into
used and extended. They led to the development of
the physical world (adapted from [1]). Inside this kind of
different languages for data modelling [4,5,6]
network, each smart component (a sub-system of the CPS)
is with sensing, data collection, transmission and actuation Object-Oriented Modelling (OOM)) [7] approach
capabilities, and vast endpoints in the cloud, offering large addresses the complexity of a problem domain by
amounts of heterogeneous data. considering the problem as a set of related, interacting
Objects. However, the abstract semantics inherent to these
The CPSs are also thought will lead to the 4-th
approaches imposes the modeller to make subjective
Industrial Revolution (Industry 4.0) that will have benefits
choices between entities, attributes and relationships
of high flexibility of production, easy and so more
artefacts for modelling a universe-of-discourse [8]. In
accessible participation of all involved parties of business
order to cope with such heterogeneous modelling patterns,
processes. Their immersion is based on the developments
we focus our interest on approaches that enable their
of computer science and the information and
normalization to a fine-grained semantic model by
communication technologies. Actually, the Industry 4.0
fragmenting the represented knowledge into atoms called
production paradigm is characterized by autonomous
formal concepts.
behaviour and intercommunicating properties of its
production elements across all levels of manufacturing
processes. II. FORMAL CONCEPT ANALYSIS
In this regard the following research directions, related A. Basic Definitions
to the CPS and the Industry 4.0 paradigm, take their
important place: optimization of sensor networks FCA is at its core a mathematical formalism which
organization, handling big datasets, challenges about the with time has developed and been extended with many
information representation and processing. These research theoretical and applied studies. Starting with a set of
domains can benefit from scientific methods well known objects and a set of attributes FCA finds generalizations of
in artificial intelligence domain, machine learning the descriptions for arbitrary subset of objects.
methods and algebra. Basing our efforts on this motivation Let G and M be sets, called the set of objects and
we are currently investigating application of a promising
attributes, respectively, and let I be a relation: I ⊆ G×M.
approach named Formal Concept Analysis (FCA).
55
7th International Conference on Information Society and Technology ICIST 2017
56
7th International Conference on Information Society and Technology ICIST 2017
TABLE III.
CASE STUDY: AUTOMATIC SMART LIGHT SWITCH APPLICATION, COMPONENTS OF THE SYSTEM AND THEIR PROPERTIES
57
7th International Conference on Information Society and Technology ICIST 2017
TABLE IV.
APPLICATION OF INTERORDINAL SCALING TO ‘POSITION’ ATTRIBUTE OF THE MULTI-VALUED CONTEXT TABLE III
x≤0 x≤2 x≥0 x≥2 y ≤ 2.5 y≤3 y ≤ 3.5 y ≥ 2.5 y≥3 y ≥ 3.5
g1 X X X X X X X
g2 X X X X X X X
g3 X X X X X X X
g4 X X X X X X
a
Figure 2. Concept lattice of the scaled context Table V
S. Yevtushenko, ConExp, https://sourceforge.net/projects/conexp/
58
7th International Conference on Information Society and Technology ICIST 2017
composition guards to the definition of formal concept, from establishing links between concepts of the lattices of
which in its turn prevents from definition of proper two or more collaborating cyber-physical systems in terms
derivation operators. Such an approach claimed to provide of improving their interoperability?
advantages in system implementation and also in its
analysis. REFERENCES
The authors of [14] describe an approach to investigate [1] V. Gunes, S. Peter, T. Givargis, and F. Vahid, “A Survey on
the architecture of software. The main purpose is to Concepts, Applications, and Challenges in Cyber-Physical
facilitate the process of identification of the components Systems,” KSII Trans. Internet Inf. Syst., vol. 8, no. 12, pp. 4242–
that implement features of software. In the proposed semi- 4268, 2014.
automatic approach, it is required to design scenarios of [2] Izza, S., “Integration of industrial information systems: from
syntactic to semantic integration approaches”. Enterprise
execution of the program that reveal features to be Information Systems. 3(1), pp. 1-57, Taylor & Francis, 2009
investigated. Then a profiler tool is used on these [3] Obrst, L., “Ontologies for semantically interoperable systems”. In
scenarios during execution of the source code to identify Proceedings of the 12th International Conference on Information
the components of the program involved into supporting and Knowledge Management. New Orleans, USA, 2003
the features of the scenarios. Set of program components [4] Barker, R., CASE* Method: Entity Relationship Modelling,
called during execution of a scenario is named an Addison-Wesley, Wokingham, England, 1990
execution profile. A context is constituted by these [5] Czejdo, B., Elmasri, R., Rusinkiewicz, M. & Embley, D.W. 1990,
profiles, that is each component has a set of scenarios in “A graphical data manipulation language for an extended entity-
which it was called as its description (object intent). In relationship model”, IEEE Computer, March 1990, pp. 26-37.
this way resulting concept lattice represents a hierarchy of [6] Hohenstein, U. and Engels, G., “Formal semantics of an entity-
groups of components, where the upper part of lattice relationship-based query language, Entity-Relationship Approach:
the core of conceptual modelling”, Proc. 9th ER conf., ed. H.
describes specialized components specific to certain Kangassalo, Elsevier Science Pub., Amsterdam, 1991
features whereas the bottom part of the lattice corresponds [7] Rumbaugh, J., Blaha, M., Premerlani, W., Eddy, F., Lorensen W.,
to general purpose components utilized in many scenarios. Object Oriented Modeling and Design. Prentice Hall, Book
As certain scenarios model number of features, analysis of Distribution Center, New York, USA, 1991
lattice is done with respect to this correspondence. [8] Lezoche M, Panetto H., Aubry A., “Conceptualisation approach
There also have been conducted studies investigating for cooperative information systems interoperability”, ICEIS
2011. Beijing, China, 2011
use of FCA for RDF annotated datasets [15] [16] provides
a case study where RDF descriptions of sensor capabilities [9] Ganter, B. and Wille, R., Formal Concept Analysis, Mathematical
Foundations. Springer, 1999
are organized and stored into a lattice structure.
[10] Khaitan, S. K., & McCalley, J. D., “Design techniques and
applications of cyberphysical systems: A survey”, IEEE Systems
VI. CONCLUSION Journal, 9(2), 350-365, 2015
Formal Concept Analysis have been applied in many [11] Stumme, G., Taouil, R., Bastide, Y., Pasquier, N., & Lakhal, L.,
domains as a knowledge representation and discovery “Computing iceberg concept lattices with TITANIC”. Data &
knowledge engineering, 42(2), 189-222, 2002
tool. Current paper takes a step into adaptation of the
[12] Ganter, B., & Kuznetsov, S. O., “Pattern structures and their
approach and its evaluation for the needs of Cyber- projections”. In International Conference on Conceptual
physical systems modelling and analysis. We have Structures, pp. 129-142, Springer Berlin Heidelberg, 2001
demonstrated employment of the basics of FCA on a [13] Tan, Y., Vuran, M. C., Goddard, S., Yu, Y., Song, M., & Ren, S,
simple example, and outlined the major interest in its “A concept lattice-based event model for Cyber-Physical
application in the interoperability context. Systems”. In Proceedings of the 1st ACM/IEEE International
Conference on Cyber-physical Systems, pp. 50-60, ACM, 2010
It is worthwhile to note the distinction of modelling
control-flow of a system with FCA. Taking into account [14] Eisenbarth, T., Koschke, R., & Simon, D., “Locating features in
source code”, IEEE Transactions on software engineering, 29(3),
that FCA is a bottom-up approach in the sense that it starts 210-224, 2003
with particularities of the domain and builds upon them a [15] Alam, M., Buzmakov, A., Codocedo, V., & Napoli, A., “Mining
structure to allow to capture general dependencies. definitions from RDF annotations using formal concept analysis”.
Traditional graphs-based formalisms (such as Petri-nets In International Joint Conference in Artificial Intelligence, 2015
for Process modelling or Feature model in Software [16] Derguech, W., Bhiri, S., Hasan, S., & Curry, E., “Using formal
development) on the other hand are conceived specifically concept analysis for organizing and discovering sensor
for modelling and appear to be more expressive. The capabilities”. The Computer Journal, 58(3), 356-367, 2015
research studying relation between FCA and graph [17] Carbonnel, J., Bertet, K., Huchard, M., & Nebut, C., “FCA for
modelling methods [17][18] indicates necessity of Software Product Lines Representation: Mixing Configuration and
Feature Relationships in a Unique Canonical Representation”, In
utilization of additional filtering after lattice is CLA: Concept Lattices and their Applications, pp. 109-122,
constructed. This basically signifies doubling of the cost CEUR Workshop Proceedings, 2016
of construction of the model. [18] Morozov, D., Kuznetsov, S. O., Lezoche, M., & Panetto, H.,
Further research will aim at answering questions: How “Formal methods for process knowledge extraction”. In 10ème
relationships between concepts of a lattice can be Colloque sur la Modélisation des Systèmes Réactifs, MSR 2015,
2015
interpreted in terms of interoperability of the
corresponding parts of the system. How can one benefit
59
7th International Conference on Information Society and Technology ICIST 2017
Abstract — The main issue with smart home/building provide means for integration with low-level devices
management is that products from different manufacturers (such as sensors and actuators), but as well with the legacy
are incompatible with each other, and even products of the building management systems (BMS) such as, for
same brand are unlikely to really interoperate. To provide a instance, widely utilized systems for supervisory control
fully interoperable solution or solution which is inherently and data acquisition (SCADA) within buildings and more
leaned towards improved interoperability, a robust complex infrastructures.
architectural approach to design an underlying system is This paper presents one of possible solutions for
compulsory. This paper presents one of possible integration and interoperability of smart home and
architectural approaches for an integrative management building automation systems which is built upon central
platform of different smart home and building automation
managed architecture where the sub-system
systems. Proposed approach is devised upon service-
interconnections can be guaranteed by a dedicated
oriented deployment principles to mediate and transform
gateway. Thanks to the gateway it is possible to have an
messages across a variety of systems, services and APIs. It
interoperability inside home or building, considering an
addresses open interoperability standards and exploits the
existing semantic data models to support different
integrated environment with a lot of devices that
integration and interoperability patterns. At its core, the cooperate together as a single entities, exchanging data
proposed approach is leveraged by an open software Smart and providing high quality services. The proposed
Home Gateway responsible for communication with integration approach is heavily leveraged upon the
proprietary solutions and vendor data formats. service-oriented deployment principles which enable
Furthermore, it envisions the Canonical Data Model which mediation and transformation of messages across a variety
serves as unified (XML-based) messaging format across the of systems, services and APIs. In order to support various
smart home management platform providing an abstraction integration and interoperability patterns, it addresses
of proprietary data semantics. relevant open interoperability standards and exploits the
existing semantic data models (in form of ontologies) for
extraction of additional semantics and provision of
I. INTRODUCTION common vocabulary for system components. As
Nowadays, the main obstacle to provide smart mentioned previously, at its core, the proposed approach
home/building interoperability is that products from is leveraged by an open software Smart Home Gateway
different manufacturers are incompatible with each other, responsible for communication with proprietary solutions
and even products of the same brand are unlikely to really and vendor data formats. Inherently, it provides embedded
interoperate. True interoperability can be achieved when modules for translation from vendor data formats so that
heterogeneous smart components can utilize data and the rest of the system can understand and further process
suggest actions from and to each other. Unfortunately, acquired data from the smart devices, sensors and
while the vision of smart buildings is nothing new, metering equipment. To achieve this, the proposed
agreeing on the standards between larger groups of solution envisions the Canonical Data Model (CDM)
companies has been unsuccessful, except in entertainment which serves as unified (XML-based) messaging format
electronics. In the recent years many solutions for across the smart home management platform providing an
integration and interoperability of smart home and abstraction of proprietary data semantics and common
building automation systems are proposed, but they do not understating of the acquired data. In other words, CDM
have a high degree of interoperability and are usually serves as a common language of the system components
strongly technology-oriented. Innovative user-oriented into which dedicated modules of the proposed Smart
services usually have to control and supervise a number of Home Gateway translate the data received from the low-
smart home devices and technical systems (e.g. lighting level devices.
system, video surveillance, motion detection, power The remainder of this paper is organized as follows. An
supply, etc.), within the target building or home [1]. In overview of the state of the art solutions for integration
such a case, interfacing with numerous smart home and interoperability of systems within home or building is
devices, coming from different vendors, using different provided in Section 2. Section 3 elaborates relevant
proprietary communication protocols, is a difficult interoperability aspects that should be taken into account
problem to solve. Therefore, in order to support more and describes the proposed methodology for integration
advanced and intelligent smart home services, generic and and interoperability of smart home and building
scalable solutions are needed to be provided in terms of automation systems. Section 4 specifies the particularities
easy, plug-and-play interfacing with different smart home of integration platform deployment by describing the test-
devices, monitoring equipment, etc. Such solutions should bed platform implemented in the Institute Mihajlo Pupin.
60
7th International Conference on Information Society and Technology ICIST 2017
Finally, concluding remarks and discussion are given in and properties to common ontologies, for example, in
Section 5. Linking Open Data (LOD) cloud. Various studies propose
ontology-based context models for representing smart
II. STATE OF THE ART SOLUTIONS home context information semantically (Gu et al. [15],
An extensive amount of research has been focused on Kim and Choi [16], and Xu et al. [17]). Ontologies, such
smart home and building automation interoperability, as as Cobra-Ont (Chen et al. [18]) and CONON (Wang et al.
elaborated by Jarvinen and Vuorimaa [2] and reported by [19]) have been defined to model context information.
Jiang et al. [3]. There are many possible typologies of However, all these research and standardization efforts are
architectures to improve interoperability with smart home lacking the holistic approach and should be combined to
devices as investigated by Capitanelli et al. [4]. For provide an adequate open-standard based architectural
instance, many studies have been concentrated on backbone in order to ensure the long-term interoperability
connecting heterogeneous devices and subsystems with different proprietary smart home and building
together, and providing a unified interface on top (such as automation systems.
in studies performed by Perumal et al. [5], Wang et al. [6],
Tokunaga et al. [7], and Miori et al. [8]). One of the III. PROPOSED METHODOLOGY
protocols for the unified high-level Web services interface One of possible architectures of the platform for
is open Building Information eXchange (oBIX) integration and interoperability of smart home and
(Considine [9]) proposed by the Organization for the building automation systems was devised as part of the
Advancement of Structured Information Standards proposed methodology. The aim of the specified
(OASIS). This data format is based on XML and defines, architecture was to describe how systems and components
similarly to any common programming language, a small interact, while their interoperability was taken as one of
set of primitive data types for describing the data. Coyle et the primary concerns. In devising the platform
al. suggested a sensor fusion-based middleware for smart architecture conceptually, interoperability as a term
homes, called ConStruct (Coyle et al. [10]). Similarly to related to the ICT domain was considered at the first
an integration platform, this solution interconnects place. However, in broader terms, interoperability can
heterogeneous devices and presents the information with a consider different aspects, such as:
unified format, in this case, RDF. In this regard, there is a Technical interoperability: related to the integration
number of specifications such as OneM2M, Smart Energy of multiple heterogeneous ICT systems and services,
Profile (SEP 2.0) and Common Information Model (CIM) and linking them together;
that should be mentioned as efforts towards the smart
home interoperability on different communication levels Semantic interoperability: common understating of
(while some of them overlapping). the meaning of exchanged information among
involved parties (e.g. specification, relation and
Furthermore, a number of standardization committees structure of concepts and properties utilized for data
such as CENELEC TC59X WG07, IEC TC57 WG21 and description);
Open ADR Alliance are working on smart home/smart
grid interoperability standards. An overview of relevant Organizational interoperability: orchestration of
association and standardization bodies, at both processes and actions in terms of where the data are
international and national level (in case of US and used and transformed (e.g. shared definitions of the
Germany), is presented in Figure 1 [11]. Recently, the use roles, responsibilities and interactions of involved
of semantic technologies in smart environments has been parties).
suggested by many authors (such as in studies by Chen et The methodology proposed in this paper is closely
al. [12], Wang et al. [13], and Zhang et al. [14]). Semantic related to the technical and semantical interoperability of
technologies provide a way to represent data on a higher smart home devices and building automation systems,
semantically meaningful level, and share common while it sets the baseline for building the organizational
understanding of the concepts using ontologies. In interoperability (e.g. to provide high-level services to the
addition, different application domains can define their end-user or in case of multi-stakeholders scenarios). In
own ontologies. In turn, these ontologies can link concepts order to deliver such integration platform which will serve
61
7th International Conference on Information Society and Technology ICIST 2017
as holistic and efficient collaboration platform, the overall system interoperability. Therefore, the proposed
proposed methodology was aimed at the following general approach envisions the exploitation of the existing
targets that have to be satisfied: semantic data models and specifications (e.g. SAREF,
development of software-based services, and SESAME-S, BOnSAI, and IFC) in providing the common
vocabulary for involved actors, interacting hardware and
specification of service-based interoperability software components, and context for semantic
domain among involved parties. enrichment of raw date. Furthermore, to ensure the long-
To accomplish these targets, the proposed methodology term interoperability with smart home services, open
embeds interoperability support spanning technical, interoperability standards (such as OneM2M, SEP 2.0,
semantic and pragmatic interoperability. CIM and oBIX) are respected and followed by underlying
To understand the data from different domains and system communication paradigms.
interoperate with heterogeneous low-level devices, the
proposed integrative management platform exploits to its IV. TEST-BED INTEGRATION PLATFORM
maximum the potential of service-oriented deployment For demonstration of the proposed concept of
principles (as shown in Figure 2). Such service-oriented integration platform, the campus of the Institute Mihajlo
architecture ensures the mediation and transformation of Pupin (PUPIN) in Belgrade was taken as a test-bed,
messages across a variety of systems, services and APIs. equipped with various metering equipment and technical
In this way, it addresses open interoperability standards systems (e.g. power supply, access control, video
(as shown at the right-hand side of Figure 2) and supports surveillance, etc.). For testing purposes in this paper, only
different integration patterns, thus enabling horizontal integration with energy related systems and monitoring
interoperability among various heterogeneous systems and equipment was considered. At the PUPIN campus,
business applications. To support vertical interoperability SCADA View6000 system [21] is operating as the main
with low-level devices and systems, the proposed BMS providing supervision and control of the underlying
approach is leveraged by an open software Smart Home systems. In this regard, SCADA system was taken for
Gateway responsible for communication with proprietary integration with the proposed gateway, as it currently
smart home/building solutions and vendor data formats supervises both electrical (by PV plant) and thermal
(e.g. ZigBee, Z-wave, OPC UA, BACnet, KNX, etc.). energy production (by local thermal power plant). On the
Following the latest trend in Internet of Things (IoT) as other hand, electricity consumption is provided by smart
AIOTI [www.aioti.org], FIWARE [www.fiware.org] or power meters (i.e. non-invasive, self-powered wireless
OpenIoT [www.openiot.eu], an event driven architecture current sensors), which were not yet integrated under the
is also incorporated allowing the real-time distribution of SCADA system, and therefore were considered for direct
data and alerts to and from any kind of cloud application. communication with the gateway. Nevertheless, the
To further support such approach, specification of proposed gateway was envisioned in flexible and scalable
Canonical Data Model (CDM) is seen as a prerequisite manner so it could be easily extended to encompass any
which will serve as unified (XML-based) messaging subsequently defined metering points.
format across the smart home/building management The proposed gateway was implemented to support
platform and make an abstraction of proprietary data two-way communication, i.e. for both data acquisition
semantics. As suggested by literature [20], deployment of from the metering points, SCADA system or other
ontologies, which brings a semantic layer to the smart external data services (e.g. meteorological data), and for
home/building integration platform, can be useful for issuing commands towards the SCADA system for
developing intelligent applications and supporting the performing control actions (e.g. increase the temperature
62
7th International Conference on Information Society and Technology ICIST 2017
63
7th International Conference on Information Society and Technology ICIST 2017
interoperability in intelligent buildings. Automation in [15] Gu, T., Wang, X.H., Pung, H.K., and Zhang, D.Q. (2004). An
Construction, 19(2):160–168. ontology-based context model in intelligent environments. In
[6] Wang, S., Xu, Z., Cao, J., and Zhang, J. (2007). A middleware for Proceedings of communication networks and distributed systems
web service-enabled integration and interoperation of intelligent modeling and simulation conference, volume 2004, pages 270–
building systems. Automation in Construction, 16(1):112– 275.
121.CAADFutures, 2005. [16] Kim, E. and Choi, J. (2006). An ontology-based context model in
[7] Tokunaga, E., Ishikawa, H., Kurahashi, M., Morimoto, Y., and a smart home. In Computational Science and Its Applications-
Nakajima, T. (2002). A framework for connecting home ICCSA 2006, pages 11–20. Springer.
computing middleware. In ICDCSW, pages 765–770, [17] Xu, J., Lee, Y.-H., Tsai, W.-T., Li, W., Son, Y.-S., Park, J.-H., and
Washington, DC, USA. IEEE ComputerSociety. Moon, K.-D. (2009). Ontology-based smart home solution and
[8] Miori, V., Tarrini, L., Manca, M., and Tolomei, G. (2006). An service composition. International Conference on Embedded
open standard solution for domotic interoperability. 52(1):97–103. Software and Systems, 2009. ICESS’09, pages 297–304.IEEE.
[9] Considine, T. (2010). Open Building Information Exchange [18] Chen, H., Finin, T., and Joshi, A. (2003). An ontology for context-
(oBIX) version1.1.OASIS Working Draft 06, June 2010. aware pervasive computing environments. The Knowledge
Engineering Review, 18(3):197–207.
[10] Coyle, L., Neely, S., Stevenson, G., Sullivan, M., Dobson, S., and
Nixon, P. (2007). Sensor fusion-based middleware for smart [19] Wang, X.H., Zhang, D.Q., Gu, T., and Pung, H.K. (2004b).
homes. International Journal of Assistive Robotics and Ontology based context modeling and reasoning using owl. In
Mechatronics, 8(2):53–60. Proceedings of the Second IEEE Annual Conference on Pervasive
Computing and Communications Workshops, 2004, pages 18–22.
[11] Baumeister & Fuchs, Interoperability activities for Smart Home
Environment, 2015 [20] C. Reinisch, M.J. Kofler, I. Felix, K. Wolfgang, ThinkHome
[12] Chen, L., Nugent, C., Mulvenna, M., Finlay, D., and Hong, X. energy efficiency in future smart homes, EURASIP J. Embedded
Syst. 1 (2011) 1–18.
(2009). Semantic smart homes: towards knowledge rich assisted
living environments. In Intelligent Patient Management, pages [21] SCADA View6000 – Project VIEW Center, Institute Mihajlo
279–296. Springer. Pupin, Serbia, www.view4.rs
[13] Wang, X., Dong, J. S., Chin, C., Hettiarachchi, S.R., and Zhang, [22] OSGi - Open Services Gateway initiative, www.osgi.org
D. (2004a). Semantic space: an infrastructure for smart spaces. [23] OPENMUC, Fraunhofer ISE, Germany, www.openmuc.org
Pervasive Computing, IEEE, 3(3):32–39. [24] Process management in the energy industry, Institute Mihajlo
[14] Zhang, D., Gu, T., and Wang, X. (2005). Enabling context-aware Pupin, http://www.pupin.rs/en/products-services/process-
smart home with semantic web technologies. International Journal management/scada/process-management-in-the-energy-industry/
of Human-friendly Welfare Robotic Systems, 6(4):12–20.
64
7th International Conference on Information Society and Technology ICIST 2017
Abstract – The overall increase in number of intelligent barriers that can be roughly summarized into three basic
devices and home appliances, effectively taking part in the aspects: a) barriers for technology acceptance, b) barriers
IoT world, has led to the development of numerous for technology use, and c) barriers for technology
innovative solutions across various domains. This paper effectiveness. These barriers change significantly between
introduces an ICT based solution for energy End User the commercial, residential and industrial sectors. In
engagement towards more efficient behaviors and parallel to these, significant advances in other areas
sustainable lifestyles. The proposed solution focuses on including; evolving information and communication
integration of comprehensive energy consumption technology infrastructures, the emergence of smart
monitoring platforms and/or Home Automation solutions devices and the internet-of-things (IoT), the Smart Meter
with advanced energy services through a highly flexible rollout for 2020 [3], the increase of cost-effective and
cloud-based IoT platform and a service based approach. novel energy storage technologies, electrification of
Employed energy services range from consumption data transport, as well innovative funding, payment
analytics and profiling over to performance evaluation and mechanisms and business models yield the optimal
consumption prediction towards demand benchmarking conditions for delivering innovative solutions.
and optimization. The ability to develop offer advanced
energy services independently, i.e. as functional upgrade of B. Research questions and SoA
an existing IoT platform, is used to formulate innovative
business models and deliver a cost-effective solution. Inducing a behavioral change (towards EE) represents
an unsolved societal challenge with potentially enormous
environmental impact. But, why is it so difficult to induce
I. INTRODUCTION a behavioral change towards EE? There are several
factors: a) relatively limited perceived benefits to Users
A. Motivation associated with EE measures. In particular, low EE
The overarching objective of this paper is to propose associated monetary benefits to the individual (an average
and elaborate on an ICT solution which facilitates End energy bill1 already does not represent a significant
Users behavior change towards increased energy portion of the home budget (cca. 5%)2 and even less if we
efficiency (EE). The current efforts in this domain are consider that only a share of it (max. 20%) can be saved),
best summarized in a quality EU study produced by the b) low capability to change energy demand patterns and
European Environmental Agency "Achieving energy
energy use practices due to aspects related to lifestyle,
efficiency through behavior change: what does it
take?"[1], which is based on factual data as well as an habits, daily routines, work/family commitments,
extensive literature research. As highlighted, the resulting in low levels of Users’ demand flexibility, c)
experience gained in previous deployments of metering lack of global conscience related to the impact of energy
and feedback technologies demonstrate that the different generation and use on the environment (pollution,
ICT enabled engagement platforms, strategies and Greenhouse Gas emissions), and the tendency to perceive
methodologies lead to varying levels of behavior change the individual consumption as insignificant (“My
and demand reduction ranging between 3%-20%, that last individual consumption does not make the difference”),
for varying periods of time. A similar experience is and d) the complexity of existing ICT platforms/tools
reported from the US where their Behavioural Energy (e.g. leveraging on gamification approach) which require
Efficiency (BEE) programs are helping energy providers intensive engagement often interfering with the everyday,
come up with innovative solutions, raising awareness of activities.
other EE programs, and achieving significant cost-
Additionally, there is an increased number of various
effective savings. As indicated in the McKinsey 2013
study [2], BEE’s potential is far from tapped since it Home Automation platforms present in the market which
revealed that behavioural interventions could reduce the provide various services related to energy efficiency,
total US residential energy use by as much as 20%. In home security, comfort regulation, remote access to home
this regard, a number of ICT solutions capable of appliances etc. However, when it comes to energy
capturing users’ attention and generating efficient and
sustainable behaviours over time, including also users 1
https://www.ovoenergy.com/guides/energy-guides/the-average-gas-
with low ICT literacy, have flooded the EE market, bill-average-electricity-bill-compared.html
however although proven to be effective to a certain 2
http://www.deceuninck.es/blog/cual-es-el-gasto-medio-de-energia-en-
extent, these technologies still face significant adoption un-hogar-espanol/
65
7th International Conference on Information Society and Technology ICIST 2017
efficiency, they are typically focused on providing energy by leveraging on proprietary social networks where people
monitoring capabilities and supplying the End User with can compare and compete. However, due to the lack of
information related to energy consumption and, in case of detailed consumption data only broad and generic ECMs
advanced solutions, on providing some simple control are delivered. Simple Energy [8], Welectricity [8] and
functionalities. However, these solutions rarely provide Leafully [8] offer solutions that do not require additional
hardware deployment but focus their approach on the
any additional semantics attached to the captured
benchmarking aspects. Hoping that this would lead to the
consumption and, since the kWh (or other, e.g. kJ, critical commitment of users, they all provide proprietary
calories etc.) figures may mean very little to an average social networks where people can compare and compete
End User unless attached and correlated with the actual but also to learn from each other’s. The pitfall of their
context (e.g. number of occupants, climate conditions approach seems to be a lack of dynamic energy
etc.), they stand very little chance in driving the change in consumption data (they all use energy bills), suggesting
the End User behavior towards more efficient lifestyles. very poor resolution (monthly) which consequently lead to
The following is a brief overview of the solutions aiming very broad energy conservation measures. Furthermore,
at overcoming these barriers. the aspect of social pressure that should be coming from
When it comes to the user engagement, a widely the employment of social networks has very low impact
adopted solution comes from the US market leader due to the fact they are using their own social networks
Opower/Oracle [4]. Despite the fact that they work for the which, unfortunately, has a limited number of active users.
utility industry, their solutions aim at helping customers Instead, utilization of widely accepted social networks
understand their energy use at the first place and then (such as Facebook, LinkedIn etc.) would allow for better
better manage it. By offering a wide range of solutions user engagement, since they already use them, and
ranging from Demand Response and Customer unlimited dissemination possibilities.
Engagement to Thermostat Management, they are focused Finally, the experience gained in previous deployment
on solving the key energy efficiency challenges. The of metering and feedback technologies demonstrate that
Opower approach mainly leverages on the utilization of the different engagement platforms, strategies and
personalized messaging system that brings important methodologies lead to varying levels of behavior change
Energy Conservation Measures (ECMs) and notifications and demand reduction (ranging between 3%-20% in kWh)
and warnings related to the user’s energy consumption, that last for varying periods of time. While proven to be
which are transferred through the means of their custom effective, these technologies still face significant adoption
designed Web and Mobile platforms. However, the key barriers that can be roughly divided into three main
disadvantage of the proposed solution is that the categories: 1) technology acceptance, 2) technology use,
methodology that drives the personalized messaging and 3) technology effectiveness. These barriers change
system seems to be taking into account only the user’s significantly between the commercial, residential and
energy consumption and not the actual performance. The industrial sectors.
solution has the highest market penetration when it comes
to the number of active users, probably owing to the fact II. METHODOLOGY
that they not require any hardware installation at the
location of each customer and offer free of charge service The proposed solution aims to spark the End User
to the End User. However, greater energy savings and behavior change by tailoring ECMs and providing
more focused energy conservation measures may only be incentives to each User according to their own
possible if both consumption and user behavior is constraints, i.e. to provide information and advice that are
monitored in real-time and benchmarked against similar relevant to the specific User and that the User can act
users. The Swedish utility E-ON recruited 10k participants upon without compromising comfort and convenience -
for what they call “Sweden's Largest Energy Experiment” the key is to understand barriers for action and target
[5] in which the real-time energy consumption was those specifically. To achieve this, features such as fair
presented to their customers through their Mobile App comparison (fair, as perceived and defined by the User)
during a period of one year. The monitored consumption can enable unique social nudges (i.e., positive
is visualized in five different ways, in order to investigate reinforcement in [9]) towards the behavior change. In this
what would make their customers save the largest amount
regard, the proposed solution seeks to evaluate the User
of energy. This approach benefited from large amount of
collected data but lacks in the means for user engagement energy social practice [10] and his energy performance,
and social race between the customers. Solutions coming rather than (just) consumption, so as to enable fair
from Energy Aware [6] (that emerged from Neurio and benchmarking and competition among Users and thus
Wattson), Engage and AlertMe [7] mainly leverage on the create a unique social pressure capable of driving the
installation of additional smart metering which is used to behaviour change. The performance evaluation is used to
track the users’ energy consumption and Web or Mobile contextualize measured energy consumption by
interface for visualization. In some cases, e.g. Neurio, normalizing it against a range of ‘objective parameters’
advanced algorithms were employed to devise user such as building size, construction material properties,
behavior from the monitored consumption and formulate number of occupants, climate conditions etc. The authors
personalized messages carrying corresponding ECMs. have opted for the User-oriented and User-tailored
Similarly to the previous solution, a benchmarking approaches as they are likely to improve the ICT
platform which proved to be the key factor for user
effectiveness due to two main reasons: First, tailored
engagement is missing. Simple Energy, Welectricity and
Leafully [8] do not require any hardware installations but advice, which fits with the user’s ability and constraints,
rather focus their approach on the benchmarking aspects is more likely to be followed and deliver change. Second,
successful actions that are visible to the User (in terms of
66
7th International Conference on Information Society and Technology ICIST 2017
Consumption Data
Meteorological Data Analytics
Performance Evaluation
and Benchmarking
Integrated Energy
Monitoring and Home Automation Demand Optimization
Technical Extensibility and platform
Data Exposure
Meteo
Washing Sensors
Application machine
Program Interface
Occupancy
Sensors
User-friendly GUIs
Social networks Refrigerator
Oven/stove Web client
AC
Storage
Building/Dwelling/ Generation
Apartment
67
7th International Conference on Information Society and Technology ICIST 2017
Listed functionalities will be used to devise a unique C. Integrated Energy Demand Optimization
analytics engine which will have a twofold objective, i.e. Apart from detecting User’s wasteful practices through
to look for performance deviation by comparing current a set of analytical services, the proposed solution also
with previous performance based on measured offers energy conservation and cost reduction measures
consumption, which will enable generation of specific which are found as output from an integrated energy
and User-tailored Energy Conservation Measures (ECM) dispatch optimization. As a result of such optimisation,
communicated to the User (e.g. User may consume more an optimal demand profile (highlighting the deviation
energy under the same outdoor conditions and occupancy from the regular practices) is suggested together with
due to an opened window, malfunctioning HVAC device optimal dispatch (allocation in time) of multiple energy
etc.), and, to anticipate wasteful practices based on User- supply options and available energy assets. For this
centric energy consumption modelling and consumption purpose, an optimisation framework based on the Energy
forecast combined with pricing/load/stability information Hub concept [16], [17], further extended conceptually
received from ESCOs (e.g. User consumption is during the Epic-HUB project [18], will be employed.
approaching contracted power peak, electricity tariff will The optimizer is based upon a mathematical algorithm
change in next X minutes, grid is approaching its limit for integrated optimisation of energy supply and demand
etc.) of the target building, while taking into account available
B. Performance Evaluation and Benchmarking energy assets, multiple carriers at the supply side (both
conventional and renewable energy sources), fixed and
Calculating User energy performance and using it later
variable pricing tariffs, type of the load (critical,
for benchmarking, instead of using only energy
reschedulable and curtailable load), comfort levels, etc.
consumption for benchmarking, is a pivotal energy
service that the proposed platform aims to offer. D. IoT-based integration platform
Therefore, adopting methodologies that will enable fair Once developed, advanced energy services are
comparison and benchmarking of Users against integrated with the corresponding data sources and
themselves and others with similar profiles is at the devices able to perform energy conservation measures as
cornerstone of proposed approach. Energy consumption well as third party services. In other words, the
data, per se, represents required, but not sufficient, technological prerequisite for deployment of the proposed
information to estimate someone’s performance or to platform is to ensure connection of sensors, actuators, and
enable fair comparison. For example, Users may consume devices to a network and enabling the collection,
more energy due to change of outdoor climate conditions, exchange, and analysis of generated data. The described
change in occupancy, specific work / family requirements constellation will indeed form an IoT network
or events, or else, while still keeping the same level of characterized by many devices (i.e. things) that will use
performance. The performance evaluation will consider dedicated gateways to communicate through a network to
normalization of consumption against a range of relevant the platform back-end server which will be running on an
parameters such as applicable climate conditions and open-source IoT platform and utilized to enable
number of occupants, but also building size and integration of the deployed sensors with advanced energy
construction material thermal properties etc. This will services. The integration platform will indeed represent
allow for setting realistic and achievable goals for the an IoT Cloud Platform and will be leveraged by an open-
Users, rather than unrealistic ones, that discourage further source communication middleware enabling data
engagement and reduce the effectiveness of the exchange of underlying systems and services required to
technology over time. In particular, the proposed enable an IoT solution.
solution will integrate the performance evaluation
The proposed platform will operate on a cloud
approach from the Energy Star’s Portfolio Manager [13],
infrastructure (e.g. OpenShift, AWS, Microsoft Azure,
[14], developed by the US EPA which represents the
and Cloud Foundry) or inside an enterprise data centre,
industry-leading, free of charge online tool that allows
and will be able to scale both horizontally, to support the
users to benchmark, track, and manage energy and water
large number of devices connected, and vertically to
consumption and greenhouse gas emissions against
facilitate deployment of different IoT solutions. The IoT
similar users/buildings nationwide. It is based on the
Cloud Platform will ensure the following core features, as
calculation of the so called Energy Star Score [15], which
indicated in Figure 2:
is expressed as a number on a simple scale of 1 – 100 and
rates the User’s performance on a percentile basis: - It will be leveraged upon middleware responsible for
buildings with a score of 50 perform better than 50% of orchestration of system components and services
their peers; buildings earning a score of 75 or higher are - It will be able to interact with very large numbers of
in the top quartile of energy performance. Although the devices and gateways using different protocols and
comprehensive methodology behind score calculation data formats. Moreover, it will provide a framework for
takes into account normalization of building type, number utilization of a common data format (e.g. Canonical
of inhabitants, type of their activity etc., it lacks in Data Model) so as to allow for easy integration into the
delivering means for better User engagement through the rest of the platform.
means of social web that unleashes the full capacity for
achieving positive social pressure.
68
7th International Conference on Information Society and Technology ICIST 2017
69
7th International Conference on Information Society and Technology ICIST 2017
Abstract— Agribusiness industry and food production are will make them able to communicate with each other
dealing with perishable products, unpredictable production and with the users, becoming an integral part of the
and supply chain management variations, tendency to Internet [3].
maximize waste reduction and adherence to strict regulations IoT is the network of physical objects that contain
related to food safety. The Internet of Things (IoT) includes
embedded technology to communicate and sense or
several technologies that can be put into practice, in order to
respond to all these demands and resolve challenges, as it interact with their internal states or the external
allows remote control of the production conditions and environment [4].
distribution of products. In this paper, an architecture for
IoT-based information systems in seed-to-fork food The IoT can be used in everyday life [5], in health-care
production is proposed. The architecture includes IoT from applications [6], military applications [7], rescue operations
the first to the last phase of production, distribution and even [8], and other application for monitoring, tracking, control
feedback consumers. The authors propose a hybrid solution systems [9] - [12].
that combines the IoT and Future Internet. Having in mind such a wide range IoT applications, it is
not a surprise that the usage of IoT in monitoring of
I. INTRODUCTION agricultural products is also well-developed and adopted.
Most consumers are aware that, using their mobile devices
Freshness of products is one of the most important (and connection to Internet), they can easily reach
attributes of food products for consumers. Therefore, when important information about a product. Based on this
purchasing food products, consumers always tend to buy information, in many cases, the consumer will make final
products which are fresher, due to the fact tgat the fresher decisions about buying a product. This behaviour of
the product, the higher its nutritional value. In order to consumers is becoming a global trend. In order to facilitate
determine the freshness of a product, it is common to use and accelerate the applications that will make up the future
detection of visual characteristics of the product, or observe work on the Internet, the European Union supported the
the expiration date to get an idea of the freshness. This development of FIWARE. FIWARE is a platform for the
traditional method of determining the freshness of the development and global deployment of applications for
product is not completely reliable. Namely, when the food Future Internet.
product is kept for a long time in a cooling storage and This platform can be very helpful in putting information,
brought to the point of sale, the buyer can think that the collected by IoT, in practice. FIWARE provides an
product can retain freshness for several days. Probably, this enhanced, OpenStack-based, cloud environment, plus a
could happen if the product was stored correctly. However, rich set of open standard APIs, that make it easier to
at that moment the buyer does not have any data if the connect to the IoT, process and analyse Big data and real-
product was stored correctly or not. He/she can only time media or incorporate advanced features for user
evaluate the freshness of the product based on its visual interaction [9].
characteristics and make a conclusion how the product was
stored. Therefore, it is very important that every customer In this paper, a reference architecture of an IoT based
has an opportunity to get the data from the complete information system in seed-to-fork food production
product life cycle. One solution is to use IoT (Internet of through FIWARE application is presented.
Things) for product tracking during its life cycle.
II. INTEROPERABILITY OF IOT AND FIWARE
The Internet of Things (IoT) is a network of devices,
equipment, and machines capable of interacting with each Internet of Things has been identified as one of the
other [1]. Among many definitions of IoT, the following emerging technologies in IT, as noted in [10] Hype Cycle
are most common: (Fig. 1). Hype Cycle is a way to represent the emergence,
adoption, maturity, and impact on applications of specific
The interconnection via the Internet of computing
technologies. It has been forecast that IoT will take 5–10
devices embedded in everyday objects, enabling years for market adoption [10].
them to send and receive data [2].
Gartner also forecasts [11] that the IoT will reach 26
IoT is a recent communication paradigm that
billion units by 2020, up from 0.9 billion in 2009, and will
envisions a near future, in which the objects of impact the information available to supply chain partners
everyday life will be equipped with and how the supply chain operates.
microcontrollers, transceivers for digital
IoT is built on three pillars, related to the ability of smart
communication, and suitable protocol stacks that objects to [12]:
70
7th International Conference on Information Society and Technology ICIST 2017
71
7th International Conference on Information Society and Technology ICIST 2017
III. DESIGNED REFERENCE ARCHITECTURE whole system is functioning according to regulations and
The system described in this paper allows traceability of informing all involved entities with relevant information,
agricultural products and all its lifecycle phases from seed such as change in regulations.
to fork. This allows final users to get safe, high quality Seed to fork service allows collecting and processing
products, for best price. Also, other entities involved data about agricultural products and its lifecycle phases.
receive data which allow them to improve their business Seed to fork services include the following: sensor data,
and optimize their services, resulting in higher income and seed purchase, farmer production and monitoring,
more satisfied clients. The proposed system can be divided production planning and monitoring, production analysis,
into three main groups: seed to fork stakeholders, seed to product quality forecast, warehouse management, transport
fork service and IoT devices. Along with these three planning and monitoring, delivery planning, quality
groups, the system also contains two connectors: web user monitoring (process and product), return product, problem
interface for connection between stakeholders and seed to handling. Sensor data enables collection of all relevant data
fork service and device manager for connection between for product and its lifecycle phases, as well as storage of the
IoT devices and seed to fork service. The reference collected data into relevant databases. Seed purchase
architecture of implemented system is shown on Fig. 2. enables purchasing best quality seeds from proven and
Seed to fork stakeholders represent different users of the reliable seed suppliers. Farmer production and monitoring
system. Stakeholders can be: seed suppliers, farmers, provides tracking of all relevant information about farmer’s
logistics services, warehouse management services, production, as well as planning of future production.
production services, laboratories, auction services, Production planning and monitoring provides tracking of
retailers, customers and authorities. Seed suppliers can all information related to production and planning of future
track yield of their products (seed) in various conditions production, based on other available information.
and get feedback for their product from other users of the Production analysis enables detection of potential problems
system (most often: farmers, laboratories, customers, etc.). and bottlenecks, and its results are used to increase the
Farmers have all needed information about their business production capacities, reducte costs and provide better
on one place, can easily print out reports, get feedback from management of resources. Product quality forecast is used
customers, organize transport and storage of their products, to dynamically calculates expiration date and quality of
and get professional advice and training. Logistic and product, based on information from production,
warehouse management services allow easy organization transportation and storage. Warehouse management and
of transportation and storage for agricultural products for transport planning and monitoring enables optimal usage of
all involved users. The system allows logistic and warehouse and transportation capacities and finding
warehouse management services to utilize their capacities optimal storage and transportation for producst, based on
to the maximum. Production services enable conditions required for the specific product. Delivery
transformation of raw products to products that are ready to planning enables cheapest and fastest delivery of products
be used by the final customer. The service allows to its final customers, based on timetables defined by both
production to get cheapest and best quality product, as well suppliers and final customers. Quality monitoring (process
as receive feedback for their output products. Laboratories and product) enables feedback from all related entities in
enable the tracking of agricultural product quality, as well product’s lifecycle, analysis of data from laboratories, and
as the quality of the whole product lifecycle. In case that thus assessing quality of both product itself, and whole seed
the product is not in its desired quality boundaries, the to fork process (complete product lifecycle). Return
system allows detection of problematic links in the product product enables the final user to return products with
lifecycle and notifies all stakeholders related to the unsatisfactory quality, as well as receive refund. Return
problematic product. Auction services enable buyers to product also allows identification of the problem source
have access to all needed information about products they and notification of all problem related stakeholders in the
are buying, and allows sellers to have estimated sell price, product’s lifecycle. Problem handling allows defining
so they can easily plan their budget. Retailers can choose strategies and procedures in case of problems or issues, for
best products for their customers, based on feedback left by example, withdrawal of a product from market because of
customers and by information about product. Customers high pesticide contamination.
can find products that fit best their needs, ensuring that they For implementation of seed to fork service FIWARE
have safe, quality product for the best price. Authorities can platform is used. The Fig. 3 shows FIWARE architecture
easily track every link in product lifecycle, ensuring that of implemented system.
72
7th International Conference on Information Society and Technology ICIST 2017
73
7th International Conference on Information Society and Technology ICIST 2017
on the type of users and roles. PEP Proxy - Wilma is [3] L. Atzori, A. Iera, and G. Morabito, “The internet of things: A
completely integrated with FIWARE ecosystem and survey” , Comput. Netw., vol. 54, no. 15, pp. 2787–2805, 2010.
FIWARE account and together with Authorization PDP [4] http://www.gartner.com/it-glossary/internet-of-things/. Retrieved
16 June 2015.
and Identity management gives a complete and robust
security system for the application. The PEP Proxy [5] A.B. Zaslavsky, C. Perera and D. Georgakopoulos, Sensing as a
service and big data. In: International conference on advancesin
represents a secure wrapper for implemented web services, cloud computing (ACC-2012). Bangalore, India; July 2012. pp.21–
which provides access control for these services based on 29.
the data from the previous two security components. [6] L. Da Xu, W. He and S. Li, “Internet of things in industries: A
Application block represents web and mobile survey”, IEEE Trans. on Indust. Inform., vol. 10(4), pp. 2233–2243,
applications that can be developed to meet the needs of 2014.
specific users (farmer, transporter, consumer, etc.). [7] D. Singh, G. Tripathi and A.J. Jara, A survey of internet-of-things:
Future vision, architecture, challenges and services. In: 2014 IEEE
world forum on Internet of things (WF-IoT), Seoul, Korea (South):
IV. CONLUSION IEEE; March 2014. p. 287–92.
In this paper an architecture for IoT-based information [8] J. Gubbi, R. Buyya, S. Marusic and M. Palaniswami, “Internet of
systems in seed-to-fork food production is presented. things (IoT): A vision, architectural elements, and future
directions”, Futur. Gener. Comp. Syst., vol. 29(7), pp. 1645–1660,
Described architecture allows collection of data related to 2013.
food products through whole product’s lifecycle, from [9] https://www.fiware.org/developers-entrepreneurs/
production of seeds, sowing, cultivation, harvest,
[10] Gartner’s hype cycle special report for 2011, Gartner Inc., 2012.
processing, storage, and distribution to the final customer. http://www.gartner.com/technology/research/hype-cycles/.
All stakeholders can easily exchange data with each other, [11] Gartner. (2014, March 19). Gartner says the Internet of Things will
in order to facilitate and accelerate data processing. This is transform the data center. Retrieved from
especially important for enterprises that consist of http://www.gartner.com/newsroom/id/2684616
independent entities, and for association of several [12] D. Miorandi, S. Sicari, F. De Pellegrini and I. Chlamtac, “Internet
independent enterprises. Enterprises can improve their of things: Vision, applications and research challenges”, Ad Hoc
business by analyzing data in the system, which can Networks, vol. 10, pp.1497–1516, 2012.
decrease the cost of production, storage, delivery and [13] K. Inkenzeller, RFID Handbook: Fundamentals and Applications in
decrease waste generation. On the other side, final Contactless Smart Cards and Identification, John Wiley & Sons, Ltd
ISBN: 978-0-470-84402-7, New York, 2003.
customers can buy quality and safe food products at
[14] L. Atzori, A. Iera and G. Morabito, “The Internet of Things: A
reasonable prices, with the possibility of returning the food survey”, Comp. Netw., vol. 54(1), pp. 52787—52805, 2010.
products if they are not satisfied with the quality of that
[15] http://www.cio.com/article/2945732/cio100/the-internet-of-things-
product. now-includes-the-grocery-stores-frozen-food-aisle.html
In future work, a further development of the service for [16] Elizabeth Weise, Amazon just opened a grocery store without a
seed-to-fork food production is planned. This means checkout line, USATODAY, Dec. 9, 2016
implementation of different methods for data processing, [17] https://forge.fi-
data classification and artificial intelligence methods for ware.org/plugins/mediawiki/wiki/fiware/index.php/
prediction. Another option is to enable enterprises FIWARE.OpenSpecification.Cloud.DCRM.
integration of third party components in order to further [18] G. Ostojić, S. Tegeltija, M. Lazarević, D. Oros, S. Stankovski,
improve its business by adapting service for the seed-to- Implementation of IoT and FIWARE in Production Systems,
Proceedings of XIII International SAUM Conference on Systems,
fork food production service to their specific needs. Automatic Control and Measurements, Niš, Serbia, November
09th-11th, 2016
REFERENCES [19] C.N. Verdouw, R.M. Robbemond, T. Verwaart, J. Wolfert, A.J.M.
[1] S. Stankovski, G. Ostojić, X. Zhang, “Influence of Industrial Beulens, “A reference architecture for IoT-based logistic
Internet of Things on Mechatronics,” Journal of Mechatronics, information systems in agri-food supply chains,” Enterprise
Automation and Identification Technology, vol. 1, pp. 1–6, January Information Systems, DOI: 10.1080/17517575.2015.1072643, 2015
2016
[2] http://www.oxforddictionaries.com/definition/english/Internet-of-
things, Retrieved 16 June 2015.
74
7th International Conference on Information Society and Technology ICIST 2017
Abstract — In this paper we propose a simple method for conclusions made from the analysis of it. On the other
measuring the similarity of interests between Facebook users hand, they offer some kind of recommendation services to
and grouping them with this measure in mind. The ultimate their users, which rely on paid advertising. Also, users are
goal is to build a web application which will integrate with given the opportunity to find new connections inside or
Facebook and offer users the ability to find other users outside of the boundaries of a particular social network,
similar to them, starting from the premise that users are and this mechanism is also based on the analysis of user
similar as much as their interests match. A user of this data which could be collected by the companies
application will be able to look through a list of similar users themselves.
(sorted by descending degree of mutual similarity) and see
the particular shared interests with each of those similar In increasingly frequent situations nowadays, data
users. originating from social media can be an object of interest
to public authorities, such as the police, as well as a range
I. INTRODUCTION of government agencies, because of a possibility of fraud
Social networks have emerged as a large factor in such as impersonation, pedophilia and sex trafficking.
information dissemination, search, marketing, expertise
and influence discovery, and are potentially an important In this paper, the focus is on Facebook, because it is the
tool for mobilizing people. Social media has made social world's most popular social network, and makes available
networks ubiquitous, and has also given researchers access an application programming interface, or API - Graph API
to massive quantities of data for various types of analyses. [4], through which various data from Facebook can be
These data sets are a strong foundation for studying the easily consumed, and later analyzed. Also, this research
dynamics of individual and group behavior, the structure of uses some user information which is not publicly available,
networks and global patterns of the flow of information and for that purpose utilizes appropriate Access Tokens
between them. Sometimes, the shape of underlying [5].
networks is not directly visible, but can be inferred from the
flow of information from one individual to another [1].
This paper is laid out in the following chapters: the first
With the great expansion of Facebook and Twitter (which
chapter gives an introduction into the research conducted
appeared in 2004 and 2006, respectively), various parties
in this paper, as well as the motivation for the research.
have shown their need for tracking the behavior and Chapter II gives an overview of some related work.
activities of social media users, from the founder Chapter III outlines basic facts about data model and the
companies to data scientists and enthusiasts. Data collected possibilities it offers, together with a brief description of
from social media can be used for a variety of purposes, so Facebook's Graph API, while Chapter IV describes the
every party which investigates it may have its own point of chosen method of determining the measure of similarity
view and method of interpretation. between Facebook users. Chapter V shows the results of
the proposed method, concluding with directions for future
Large amounts of data collected from social networks research.
have found their role in social media marketing. By some
definitions, “social media marketing” encompasses II. RELATED WORK
advertising and promotional efforts that use social media
Web sites [2]. It is a form of viral marketing, a term coined As noted, this section deals with existing related work
by Harvard professor Jeffrey F. Rayport in 1996 to in the area of finding similar social network users. Since
illustrate how a message spreads through an online the people who have conducted research in this field so far
community rapidly and effortlessly [3]. Marketing have different backgrounds, they have yielded few
agencies are heavily involved in social media data possible solutions distinct by approach. In this chapter we
analysis, and are mostly occupied with collecting large will present three classes of approaches: sociological
amounts of such data, which is more or less publicly approaches in subsection A, approaches based on queries
available. The majority of such agencies base their insights in subsection B, and finding clones in subsection C.
into the state of the market on knowledge gathered from a
variety of social networks. Groups of users which show A. Approaches relying on social relationships
similar aspirations on the network may adopt a particular
product with greater success than a group which is This class is based on social relationships existing
composed of people with completely different viewpoints. among users. In many cases, this information is
instrumental in producing suggestions (e.g. for friendship
Social media companies also have the benefit of relationships or community membership). In particular,
collecting data for themselves. Twitter and Facebook make the approach shown by Spertus, Saham & Buyukkokten in
a large part of their profit through the sale of such data, or 2005 [6] analyzes the affiliation of users to multiple virtual
75
7th International Conference on Information Society and Technology ICIST 2017
communities and tells them if they should join new ones. concern patterns of shared interests. Additionally, holders
For this purpose, their approach considers Orkut, once a of these ideas explore how this correlates with connectivity
big social network operated by Google, as a reference and degrees of separation in a social graph. Since
scenario, and experimentally compares the effectiveness analyzing social media deals with large amounts of data,
of a range of techniques for computing user similarities the number of possible clones in a given dataset is
(e.g. tf-idf coefficients, or parameters coming from 𝑛2
enormous (for n users in dataset there is possible
Information Theory). 2
clones). Computing similarity values for each of these
pairs is practically infeasible. Hence, this approach offers
The AboutMe system by Geyer, Dugan, Millen, Muller a way to first detect candidate pairs for clones, and later
& Freyne from 2008 [7] is able to complete the profile of compute the actual similarity. To generate candidate pairs
a user u by examining the list of topics used by his Min-Hashing and Locality Sensitive Hashing (LSH) may
acquaintances in a social network. Resulting profiles are be employed [11]. This improvement is a valuable
more accurate, and ultimately, are relevant to enhancing contribution of this approach, and can be used in other
user participation in social activities. methods as well.
The approach of Groh & Ehmig from 2007 [8] suggests III. PRELIMINARIES
using the friendship lists to identify resources relevant to
them. In particular, this approach handles the friendship The approach proposed in this paper consists of using n-
list of a user u and the ratings of the users on these lists dimensional vectors to represent Facebook users and
assigned with an object o to predict the rating that u would calculate mutual similarity between each possible pair
assign to o.
using a few simple methods for measuring the distance
between two vectors in n-dimensional space. Since this
The effectiveness of these approaches, however, paper is focused on Facebook, it is necessary to give an
crucially depends on the number of social relationships
created by users. In fact, if a user is involved in few overview of the fundamentals of the way Facebook content
friendship relationships, the information at our disposal is (especially pages) categorization works.
poor, and thus the quality of suggestions will inevitably be A. Facebook pages categorization
poor [9].
B. Approaches based on queries Facebook categorizes its content by employing a wide
range of methods, and the most important method in this
To model similarity, this class of approaches assumes paper is Facebook Page categorization. When a user or
the existence of a set of queries, and two users are deemed organization wants to make a Facebook page, it must
similar if their answers to these queries are (mostly) provide a short description and a couple categories which
identical. Technically, each user has a vector of best suit it. Hinging on entered categories, each page may
preferences (answers to queries), and two users are similar be classified as one or more categories. Each category can
if their preference vectors differ in only a few coordinates. succeed some categories, as it can be placed as the ancestor
The preferences are unknown to the system initially, and
the goal of the algorithm is to classify the users into classes of none or plenty of other categories. Thus, if we pay
of roughly the same preferences by asking each user to attention to Facebook's official page, we will see that it
answer the least possible number of queries. This type of belongs to two categories - Product and Service. Since
method can prove nearly matching lower and upper Facebook users are inclined to “like” pages, Facebook
bounds on the maximum number of queries required to records such behavior. This implies that every page “like”
solve the problem. Specifically, it presents an “anytime” a user has executed is stored somewhere in Facebook's
algorithm that asks each user at most one query in each system, and can be accessed through its services, which are
round, while maintaining a partition of the users. The available to third-party Facebook applications. Using the
quality of the partition improves over time: for n users and appropriate permissions, we are capable of using that data
time T, groups of 𝑂̃ (𝑛/𝑇) users with the same preferences to represent a user. Hence, the set of all known Facebook
will be separated (with high probability) if they differ in user interests in this work consists of all categories
sufficiently many queries. Also, it presents a lower bound
that matches the upper bound, up to a constant factor, for currently present in the system and dynamically increases
nearly all possible distances between user groups [10]. through the registration of new users. When a new user
joins the system, categories of his likes are filtered, and all
C. Finding clones the categories that are not present in system yet will be
added afterwards.
The viewpoint of this approach is that social networks
provide a framework for connecting users, often by B. Modeling user
allowing one to find his preexisting friends. Some social
networks even allow users to find other users based on a All approaches mentioned in the previous chapter can
particular interest or tag. However, users would ideally utilize n-dimensional vectors as a model of a concrete user
like the option of finding people who share many of their on Facebook. N-dimensional vectors are encouraged as the
interests, thus allowing one to find highly similar, like- main data structure through this work, and all algorithms
minded individuals whom they call “clones”. This
will be executed against them. The problem of measuring
approach explores a means for finding “clones” within a
large, sparse social graph, and it presents its findings that the distance between two n-dimensional vectors comes as
a logical continuation. The following subsections deal with
76
7th International Conference on Information Society and Technology ICIST 2017
In accordance with that, we can now also assume that a GAE, Web, Design, UX, Android, 𝐴𝑃𝐼]
user who has “liked” 3 different movies, 2 distinct sport
clubs, and 7 bands, in that case its corresponding vector And that there is one referent vector against which we want
can consist from some other positive integers and zeros. to measure distance (given as referent in the following
One such vector could be: equation) in the coordinate system defined by the
[3,2,0,7,0,0] categories structure:
The next issue is how to calculate the distance between two 𝑟𝑒𝑓𝑒𝑟𝑒𝑛𝑡 = [𝑃𝑢𝑏𝑙𝑖𝑠ℎ𝑖𝑛𝑔, 𝑊𝐸𝐵, 𝐴𝑃𝐼]
points in the described n-dimensional coordinate system,
in which n is representing number of categories currently And vectors A, B, C and D are those whose distance is
known to the system. In order to achieve a measure of going to be measured:
similarity, we can measure the angle formed by two
vectors representing users, or directly calculate the
𝐴 = [𝐴𝑙𝑔𝑜𝑟𝑖𝑡ℎ𝑚𝑠, 𝑃𝑟𝑜𝑔𝑟𝑎𝑚𝑚𝑖𝑛𝑔, 𝑀𝑖𝑛𝑖𝑛𝑔, 𝑃𝑦𝑡ℎ𝑜𝑛, 𝑅𝑢𝑏𝑦]
Euclidean distance between them. In a two-dimensional 𝐵 = [𝑃𝑢𝑏𝑙𝑖𝑠ℎ𝑖𝑛𝑔, 𝑆𝑒𝑟𝑣𝑒𝑟, 𝐶𝑙𝑜𝑢𝑑, 𝐻𝑒𝑟𝑜𝑘𝑢, 𝐽𝑒𝑘𝑦𝑙𝑙, 𝐺𝐴𝐸]
coordinate system, this problem (shown graphically in 𝐶 = [𝑊𝑒𝑏, 𝐷𝑒𝑠𝑖𝑔𝑛, 𝑈𝑋]
Figure 1) is easily recognizable. 𝐷 = [𝑃𝑦𝑡ℎ𝑜𝑛, 𝐴𝑛𝑑𝑟𝑜𝑖𝑑, 𝑀𝑖𝑛𝑖𝑛𝑔, 𝑊𝑒𝑏, 𝐴𝑃𝐼]
77
7th International Conference on Information Society and Technology ICIST 2017
Let's see what evidence induces this kind of behavior. In experience [14]. The definition has been expanded to refer
Figure 2 the referent vector is shown above and vectors D to a social graph of all Internet users. Figure 3 given below
and B are shown below. to illustrate the structure of a social network. [15].
𝐴∙𝐵 ∑𝑛1 𝐴𝑖 𝐵𝑖
𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 = cos(𝜃) = =
‖𝐴‖ ∙ ‖𝐵‖
√(∑𝑛1 𝐴𝑖 2 )√(∑𝑛1 𝐵𝑖 2 )
The value of the similarity in this approach is between -1 Facebook's Graph API is the primary way for
and 1. If the similarity is equal to -1, then the considered applications to read and write to the Facebook social graph
vectors are completely different, and if it is equal to 1, then [4]. The Graph API has multiple versions available, and
they are completely similar. for this research, version 2.2 is used. This API is
F. Pearson Correlation Coefficient Facebook's official mechanism for making it possible to
perform many different queries, such as fetching data
about certain events, apps, groups, pages, users, as well as
Pearson Correlation Coefficient (sometimes referred to as
as user-specific data, such as a user’s timeline, his friends,
as the PPMCC or PCC [12]) when used as a method for
photos, etc. A complete list of root nodes of the Graph API
measuring the similarity of n-dimensional vectors,
version v2.2 is available on
represents a slightly more sophisticated technique which
https://developers.facebook.com/docs/graph-
actually measures the linear correlation of the considered
api/reference/v2.2.
vectors. Actual similarity is calculated by the following
equation:
∑𝑛1 𝑎𝑖 ∑𝑛1 𝑏𝑖 Graph API is secured by an authentication flow based on
∑𝑛1 𝑎𝑖 𝑏𝑖 −
𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 = 𝑛 the OAuth 2.0 protocol exposed through Facebook Login.
(∑𝑛1 𝑎𝑖 )2 (∑𝑛 𝑏 )2 Therefore, every application using the Graph API must
√(∑𝑛1 𝑎𝑖 2 − ) ∙ (∑𝑛1 𝑏𝑖 2 − 1 𝑖 )
𝑛 𝑛
provide a valid access token digested through noted
G. Social graph authentication flow. The received access token is a so-
called short-lived access token, and it expires after
The social graph in the Internet context is a graph that approximately 2 hours. However, this short-lived access
depicts the personal relations of Internet users. In other token can be exchanged with a long-lived access token,
words, it is a social network, with the word “graph” taken which expires after 60 days.
from graph theory to emphasize the rigorous mathematical
mathematic analysis applied (as opposed to the relational I. Collecting the data used in this
representation in a social network). In the words of Mark research
Zuckerberg, the founder of Facebook, the
This work uses User Access Tokens, which means that
social graph has been referred to as “the global mapping of users must give their permission explicitly at the end of the
everybody and how they're related” [13]. The term was Facebook Login process. Generally, at the first page of our
popularized at the Facebook F8 conference on May 24, application, users are faced with the proposed privacy
2007, when it was used to explain how the newly policy, and only if they decide to agree with it and provide
introduced Facebook Platform would take advantage of the us their tokens are we able to obtain information from
relationships between individuals to offer a richer online Graph API. This information consists of a list of the user's
78
7th International Conference on Information Society and Technology ICIST 2017
The system possesses knowledge of some of the Combination of Cosine similarity and Jaccard
categories, and a vector which describes all known index
categories is given below:
All methods described in the previous parts of this paper
𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑖𝑒𝑠 = [𝐴𝑙𝑔𝑜𝑟𝑖𝑡ℎ𝑚𝑠, 𝑃𝑟𝑜𝑔𝑟𝑎𝑚𝑚𝑖𝑛𝑔, 𝑀𝑖𝑛𝑖𝑛𝑔, 𝑃𝑦𝑡ℎ𝑜𝑛, 𝑅𝑢𝑏𝑦] do not care about exactly the same pages which are liked
by both compared users, but only about page categories.
There are also four users currently using the system: This fact was the motivation for the introduction of the
Jaccard index.
- User A, who has liked 4 pages that Facebook
categorized as "Algorithms", 5 pages categorized The Jaccard index, also known as the Jaccard similarity
as "Programming", 6 pages in the "Mining" coefficient (originally coined coefficient de communauté
category, 7 "Python" pages, and 8 "Ruby" pages. by Paul Jaccard), is a statistic used for comparing the
This user is then described with following vector: similarity and diversity of sample sets. It measures
similarity between finite sample sets, and is defined as the
𝐴 = [4,5,6,7,8] size of the intersection divided by the size of the union of
the sample sets as given in equation below:
- User B liked same pages as A, except those which
|𝐴 ∩ 𝐵| |𝐴 ∩ 𝐵|
are categorized as "Mining". B is absolutely not 𝐽(𝐴, 𝐵) = =
interested in that sort of thing, it so is described |𝐴 ∪ 𝐵| |𝐴| + |𝐵| − |𝐴 ∩ 𝐵|
with:
Facebook itself uses algorithms which recommend users
𝐵 = [4,5,0,7,8] to each other if they have liked the same pages. The idea
of combining cosine similarity with the Jaccard index
strives to merge the good parts of Facebook's solution into
- User C has the same interests as B, but C liked
the results of this work
one less page from the "Ruby" category. User C
is described with: V. RESULTS
79
7th International Conference on Information Society and Technology ICIST 2017
80
7th International Conference on Information Society and Technology ICIST 2017
Abstract — Computational Systems, from transactional possibilities of reorganization of the team and analyze
systems, as discussed in this work, provide information situations that the team has deficiencies. Thus, it is
during the departure of any sport. This information will possible to act on deficiencies to improve the technical
help the technician to decide about the possibilities of performance and the team's tactical.
reorganization of the team and analyze situations that the The clear majority of technical systems of scouts are
team has deficiencies. In this way, it is possible to act in the transactional systems that require more than one operator
deficiencies to improve the technical performance and the to capture all information on performance in the departure
team's tactical. The study had as objective to present the [24]. The information can be collected after recording the
development of a digital system for the review of game of game, confirming the performance of the team through a
modality of goalball Paralympic. The information of the
video [7].
scout technician brings the reality of facts and help the
coach to modify the staff of the best way to win the game. In Although many people still consider it a sport where the
some cases, the player who performs a move wonderful, luck or exploiting the chances are the determinants of the
may have committed three or four imperiling errors, an outcome of the games and many coaches to continue using
illusion to believe that the player is well in the block. This conservative methods in their training, this subjectivity,
model of system propitiated an improvement in the little by little, has yielded interpretations scientifically
understanding of the game by players, aiding in the substantiated.
"feedback" between the technical commission and athletes The evolution of goalball is characterized by a high
and helping in the identification of problems that occur with requirement physical, technical, nutritional, and
players during drills and games. The financial statements at psychological, besides the tactical aspect that comes if
run time showed that it is possible to analyze the team constituting a decisive factor for the obtaining of success
during the departure. The basic charts generated offer a of a team [9]. The current goalball, has gained much more
preview of the results of the team, in real time. Another dynamism and requires that the players are in constant
point considered is volume of information acquired during displacements, being with or without the ball. This has
the time of departure, due to ease of data management. We transformed the preparation specific tactic in an area of
conclude that the results obtained indicate that the digital growing interest and allocated to digital systems of
system for analyses goalball game is a means needed for the analyzes of game [24].
analysis technique and tactics, both individually and
collectively, serving as a tool for the preparation of Given this fact, this study approached the development
strategies for drills and games and helping in the decision- of a digital application for review of the game of the
making process. modality of goalball.
81
7th International Conference on Information Society and Technology ICIST 2017
Various types of data, information and knowledge be opponents with body contact with this area. As the area of
influencing factors in decision making [22] Second [21] defense is the main point of reference for spatial
is defined as information "as data that has been converted orientation of the players, there are different markings
into a meaningful and useful context for specific end inside differentiating it from other areas, which only have
users." The right information at the right time can lead to the rectangular external marking [11]. These internal
business executives, technical committees of sports teams appointments to the team area are references to the
and others to make decisions that make them succeed in positions of players: left wing, right wing and center (or
carrying out their activities. pivot).
Quality information it is available timely information The game space is the same size of a volleyball court
for decision and accurately as possible outlet [14] (18 meters long and 9 wide). The goals which are in the
According Balieiro [11] Scout system is a statistical essential point of each side, measuring 9m wide by 1m30
system that monitors a certain team in the fundamentals of tall [9]. Each half of the court is divided into three areas
volleyball or a certain team (usually the opponent) against of 3 x 9m: area of attack or launch, guide area and the
the performed plays (attack position in the network, neutral area.
height, and speed of the ball, among others). Systems Players cannot invade the half-court opponent and
scout assist in replacement decisions players, survey limited by their area of attack, hit the ground ball or
distribution, position, and direction of the serve over a bouncing towards the opponent's goal. The rule does not
match. During the training, which helps to define allow the release of high balls and, therefore, the first
foundations give more focus our attention, what or which contact of the ball with the ground after the release of the
players must accomplish specific activities of breeding players must happen obligatorily within your team attack
grounds, among others. area in the image below we can see the court goalbal with
its respective size and sector of the block Adapted from
B. The Goalball mode Morato 2012 [13].
The goalball and match with two teams with three Therefore, this work aims to adapt this methodology
players each throw balls against each other alternately in scout for goalball and development for digital platforms.
order to score goals in the opponent. Despite the different Based on studies by Morato [2] and Tosim [7] where they
visual classification of athletes, all competing together have the algorithm sequence to the dynamics of goalball
and blindfolded so that no one is disadvantaged [25]. game, in the image below:
82
7th International Conference on Information Society and Technology ICIST 2017
coach to modify the team in the best way to win the observer and use a proprietary system to record what is
game. observed [5].
This monitoring system at the end, you can generate a The specific literature on game analysis suggests that
statistical report performance data for each athlete by the game record is held in game context, during the
fundamentals, which in the hands of an appropriate competitions of excellence, to observe the highest level of
professional can modify the most effective way team, or preparation and sporting provision [27]; [12].
combine best moves.
C. Game Analys Systems II. PROCEDURES
One of the sports that make more use of technology is The organization in modules allowed to be developing
fencing. One of the first innovations was the creation of a all similar functionality together and sorted by optimizing
vest made of metal, which when touched by the sword, the time schedule. In the first iteration are being
which is on the electricity, sends a signal indicating the developed the basic entries of the system: management
touch [17] and as in fencing, taekwondo also usually team, athletes run games management, types of moves
sensors to detect the touch, this sensor is below the management, user management and the main screen of
padded vest, used to lessen the impact. The brutality the system. In the second iteration management logs,
turned into something more technical, with more beauty main system registry, your data is generated from the
[3]. speech recognition, standing out as the most important
In tennis, the technology is already much more feature of the system. In the third iteration, it was
advanced, is called the Falcon's Eye, which has a system expanded demonstrations of the data generated by the
of cameras around the court strategically to capture, from system and were carried out field tests.
all angles, the ball [10]. The images are passed from the IOS operating system was built based on the concept
camera to a central computer and there plays the path of of direct manipulation, using multi-touch gestures.
the ball in 3D, where you can see precisely where the ball Interaction with the OS includes gestures like just touch
fell. This system is triggered only when an athlete feels the screen, slide your finger, and the movement of
harmed by the referee. "tweezers" used to enlarge or reduce the image. Some
Decisions were more fair, accurate and exciting, applications use internal accelerometers to respond to
helped the credibility of the sport and the product [4]. The shaking device (commonly resulting in the undo
simplest systems based on notes made on paper, by a command) or rotating it (commonly resulting in change
"trained analyst" frequency of hits and misses of the from portrait to landscape mode). IOS consists of four
foundations made by each athlete during the match. The abstraction layers: The Core OS layer, the Core Services
operator can watch the match live (in the stadium or on layer, the Media layer, and the Cocoa Touch layer.
television) or may be based on recorded images [27]. In IOS, each application triggers a new process in the
Although laborious and only can handle a small operating system, with the possibility of multiple
amount of information at a time, coaches still use the note application processes to run concurrently. The set of tools
paper [24]. To COSTA [12], when done in a structured that enable the development of applications for iOS
manner, provide relevant information on the performance called iOS SDK (development kit for iOS). According
of athletes and / or team, of course, the paper record can Lecheta [17], this SDK can be used in various
be replaced by electronic means, such as computer, tablet development environments, including Eclipse, which is
and smart phone, facilitating the processing of an integrated development environment (IDE) open
information. source, widely used for application development in Java
A very intense form of being in the world is through programming languages, PHP C / C ++, among other
the possibilities that the vision allows us. Being in the more [12]. The IOS Tols Development (RTD) is a plugin
world through the design and recognition of the world that serves to integrate the emulator to the Eclipse IDE,
around us conflicts with the shapes, colors and primary providing facilities development. RTD maximizes
sensations of being human. This is through the Eclipse features so that you can configure new Apple
relationships and associations that allow us to converge projects, create an application interface, add components,
our sensorimotor actions [19]. debug your applications using the iOS SDK tools.
The analysis of statistical data collected in goalball In order to test the developed application, we used the
(scout), is very important for the coach. From this emulator that allows installation and execution of
analysis, the coach can get interesting information on the applications that could be installed on the development
progress of not only the games but also in training, where computer, for example. [3] The menus used by the mobile
athletes can see the scout their shortcomings and device or tablet work the same way in the emulator,
successes so that they can have a maximum performance which also enables the use of the Internet. This emulator
during matches. However, there are few research projects is contained in the SDK IOS. According stated in Lecheta
involving scout this mode, which does not yet allow a [5], IOS is integrated with the database SQLite, which
clear view on the subject. facilitates the use of the database in the developed
Currently there are two scout’s models used by applications. Thus, it is projected a database to SQLite.
Brazilian teams of this sport, with forms of more To develop this system, we used the ICONIX software
technical collection, which require a certain time for your development methodology, for prototyping and be
notes, has the variabilities error and the results are oriented to be more suitable for low-5-sized systems
presented in later times. (functional and structural), as in this proposal. Designed
Therefore, it is necessary to carefully define the criteria by ICONIX Software Engineering, the ICONIX is a
/ categories and indicators to watch, prepare a simple and practical software development process, but is
knowledgeable and experienced in the analyzed mode a powerful methodology accompanied by a component
83
7th International Conference on Information Society and Technology ICIST 2017
analysis and representation of sound and effective Niderauer [11] data entry is complex for the conditions of
problems. [4] a game goalball, as they are via keyboard and command
The ICONIX process works from the development of line.
use case diagrams based on requirements gathering This scout model provided an improved understanding
according to the interface. The listed functional of the game by players, helping the "feedback" between
requirements were registering player by his name; the coaching staff and athletes and contributing to the
registering games to date information and numbers of identification of problems that occur with players during
players (shirt); registering the foundations made by practices and games.
players in the game, ie the history of the game. Each The run-time demonstrations showed that it is possible
ground follow or follows a sequence of button presses; to analyze the team during the match. The results
generate reports, corresponding reporting relating to generated provide a preview of the team's results in real
players, total withdrawals, attacks or blocks, the amount time. Another point considered is the volume of
of correct answers of each foundation, the total errors in a information acquired during the start of the period due to
game or match. And the non-functional requirements ease of data management.
identified were generating sets of backups; validating
point, which corresponds to a number of reasons over a
departure point is not possible to collect information from IV. CONCLUSION
a given foundation.
With this study, we understand the importance of a
III. RESULTS
digital application for the modality of goalball adapted
Briefly, a goal in a game of goalball, starts with a pitch sport, being possible to highlight the positive and
(where the player positioned in your area orientation, negative points of the athletes in game situation,
makes a pitch to which position of the opponent's court potentiating its performance in sports and aiding in
and what the outcome of the plea). Then, the blocking decision-making on the part of the athletes and the
movement arises (who performed and what its effect). technical commission.
That done, you can repeat these actions occur to the goal, The graphs generated offered a preview of the results
which is precisely when the hit occurs in offensive action of the team in real time and the volume of information
and the failure to defensive action [16]. The figure and acquired during the making of the game space, was very
table below shows the sequence to yield Government significant because of the ease of entry and of data
Analyst used by the study team: management.
Thus, we conclude that the results obtained indicate
that the digital system for anallise goalball game is a
means needed for the analysis technique and tactics, both
ATLHETS
Critérios
POSSIBILITY
individually and collectively, serving as a tool for the
de analise
preparation of strategies for drills and games and helping
ATTACK in the decision-making process.
BA BC BF GF ACKNOWLEDGMENT
The authors thank the Association of the Visually
TEAM RESULT Impaired People of Paraná (AEDV-PR) and its
participants for their prompt service and for sharing
valuable information, and the Coordination for the
Improvement of Higher Education Personnel (CAPES)
BA BC BF GF and the Pontifical Catholic University of Paraná (
PUCPR) for financial support.
REFERENCES
POSSIBILITIY
DEFENSE [1] J. A. GAVIÃO et al. (Orgs.). Goalball: invertendo o jogo da
ATLHETS
inclusão. Campinas: Autores Associados, 2009, pp.1.
[2] M. MORATO. O ensino da educação física para deficientes
visuais. Rev. Bras. Cienc. Esporte. V. 25, n. 3, pp. 117-131, 2004.
[3] MORATO, M. Analise de Jogo de Goalball: modelação e
Figure 3. Sequence for income analysis Interpretação dos padrões de jogo da Paralímpiadas de Pequim.
Tese de Doutorado em Atividades Motoras Adaptadas da
Universidade de campinas, 2012.
Based on the raised requirements, an application was [4] M.L.M. OKUMURA. A engenharia simultânea aplicada no
developed with the distinction of being mobile, in which desenvolvimento de produtos inclusivos: uma proposta de
it considered the versatility and usability of the system as framework conceitual. Dissertação de mestrado em Engenharia de
the buttons and other input mechanisms, thus improving Produção e Sistemas da Pontifícia Universidade Católica do
Paraná, 2012.
the fundamentals record time along a move.
[5] T D. PORTA, A.O. ZAMBERLAN,; R. PEROZZO: Scout Vôlei
Thus, the work has the positive factors mobility and de area para dispositions móveis. Novo Hamburgo, RS: 2010.
usability. Once the data collection can be performed in Trabalho de Conclusão de Curso (Bacharelado em Sistemas de
the stands through easily accessible buttons on a touch Informação) – Instituto de Ciências Exatas e Tecnológicas,
screen, as in the work developed by Raimann [9] and Feevale, 2010.
84
7th International Conference on Information Society and Technology ICIST 2017
[6] J.R. THOMAS; J.K. NELSON. Métodos de pesquisa em atividade [18] L. O. PEREIRA; SILVA, Michel Lourenço. IOS Para
física. 3 ed. Porto Alegre, Artmed Editora, 2002, pp 280. Desenvolvedores. Rio de Janeiro: Brasport, 2009.
[7] A. TOSIM. Sistemas técnicos e táticos no goalball. Revista [19] L. H. RAIMANN. Scout: Sistema de monitoracão em equipes de
Mackenzie de Educação Física e Esporte. São Paulo, v.7, n.2, voleibol. Novo Hamburgo, RS: 2008. Trabalho de Conclusão de
pp.141-148, 2012. Curso (Bacharelado em Sistemas de Informacão) – Instituto de
[8] F. XAVIER. O processamento da informação nos jogos Ciencias Exatas e Tecnológicas, Feevale, 2008.
desportivos. 2ª ed. Porto: Universidade do Porto, 2012. [20] D. SHONDELL; C. REYNAUD. The Volleyball Coaching Bible.
[9] A.O. ZAMBERLAN, et al.: A IA entrando na quadra de vôlei: Champaign: Human Kinetics, 2002.
Scout Inteligente. Rio Grande do Sul, 2006. [21] E. TURBAN. Tecnologia da Informação para Gestão. 3º ed. Porto
[10] A. IOS. IOS. Accessed:19/01/2016. [Online]. Disponível em: Alegre: Ed. Bookman, 2004.
http://www.apple.com. [22] MOSQUERA, C. Educação Física para deficientes visuais. 1 ed.
[11] S. BALIEIRO. Jogada de alta tecnologia. INF: tecnologia da Rio de Janeiro:Sprint, 2000.
informação, n 224, novembro 2004. [23] D. F. NASCIMENTO; M. P. MORATO: Goalball: manual de
[12] E. BUTZEN. Proposta de um modulo data mining para sistema de orientação para professores de educação física. Brasília: Comitê
scout no voleibol. Novo Hamburgo, RS: 2009. Paraolímpico Brasileiro, 2006.
[13] P. R. FOINA.. Tecnologia da Informação: planejamento e gestão. [24] S. M. RIBEIRO; P. F. ARAÚJO: Formação acadêmica refletindo
São Paulo: Ed. Atlas, 2001. na expansão do desporto adaptado: uma abordagem brasileira.
[14] I.. ICONIX. Software Engineering. Accessed:12/02/2016. Rev. Bras. Cienc. Esporte. V. 25, n. 3, p. 57-69, 2004.
[Online]. http://www.iconixsw.com. [25] S. G. SANTOS. Métodos e técnicas de pesquisa quantitativa
aplicada a Educação Física. Florianópolis: Tribo da Ilha, 2011.
[15] S. KUS. Coaching Volleyball Successfully. Champaign: Human
Kinetics, 2004. [26] ADABAUGH, M. P. NIDRR's Long Range Plan - Technology for
[16] R. R. LECHETA Google Android: Aprenda a criar aplicações para Access and Function Research Section Two: NIDDR Research
dispositivos móveis com o Android SDK. São Paulo: Novatec, Agenda Chapter 5: TECHNOLOGY FOR ACCESS AND
2009. FUNCTION.
[17] D. S. NIEDERAUER. Gramática de comandos scout no voleibol. [28] A. ZAMBERLAM. Em direção a uma técnica para programação
Novo Hamburgo, RS: 2009. Trabalho de Conclusão de Curso orientada a agentes BDI [Dissertação de Ciência da Computação].
(Bacharelado em Sistemas de Informacão) – Instituto de Ciencias Pontifícia Universidade Católica do Rio Grande do Sul. Porto
Exatas e Tecnológicas, Feevale, 2009. O'BRIEN, James A. Alegre, 69 fl.(2002).
Sistemas de Informação e as decisões gerenciais na era da Internet.
2º Ed. São Paulo: Ed. Saraiva, 2004.
85
7th International Conference on Information Society and Technology ICIST 2017
Abstract— Although short texts, tweets and other types of is a classification of text documents into two or more
comments that can be found on social networks, carry classes. The simpler task of sentiment analysis may
significant information about the person who wrote them require the determination of whether the text has a
and the object to which they refer when they are positive or negative sentiment. The more complex task of
cumulatively observed. The interest for machine analysis of
sentiment analysis may require numerical quantification
the feelings that are expressed by some text grows, but
algorithms that perfectly determine emotion that text of sentiment text using scales, for example, from 1 to 10.
carries are still not found. Problems that occur in the Text classification and sentiment analysis as a special
machine analysis of the text are different, starting with the case of text classification are problems to be solved with
complexity of the language in which the text is written to the the help of artificial intelligence or with the help of
complexity of the feelings expressed by it. This paper supervised machine learning algorithm.
provides an overview of some of the existing solutions for Since it is a language processing, it is necessary to do
sentiment analysis and gives an overview of our solution. some preprocessing to help in getting better results even
Sentiment analysis will be shown in the case of tweets on small training data sets. One such process is to prepare
written in the Serbian language. Although the method is still
the text in a single format (for the Serbian language select
in the development phase, the resulting accuracy of 82% is
encouraging at this stage of the method development. one alphabet, for example, Latin alphabet), to process
special characters, to remove redundant words (stop
I. INTRODUCTION words), to tokenize text, etc.
For the Serbian language, only a few solutions for
The increasing use of social networks offers the
sentiment analysis were developed [5-9]. Some of
opportunity to explore a variety of social issues by
problems of sentiment analysis of texts in the Serbian
analyzing the comments that people are posting. The
language are:
problem of sentiment analysis and determination of
The lack of labeled data;
comments sentiment is not new, but with the appearance
of machine processing of natural language and machine Lack of vocabulary of terms with positive and
learning algorithms for classification, greater progress in negative sentiment on the Serbian language;
this area happened. Most developed tools for sentiment Non-uniform normalization (stemming,
analysis are for texts in English [1-4]. To determine the lemmatization) because of complexity of the
sentiment of a text is not easy, there are problems which Serbian language;
are difficult to resolve, such as, for example, irony and Various pronunciation (ekavica, ijekavica) and
sarcasm. The complexity of the grammar of the Serbian dialects;
language makes the analysis of texts in the Serbian Characters with diacritical marks (č,ć,š,...)
language even more difficult. It is necessary to create an Two alphabets (need of translation into one,
algorithm that manages to classify text written in the Latin or Cyrillic);
Serbian language as positive or negative, without the use Informal writing on social networks.
of huge and abundant lexical resources.
Sentiment analysis belongs to the field of natural The process of text classification based on sentiments and
language processing which is on the rise, unlike areas altogether with the preprocessing process can still be
such as spam detection, detection and labeling type, upgraded by seeking solutions to problems such as
gender and number of words in a sentence, which are normalization, keywords extraction, symbols, negation,
mostly areas where they obtained pretty good results. sarcasm, etc. Previous solutions of sentiment analysis for
Sentiment analysis is one of the sub-areas of natural the Serbian language are described in the sequel. Section
language processing, i.e. NLP (Natural Language 3 describes the methodology of work. The normalization
Processing). Most generally speaking, sentiment analysis and feature extraction are presented in Section 4 and
86
7th International Conference on Information Society and Technology ICIST 2017
Section 5. Applied machine learning methods and results tweets on the Serbian language related to one area (for
are discussed in Section 6, while Section 7 contains our example, for different media in Serbia).
conclusions and some directions of future work. Collecting tweets on the Serbian language is not an easy
task. All available services do not offer enough options
II. PREVIOUS SOLUTIONS FOR THE SERBIAN that would in any way limit the collection of tweets
LANGUAGE written only in the Serbian language. Although there is
the possibility of specifying the language for Serbian,
As previously stated, there are several solutions which Cyrillic is the default, so specifying the language leaves
attempt to classify the text in the Serbian language, based out the collection of tweets that are written in Latin letters
on the sentiment that text carries. They differ in the (larger than that of the Cyrillic alphabet).
domain to which had applied to, the method of text Another problem with the data collection is the unequal
normalization used, and the methods of the classifications ratio between positive and negative tweets. Tweets in the
used. field of printed media, for which we have been doing the
Milošević in [5] described implementation of sentiment collection of data, contains tweets that carry more
analyzer for Serbian language using Naive Bayes negative than positive sentiment. Therefore, it is
algorithm of machine learning. As a part of sentiment necessary to extend the model with positive tweets in
analyzer is described and developed hybrid stemmer for order to design a relevant training set.
Serbian language that works on principles of suffix Collecting data for sentiment analysis in resource-limited
stripping and dictionary. Analyzer of sentiment does languages carries a significant risk of sample selection
binary classification of text into two classes - positive and bias, since the small quantities of available data are most
negative. He processed negation by adding prefix to likely not representative of the whole population.[8]
words near negation. The collected text data are input in the process of
Batanović et al. presented in [8] a dataset balancing sentiment analysis that takes place in several phases,
algorithm that minimizes the sample selection bias by depending on a specific algorithm.
eliminating irrelevant systematic differences between the Our approach applies machine learning algorithms in
sentiment classes. They use it to create the Serbian movie order to train a polarity classifier using a labeled corpus.
review dataset – SerbMR – the first balanced and One of famous approach is Pang and Lee’s algorithm for
topically uniform sentiment analysis dataset in Serbian. sentiment analysis [1] which contains the following steps:
They proposed an incremental way of finding the optimal 1. Pre-processing;
combination of simple text processing options and 2. Separation of characteristics (attributes) that can be
machine learning features for sentiment classification. numeric or text;
Mladenović’s doctoral thesis [7], whose main task is the
3. Classification using the appropriate algorithm for
analysis of emotions in text, presents research related to
the sentiment classification of texts in the Serbian the classification (Naive Bayes, Maxent, SVM, ...),
language, using a probabilistic method of machine using previously allocated attributes.
learning of multinomial logistic regression i.e. maximum
entropy method. She developed a system for sentiment
analysis of Serbian language texts, with the help of digital
resources such as: semantic networks, specialized
lexicons and domain ontologies.
Doctoral dissertation of Grljević [9] provides a
comprehensive approach to modeling and automation of
analysis of sentiments contained in student reviews of
teaching staff available on social media and social
networking sites. Reviews of professors from various
Figure 1. The algorithm for sentimental analysis which is applied to the
institutions of higher education in Serbia, have been set of tweets.
collected from the site “Oceni profesora”. The prepared
and annotated corpus was used in sentiment analysis for Our work was organized in next six steps (Fig. 1):
training the supervised algorithms (Naive Bayas, SVM, 1. Collecting data using Twitter API;
K-Nearest Neighbor). Five dictionaries were developed 2. Labeling data manually;
based on the annotation of sentiment words, negation 3. Normalization;
keywords, and observations derived during the annotation 4. Tokenization;
process, which were consequently used in sentiment 5. Stemming;
analysis based on lexicons. 6. Splitting data for training and test sets;
7. Feature extraction;
III. DATA SET AND METHODOLOGY 8. Applying method for training and test.
The first step in the sentiment analysis is the collection of Our initial data set from the Twitter have no assigned
resources (texts) to be analyzed. In our case, these are the sentiment so that one of the first steps must be their
87
7th International Conference on Information Society and Technology ICIST 2017
labeling. The dataset is divided in 2:1 ratio, the data for Feature extraction was based on the lexical resources and
training and testing, respectively. allocation of key terms.
Sentiment words express the desired and undesired
situations and can be divided into positive and negative
IV. NORMALIZATION sentiment terms. We constructed lexicons of positive and
negative sentiment term, and they are general purpose.
As a part of the pre-processing step in our algorithm, all Based on the existing corpus, we have also created
signs of excessive punctuation, links, tags that do not general sentiment term lexicon that are specific to the
affect the content of the message have been removed. domain that is the subject of analysis. As these general
Tokenization has been performed based on the rules that lexicons don’t cover the entire set of words, lexicon
apply to writing on twitter. For example, tagging of posts which represent the adjustment of the general sentiment
is followed by a hashtag (#), the @ sign have used to lexicon to the corpus, have also been extracted. So we
present usernames in tweets. In tokenization process, obtained two types of sentiment lexicons:
before applying sentiment analysis, special attention General sentiment lexicon of positive and negative
should be paid to emoticons and their way of writing. terms, manually created
Special regular expressions that include known emoticons Model generated sentiment lexicon.
have been used. Correct processing of emoticons could be Every lexicon has its own advantages and disadvantages:
important for labeling the starting data. General lexicon advantage: large word set.
Another aggravating circumstance for the text written in General lexicon disadvantage: ambiguity of word
the Serbian language is the use of two alphabets. meaning.
Therefore, all data is transferred in one (Latin). A Model generated sentiment lexicon advantages:
collection of stop words that do not contain a sentiment closely related to the specific field.
and have high occurrence are removed from the data. Model generated sentiment lexicon disadvantages:
Normalization of documents in the Serbian language can limited with size and quality of the model.
be done by lemmatization or stemming. It's hard to make Negation was treated by isolating a set of words which
a perfect lemmatizer so that we have to be satisfied by form negation in the Serbian language and using this set
using stemmers [10]. Disadvantages of stemming, caused for creating special rules for handling negation.
by removing only the suffix (not the prefix) can be solved
using n-gram analysis [11]. Normalization and VI. MACHINE LEARNING METHOD
tokenization in our work have been done in the following
steps: Although there are a number of very complex algorithms
Removing retweets of supervised machine learning, the best results in the
Tokenization sentiment analysis and the general classification of
o Dates handling human language and the speech were obtained with one
o Currency expression handling of the simplest algorithm – Naive Bayes. Our data have
o Numbers in specific format handling been trained and tested using Simple method, Naïve
Bayes and SVM, as shown in Table I.
o URLs handling
o Hash tags handling
o Emoticon handling TABLE I. RESULTS OF APPLYING DIFFERENT MACHINE LEARNING
Translate to Latin letter
METHODS
88
7th International Conference on Information Society and Technology ICIST 2017
Data set consists of 3508 tweets for training and 1726 [4] M. Silberztein, Nooj, http://www.nooj4nlp.net, 2002.
tweets for test. [5] N. Milošević, “Mašinska analiza sentimenta rečenica na srpskom
jeziku”, Master's Degree Thesis, University of Belgrade,
Training feature we used are the following: Belgrade, Serbia, 2012.
number of positive terms from lexicon [6] M. Mladenović, J. Mitrović, C. Krstev, D. Vitas, “ Hybrid
number of negative terms from lexicon SentimentAnalysis Framework For A Morphologically Rich
Language”, Journal of Intelligent Information Systems, vol. 46:3,
number of positive terms appearing only in positive 2016, pp 599–620.
tweets and not in negative tweets [7] M. Mladenović, Information Models in Sentiment Analysis Based
on Linguistic Resources, Doctoral Dissertation, University of
number of negative terms appearing only in negative Belgrade, Belgrade, Serbia, 2016.
tweets and not in positive tweets [8] V. Batanović, B. Nikolić, M. Milosavljević, “Reliable Baselines
number of terms in tweet. for Sentiment Analysis in Resource-Limited Languages: The
Serbian Movie Review Dataset”, Proceedings of the 10th
International Conference on Language Resources and Evaluation
Simple method is simply counting features: terms from (LREC 2016), pp. 2688-2696, Portorož, Slovenia.
positive/negative lexicon and terms appearing only in [9] O. Grljević, Sentiment in Social Networks as Means of Business
positive/negative tweets. Naive Bayes and SVM are using Improvement of Higher Education Institutions, Doctoral
Dissertation, University of Novi Sad, Novi Sad, Serbia, 2016.
the same features as Simple method and additionally the
[10] Nikola Milošević, Stemmer for Serbian language,
number of terms in tweet. http://arxiv.org/abs/1209.4471, arXiv preprint arXiv:1209.4471,
2012.
The results are still not satisfactory and mostly because of [11] U. Marovac, A. Pljasković, A. Crnišanin, E. Kajan, „N-gram
the small set of data. Cross-validation indicates that this analiza tekstualnih dokumenata na srpskom jeziku“,,Proceedings
of TELFOR 2012, 2012, pp. 1385-1388.
method can get a good result but we still have to work on
[12] W. Medhat, A. Hassan, H. Korashy , „Sentiment analysis
attributes for new features. The problem is irony, algorithm and applications: A survey“, Ain Shams Engineering
incomplete tweets, informal communication on Twitter, Journal, 72(2), 2014, pp. 305-328.
few tweets with positive sentiment.
VII. CONCLUSION
Presented algorithm is adapted to the structure of the text
as a tweet, and to the topic on which tweets was referring.
The accuracy of classification was increased with the
deeper observation of the individual areas and aspects of
the text which is analyzed, as well as the creating of
specific vocabulary and rules for each area. The method
should be improved with:
Dataset with better positive-negative ratio;
Handling negation and irony;
Normalization with n-grams.
Our method is at the beginning of development. A
small number of features used for training. Model still
gives results that are encouraging for this phase of
research. The contribution of this work is mostly in
constructed set of lexical resources which are general
purpose and don’t depend on the data source.
ACKNOWLEDGMENT
This work was partially funded by the Ministry of
Education and Science of the Republic of Serbia after the
project III-44 007.
REFERENCES
[1] B. Pang, L. Lee, H. Rd, and S. Jose, “Thumbs up ? Sentiment
Classification using Machine Learning Techniques”, EMNLP ’02
Proceedings of the ACL-02 conference on Empirical methods in
natural language processing, vol. 10., 1988, pp. 79–86.
[2] C. D. Manning, J. Bauer, J. Finkel, and S. J. Bethard, “The
Stanford CoreNLP Natural Language Processing Toolkit”,
Association for Computational Linguistics, 2014, pp. 55–60.
[3] S. Paumier, Unitex, e Institut Gaspard-Monge (IGM), University
Paris-Est Marne-la-Vallée, Paris, 2002.
89
7th International Conference on Information Society and Technology ICIST 2017
90
7th International Conference on Information Society and Technology ICIST 2017
91
7th International Conference on Information Society and Technology ICIST 2017
use (“pay as you go”). Downside of the cloud based cloud platform is used, the ability to shut down unused
solution is the uncertainty of cost, as cloud providers can machines, some cost reduction is noticed, i.e. some
modify pricing model at any time. Also, upside might be machines can be shut down during the night. New costs
the possibility of discounts for Smart Grid companies. for a small, medium and large utilities are USD 36,500,
70,000 and 61,500, respectively. These prices are for
IV. RESULTS Azure platform. In case of cost reduction for Amazon
Using estimated CPU and memory, we have come to Web Services, it has come to next calculations: USD
conclusion which machines can be used, to run ADMS for 34,500, 66,500 and 69,000 for a small, medium and large
different sized utilities. Industry experts estimate that utility.
calculated upfront investment costs for supporting data
centers goes from 1 million dollars for small, 1.4 million V. DISCUSSION
dollars for medium and up to 2 million dollars for big Even though the prices for on premise solution seems
utilities. The assumptions are (for the software quality smaller, there are hidden costs which are not included in
sake) that the hardware should be replaced every 5-7 the initial price. One of the advantages of cloud based
solution is elimination of system administrator, since the
TABLE II. maintenance is shifted to the cloud provider, which
MONTHLY COST COMPARISON provides 24/7 support to the utility. Another advantage of
cloud solution is loss of licenses, because cloud virtual
On-premise [$] Azure [$] Amazon [$]
machines have licenses included, which represents the
Small 17,000 36,500 34,500 cost of 15% of on premise solution. In this paper, we are
Medium 24,000 70,000 66,500
not calculating the cost of databases needed for running
the ADMS software, as this can be used for further
Large 34,000 61,500 69,000 research. Also, security and optimization of different
cloud solutions are going to be considered in further
research.
years, which entails repetition of these costs. If we use 5
years as a period for hardware replacement, we can scale REFERENCES
this costs for monthly basis and get estimated numbers: [1] Tsai, Wei-Tek, Xin Sun, and Janaka Balasooriya. "Service-
17.000 dollars, 24.000 dollars and 34.000 dollars for a oriented cloud computing architecture." Information Technology:
New Generations (ITNG), 2010 Seventh International Conference
small, medium and large utility, respectively. This on. IEEE, 2010.
estimated costs include licenses, which are 15% of the [2] Peter Mell and Tim Grance. “The NIST Definition of Cloud
whole cost. As this is on premise solution, hardware is Computing.” National Institute of Standards and Technology,
bought at the very beginning of deployment process, Information Technology Laboratory, March 2017.
which means that hardware maintenance, electricity, http://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication8
facility renting and fees for various employees remain as 00-145.pdf
monthly expenses. In Table 2, monthly cost comparison [3] Dillon, Tharam, Chen Wu, and Elizabeth Chang. "Cloud
for on premise and cloud solutions is provided. computing: issues and challenges." Advanced Information
Networking and Applications (AINA), 2010 24th IEEE
In case of cloud solution, estimated expenses are International Conference on. Ieee, 2010.
different, i.e. there is no upfront investment and [4] Zhao, Gansen, et al. "Deployment models: Towards eliminating
maintenance costs are distributed differently. Using the security concerns from cloud computing." High Performance
same estimated values for CPU, memory and IOPS for the Computing and Simulation (HPCS), 2010 International
machines which were used in on premise solution, we Conference on. IEEE, 2010.
have calculated monthly expenses of the system. In case [5] Pahl, Claus, and Huanhuan Xiong. "Migration to PaaS clouds-
of a small utility, monthly cost is around USD 41,000, if migration process and architectural concerns." Maintenance and
Evolution of Service-Oriented and Cloud-Based Systems
the system is deployed on a cloud platform, specifically (MESOCA), 2013 IEEE 7th International Symposium on the.
on the Microsoft Azure cloud platform. The same IEEE, 2013.
calculation was done for a medium and a large utility, [6] Hajjat, Mohammad, et al. "Cloudward bound: planning for
which gives the expected costs of USD 79,000 and beneficial migration of enterprise applications to the cloud." ACM
86,000, respectively. For comparison, the same SIGCOMM Computer Communication Review. Vol. 40. No. 4.
calculation was performed on another cloud platform, ACM, 2010.
Amazon Web Services. For a small, medium and large [7] Khajeh-Hosseini, Ali, David Greenwood, and Ian Sommerville.
utility, calculated expenses are USD 38,000, 75,500 and "Cloud migration: A case study of migrating an enterprise it
system to iaas." Cloud Computing (CLOUD), 2010 IEEE 3rd
97,000, respectively. If one of the main advantages of International Conference on. IEEE, 2010.
92
7th International Conference on Information Society and Technology ICIST 2017
93
7th International Conference on Information Society and Technology ICIST 2017
By searching papers related to this topic, were able to missing tourist target groups which means that they are
notice the fact that most of work is derived from national broadcasting the same advertisement to all subscribed
urban projects, related to the concept of the smart city such users. Studies have shown that such way of advertising has
as Chinese [8] and Spanish [9] national projects. There is a proved to be worse in practice [14].
lack of research that analyzes the smart tourism concept One way communication flow disallows tourists to be
from a technical point of view. Research [10] that came out engaged in communication leaving stakeholders without
of Chinese initiative, was good basis for this research. any feedback. They usually gather feedback information by
Little research appears to be done on the detailed analysis
of the two-way stakeholder - tourist information exchange,
and the implementation of a system which supports such
communication.
There are many applications being developed around the
world related to smart tourism concept. For example,
Belgium is developing a system for tagging places, objects
and other “taggable” units named “TagTagCity” using Figure 1. Extending traditional communication model
above mentioned NFC and QR technologies
(https://www.tagtagcity.com/) [11]. The city of hiring marketing research agencies which conduct the
Amsterdam uses beacons to let tourist signs translate surveys. Such way of gathering feedback is expensive and
it takes time for agencies to conduct the surveys. If the
services that stakeholders are offering require dynamic and
fast adapting business models to consumers, they quite rely
on survey agencies and it can provide huge expenses and
time waste.
Real-time feedback system would be important feature
which is able to provide stakeholders with useful data
related to their offering services in order to improve their
STEs [4].
III. COMMUNICATION FLOW ANALYSIS One of the most important advantages of personalization
as experienced by tourists is an increased comfort level in
The traditional communication model is represented as both emotional and physical, such as getting things just the
one-way communication, where all tourists are given the
way they like and the feel of being looked after and be well
same broadcasting information - advertisement.
informed based on their preferences [16]. Understanding
Analyzing the mass public advertising communication the needs, wishes and desires of travelers becomes
flow allows us to notice some flaws. Stakeholders are
94
7th International Conference on Information Society and Technology ICIST 2017
increasingly critical for the competitiveness of destinations additional human supervision for situations where
[17]. knowledge cannot be extracted, for example: invalid
semantics, grammatic errors, etc.
Second step, would represent the introduction of
feedback system communication flow (see Figure 1). In V. SYSTEM ARCHITECTURE THAT SUPPORTS
the traditional way of advertising, stakeholders are mostly COMMUNICATION FLOW ENHANCEMENT
left without feedback. Traditional one-way Broadcasting advertisements that correlate with tourist
communication flow is now upgraded to two-way interests enhances tourist’s experience [3]. Increasing
communication flow in which tourist is now able to leave tourist experience stimulates tourist to fulfil his own
feedback related to stakeholder’s advertisements. Such interests for related destination. Such an action can exceed
feedback can be very useful for stakeholder in order to their prior expectations. Aforementioned indicates the
improve STEs. Benefits from introducing feedback system stimulation of tourist consumption that after all increases
industry level service.
would affect enhancement of existing business models
through: Graphic representation of system architecture is shown
on Figure 3, describing main entities and communication
• adapting: modifying behavior to fit the flows between them. Hereinafter will be described main
environment system architecture entities respecting the order in which
• inferring: drawing conclusions from rules and they communicate in the real system.
observations
• classifying: making classifications based on A. Back representation server (BRS)
observations First of all, it is important that stakeholder provide data
• predicting: interpolating and extrapolating to thrusted third party offering their services. Such an
from data action is available through “Stakeholders portal Web
• learning: using experience to improve application” (see Figure 3), allowing them to register
performance themselves, and provide information about services that
• self-organizing: develop and increment their they are offering and receive feedback provided by tourist
business models to a new level as well. Such application could be used from platforms that
techniques for improvement [18]. do not necessarily need to support user mobility.
Some of listed methods can be automated using artificial Stakeholder’s web portal is hosted on BRS whose main
intelligence (AI) techniques. In Data mining sphere, role is to gather information from stakeholders and
sentiment analysis (SA), can be useful in language text forwards them to TTP.
information processing to systematically identify, extract,
B. Front representation server (FRS)
quantify, and study affective states and subjective
information. By having knowledge extracted from tourist’s Then using FRS through TA web application which
feedback, it becomes much simpler to reorganize and adapt has role to allow tourists to create their own personal
business models to the mood and requests based on their profiles, by entering their personal data via devices that
feedback. support mobility. Most of devices that support mobility has
Feedback system can be abused/misused by leaving integrated GPS feature that allows us to track user
biased feedback or flooding with spam data. Which geographical position which is important attribute for
implicates that it can’t be fully automated, and it requires information personalization used by recommender system.
95
7th International Conference on Information Society and Technology ICIST 2017
96
7th International Conference on Information Society and Technology ICIST 2017
[8] L. Ago, T. Tanel and H. Priit, Smart City: A Rule-based Tourist Macao and Hong Kong," [Online]. Available:
Recommendation, Eliko Competence Center, Tallinn, Estonia : http://en.cnta.gov.cn/focus/travelnews/201507/t20150707_7212
Springer, Vienna, 2011. 36.shtml. [Accessed 4 2017].
[9] M. Anderson, Advertising Feedback in Messaging Systems, [17] Spanish Ministry of Industry, "Innovacion y Turismo.
Milpitas, CA, (US): United States Patent Application Publication, Tecnologías y Turismo. Segittur," [Online]. Available:
2011. http://www.segittur.es/en/inicio/. [Accessed 4 2017].
[10] K. Alfred and T. Maximilian, Contextualized Communication of [18] G. Yang, L. Hongbo and C. Yi, THE EMBEDDING
Privacy Practices and Personalization Benefits: Impacts on CONVERGENCE OF SMART CITIES AND TOURISM
Users’ Data Sharing and Purchase Behavior, Springer.de, 2004. INTERNET OF THINGS IN CHINA : AN ADVANCE
[11] I. H. Group, Creating ‘moments of trust’ The key to building PERSPECTIVE, China: Advances in Hospitality and Tourism
successful brand relationships in the Kinship Economy, Research, 2014.
InterContinental Hotel Group, 2014. [19] S. Marianna, X. Zheng, K. Chulmo and G. Ulrike, Smart
[12] C. Donald, J. Robert, S. Michael, H. Abbie and E. Darcy, Tourism: Foundations and Developments, Electron Markets,
Teaching Strategies: A Guide to Effective Instruction, Cengage 2015.
Learning, 2016. [20] B. Dimitrios and A. Aditya, Smart Tourism Destinations:
[13] European Union Open Data Portal, European Union Open Data Enhancing Tourism Experience Through Personalisation of
Portal, [Online]. Available: Services, Springer International Publishing Switzerland, 2015.
https://data.europa.eu/euodp/en/data/. [Accessed 4 2017].
[14] Eventya Co, " Urby - stuff to do & places to go in your city,"
Eventya Co, [Online]. Available: http://www.urbyapp.com/.
[Accessed 4 2017].
[15] B. Dimitrios and A. Aditya, Smart Tourism Destinations,
Springer International Publishing Switzerland, 2013.
[16] Chinese government, "Beautiful China—2014 Year of Smart
Travel” Theme Publicity and Promotion Activities Organized in
97
7th International Conference on Information Society and Technology ICIST 2017
Abstract— Business process management (BPM) systems are processes is promising faster reaction to events received
becoming increasingly data-driven. At the same time IoT from the IoT devices.
(Internet-of-Things) sensing devices are generating large
quantities of data. Some of this data represents information II THE HEALTH OF THE SENSOR
about critical events. These events may be used to trigger and Substantial number of IoT devices are still different
drive some automated BPM response procedures to minimize
types of Internet connected sensor devices, often situated in
negative effect of critical event or to repair potential damage
remote places. Those sensors provide valuable data that are
presented by those events. In this paper, we are proposing
pairing of BPM driven application and IoT for sensor error processed for further value added services. Their
discovery and recovery automation. malfunction, especially when not noticed and corrected in
reasonable time, can lead to unreliable or incomplete data
I INTRODUCTION series, deficient data analysis, difficulties in making
business decisions and other negative effects for systems
The onset of the age of Internet of Things (IoT) has relying on analysis of those data.
brought unprecedented possibilities regarding data
collection and analysis, and promises ability to Therefore, IoT sensors, regarded as critical devices to a
significantly improve critical functions in many different specific business case, require good maintenance and
activities. The possibilities to use new devices connected to prompt resolution of potential problems. A trend has been
the Internet (ranging from simple sensors to smart devices developing [3,4] to merge automated BPM systems with
or complex devices capable not only of sensing but also of IoT based systems to provide new ways of using data
actuating certain work), and to use data gathered from these generated by IoT sensor devices, or to bring faster response
devices and interacting with them is bringing ever larger to detected erroneous situations. If IoT devices are used as
impact on economy. This change has lately been denoted integral part of any emergency management system [1,2]
as 4th industrial revolution. New applications, taking the health of the IoT sensor device and its availability must
advantage of vast amount of data created by IoT devices are be monitored.
created at increasingly fast rate. The usage scenarios of IoT In this paper, we concentrate on using data that specific
may range from simple applications aimed for personal use sensors are providing to determine the health of the sensor
(tracking fitness activities, or monitoring vital signs of device itself, and to trigger appropriate maintenance
chronically ill patients), applications aimed for improving procedures to prevent their failure, or to minimize the
day-to-day activities of larger population (like applications downtime of such a device.
using sensor data to manage public transport congestions),
to applications that may have wider scope of use.
III A CASE STUDY – ENVIRONMENTAL SENSORS FOR
Similarly, business process management (BPM) oriented
AGRICULTURAL USE
applications have also gained strong acceptance in a variety
of business solutions in recent years. BPM applications The WAHASTRAT project [5] has developed a network
were initially focused on streamlining traditional of 8 automated monitoring stations (designated as WH1 –
interactive business processes by defining clear and WH8) to provide information about environment variables:
understandable process models and orchestrating activities soil moisture, speed and wind direction, air temperature, air
to improve the process efficiency. However, automated humidity and atmospheric precipitation. The data from
business processes, containing primarily automatically these monitoring stations are collected and analyzed to
performed tasks, have become important tool for achieving provide information specifically about water shortage and
various integration scenarios between different software drought conditions. Currently a new, cross border project is
systems. Optimizing emergency management by under consideration - to increase the number of sensors
formalizing emergency response procedures by BPMN, (monitoring stations) and to provide data exchange between
and deploying them to appropriate systems has been in countries in order to get better coverage and overall
focus of several research papers such as [1,2]. understanding of weather patterns in the region and their
Since business process models may be designed to allow impact on agriculture.
a process instance to be invoked by different external Availability of data provided by these sensors is
events (timers, signals, messages), pairing the IoT devices, invaluable to modern agriculture, and may be used to make
as a source of critical events, with automated business informed decisions about irrigation needs during drought
98
7th International Conference on Information Society and Technology ICIST 2017
99
7th International Conference on Information Society and Technology ICIST 2017
100
7th International Conference on Information Society and Technology ICIST 2017
the error details, the authorized user can decide whether the REFERENCES
manual inspection and the visit to the monitoring station is [1] Norwegian Emergency Management Process By Using Business
necessary or the issue may be resolved in some other way. Process Modeling Notation, V. Nunavath and A. Prinz, 8th IADIS
After the issue is resolved, the normal service of monitoring International Conference Information Systems, Madeira, Portugal,
station may be restored. 2015
[2] Design Patterns for Emergency Management Processes, T. Ludík
If a problem has occurred on a monitoring station that and T. Pitner, Proceedings of 11th IFIP WG 5.11 International
will support remote commands (as none currently does), the Symposium, ISESS 2015, IFIP Advances in Information and
attempt to contact the monitoring station will be made. For Communication Technology, vol 448, pp 434-444, Melbourne, VIC,
this purpose, a service task invoking the communication is Australia, 2015.
added to the process. [3] BPM and IoT: Transformation at the Digital Interface to the
Physical World, BPM Whitepaper
Since monitoring station may in fact already have failed, [http://bpm.com/resources/featured-whitepapers/1081-bpm-and-
the communication task is timed, and if no response is iot-transformation-at-the-digital-interface-to-the-physical-world]
received in given time, authorized user action is again [4] Integrating the Internet of Things with Business Process
needed. If communication is successful, and monitoring Management: A Process-aware Framework for Smart Objects, G.
station is responding, the newly received data is analyzed Merroni, CEUR WORKSHOP PROCEEDING, 2015
to determine whether further action is needed. [5] Water shortage hazard and adaptive water management strategies
in the Hungarian-Serbian cross-border region
[https://wahastrat.vizugy.hu]
[6] IoT-BPM Case Study, [http://www.pnmsoft.com/wp-
V CONCLUSIONS AND FURTHER WORK content/uploads/2016/02/IoT-BPM-case-study.pdf]
[7] Internet of Things-aware Process Modeling: Integrating IoT
Devices as Business Process Resources, S. Meyer, A. Ruppen, C.
In our paper, we have focused on a problem of detecting Magerkurth, 25th International Conference, CAiSE 2013, Valencia,
a failure of an automated environmental monitoring station Spain, June 17-21, 2013. Proceedings
used for acquiring data primarily about soil humidity. There [8] The R Project for Statistical Computing, [https://www.r-
is an operational network of such monitoring stations. project.org/] accessed in april 2017.
Monitoring stations are acquiring data for several years,
and these data are regularly analyzed. However, detection
of monitoring station failure has not been properly
addressed. Although failures of monitoring stations have
been fairly rare, they do happen from time to time.
Sometimes such a failure may have been prevented if some
corrective actions have been taken.
By adapting the data analysis algorithm, we can detect
critical events and use them to trigger automatic execution
of modeled response procedure. In the simplest
implementation, the recovery procedure will notify
authorized personnel, responsible for the given monitoring
station, to check the health of monitoring station. In this
case we are getting faster response, when compared to
standard review and detection of the problem from the
plotted graphs.
If a monitoring stations are upgraded, and bidirectional
communication is made possible, some form of automated
recovery may be attempted directly from the recovery
process.
BPMN is well known and adopted process modeling
language, and there are several commercial and open
source process engines that supports deployment and
execution of processes described in BPMN. In this case, use
of process model driven system as an implementation
platform gives an extra flexibility, making it possible to
adapt current base model to further, more sophisticated,
needs. Similar approach may be used for any kind of sensor
as long as error conditions may be derived from data
obtained from the sensor itself (or from the absence of that
data).
101
7th International Conference on Information Society and Technology ICIST 2017
Abstract—Modern supervisory control and data acquisition In general, temporal database is a database system that
(SCADA) systems collect large number of measurements, supports the time domain and is thus able to manage time-
inherently time series data, from various sensors in critical varying data [1]. On the other hand, temporal aggregation
infrastructures such as transmission and distribution of is a process based on temporal grouping where the
electrical power, water and gas. Analysis of such number of timeline is partitioned over time, tuples are grouped over
measurements individually, even with modern utility these partitions and aggregate values are computed over
management systems (UMS) on top of SCADA, is rarely these groups [2]. There are two types of temporal
feasible, especially in real-time smart and sensing systems grouping [3]: span grouping and instant grouping. Span
that are continuously acquiring sensor readings. One grouping is based on pre-defined time periods, such as
approach to overcome this problem is to extract important days or weeks, while instant grouping depends on tuples
characteristics of such data using business intelligence whose intervals contain a given time instant. Aggregations
solution that is capable of performing temporal aggregation based on span and instant groupings are called span
and providing access to a significantly smaller amount of temporal aggregation (STA) and instant temporal
aggregated data that can then be analyzed instead of raw aggregation (ITA), respectively. There are also two types
data. Developing a technique for efficient temporal of aggregation functions [4]: computational, obtained by
aggregation of such large data sets, thus, becomes an accumulating certain computations on tuples, such as sum
essential but non-trivial task. This paper proposes one such and count, and selective, obtained by selecting a
algorithm that can efficiently compute and retrieve representative value among values of tuples, such as min
temporal aggregates for large volumes of time series data, and max.
continuously, as such data arrives. The algorithm’s
efficiency was proven both theoretically and experimentally. Since each time series acquired by SCADA originates
Complexity analysis showed that the algorithm achieves from a different sensor called point, it is essential to
logarithmic update processing time cost, logarithmic query compute temporal aggregates for each point individually
time cost and linear storage space consumption. Moreover, so that each point can still be analyzed as a separate entity,
performance results from experimental studies show strong but with a significantly smaller amount of data. Examples
indication that the algorithm could be applied in practice of such analysis are detection of correlation between
for aggregating time series data on a large scale in real-time. variables and forecasting [5]. In such case, it is impractical
to use ITA because when applied on individual points it
results in aggregated data of the same amount as raw time
I. INTRODUCTION series data. This makes STA incomparably better choice.
Time series data plays an important role in analysis of This paper introduces an algorithm that efficiently
critical infrastructures with modern UMS. With large performs continuous STA of time series data (C-STATS)
amount of such data being collected with modern SCADA for each point individually as such data arrives. The
from various sensors such analysis becomes extremely C-STATS algorithm is based on a new I/O and main
difficult and temporal aggregation becomes increasingly memory efficient data structure, the linked sequence tree
important. (LS-tree), which is used for managing both raw and
In query languages, aggregation is identified as the aggregated time series data. Also, while most of existing
process of computing a single value, an aggregate, which approaches consider only regular time spans expressed in
summarizes a set of attribute values by means of an terms of temporal granularities, such as years and months
aggregation function. In a conventional database [6], this paper proposes a new, generic granularity model
management system (DBMS), aggregation is done by first (G-model) as an approach to definition of temporal
partitioning the argument relation into groups of tuples granularities or simply granularities, that is also crucial for
with equivalent values for one or more attributes and then the algorithm’s efficiency. The proposed algorithm is also
applying an aggregation function to each group in turn. applicable to both computational and selective
Classical aggregation functions include min, max, sum, aggregation functions.
count and average. However, for interval-valued We analyzed the proposed algorithm as an access
databases, such as temporal databases, more partitioning method by its complexity and performance that are
predicates other than equality, such as temporal characterized by three costs [7]: (1) the update processing
overlapping, emerge as being important along with a time cost (the time to update the method’s data structures
variety of aggregation functions, such as moving average. as a result of a change: insert, update or delete operation),
Therefore, aggregation possibilities are greatly expanded (2) the query time cost for each of the basic queries and
and the process of temporal aggregation is far more (3) the storage space cost for physically storing data
complex problem than aggregation in conventional DBMS structures. The complexity of the algorithm was derived
and, hence, more prone to inefficiencies. from theoretical proof, while its performance was tested
102
7th International Conference on Information Society and Technology ICIST 2017
experimentally on large data sets. Conducted experiments resembles the roll-up operation examined in data
were designed to simulate the ratio between the number of warehousing studies. In a data warehouse, information is
points and the number of tuples for each point in stored at finest granularity and the roll-up takes place at
laboratory conditions rather than to fully imitate the query time, which allows a user to obtain aggregates at
amount of data that would accumulate over time and reach coarser granularity. This approach provides a set of
much greater numbers in practice. conversion functions that can be used to convert
The rest of the paper is organized as follows. Section 2 aggregates between granularities on the basis of temporal
discusses closely related work. Section 3 introduces core relationships between them.
concepts used in the rest of the paper. Proposed G-model A generic algorithmic framework that can be applied to
and LS-tree are introduced in sections 4 and 5, many different forms of aggregation was proposed in [15]
respectively. The C-STATS algorithm is introduced in and implemented in the context of Tripod [16][17], a
section 6 and its complexity analyzed in section 7, while prototype of spatiotemporal object-oriented DBMS. The
performance results are discussed in section 8. Finally, proposed technique is based on construction of hash tables
section 9 presents conclusions and open problems for for both the attributes that determine how tuples are
further research. partitioned and the attributes that aggregates are computed
for (aggregation attributes). Once the two hash tables are
II. RELATED WORK populated they can be used to determine which
A considerable amount of work has been done by the aggregation attribute value belongs to which partition.
scientific community to support temporal aggregation that Then, an aggregation function can be applied on
has been formalized in [8]. The most related work is aggregation attribute values per partition in order to obtain
focused on solving the problems of answering basic desired aggregates. The results of performance studies
temporal aggregation queries in terms of temporal indicate that the framework provides a scalable solution
databases and coping with user-defined granularities. A for many cases of aggregation. However, it does not
large number of techniques have been proposed to provide any guarantee on performance because its primary
efficiently support ITA [4, 6, 9 – 12]. However, there are target is generality rather than efficiency. One of open
much less techniques proposed for STA, none of which challenges it leaves behind is handling update-intensive
considers efficient aggregation of such large time series applications based on sampled continuous functions, such
data sets as the ones in modern UMS. as time series data discussed in this paper.
An approach for computing temporal aggregates with Several approaches to generalization of temporal
multiple levels of granularities was proposed in [13]. A aggregation and formulation of temporal query languages
model called the Hierarchical Temporal Aggregation were proposed in [18]. These approaches are based on the
(HTA) model was introduced to aggregate more recent temporal multi-dimensional aggregation operator
data with finer detail and older data at coarser (TMDA-operator) [19] that generalizes temporal
granularities. However, the proposed algorithm is focused aggregation by decoupling the partitioning of the timeline
only on computational aggregation functions. It uses a for resulting aggregates from the specification of groups
structure called the 2SB-tree that maintains two segment of tuples associated with resulting aggregates. The
balanced trees [10] with tuples that beside values have TMDA-operator was evaluated with two algorithms: one
timestamps instead of time intervals. However, each of for ITA and another for STA. The algorithm for STA
these timestamps implies an interval from it to +∞, unlike processes tuples in chronological order and computes
a timestamp recorded for a point in a UMS that implies a aggregates that are then stored in a table and updated as
time interval from it to the next recorded timestamp for tuples are being scanned. The average time complexity of
the same point. The HTA model divides the timeline into the algorithm is O(N⋅logM) for N tuples and M resulting
dynamically adjustable segments and maintains separate aggregates. A general model that offers a uniform way of
index structure for aggregates in each segment. Depending expressing the various forms of temporal aggregation was
on what triggers the adjustment, the HTA model can be later proposed in [20]. This model is based on the formal
separated into two sub-models. The fixed-storage model relational database framework available in SQL that
aggregates older data at coarser granularity when the total includes aggregation [21]. However, it implies an
storage of the aggregate index exceeds a fixed threshold. implementation that requires costly operations, such as
The fixed-time-window model assumes that the lengths of joins and multiple scans of the argument relation, which
all segments except the first one are fixed, which causes would make it inefficient.
dividing times to increase with advancement of time. The
time complexity of the algorithm under the fixed storage III. CORE CONCEPTS
model is O(G∙logBS/G) for tuple insertion and O(logBS/G) This section provides formal definitions of time series,
for computing aggregates. Here, G is the number of used granularity and STA concepts used in the rest of the paper.
time granularities, S is the storage limit in the number of Time series are sequences of values ordered with
blocks and B is the block capacity in the number of respect to associated timestamps from the time domain.
records. The time complexity of the algorithm under the Formally:
fixed time window model cannot be guaranteed because
DEFINITION 1. Time domain.
advancing of dividing times doesn’t depend on index size
A time domain is a pair (T, ≤) where T ≠ Ø is a set of
but on length of segments. In the worst case, all tuples fall
timestamps and ≤ is a total order on T.
on the last partition and thus are kept at finest granularity.
DEFINITION 2. Time series.
Another approach for using multiple levels of
A time series τ is an ordered sequence of values τ = {xt : t
granularities for temporal aggregation was proposed in
∈ T} where (t = t1,..., tn) ˄ (∀i ∈ {1,..., n})(ti < ti + 1).
[14]. Here, aggregation for different granularities
103
7th International Conference on Information Society and Technology ICIST 2017
Time series tuples (TS tuples) are detonated as <key, V. THE LS-TREE
time, value>, where key represents an identifier of an The LS-tree is a new tree data structure for indexing an
individual time series called point, time represents a ordered sequence of elements. Its design goal is to provide
timestamp associated with value, and value represents a an index structure that the C-STATS algorithm can utilize
time series value itself. for indexing TS and STATS tuples to achieve desired
Granularity concepts were formalized in [22]. efficiency. To meet these requirements, the LS-tree
Granularities are mappings γ from an index set to the time indexes tuples by their unique identifiers and keeps each
domain. Each non-empty subset γ(i) is called a granule of tuple linked to previous and next tuple. The unique
the granularity γ. Granules in a granularity do not overlap identifiers, in this case, represent keys and times of TS
and their index order is the same as their time domain tuples or keys and start times of STATS tuples. The
order. Formally: purpose of these identifiers is to enable efficient retrieval
DEFINITION 3. Granularity. of tuples and application of changes (insert, update and
A granularity is a function γ : N0 → T such that delete operations). On the other hand, the purpose of
linking is to enable efficient computation of aggregation
γ(i) ≠ Ø ˄ γ(j) ≠ Ø ˄ i < j ⇒ γ(i) < γ(j) (1) functions, such as integral, that do not depend only on
and changed tuples, but also on tuples that they are linked to.
γ(i) ≠ Ø ˄ γ(j) ≠ Ø ˄ i < k < j ⇒ γ(k) ≠ Ø (2)
In this paper, the LS-tree is implemented as the B+ tree
where i ∈ N0, j ∈ N0 and k ∈ N0. [23] with doubly linked leaves and doubly linked tuples
Granules can be composed of a set of contiguous time for the same key in each leaf. B+ trees provide I/O
instants (time interval), a single time instant, or even a set efficiency in a block-oriented storage context by indexing
of non-contiguous time instants. However, STA applies tuples by their unique identifiers. Doubly linked lists
only to granules composed of time intervals. provide I/O efficiency for retrieving previous and next
Finally, span temporally aggregated time series LS-tree leaves and main memory efficiency for retrieving
(STATS) can be defined as a series of aggregates that previous and next tuples. The LS-tree inherits the
summarize original time series values by means of complexity of the B+ tree. Additional construction of a
aggregation functions on time intervals of pre-defined doubly linked list when a leaf is read into main memory
granularities. Formally: does not affect the LS-tree complexity, because the list is
constructed as tuples in the leaf are being read in search of
DEFINITION 4. STATS a particular tuple.
A STATS is an ordered sequence elements α = {ai : i ∈
N0}, where ai = {f1(x),…, fk(x)} is a set of aggregates that Fig. 2 shows an example of the LS-tree of order 5 that
are result of aggregation functions {f1,…, fk} for a time indexes TS tuples. For the purpose of simplicity, keys,
series τ on a granule γ(i). times and values are represented as natural numbers.
< 1, 6 > < 2, 5 >
STATS tuples are denoted as <key, start time, end time,
aggregates>, where key represents point identifier, start
and end time represent aggregated time interval, and
aggregates represents a set of computed aggregates.
< 1, 1, 220 > < 1, 6, 222 > < 2, 5, 217 >
< 1, 4, 219 > < 2, 2, 218 > < 2, 7, 221 >
IV. THE G-MODEL < 1, 5, 220 > < 2, 4, 219 > < 3, 6, 222 >
The property of granularities that does not allow
granules to overlap, consequently, ensures that granules in
a granularity can be uniquely identified by their start and Figure 2. An example of the LS-tree.
end times. This paper exploits these properties to model
granularities with two functions: σ and ε. Given a VI. THE C-STATS ALGORITHM
timestamp as an input, these functions produce start and
end time, respectively, of a granule that overlaps with The novel C-STATS algorithm efficiently performs
given timestamp. Formally: STA of time series data for individual points, for multiple
DEFINITION 4. Functions σ and ε. granularities and for both computational and selective
Let s(i) and e(i) be start and end time, respectively, of a aggregation functions. It ensures consistency between raw
granule γ(i). Then, functions σ: T → T and ε: T → T are and aggregated data at all times by continuously changing
such that STATS tuples according to changes upon TS tuples.
Computing STATS tuples in advance therefore ensures
σ(t) = s(i) ˄ ε(t) = e(i) ⇔ s(i) ≤ t < e(i) (3) efficiency at query time when such tuples are requested.
To achieve this, the algorithm exploits the properties of
Fig. 1 shows an example of the G-model whose the G-model and the LS-tree. TS tuples are indexed by
granules represent calendar days. their keys and times in one LS-tree and STATS tuples are
Time σ(Time t): indexed by their keys and start times in multiple LS-trees
return new Time(t.Year, t.Month, t.Day) separated by associated granularities. Therefore, the
algorithm maintains G + 1 LS-trees, where G is the total
Time ε(Time t): number of granularities. Each change of time series data
return σ(t).AddDays(1)
causes all LS-trees to update their structures and the G-
model helps to faster identify the affected tuples.
Figure 1. An example of the G-model.
104
7th International Conference on Information Society and Technology ICIST 2017
Given a key, time and value of a changed TS tuple, the second step is performed in O(k∙G∙(logM+F+r)) time.
C-STATS algorithm performs the following steps: Thus, the overall update processing time cost of the
1. Change TS tuple. First, a position where a TS algorithm is O(logN+k∙G∙(logM+F+r)). This can be
tuple with given key and time should be placed is simplified to O(logN) because N has higher growth rate
located in the LS-tree that holds TS tuples. If a TS then k, G, M, F and r.
tuple exists at located position, its value is THEOREM 2. Simple query time cost.
preserved for step 2 as old value. Then, a change The query time cost of the C-STATS algorithm for
is applied either by inserting new TS tuple at selecting a STATS tuple associated to given granularity,
located position, updating value of existing tuple key and start time is O(logM), where M is the number of
or deleting tuple at located position. STATS tuples for a single granularity.
2. Change STATS tuples. To change STATS tuples
that are affected by incoming changes their def STA:
positions must first be located in LS-trees by LSTree TSTuples
Map<Granularity, LSTree> STATSTuples
given key and their start times. Their start times
are calculated by applying functions σ and ε to Insert(Key k, Time t, Value v):
times of TS tuples. In case of an insert operation, TSTuple x = TSTuples.AddTuple(k, t, v)
if there are no STATS tuples at located positions, foreach (Granularity g in STATSTuples):
then new ones are created. Finally, each located Time s = g.σ(t), e = g.ε(t)
STATS tuple is changed according to an operation STATSTuple y = STATSTuples[g].GetOrAddTuple(k, s, e, null)
if (y.Integral = null):
performed upon a TS tuple. If needed, aggregation if (x.Prev ≠ null): y.Integral = (e - s) * x.Prev.Value
functions can use both old and new value of else: y.Integral = 0
changed TS tuple, as well as links to previous and Time nextStartTime = y.StartTime
next tuples, provided by doubly linked lists. if (x.Next ≠ null): nextStartTime = g.σ(x.Next.Time)
Fig. 3 shows pseudocode of the C-STATS algorithm that while (y.StartTime ≤ nextStartTime):
Time from = y.StartTime, to = y.EndTime
performs insert, update and delete operations upon TS if (from < t): from = t
tuples as well as selection of STATS tuples. In this if (x.Next ≠ null ˄ x.Next.Time < to): to = x.Next.Time
pseudocode, integral is computed as an aggregation if (x.Prev ≠ null): y.Integral – = (to – from) * x.Prev.Value
function that depends both on time series values and their y.Integral + = (to – from) * v
y = y.Next
times. Here, integral represents a surface covered by time
series values between start and end times of STATS Update(Key k, Time t, Value v):
tuples in a coordinate system where times fall on x axis TSTuple x = TSTuples.GetTuple(k, t)
and associated values fall on y axis. For the purpose of Value o = x.Value
x.Value = v
simplicity, these coordinates are interpolated using the foreach (Granularity g in STATSTuples):
step interpolation. STATSTuple y = STATSTuples[g].GetTuple(k, g.σ(t))
Time nextStartTime = y.StartTime
VII. COMPLEXITY ANALYSIS if (x.Next ≠ null): nextStartTime = g.σ(x.Next.Time)
while (y.StartTime ≤ nextStartTime):
The complexity of the algorithm was analyzed by three Time from = y.StartTime, to = y.EndTime
costs: (1) the update processing time cost, (2) the query if (from < t): from = t
time cost and (3) the storage space cost. if (x.Next ≠ null ˄ x.Next.Time < to): to = x.Next.Time
THEOREM 1. Update processing time cost. y.Integral + = (to – from) * (v – o)
y = y.Next
The update processing time cost of the C-STATS
algorithm for processing a single change in time series Delete(Key k, Time t):
data is O(logN), where N is the total number of TS tuples. TSTuple x = TSTuples.GetTuple(k, t)
PROOF. Let us denote G as the total number of Value o = x.Value
granularities, M as the number of STATS tuples for a TSTuples.Remove(x)
foreach (Granularity g in STATSTuples):
single granularity, k as the number of STATS tuples STATSTuple y = STATSTuples[g].GetTuple(k, g.σ(t))
affected with a change in time series data, F as the total Time nextStartTime = y.StartTime
number of aggregation functions and r as the number of if (x.Next ≠ null): nextStartTime = g.σ(x.Next.Time)
TS tuples required for computation of aggregation while (y.StartTime ≤ nextStartTime):
functions. To process a single change in time series data, Time from = y.StartTime, to = y.EndTime
the algorithm needs to perform two steps: (1) to update if (from < t): from = t
if (x.Next ≠ null ˄ x.Next.Time < to): to = x.Next.Time
one LS-tree that holds N TS tuples and (2) to update G y.Integral – = (to – from) * o
LS-trees, each holding M STATS tuples. In the first step, if (x.Prev ≠ null): y.Integral + = (to – from) * x.Prev.Value
the algorithm can apply a change directly to a LS-tree that y = y.Next
holds TS tuples using given key, time and value. Since
LS-trees inherit complexity of B+ trees, the first step is Select(Granularity g, Key k, Time from, Time to):
performed in O(logN) time. In the second step, the STATSTuple[] result
algorithm needs to find k STATS tuples in each of G LS- STATSTuple a = STATSTuples[g].GetNexTuple(k, g.σ(from))
while (a ≠ null ˄ a.StartTime < to ˄ a.EndTime > from):
trees and for each of these tuples to perform F aggregation result.Add(a)
functions using r TS tuples. Since the algorithm can a = a.Next
identify affected STATS tuples in LS-trees using given return result
key and calculated start times and since all required TS
tuples can be retrieved using doubly linked lists, the Figure 3. An example of the C-STATS algorithm.
105
7th International Conference on Information Society and Technology ICIST 2017
PROOF. To select a STATS tuple, the algorithm needs Experimental results confirm the complexity analysis of
to perform two steps: (1) to find the LS-tree that holds the algorithm. Fig. 4 shows that the algorithm achieves
STATS tuples for given granularity and (2) to find a logarithmic time cost for update processing, simple query
STATS tuple, among S STATS tuples in that particular and range query. None of the experiments showed major
LS-tree, by its key and start time. Since granularities are differences between insert, update and delete operations.
directly mapped to LS-trees that hold STATS tuples and This was also the case for using different number of
since LS-trees inherit complexity of B+ trees, these steps aggregation functions. Update processing time was mostly
are performed in O(1) and O(logM) time, respectively. influenced by amount of time series data that was much
Therefore, the overall query time cost of the algorithm is larger than amount of aggregated data, as shown in Fig. 6.
O(1+logM) ⇒ O(logM). Simple and range query showed only minor differences
THEOREM 3. Range query time cost. between them, influenced by the number of STATS tuples
The query time cost of the C-STATS algorithm for selected with the range query. However, there was major
selecting STATS tuples associated to given granularity difference between update processing time and query
and key and whose time intervals overlap with given time time, in general, as expected by their complexities. Fig. 5
range is O(logM+k), where M is the number of STATS shows how the average speed of update processing drops
tuples for a single granularity and k is the number of as more points and granularities are used. Larger number
STATS tuples that satisfy given conditions. of granularities requires equally large number of LS-trees
and larger number of points results in larger number of
PROOF. The theorem 3 is a generalization of the
STATS tuples. As there is more aggregated data for the
theorem 2. Here, the algorithm needs to perform the
algorithm to maintain the performance decreases.
following steps: (1) to calculate start time of a first
However, increasing the number of points showed much
granule that overlaps with given time range using the σ
less decrease in performance than increasing the number
function, (2) to find a STATS tuple for given granularity
of granularities. Therefore, one of the main challenges for
using given key and calculated start time and (3) to use
further improvements of the C-STATS algorithm is to
doubly linked lists to find next k STATS tuples that have
reduce these performance differences. Fig. 6 shows
the same key and whose time intervals also overlap with
linearly increasing storage space consumption mostly
given time range. These three steps are performed in O(1),
dependent on TS tuples. Using different number of points,
O(logM) and O(k) time, respectively. Therefore, the
granularities or aggregation functions did not make any
overall query time cost of the algorithm is O(1+logM+k)
considerable difference in total amount of consumed
⇒ O(logM+k).
storage space. In the worst case, when 30 thousand points
THEOREM 4. Storage space cost. was aggregated for all three granularities and all five
The storage space cost of the C-STATS algorithm for aggregation functions, the storage space was still two
storing both raw and aggregated time series data is O(N), orders of magnitude more consumed by TS tuples
where N is the total number of TS tuples. (~39GB) than by STATS tuples (~400MB).
PROOF. The storage space cost of LS-trees is the same
as for B+ trees, which is O(X), where X is the number of
tuples in the tree. The C-STATS algorithm maintains one
LS-tree that holds N TS tuples and G LS-trees, each
holding M STATS tuples, where G is the total number of
granularities and M is the number of STATS tuples for a
single granularity. Therefore, the overall storage space
cost of the algorithm for both raw and aggregated time
series data is O(N+G∙M). This can be simplified to O(N)
because N has the highest growth.
VIII. PERFORMANCE RESULTS Figure 4. Performance of the C-STATS algorithm for aggregating 10
thousand points, for hour granularity, and for performing the simple
The C-STATS algorithm was implemented in C#. All query and the range query that selects 100 STATS tuples.
experiments were performed on a PC with eight-core
4GHz processor, 16GB main memory and 500MB/s solid
state drive. Sizes of LS-tree blocks were adjusted to 8KB.
The algorithm was set to serialize blocks and persist them
to disk each time they change. However, in order to
reduce disk reads, blocks were removed from main
memory only if they have not been used for at least one
second. The algorithm was performed on 1 billion TS
tuples equally distributed among 10 to 30 thousand points
within one-week time period. The purpose of this G={hour}
G={hour} G={hour,
G={hour, day}day}
G={hour, day, week}
experimental amount of time series data was to simulate
the ratio between the number of points and their TS tuples
in laboratory conditions, rather than to fully imitate the Figure 5. Performance of the C-STATS algorithm for aggregating 1
amount of data that would accumulate over time and reach billion TS tuples for different number of points (P) and different
much greater numbers in practice. Experimental time granularities (G).
series data was aggregated for three granularities (hour,
day and week) with five aggregation functions (min, max,
sum, count and integral).
106
7th International Conference on Information Society and Technology ICIST 2017
107
7th International Conference on Information Society and Technology ICIST 2017
Abstract — Recently there is a rapid increase in application Distribution (EPD). One of the main problems of Electric
of software intensive systems in the domain of Electric Power Distribution Companies (EPDCs) is to properly
Power Distribution (EPD). EPD companies use a significant and on time have a calculated value of the delivered
number of field devices that are usually connected via a energy. EPDC in Serbia and its regional center in Nis is
cloud infrastructure. While communicating with each other obtaining it by having monthly meter readings. Such
such devices produce enormous amounts of data that should monthly meter readings are later used to properly bill
be processed in order to be of any value for the EPD and the customers for delivered energy. Lately, with introduction
software that is deployed into its infrastructure. Systems of the smart meters, measures can be remotely read every
that deal with processing of such data, especially in cloud- 15 minutes, so it becomes easier for EPCDs to have meter
based environment are denoted as Stream Computing readouts ready for monthly bills. However, most of the
Systems. They usually have very strong quality currently used meters are analog devices that just display
requirements for reliability, performance, scalability and the currently consumed power. Such meters require
etc. In this paper we propose an architectural approach to human operator to collect the data. But timely data on
manage dependability and more specifically – reliability of power consumption are not only needed for the billing.
stream computing systems. For this purpose we apply the Such data are needed for proper calculation of technical
principles of autonomic computing to an architecture that is loses and estimation of the commercial loses. Reducing
based on microservices. such loses is of the highest importance for the local
EPCDs.
I. INTRODUCTION In order to obtain accurate and timely data for such
analyses, Serbian EPDC’s regional center in Niš is
Internet of Things (IoT) is an emerging concept that introducing various devices that are being used in day to
comprises the interconnection of various hardware day operations [20]. Each person on the field is equipped
devices, such as, end user devices, home appliances, with a ruggedized Android based PDA smart device that is
smartphones, field sensors and etc. Such devices are used for electric meter readings. Also, the company is
usually connected via a cloud infrastructure that should be purchasing modern grid analyzers, systems that can be
based on the concept of microservices [18]. The used to remotely read various measurements, such as
communication of such devices naturally leads to voltage lever, electric currency strength, delivered energy
production of enormous amounts of data, which requires a and so on.
lot of processing in order to be available for other devices
and software systems. Moreover, in many cases this data Having these devices deployed on the field and set up
should be processed simultaneously with real time and connected, it is possible to harvest all needed data to
requirements. The tendency is to have such devices to do various kinds of analyses. Based on the meter readings
communicate via software services in cloud based and readings from grid analyzers, it is possible to make an
environment. Main purpose of stream computing systems estimation of the level of the technical losses and
is to make this possible. In this context, usually the quality afterwards make a calculation of the commercial loses.
requirements, both for stream computing systems and the But such readings are usually applied on 15 minutes
IoT devices are very high and this holds not only for the readings. Making required data to be accurate and
hardware, but also for the software. received on time is the major requirement of the presented
system. Given that such operations are time constrained
Software quality in terms of dependability becomes an but rely on the network connection, human input and can
important quality characteristic of contemporary software reveal sensitive business and personal data of the
systems [2]. Dependability has several attributes like consumer, the main research question that arises is: how
availability, reliability, safety, maintainability and etc. to raise dependability of the results of operation of the
Traditionally, software dependability is determined by future stream computing system that will incorporate
the means of software testing [8]. In the context of IoT, analyses of presented 15 minute resolution readings.
however it is very difficult to rely only on testing, because The goal of this paper is to define an approach, and a
of the natural obstacles that arise for testing into a reference architecture for management of reliability of big
distributed environment. Additionally, wide class of stream computing system in the EPD domain. The
embedded IoT systems have ultra-high reliability approach presented here is based on the concept of self-
requirements. adaptation of microservice architecture, as the latter is
Currently there is a trend to increase usage of currently the most common case used in cloud based IoT
information systems in the domain of Electric Power systems. From purely scientific perspective, it is also of
108
7th International Conference on Information Society and Technology ICIST 2017
significant importance, to apply and validate the notion of limits. Such redundancy in the equipment can pose a
autonomic computing in different application domains. significant financial requirement. The best answer today is
The rest of the paper is organized as follows: Section 2 migrating the infrastructure to the cloud. Stream
describes some related work in terms of employment of computing big data analytics in Smart Grids is going hand
stream computing in EPD; Section 3 presents the in hand with the Cloud technology [14]. Its characteristics
theoretical background of our approach; Section 4 shows like cost effectiveness, reliability, scalability, availability
the main concepts of our architectural solution for and elasticity are very useful for such environments. It
software dependability management of EPD stream also means that EPDs can focus on maintaining
computing systems and finally Section 5 concludes the infrastructure on which their core business is based,
paper and gives some directions for further research in the namely the power grid and devices that are needed for its
area. functioning. Using the Cloud, the need for the computer
hardware infrastructure becomes obsolete.
II. STREAM COMPUTING IN ELECTRIC POWER Most solutions that can be found in the literature are
DISTRIBUTION focusing on smart meters that are possible to be read
automatically. This is undesired prerequisite since the
With the introduction of Smart Grid [1], process of changing meters from the old analogue ones
interdependency of electric power grid and information into digital ones is long and tedious and won’t be
and communication technology is rapidly emerging [21]. completed in some countries for the foreseeable future.
This means that not only electric power grid devices are Therefore, alternative approaches should be sought to
shifting from analogue technology to digital, which in include manual readings and human input into such big
turns provides more possibility of data measurement and data streaming computing to enable self-adaptiveness and
control, it also means that these devices are starting to be reconfiguration based on the input stream of data from
interconnected. various sources.
But Smart Grid also means that more stakeholders are
involved into the power generation and distribution III. THEORETICAL CONCEPTS
process, hence more data are being gathered and more
diversified analyses of such data are needed. Nowadays, In this section we are going to briefly present some of
dynamic load tracking, dynamic tariffs, clients that can the three fundamental notions that are used in our solution
consume but also produce electricity that can be delivered – Software reliability and dependability, Autonomic
to the grid is becoming a reality [17]. This induces computing and Microservices.
possibilities for massive data collection in different
locations of the power grid. But in order to be able to A. Software reliability and dependability management
process such high volumes of data, conventional Software reliability is significant software quality
approaches cannot be used anymore. If such data are only parameter, for large variety of systems in different
stored into the data warehouse and analyzed later, it will domains. Formally, it is the continuity of correct service
not have the same impact as if the analyses are obtained in delivering, i.e. the belief that a software system will
real-time or in near real time. One of the possible behave as per specification over a given period of time
approaches is to use stream computing systems to be able [1]. It is measured by statistical values and may be
to process such high volumes of data as they come. represented as:
Stream computing and stream mining of real-time or Probability of failure
near-real-time data is not a novel approach. It is already Failure rate
being used for quite some time in various specific fields Mean time to failure
such as monitoring of highway traffic [3]. But also there Generally, reliability is part of a broader notion of
are generic approaches with platforms that can be dependability. The latter is defined as the ability of a
customized for the real-life deployments, such as S4 [15]. computing system to deliver services that can justifiably
Only lately, after the Smart Grid development has be trusted [2]. It is depicted by a number of attributes as
influenced such high data volumes being obtained [7], it is follows:
starting to be used in the field of EPD. In [18] authors are Availability represents readiness of the system to
identifying the most important technologies that are used deliver correct service.
in the Smart Grids, namely: Streaming Data Analytics, Safety is concerned about absence of catastrophic
Load Forecasting, Cloud, Demand Response and Static consequences on the user(s) and the environment
Analytics. Obviously, first three and the fifth require in case of system failure.
streaming computing of incoming data to make the system
Confidentiality is the absence of unauthorized
adaptable to dynamical changes that happen throughout
disclosure of information,
the grid. Predicting power supply and demand is one of
the big challenges that can be solved using Stream Integrity means absence of improper system state
Computing [6]. But not only academia is trying to give alterations;
answers to such demands for the real-time data analysis, Maintainability is the ability of the system to
companies like Accenture are already providing solutions undergo repairs and modifications.
that can fulfil such needs [22]. In the contemporary IoT context reliability is gaining
even more attention, because of the high customer and
In order to have streaming computing implemented, the industry requirements towards the distributed smart
underlying infrastructure needs to be able to answer devices. The most common way to evaluate reliability is
different loads of data that are happening in different time to use data from system testing [8]. There exist a number
periods. Periods with high consumption but also with high of software reliability models, tailored to process such
production of electricity can push the hardware to the data and more specifically – number of failures and testing
109
7th International Conference on Information Society and Technology ICIST 2017
(or execution) time elapsed between two subsequent The idea behind this is that self-adaptive systems can
system failures [10], [11]. This way practical importance dynamically change their behavior or properties based on
of software reliability is to determine when enough testing external stimulus. A key mechanism used for running the
is been performed and the system is ready to be shipped to self-adaptation is the so called control loop or feedback
the market. However, it is inherent for all reliability loop [5]. It contains the following main phases:
models to make assumptions in order to make estimation Collect – the relevant data is obtained from the
possible and such assumptions may reduce applicability of surrounding environment and stored for further
model into development industry. processing. Usually, this data is collected through
However, most IoT systems and their constituent parts, agents or sensors.
like embedded and safety critical systems have ultra-high Analyze – the data is analyzed and related to
reliability requirements. Studies have shown that existing models.
satisfying such requirements is practically infeasible, as Decide – the best action from a list of options is
one needs to test with uncorrelated test data for thousands selected
of years [4]. Act – the system puts the change in place and the
This way in many IoT domains, like EPD, alternative loop starts over.
approaches for reliability management are needed. In next
section we continue with the description of the concept of C. Microservices
autonomic computing, which we are going to employ in Service Oriented Architecture (SOA) is well known
our reliability management approach. architectural concept for building distributed software
systems that has dominated this software engineering
B. The notion of autonomic computing domain during the last two decades. Main reason is that it
Autonomic computing has been introduced several is claimed to enable reusability and straightforward
years ago as an attempt to define structural approach integration of the main building blocks, called services.
towards design and development of dynamic and self- However recently, mainly because of quality requirements
adaptive software systems. for system that operate in cloud environment, the
Self-adaptive systems can be viewed from multiple architectural style of microservices has emerged [12],
points depending on their direction of autonomy. These [16]. Microservices and SOA share some common
points are called self-* properties of the system. They characteristics, but also differ in the following:
represent the ability of the system to take autonomous Service Granularity – microservices are small and
actions in the following 4 areas [13]: fine-grained. Their small codebase allows
Self-configuration – the system is able to configure developer to quickly identify and fix issues and
itself in order to comply with initially defined high also easily test them in isolation. In contrast, SOA
level goals; doesn’t provide any recommendation regarding
Self-optimization – the system is able to detect and service granularity. Services could vary in size
optimize weak points or places that can be significantly and developers could easily be
improved in terms of specific requirements; tempted to build large enterprise services that are
Self-healing – the system is able to detect hard to maintain.
problems, create a strategy for fixing them and to Component Sharing – the approach of
apply it; microservices tend to make each service
Self-protection – the system is able to detect independent to other shared components. A
potential threats and defend itself against external common technique is to copy the same
attacks. functionalities to all services and evolve them
The idea behind our reference architecture is to design independently. This approach gives developers
an approach that will target the Self-configuration and freedom to update their services independently by
Self-optimization properties of the distributed system. reducing the external dependencies and team
Main aspect of autonomic computing is the so-called coordination. On the contrary, SOA follows the
control loop (fig. 1). principle of share-as-much-as-possible of given
functionality. Although this approach solves the
problem with code duplication, it increases the
coupling between service components. When a
shared component needs to be updated, teams need
to analyse the impact on other services and align
their release plans to the other services.
Remote Access Protocols – Microservices favour
Representational State Transfer (REST) [9] for
exchanging messages. Compared to other protocols
REST is relatively simple and lightweight and
allows easy ad hoc development and integration.
This helps development teams to quickly
implement and deploy new services that can
exchange messages with others. However, classical
Figure 1. Autonomic Control Loop SOA doesn’t have any specific prescriptions
regarding remote access protocols. In result,
110
7th International Conference on Information Society and Technology ICIST 2017
protocols are often heavyweight for simple and business data but also data that relate to consumer’s
well-separated applications. privacy is transferred, measures for assuring safe and
To complement what have been stated above, it should secure data transfer need to be applied. Since PDA mobile
be noted that simplicity is fundamental factor that devices depend on the human data insertion, system needs
distinguishes SOA and microservices. The latter strive for to provide proper signals to the users in cases when
simplicity at any level of implementation. Microservices inserted values are not within defined threshold. Such
are fine-grained because this way, they are simpler to threshold can be statically set in advance, but it is better if
support. Protocol and message options are reduced, as it is it is dynamically calculated based on already received
much easier to operate with limited set of interfaces that data.
need to be supported. In consequence, microservice When system starts receiving data related to the
applications are more robust and resilient. They could targeted part of the grid, the analyze phase of the control
easily be adapted to new market requirements and attract loop starts. This is made by grid analyzers that are usually
new customers. deployed on transformer station feeders, but can be also
SOA doesn’t provide any guideline in terms of used on any power line that powers some smaller part of
simplicity. Although service-oriented systems can be built the grid. Based on the type of the electricity meters in that
for simplicity, most times you will encounter large and part of the grid, metering data are read automatically or
complex service-based applications. They can integrate are collected by workers carrying mobile PDA devices.
with a large set of heterogeneous systems, work with large Two types of analyses are taking place, first one is
amount of data and support complex rules for processing primary and addresses the electricity losses by estimating
messages. But such applications are hard to support and technical and commercial losses. Knowing the power line
change rapidly. Often such complexity introduces more characteristics and the energy and the electric current on
troubles that solutions. the output as well as the amount of energy delivered to the
consumer, it is possible to calculate technical losses.
In general, microservices are a form of SOA. Since
When calculated technical loses and delivered energy are
SOA practices are significantly less restrictive than
subtracted from the energy measured using the grid
microservices it is reasonable to think of microservices as
a “form of SOA, perhaps service orientation done right” analyzer, the amount of commercial loses is determined.
The second type of analysis is used for further
[12].
improvement of the system. All possible values that
In next section we are going to discuss design issues of deviate from the average values for the given period of
the system for dependability management based on the year are check on the human error. Also the human error
notions of microservices and autonomic computing. warning threshold is calculated and sent back to devices
after each block of analysis.
IV. ARCHITECTURAL APPROACH FOR DEPENDABILITY
The planning phase relates on the products of the
MANAGEMENT IN EPD SMART COMPUTING SYSTEMS analyses. Based on analysis results, if commercial losses
are above the given threshold, the system should plan its
A. System design and analysis reconfiguration in order to narrow down the source of the
Following the approach of autonomic computing, we loses. Whether it can be one customer or one subpart of
should first consider the collection phase. In our proposed the grid, in planning phase the system should mark the
system, data is collected using three main sources [19]: locations where grid analyzers should be deployed for the
Smart meters that will be read automatically in 15 next session. Such locations should be shown on the GIS
minute resolutions. module of the system. In addition, based on the meter
Mobile PDAs, that will send data using 3G or WiFi types, meter reading locations should be also marked.
network. The resolution of these readings will rely By executing the change plan, system will be
on the workers on the field. reconfigured. Given that grid analyzers require physical
Grid analyzer devices that will be deployed on location changes and proper mounting on the power line
various points of the electric grid. These devices as well as the analog meters require presence of workers
provide data readings in 15 minutes or smaller in the field, we can talk about semi-autonomous execution
resolutions. phase. Once when the devices are properly deployed, the
The presented collection phase can have two potential new collection phase can start over. In the meantime, the
problems. Namely, reliable and secure data transfer of results in the form of calculations of commercial losses
massive amounts of data form the autonomous working can be consumed in order to prevent them in the future.
devices such are smart meters and grid analyzers, as well Generally, losses can come from broken equipment or
as form the field PDA devices needs to be assured. Human from theft. Therefore, system execution results should be
error detection needs to be applied for data received from used by the technical and legal departments in order to
PDA devices. mitigate such problems.
Reliable data transfer from smart meters and grid B. Micro-services self-adaptive architecture
analyzers is addressed by the retailer that needs to provide
appropriate technology that assures it. It is in most cases Next, we will present our microservice-based
based on the data transfer through the power grid itself. architecture that implement the aforesaid approach. It
But the reliability problem arises from the grid instability considers some of the specifics of microservices like
where fault reduction and elimination needs to be applied. technology heterogeneity, the large number productive
Moreover, storing and working with such massive data on services, highly distributed nature, service
the company side, requires proper hardware capabilities, componentization, etc. Therefore, it can be applied
in the form of data warehouses that need to provide regardless the technology that is used for developing the
mechanisms which prevent data loss. Given that sensitive services. Each of the deployed services could be self-
111
7th International Conference on Information Society and Technology ICIST 2017
adaptive. This means that is can monitor itself and the component and apply changes based on the identified
operating environment and change based on internal or adaptation strategy.
external stimulus. The approach is still a work in progress Adaptation Registry
and many of the identified components need to be further The adaptation registry is the key component that
scrutinized and described in more details. The proposed allows scaling and quickly expanding the identified
architecture is presented in Figure 2 and has a lot of
adaptation mechanisms. It stores adaptation practices that
common characteristics with SOA. The reason for this is
the service instances can use to manage themselves.
that microservices and SOA are tightly connected – often Services could search for adaptation practices, via their
considered that microservices is proper way of autonomic managers, based on specific situations or
implementing SOA. However, it is the self-adaptation
context but they could also register new adaptation
components that distinguishes our proposed architecture
strategies based on their knowledge.
from standard SOA implementations.
V. CONCLUSION
Electric power distribution (EDP) software intensive
systems represent a typical Internet-of-Things related
domain. EPD companies usually have to collect and
process information coming from large amount of
distributed smart meters, field sensors and other devices.
In this paper we have proposed an architectural approach
to manage reliability of software systems that deal with
processing of such information. Our solution is based on
adaptive architecture, based on microservices, this way
making it specifically oriented for execution in cloud
environment. Microservice based architecture also fosters
scalability of such systems.
Directions for further research include investigations of
other quality characteristics like performance. More
specifically, efforts should focus on algorithms for
caching of information flowing within the system.
ACKNOWLEDGMENT
Figure 2. Reference Architecture for Self-Adaptive Microservices in Research, presented in this paper was partially
EPD stream computing systems supported by the DFNI I02-2/2014 (ДФНИ И02-2/2014)
project, funded by the National Science Fund, Ministry of
Below we provide an overview for each of the Education and Science in Bulgaria.
components shown above.
Service Consumer REFERENCES
Service consumers are the users or systems that [1] Amin, S.M. and Wollenberg, B.F., 2005. Toward a smart grid:
consume functionality exposed by the services. For power delivery for the 21st century. IEEE power and energy
example these are the employees with the PDA devices, magazine, 3(5), pp.34-41.
other services and/or legacy components of the EPD [2] Avižienis, A., Laprie, J-C., Randell, B., Basic concepts and
system or eventually – the smart sensors after they are Taxonomy of dependable and secure computing, IEEE Trans on
installed by the EPD Company. Consumers make service Dependable and Secure computing, Vol. 1, Issue 1, Jan -March
2004.
invocations in order to use the desired functionality.
[3] Biem, A., Bouillet, E., Feng, H., Ranganathan, A., Riabov, A.,
Service consumers will use REST protocol in order to Verscheure, O., Koutsopoulos, H.N., Rahmani, M. and Güç, B.,
reach the services. 2010. Real-Time Traffic Information Management using Stream
Service Registry Computing. IEEE Data Eng. Bull., 33(2), pp.64-68.
Service registries serve as a catalog that provides [4] Butler, R., and G. Finelli. "The infeasibility of quantifying the
reliability of life-critical real-time software." IEEE Transactions
information for the services. This includes information on Software Engineering 19.1 (1993): 3-12.
about history of service invocations and about the service [5] Brun, Y., et al. Engineering Self-Adaptive Systems through
itself. History of service invocations would be used to Feedback Loops. In Software Engineering for Self-Adaptive
calculate metrics about service quality and specifically to Systems. LNCS, Vol. 5525. Springer-Verlag. 2009. 48-70.
monitor reliability. JDBC would be used to provide [6] Couceiro, M., Ferrando, R., Manzano, D. and Lafuente, L., 2012,
metrics data. May. Stream analytics for utilities. Predicting power supply and
Service Instance demand in a smart grid. In Cognitive Information Processing
(CIP), 2012 3rd International Workshop on (pp. 1-6). IEEE.
The service instance is the actual service [7] Diamantoulakis, P.D., Kapinas, V.M. and Karagiannidis, G.K.,
implementation and in our architecture consists of two 2015. Big data analytics for dynamic energy management in smart
components – managed component and autonomic grids. Big Data Research, 2(3), pp.94-101.
manager. Managed component contains the functionality [8] Dimov, A., S. Chandran and S. Punnekkat. (2010). How do we
of the service instance. Autonomic manager implements Collect Data for Software Reliability Estimation? Proceedings of
the adaptation rules and the data analyzer part of the the 11th International Conference on Computer Systems and
Technologies (CompSysTech). ACM ICPS, vol. 471. Sofia,
autonomic control loop. This way it handles service self- Bulgaria. June 17-18, 2010. 155-160.
adaptation capabilities. It monitors the managed
112
7th International Conference on Information Society and Technology ICIST 2017
[9] Fielding, R. Architectural styles and the design of network-based [17] Siano, P., 2014. Demand response and smart grids—A survey.
software architectures. Diss. University of California, Irvine, Renewable and Sustainable Energy Reviews, 30, pp.461-478.
2000. [18] Sornalakshm, K., Vadivu, G., 2015, A Survey on Realtime
[10] Farr, W. and M. Lyu, "Software reliability modeling survey" in Analytics Framework for Smart Grid Energy Management, In
Handbook of Software Reliability Engineering, New International Journal of Innovative Research in Science,
York:McGraw-Hill, pp. 71-117, 1996. Engineering and Technology, Vol. 4, Issue 3, pp. 1054-1058,
[11] Gokhale, S. "Architecture-Based Software Reliability Analysis: ISSN(Online) : 2319-8753
Overview and Limitations", IEEE Transactions on Dependable [19] Stoimenov, L., Davidovic, N., Stanimirovic, A., Bogdanovic, M.
Security Computing, vol. 4, no. 1, pp. 32-40, 2007. and Nikolic, D., 2016. Enterprise integration solution for power
[12] James, L. and M. Fowler, “Microservices: A definition of this new supply company based on GeoNis interoperability framework.
architectural term”, Data & Knowledge Engineering, 105, pp.23-38.
http://martinfowler.com/articles/microservices.html, March, 2014 [20] Stoimenov, L., Davidović, N., Stanimirović, A., Bogdanović, M.,
[13] Kephart, Jeffrey O., and David M. Chess. "The vision of Krstić, A. and Nikolić, D., 2011. GinisED Enterprise GIS-
autonomic computing." Computer 36.1 (2003): 41-50. Framework for the Utility of the Future. In CIRED 2011,
Frankfurt, Germany, 6-9. June.
[14] Naveen, P., Ing, W.K., Danquah, M.K., Sidhu, A.S. and Abu-
Siada, A., 2016, March. Cloud computing for energy management [21] Taft, J.D. and Becker-Dippmann, A.S., 2015. The Emerging
in smart grid-an application survey. In IOP Conference Series: Interdependence of the Electric Power Grid & Information and
Materials Science and Engineering (Vol. 121, No. 1, p. 012010). Communication Technology (No. PNNL--24643). Pacific
IOP Publishing. Northwest National Laboratory (PNNL), Richland, WA (United
States).
[15] Neumeyer, L., Robbins, B., Nair, A. and Kesari, A., 2010,
December. S4: Distributed stream computing platform. In Data [22] The Right Big Data Technology for Smart Grid – Distributed
Mining Workshops (ICDMW), 2010 IEEE International Stream Computing, https://www.accenture.com/us-en/blogs/blogs-
Conference on (pp. 170-177). IEEE. the-right-big-data-technology-for-smart-grid-distributed-stream-
computing, last accessed 12.04.2017.
[16] Richards, M., “Microservices vs. service-oriented architecture.”,
O'Reilly Media, Inc., 2015
113
7th International Conference on Information Society and Technology ICIST 2017
Abstract— With the emergence of new requirements in man- Actually, agents are able to respond promptly and correctly
ufacturing, mainly the product customization trend, imple- to changes, due to their inherent capabilities of self-adapting
menting a solution for product quality control becomes an to emergent events without external intervention [12].
increasingly complicated task. In this modern industrial context,
those solutions need to take into account the constant updates In the present work, we are interested in automatizing the
in the product design and combine its resources in the most product quality control which represents one of the key
effective manner in order to meet the new product specifica- application domains of the industry 4.0 for several reasons:
tions. By reason of the lack of available technical solutions, • It enables an efficient collaboration since it guarantees
this task is usually carried out by human operators which is
time consuming, tiring and not always a reliable solution. In a better conformity to the product specifications.
the present work, a self-* automatic quality control platform • It allows partners to track the product state during the
based on cyber-physical systems and multi-agents paradigm is different manufacturing phases and have an idea about
presented. The solution is validated within a real industrial use- eventual problems.
case. • It gives the customers the possibility to update on-the-
Keywords: Industry 4.0, Cyber-Physical System (CPS), Multi-
Agent System (MAS), Automatic Quality Control (AQC). fly the specifications of the product to be manufactured
without interrupting the process.
I. INTRODUCTION . Traditionally, quality control is performed according to a
test plan where the measurements and the decision criteria
The modern industrial context is characterized by two im- are defined. Those specifications are usually updated within
portant dimensions: On one hand, the need for collaboration a human intervention, which is suitable for mass production.
in the form of companies-networks in order to face competi- But with today’s growing need for product customisation,
tiveness especially for small and medium enterprises. On the the test plan should be constantly updated and configured
other hand, the need of enabling products customization to because of the huge variety of products. This task is no
better meet customer needs and exigences. longer possible to be performed by humans. For this reason,
Traditional manufacturing approaches have proven unable proposing an autonomous self-* quality control approach
to address the emergent needs mainly because of their becomes an essential task.
centralized and rigid conception. Consequently, nearly all In this paper, a CPS-Agent based quality control solution,
of the recent production paradigms seek to adopt more in line with the industry 4.0 paradigm, is proposed. At
agile approaches enabling responsiveness, robustness, self- this early development phase, there is a need to clearly
*, and flexibility. These new needs have coincided with define the CPS conception. In the following section of this
huge advancements in the technological area, mainly high paper, a proposal of a CPS design inspired by multi-agents
availability and affordability of sensors, computer networks, systems is introduced. The proposed architecture is then
the Internet of Things and the Artificial Intelligence.These used to implement a CPS-agent based framework for quality
circumstances have been at the origin of the industry 4.0 control. The framework will be finally validated within a real
paradigm emergence. This new concept, which was used industrial use case.
for the first time in 2011 in the Hannover Fair in Ger-
many, involves the main technological innovations applied II. T HEORETICAL BACKGROUND
to production processes in the field of automation, control Advances in technology have engendered the Internet of
and information technologies [6]. It enables companies to Things (IoT) emergence [1] where embedded sensors are
schedule maintenance, predict failures and adapt themselves in charge of collecting data and sharing it through the
to new requirements and unplanned changes in the produc- network. Today’s technology is unable to store and process
tion processes in an autonomous way [7]. the huge amount of data generated through this process, this
The fourth industrial revolution led to introducing the Cyber phenomenon is known as the Big Data challenge. In order to
Physical Systems (CPS) as a new technological enabler address this problem, many innovative concepts have been
allowing to build modular, intelligent and flexible systems. proposed particularly in the industry 4.0 context. One of
According to Lee [10], CPS are defined as autonomous the most interesting key-enablers that have drawn attention
systems, have a self-* behaviour, and are able to make in the manufacturing field is the Cyber Physical System
decisions; this intuitively leads to consider Multi-agents which is considered as an autonomous and reactive entity
system paradigm [24] as a valuable alternative to design CPS. that interacts with its physical and logical environment. It
114
7th International Conference on Information Society and Technology ICIST 2017
observes and predicts behaviour and makes decisions based coming from sensors and already preprocessed in the
on analytic and inferences. Every physical entity has its level below. The aim in this level is to build a complete
digital twin: a software model that mimics the behaviour of cyber image of the concerned entity. Complex analytics
the physical asset. In contrast, the IoT in common parlance such as deep learning are performed at this level. The
is generally limited to the physical assets, not their digital resulting image, called the cyber twin, is used to infer
models. In this early development phase, CPS design and additional knowledge.
implementation present many challenges and are still not de- • Cognition level: In this level a cyber physical system
fined. In the following section an approach of implementing should be able to analyse the knowledge provided by
CPS based on the IoT technologies and the Multi-Agents the cyber twin in order to predict its potential failures,
System paradigm is proposed. be aware of its degradation and estimate the time left to
A. CYBER PHYSICAL SYSTEMS reach a failure. To this end, specific prediction algorithm
are adapted based on historical data concerning the
The reliability of a solution is measured by the require- health evaluation.
ments it is able to satisfy. The starting point is the 5-C • Configuration level: Based on the predictions made in
architecture for Cyber Physical Systems as seen in [10] and the forgoing level, a machine have to react in order to
presented in the figure 1. The pyramid represents a data life reduce the potential impacts. Generally and based on a
cycle through CPS: from row data collected by the sensors predefined policy, a machine rises alerts to the factory
into knowledge on which decision-making is based. managers. In case of approaching failure, the machine
self-configure its behaviour to minimize the risk. It
can be done for example by reducing the production
throughput or completely stopping the process.
Based on the CPS architecture explained in the previous
paragraph, the main CPS requirements are listed in table I.
Most common technological enablers for each requirement
are then proposed.
TABLE I
T ECHNOLOGICAL ENABLERS FOR CPS
Challenges Enablers
Plug and play functionality IoT [16], [1]
Sensor network
Horizontal and vertical communication MAS [25]
Data processing Kalman filer [15]
Incomplete data estimation
Data accuracy assessment Machine learning [4]
115
7th International Conference on Information Society and Technology ICIST 2017
been extended by adding to the agent the social dimension, 2) Fuzzy logic: Fuzzy logic refers to the theory of fuzzy
the reactivity and the pro-activity characteristics [24]: sets [14]. A fuzzy set is defined as a set with no clearly
• The autonomy: An agent processes the data collected defined boundaries. It is characterised by a membership
from its neighbourhood and calculates its actions with- function which is a curve that defines how to associate each
out any external intervention. point of the input to a membership value in the interval [0,1].
• The social dimension: In multi-agents systems, a Fuzzy logic allows to model in a more intuitive way complex
standard language is defined that enables agents to dynamic systems and has the advantage to not need lots of
communicate. data to train. It is commonly applied in decision making
• The reactivity: The agents react and change their support systems and particularly to sensor monitoring and
behaviour based on their perception. estimation of machining. Example in [3] where a solution
• The pro-activity: Agents adapt their behaviours by for tool wear estimation is presented. Or in [21] where a
their self in order to achieve an ultimate aim. fuzzy Logic Approach to Sensor Monitoring in Machining:
Two types of agents can be distinguished based on their way Features from acoustic estimation signal extracted and used
to react: to develop an intelligent system for classification of tools
wear level.
• Reactive agent: These agent reactions are based on the
Based on the CPS definition and requirements provided in the
external received stimuli. It runs algorithms based on
previous section, cognitive agent cover an important number
predefined rules that takes the perceptions gathered by
of the CPS requirements. In the next section a conceptual
sensors as inputs and actions on the environment as the
approach of CPS based on cognitive Multi-Agent system is
output.
proposed.
• Cognitive agent: These agents are able to run com-
plex operations in order to fulfil a final goal. They III. THE PROPOSED APPROACH
are characterized by their representing and reasoning At this early development phase, CPS conception needs
skills.They run machine learning algorithms in order to to be defined. The contribution brought through the present
evolve and constantly adapt their behaviour to deal with work consists on proposing a conceptual solution to built a
emergent environmental changes and resolve conflictive CPS network.
situations.
A. CPS TECHNOLOGICAL ENABLERS
For the sake of reaching autonomously and efficiently its
ultimate goal, an agent runs complex operations based on Table II highlights the common points between CPS
decision-making paradigms. requirements and the Multi-Agents abilities. Remedies are
also proposed in case that the CPS requirement can not be
C. DECISION MAKING PARADIGMS satisfied by MAS abilities.
The most frequently used decision making methods that
TABLE II
come to a conclusion based on sensor signals data are briefly
C OMPARING CPS REQUIREMENTS WITH MAS ABILITIES
reviewed in this section. These methods include paradigms
like neural networks, fuzzy logic, genetic algorithms and CPS requirements Provided by MAS Solution
hybrid systems. (if not provided by MAS)
1) Neural networks: An artificial neural network (NN) is
Self-* Yes
an information processing paradigm inspired by the biologi-
Reactivity Yes
cal nervous system.[18]. NN are known for their remarkable
ability to provide meaning from complicated or imprecise Near real time Yes
data. NN acquire their knowledge thanks to a training process processing
that either supervised or unsupervised learning, depending on Prediction Partial Use Artificial Intelligence
the application and the available data: Plug and play Partial Use IoT as infrastructure
• Supervised learning: In this case, neural networks are Link with No Use embedded system
trained by providing standard input information and physical word technologies
their expected output.
• Unsupervised learning: With this learning mode, only Multi-agent System paradigm is used in order to fulfil re-
input stimuli are provided to the neural network. These quirements such as autonomy,self-* behaviour, reactivity and
inputs are automatically clustered into sets which are near real-time processing. The prediction and the decision-
closely related. making abilities are insured by using Artificial intelligence
Neural networks proved to be highly efficient when it comes in particular Artificial Neural Networks and Fuzzy Logic.
to tool failure recognition and risk prediction. In [2] a method Connection with the physical world needs hardware tech-
that automatically detects and classifies tool malfunctions nologies implementation, which can be resolved by the use
using probabilistic neural networks. An accurate flank wear of Embedded Systems solutions.
estimation in turning is proposed in [9] based on acoustic The figure 2 gives a clearer idea about the technological en-
emission signals. ablers and infrastructure involved in a CPS implementation.
116
7th International Conference on Information Society and Technology ICIST 2017
This proposition takes into consideration both the software is essential to check the validation of the decision
and hardware implementation issues. proposed by the decision module. This validation takes
into consideration the coherence of the decision with
the company business rules on one hand. On the other
hand, this module checks if the proposed decision is
compatible with other ongoing processes and does not
create conflicts. Do do that it uses optimization methods
like scheduling algorithms.
• Actuators: These are components in charge of exe-
cuting the received commands. They can be network
gateways sending alerts, reports or messages to the
concerned parts. They can also, as commonly known,
seen as entities able to act on the physical environment.
Fig. 2. CPS technological enablers
B. PROPOSED FRAMEWORK
The proposition consists on proposing a CPS-Agent based
framework, that takes advantages from the technological
enablers mentioned in the previous section. The idea is to
implement a CPS-agent network in an industrial context to
supervise and control the production systems. The figure 3
presents the different components of a single CPS-agent.
• Sensing module: This module represents the CPS-agent
Fig. 3. The proposed framework architecture
entrance point of the incoming data. CPS network is
based on an IoT infrastructure, that means that every
CPS-Agent is equipped with a network interface. This This solution is applied to implement a quality control system
interface is considered as the sensor responsible of and is in validation phase with a real industrial use case.
collecting logical data, which is mainly messages from IV. APPLICATION
others CPS-Agents and commands from the information
The proposed framework is implemented within an indus-
system. Physical sensors are part of the sensing module
trial partner APR (Application Plastique du Rhone) which is
responsible for collecting physical data. Plug and play
a French company specialised in plastic manufacturing. Con-
functionality is required to enable self-configuration.
sidering the company requirements and business rule, a qual-
A CPS-Agent have the possibility to autonomously
ity control solution for plastic items is being implemented.
instantiate the type and the number of sensors that suit
The figure 4 represents examples of items manufactured by
the respective demand.
APR. These items need a compliance control up to 100%.
• Pre-processing module: This module acts as a physical
filter, it is responsible for row data cleansing. This
process consists on removing noise and redundancy
from the incoming data. It is also responsible to assess
the data accuracy based on the level of confidence
assigned to the source. In the case that several sensors
are implemented, the data fusion is performed in this Fig. 4. Example of items to control
step in order remove incoherences. The output of this
module is the pre-processed data used by the decision
module. A. SOFTWARE IMPLEMENTATION
• Decision module: The added value of a CPS resides Practically, the proposed approach is implemented on a
on its decision capacities. For this reason, this module Raspberry Pi card. To performs this task, JAVA language
is considered as the most important. It has for mission is used. Modern guideline based on modularity has been
to process the received pre-processed data to provide followed in order to insure a scalable and generic solution.
decision. The cognitive capacities used are based on The reason behind the choice of modularity in implementing
Artificial Networks and Fuzzy Logic to perform essen- a CPS-agent solution is enabling flexibility which is manda-
tially prediction tasks. The predictions made are used to tory to provide a self-* behaviour. The figure 5 represents a
automatically self-configure its behaviour, optimize the simplified class diagram of a CPS-Agent.
production incomes, reduce impacts and avoid failures. The controller module has the same main functionalities as
• Validation module: The validation module is responsi- a logical agent: It perceives, decides and applies decision. It
ble for sending commands to the actuators. This module is also responsible for instantiating the types and numbers
117
7th International Conference on Information Society and Technology ICIST 2017
118
7th International Conference on Information Society and Technology ICIST 2017
[6] Mario Hermann, Tobias Pentek, and Boris Otto. Design principles
for industrie 4.0 scenarios. In System Sciences (HICSS), 2016 49th
Hawaii International Conference on, pages 3928–3937. IEEE, 2016.
[7] Nasser Jazdi. Cyber physical systems in the context of industry 4.0. In
Automation, Quality and Testing, Robotics, 2014 IEEE International
Conference on, pages 1–4. IEEE, 2014.
[8] Babu Joseph and F Wang Hanratty. Predictive control of quality in
a batch manufacturing process using artificial neural network models.
Industrial & engineering chemistry research, 32(9):1951–1961, 1993.
[9] SRT Kumara and PH Cohen. Flank wear estimation in turning
through wavelet representation of acoustic emission signals. J. Eng.
for Industry, 2000.
[10] Jay Lee, Behrad Bagheri, and Hung-An Kao. A cyber-physical
systems architecture for industry 4.0-based manufacturing systems.
Manufacturing Letters, 3:18–23, 2015.
[11] Jay Lee, Edzel Lapira, Behrad Bagheri, and Hung-an Kao. Recent
advances and trends in predictive manufacturing systems in big data
environment. Manufacturing Letters, 1(1):38–41, 2013.
[12] Paulo Leitão. Agent-based distributed manufacturing control: A state-
of-the-art survey. Engineering Applications of Artificial Intelligence,
22(7):979–991, 2009.
[13] Paulo Leitão, José Barbosa, and Damien Trentesaux. Bio-inspired
multi-agent systems for reconfigurable manufacturing systems. Engi-
neering Applications of Artificial Intelligence, 25(5):934–944, 2012.
[14] Ebrahim H Mamdani. Application of fuzzy algorithms for control of
simple dynamic plant. In Proceedings of the Institution of Electrical
Engineers, volume 121, pages 1585–1588. IET, 1974.
[15] Reza Olfati-Saber. Distributed kalman filter with embedded consensus
filters. In Decision and Control, 2005 and 2005 European Control
Conference. CDC-ECC’05. 44th IEEE Conference on, pages 8179–
8184. IEEE, 2005.
[16] Afif Osseiran, Volker Braun, Taoka Hidekazu, Patrick Marsch, Hans
Schotten, Hugo Tullberg, Mikko A Uusitalo, and Malte Schellman.
The foundation of the mobile and wireless communications system
for 2020 and beyond: Challenges, enablers and technology solutions.
In Vehicular Technology Conference (VTC Spring), 2013 IEEE 77th,
pages 1–5. IEEE, 2013.
[17] Djamila Ouelhadj and Sanja Petrovic. A survey of dynamic scheduling
in manufacturing systems. Journal of scheduling, 12(4):417–431,
2009.
[18] David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams.
Learning internal representations by error propagation. Technical
report, DTIC Document, 1985.
[19] Stuart Russell, Peter Norvig, and Artificial Intelligence. A modern
approach. Artificial Intelligence. Prentice-Hall, Egnlewood Cliffs,
25:27, 1995.
[20] Neal Solomon. System, method and apparatus for organizing groups
of self-configurable mobile robotic agents in a multi-robotic system,
March 11 2008. US Patent 7,343,222.
[21] R Teti. Fuzzy logic approach to sensor monitoring in machining. NEW
TRENDS IN FUZZY LOGIC, page 189, 1995.
[22] Paul Valckenaers and Hendrik Van Brussel. Holonic manufacturing ex-
ecution systems. CIRP Annals-Manufacturing Technology, 54(1):427–
432, 2005.
[23] Shiyong Wang, Jiafu Wan, Daqiang Zhang, Di Li, and Chunhua Zhang.
Towards smart factory for industry 4.0: A self-organized multi-agent
system with big data based feedback and coordination. Computer
Networks, 101:158–168, 2016.
[24] Michael Wooldridge. An introduction to multiagent systems. John
Wiley & Sons, 2009.
[25] Gulnara Zhabelova and Valeriy Vyatkin. Multiagent smart grid
automation architecture based on iec 61850/61499 intelligent logical
nodes. IEEE Transactions on Industrial Electronics, 59(5):2351–2362,
2012.
119
7th International Conference on Information Society and Technology ICIST 2017
Abstract— The aim of this paper is to highlight the economical, ecological, social and business growth,
potentials of the Internet of things (IoT) in supporting according to the MO CC Sustainability Assessment
sustainable economical, ecological, social and especially Framework. CC is an environment which facilitates IoT
business growth, according to the Multi-Objective (MO) functioning, as it supports handling the high data volumes
Cloud Computing (CC) Sustainability assessment model. generated from large number of devices, conserves
IoT is highly related to the CC as its inherent environment energy, memory, processing and minimizes the costs.
and platform for the operations. It has motivated us to
present the ways of harnessing IoT capacities in realization II. MOTIVATION FOR RESEARCH
of the defined sustainable development goals (SDGs), aiming
to bring sustainable development, taking care of the present The increasingly rapid developing and greatly
needs but without jeopardizing the ability of next consumerist human life environment implicates that the
generations to meet their own necessities. The MO CC natural resources are also increasingly consumed with
Sustainability Assessment Framework is perfectly matched time. Furthermore, the way of using the resources is
with IoT paradigm, taking a rule of support platform for rapidly changing, thus the issue of ensuring the needed
storage and analysis of data collected from the IoT level of sustainability becomes one of the most important
networked devices. Our idea is to provide a wide research questions [3]. The MO CC Sustainability
sustainability assessment framework that will encompass all Assessment Framework is perfectly matched with IoT
the diversity of design and use variations as well as the paradigm, as Cloud has become an ultimate support
integration of the CC technology as a support platform for platform for storage and analysis of data collected from
more specific environments and architectural designs. the IoT devices, allowing smooth utilization of the
applications that rely on the virtualization technologies
and intensive data computation.
I. INTRODUCTION There is an arising interest in merging Cloud and IoT,
Cloud Computing (CC) is one of the most potential as their joint use has become pervasive, making them
transformative paradigms in business and society areas, inevitable components of the Future Internet. The
delivering a spectrum of computing services and various research question stated for this study is how to provide a
usage alternatives to the customers. As such, CC has sustainability assessment framework that will encompass
recently triggered additional attention and efforts towards all the diversity of design and use of IoT, as well as the
searching its sustainability conformance side and integration of the CC technology as a support platform
proposing a proper sustainability assessing model. The for specific environments, data and architectural designs.
existing literature indicates that there were certain efforts
to provide some modeling, but unfortunately there is no III. RELATED WORK
comprehensive proposal. We have proposed Multi Even though there were some initiatives to provide a
Objective Cloud Computing Sustainability Assessment suitable sustainability assessment approach, the work
framework that is based on four different pillars, or related to the sustainability in IoTs based on CC still lacks
specific areas that cloud computing technologies and its the integrated proposal. The existing models mostly direct
development strategies have strong influence to [1, 2]. It is the attention to one specific sustainability pillar. For
a Multi-Objective (MO) model designed with a goal to instance, a group of authors has addressed the most
analyze the effects of CC technologies at the economy, prominent business models for long-term sustainability,
business, ecology, and society levels. The framework is Cloud Cube Model (CCM) and Hexagon model [4]. There
designed with a goal to free the users from defined are studies that are oriented towards the models focused to
constrains when choosing relevant objectives for their the financial aspect, such as the Capital Asset Pricing
system/working environment. Such high level of Model (CAPM) [5]. The research for more comprehensive
flexibility allows the design and joint use of CC and studies leads to the United Nations (UN) general model,
Internet of Things (IoT) paradigms. The driving which relies on 10 principles and 17 sustainability
motivation for this paper is to provide an analysis of the development goals (SDGs) for stable sustainability
IoT technology potentials that would support sustainable development modeling [6, 7]. Our methodology strongly
120
7th International Conference on Information Society and Technology ICIST 2017
relies on the UN directions, while further expanding the infrastructure provider when applied to IoT, and puts light
proposed MO CC Sustainability assessment model [1, 2] to the possibilities of using CC infrastructure.
with business area based aspects that are highly related to
the proper application and use of the IoT technologies. For
analysis and comparison to the UN model, we have taken
into consideration the UN SDGs.
121
7th International Conference on Information Society and Technology ICIST 2017
122
7th International Conference on Information Society and Technology ICIST 2017
secure and reliable network environment. Thus, the QoS is assessed by a number of appropriate relations and
second level of the model classifies users: service is defined by goals required to be accomplish when
providers and end users; which have specific interests delivering the IoT-Cloud to the user:
versus the defined objectives related to the infrastructure,
platform, and application (level 3). At the fourth level, QoS ( Si ,VM j ) f (QoS _ DesignVariables) (1)
infrastructure provider objectives can be divided into
those related to income maximization and those related to where QoS ( Si ,VM j ) is a multivariable function for
expense minimization. Income objectives are classified as evaluation of the performances. In general it also targets
quality of service (QoS) based and security based. the estimation of the fault tolerance, available resource,
Expense objectives are divided into: resource usage and load balancing, and scalability, which depend on a set of
efficiency of resource usage. Accordingly, we have features that are grouped as quantifiers. First set
defined and provided a mathematical modeling of a set of represents quantifiers of physical resources:
featured objectives to fulfill the IoT business sustainability
principles. We will focus to QoS and security, while other I qf f ( N S , Si , SiCPU S iRAM , S ist , Sistr , Sibw , HDD param , SDD param , RAIDitype , FS itype )
goals, due to the available paper length will be explained (2)
in brief. Quality of service (QoS) can be estimated
providing that a set of design variables listed in Table I: where HDD param and SDD param depend on the number,
features and size of HDD/SDD disks. The next group of
TABLE I. quantifiers describes virtual resources:
QOS DESIGN VARIABLES
Variable Identifier II qf f ( NVM ,VM j ,VM CPU
j VM RAM
j , VM im bw
j , VM j , HP
type sch
, CPU HP ) (3)
Number of servers NS
Serveri
Third set of quantifiers defines activities {migrations,
Si distributions, replications} in virtual environment:
Number of VMs NVM
III qf f (S i VM j ,VM j (S i ),VM im im _ rep
j ( S i ),VM j ( S i )) (4)
Virtual machinej VM j
No. of ph. CPUs on server Si QoS maximization as Goal O1 is further defined based
SiCPU
on the relations (1)-(4):
No. of virtual CPU cores on VMj
VM CPU
j
O1 max QoS (Si ,VM j ) (5)
RAM size on server Si [GB]
SiRAM
QoS performances rely on a large number of variables:
RAM size on VM j VM RAM
j
QoS ( performances ) f ( I qf , II qf , Si VM j ,VM j (Si Sk )) (6)
Ph. storage size on ph. server Si [TB] Sistr
while reliability and availability are highly dependable on
VM j image size [TB] VM im
j VM manipulation operations: migration, images
Server network traffic [GB/s] migration and system of image replicas (7, 8):
Sibw
VM j network traffic bandwidth VM bw QoS (reliabilit y , availability ) f ( RAIDitype , VM manip ) (7)
j
Hypervisor type and parameters
Hypervisoritype
VM manip f (VM j (S i S k ), VM im im _ rep
j ( S i S k ),VM j (S i S k )) (8)
CPU scheduling type and parameters
related to hypervisor
CPU isch
Another important aspect to consider in IoT is security
No. and type of magnetic HDs
HDD num plan. The goals related to security focus to bringing high
Hard Diski levels of sensitivity, specificity and accuracy of the
HDDi security systems. Yet, the main obstacle for gaining the
Hard Diski size [TB]
HDDisize best possible security performances, are QoS and energy
Type and number of SSDs efficiency objectives, as these are in the divergence with
SDD num the security limitations.
SSD Diski
SDDi Intrusion Detection/Prevention Systems (IDS/IPS)
represent the basic support for system security
SSD Diski size [TB]
SDDisize provisioning. IDS/IPS provide detection, classification
RAID type and parameters and further reaction to the detected malicious events in
RAIDitype
the supervised system environment. As there is no ideal
Filesystem type and parameters
FSitype algorithm that will provide absolute detection accuracy,
VMs distribution some normal activity is classified as malicious, gaining
Si VM j
tag of being false positive ( FP ).
VMs migration Alternatively, some malicious traffic can skip the IDS
VM j ( S i )
VMs images migration scanning capabilities and treated as normal traffic (false
VM im
j ( Si )
negative ( FN )). True positives ( TP ) and true negatives
System of the VMs images replicas VM im _ rep
( Si ) ( TN ) stand for real normal events detected as such, and
j
123
7th International Conference on Information Society and Technology ICIST 2017
malicious traffic that is correctly detected, respectively. NVM NVM NVM NVM
IDS/IPS sensitivity and specificity are defined base in the VM CPU
j VM RAM
j VM imj VM bwj
j 1 j 1 j 1 j 1
collected TP and TN events rates.
NS NS NS NS (11)
Further we define Goal O2 - the sensitivity SiCPU SiRAM Sistr Sibw
i 1 i 1 i 1 i 1
1 ( Si ,VM j )
4
maximization, and Goal O3 - the specificity
maximization: Table II provides details related to the set of selected
goals.
TPi
sensitivity TPrate _ i
,
O2 max TPrate (TPrate _ i ) (9) We define Goal O4 as reaching the maximization of
TPi FNi
1 , which depends on optimal deployment and migration
specificit y TN rate _ i
TN i O3 max TN rate (TN rate _ i ) of virtual machines. Goal O5 is defined as accessing the
, (10)
TN i FPi minimization of the number of VM migrations. The
The resources usage are measured through energy use number of migrations of virtual machines is calculated
for the operations of IoT devices and adequate CC according to:
infrastructure, the cost of the used administrative time, NVM
and the use of non-IT components and time. The energy 2 (VM ) VM j (Si Sk ) (12)
j 1
consumption is the most important component, as it
depends on the CPU energy usage, energy spent on Goal O6 searches for the minimization of energy
network components, and energy for storage operations. consumption.
Resource efficiency is decomposed to efficiency of: CPU, The listed variables depend on the values obtained
RAM memory use, storage space use, and network from calculating statistical power consumption for each
traffic. We will not repeat the discussion on these design CPU and for all CPUs.
variables due to their high overlapping with QoS
objectives variables. Resource use efficiency is calculated
according to:
TABLE II.
CLOUD C OMPUTING AND INTERNET OF THINGS RELATIONSHIP.
The statistical power consumption depends on the In the similar way, we can estimate the storage power
probability of Pj state of the CPU i : consumption goal which depends on the used disk
technology, and relies on the statistical power
N CPU 1 N CPU _ st 1 consumption for individual PHDDi and PSSDi , and
PCPU iCPU
j * PjCPU (13)
i0 j0 statistical power consumption of all disks, PHDD and
PSSD . Table II provides information on the calculations
where the probability of a specific state for CPU is:
on goals O7 ( Pstr minimization), O8 (energy Efficiency
iCPU
j f ( Hypervisorstype , CPU sch , Si VM j ),VM j ( Si )) (14) ( EE ) CPU/Storage maximization), O9 (maximization of
124
7th International Conference on Information Society and Technology ICIST 2017
EE networking performances) and O10 (maximization of Further work will be focused to stronger support for
networking energy efficiency) which are deffined integration of Open/Private/Public data and, depending on
according to values: the volume and ontological specification, the integration
and more efficient use of Big/Smart Data. The appearance
HDD / SSD num 1 HDD / SSD num 1 N HDD / SSD _ st 1 of diverse worldwide green initiatives that are raised as a
PHDD / SSD PHDDi / SSDi iHDD
j
/ SSD
* PjHDD / SSD (15) consequence of intensified IoT involvement to all human
i i j0
life aspects is bringing additional bustle in IoT application
where iDTj can be calculated according to the relation 16. possibilities and development directions.
DT stands for "disk type":
ACKNOWLEDGMENT
iDT type
j f ( DTparams , NVM S , FSi , RAIDitype , IIIqf ) (16)
The work presented in paper has partially been funded by
The proposed MO analysis is multiple step process. the Ministry of Education, Science and Technological
Radar diagram given in Figure 4 provides one possible Development of Republic of Serbia: V. Timčenko by
appropriate way of visualizing and comparison of the grants TR-32037/TR32025, N. Zogović and B. Đorđevic
selected goals and their respective values obtained by the by grant III-43002, and M. Jevtić by grant TR-32051.
arbitrary parameter estimation for the IoT network/system
under the analysis. It provides an example of two different REFERENCES
estimations (A and B) of the selected objectives, and
[1] Timčenko V et al (2016) Assessing Cloud Computing
indicates objectives that should be enhanced or paid Sustainability. In Proc. of the 6th Int. Conf. on Information
additional attention to. Society and Technology, ICIST2016, pp. 40-45.
[2] Zogović N et al (2015) A Multi-objective Assessment Framework
for Cloud Computing. In Proc. of TELFOR2015 Serbia, Belgrade,
pp. 978 - 981.
[3] D. Griggs, et al. (2013) Policy: Sustainable development goals for
people and planet, Nature 495.7441.pp: 305-307.
[4] Chang, Victor, Gary Wills, and David De Roure. "A review of
cloud business models and sustainability." Cloud Computing
(CLOUD), 2010 IEEE 3rd International Conference on. IEEE.
[5] "Sustainable ICTs and Management Systems for Green
Computing", edited by Hu, Wen-Chen IGI Global, 2012
[6] A report to the SG of the UN by the LC of the SDSN (2015)
Indicators and a monitoring framework for the sustainable
development goals – Launching a data revolution for the SDGs.
[Available] http://indicators.report/
[7] Y. Lu, et al (2015) Policy: Five priorities for the UN sustainable
Figure 4. “IoT-Cloud” chart for multiple quantitative variables development goals-comment. Nature 520.7548.pp: 432-433.
comparison [8] S.88 DIGIT Act: “Developing Innovation and Growing the
Internet of Things Act” or “DIGIT Act”, enacted by the Senate
V. CONCLUSION and House of Representatives of the United States of America in
Congress assembled. [Available]
This paper provides a sustainability assessment https://www.congress.gov/bill/115th-congress/senate-bill/88/text
perspective of the Internet of Things paradigm, taking into [9] Internet of Things Global Standards Initiative. [Available]
consideration the relation to the cloud computing http://www.itu.int/en/ITU-T/gsi/iot/Pages/default.aspx
environment and demands for IoT business growth in such [10] Botta A et al (2016) Integration of Cloud computing and Internet
of Things: A survey. Future Generation Computer Systems, 56,
environment. IoT is believed to be one of the most pp. 684–700.
important nowadays Mega Trends. The integration of the [11] Gayler M (2012) Open Data, Open Innovation and The Cloud. A
IoT to the proposed Multi Objective Cloud Computing Conference on Open Strategies - Summit of New Thinking,
Sustainability Assessment model is presented by means of Berlin.
a set of defined objectives that should be fulfilled in order
to provide proper, secure, and QoS enabled environment.
125
7th International Conference on Information Society and Technology ICIST 2017
{d.tchoffa}{a.elmhamedi}{f.danioko}@iut.univ-paris8.fr, nicolas.figay@airbus.com
The emergence of the Internet of Things (IoT), coupled with FIF was built starting from the ATHENA
other capabilities such as Cloud Computing, Big Data, Interoperability Framework (AIF[4]), but had to
Artificial Intelligence or Block Chain, creates opportunities continuously extend and adapt itself in order to address
for the development of Cyber-Physical Systems (CPS) at the effectively PLM interoperability issues and brakes coming
digital business ecosystem scale. It will impact the way from successively emerging industrial drivers, states of the
engineers work, in order to design, develop, deploy, operate practice and Information and Communication
and maintain complex systems-of-systems, with increasing Technologies (ICT). The AIF holistic and multi-layered
needs concerning security, agility and interoperability. This approach which combined the usage of Model Driven
paper presents the evolutionary Federated Interoperability Architecture (MDA), Service Oriented Architecture
Framework (FIF) which has been developed continuously for (SOA), Ontology and Enterprise Architecture (EA) was
Aeronautic Space and Defence over the last years in order to insufficient. FIF extended it by considering combined
ensure continuous Product Lifecycle Management (PLM) usage of a consistent set of open standard as key enablers.
interoperability. The evolutionary nature of the FIF is It also does not propose a new independent interoperability
demonstrated, by describing how Cloud and Portal framework, but a way to federate legacy one, in order to
virtualization technologies were included into the FIF in define and to build the ideal system reusing legacy
order to short the time to usage of PLM standards within a frameworks as building blocks of the proposed approach.
Dynamic Manufacturing Network (DMN). However the FIF FIF consequently had to support the identification and the
should similarly support identifying and addressing new qualification of an accurate set of consistent open standards
interoperability challenges and scientific gaps related to for the different layers and aspect to be considered when
Virtual Manufacturing based on other emerging Internet willing to achieve continuous PLM interoperability:
technologies.
Information & Communication technologies (ICT),
Information system, Business, Motivation and Enterprise.
I. INTRODUCTION FIF has been extended through successive research,
standardization and operational projects, addressing these
Internet evolution, with emerging IoT, Cloud successive industry drivers: exchange and sharing of
Computing, Big Data, Artificial Intelligence or Block distributed Digital Mock up within the extended enterprise,
Chain, creates opportunities for the development of a new technical Enterprise Application Interoperability, usage of
kind of digital business ecosystems constituting Cyber- digital behavioural aircraft and establishment of Dynamic
Physical Systems at scale. The way engineers work and Manufacturing Networks. FIF have been addressing
PLM approaches [1] will be impacted when designing, several identified scientific gaps, such as semantic
developing, deploying, operating and maintaining complex preservation with a model driven approach [5], formal
systems of systems (e.g. Aircraft and associated design, Interoperability specification within a DMN (IMAGINE
production, operating and support environment), in [6]), or test based approach for shorting the time to usage
particular because the needs concerning continuous of PLM standards (addressed by SIP [7]). The whole
security, agility and interoperability will increase as well as description of the FIF is out of the scope of this paper. The
the complexity of the systems of systems to consider. aim is to illustrate the evolutionary nature of the FIF, by
The Federated Interoperability Framework has been describing how this last scientific gap was addressed
developed iteratively for Aeronautic Space and Defence through innovative usage of emerging Cloud technologies
over the last years, in order to support the continuous combined with other enabling technologies selected by the
Through Lifecycle Interoperability [2]. It is based on the FIF.
usage of open PLM standards elected by a mature
eBusiness community in terms of interoperability [3]. II. RESEARCH QUESTIONS
FIF was designed in order to be evolutionary, i.e. it The research questions addressed in this paper are related
should be easy to adapt it to a continuously changing to the introduction of Cloud technologies in the FIF,
environment combined with other enablers, in order to respond to the
126
7th International Conference on Information Society and Technology ICIST 2017
previously described motivations, context, constraint and portfolio of cross organization collaboration scenarios
needs for proposing a test based approach which will formalized as cross organizational collaborative Processes
reduce the time to usage of the PLM manufacturing open models (Pc), from which it will be possible to derive
standards. Such an approach should allow assessing associated testing procedures, Standardized Cross
standards and their implementation on top of an Organizational collaborative process (SPc) as workflow
experimental test bed platform simulating a complex DMN models, data flows, associated test data sets and the
at an acceptable price. Performing a continuous State of the required simulation and testing capabilities. The DMN
Practice (SoP) and a continuous State of the Art (SoA), no model, formalized by mean of ArchiMate, captures the
approach addressing the identified needs and solving the actual data exchange environment, at ICT, Information
identified scientific gap was found: at this time, no formal System (IS) and Business Layer, and is used as the input of
method exists allowing modelling DMN interoperability the DSF. Such approach was demonstrated in [10].
use cases, derived collaboration models and associated However the collaboration environment is simulated on a
testing procedures with the needed test data. However a multi-tenant network of virtual networks. The applications
new approach is required, addressing data, service and are also simulated with Application Reference Components
process interoperability at business, information and ICT (ARCs), which are applications which provide interfaces
layers simultaneously. We only provide the most implementing the required subset of the PLM standard
representative references extracted from those studied in required in order to play the collaboration scenarios. It can
the SoA and the SoP. As many proposed testbed be a legacy software product, as well as software
approaches, [8] only deal with schemas conformance, components automatically generated from standards in
which is not interoperability testing, within a Business to order to play the expected role. Such approach was
Business context. [9] propose exactly what should be demonstrated in [4] and [5].
considered as interoperability testing, but restricted only to The second phase (cf. Figure 2) consists in testing
the ICT layer. For the innovative approach we are software solutions or applications, within the role of ARCs
proposing, we adopted the same philosophy than [9], but used in the collaboration scenario. ARCs are replaced by
considering the holistic FIF multi-layered approach, the applications or software systems to be tested. They can then
manufacturing context and the enabling cloud
virtualization cloud technologies in order shorting the time
to usage of PLM manufacturing standards. For that, a DMN
Software Factory (DSF) is proposed which takes advantage
of a Model Driven Approach for the fast generation and
assembly of components of a testing environment in the
Cloud. The FIF, the building blocks of testing
environments and the building blocks of the DSF should
support extension and reconfiguration in order to support
evolution of technologies and practices for continuous
interoperability.
127
7th International Conference on Information Society and Technology ICIST 2017
appropriate Information flow Coverage (IC), i.e. what can solution component and fit perfectly well with
be exchanged using a well-defined subset of the used infrastructure network virtualization solutions.
application protocols. The considered criteria for assessing the categories of
solutions and electing available realizations as component
IV. CLOUD BASED SOLUTION: DISCUSSION ON WHAT of the shelves were simplicity, openness, robustness and
WAS ACHIEVED reusability as parametric building blocks of the DSF. It
The adopted approach combined usage of ICT means it should be easy to deploy and aggregate them for
infrastructure, Business Application and Enterprise building and reconfiguring on demand complex
virtualization as illustrated by Figure 4. heterogeneous simulation and PLM testing infrastructures.
A model based automating creation of component relying
on such building blocks was defined and will be assessed
in operational tests as implied by the FIF principles. It is
out of the scope of this article, but it will lead to new
research publications in the following months. Applying
this approach, Model Based System Engineering usually
applied to manufactured products can be applied to
composite organization, using standard libraries pre-
installed machines (appliances) and associated parametric
models formalized as ArchiMate blueprint models. The
approach is illustrated by Figure 6.
128
7th International Conference on Information Society and Technology ICIST 2017
VI. CONCLUSION
The Federated Interoperability Framework (FIF) has
been developed continuously for Aeronautic Space and
Defence over the last years in order ensuring continuous
Product Lifecycle Management (PLM) interoperability.
This paper aims at demonstrating the evolutionary nature
129
7th International Conference on Information Society and Technology ICIST 2017
of the FIF, by describing how Cloud and Portal [3] H. Panetto, “Towards a Classification Framework for
virtualization technologies were included in the FIF in Interoperability of Enterprise Applications”, International Journal
of Computer Integrated Manufacturing, Taylor & Francis, 2007, 20
order to short the time to usage of PLM standards within a (8), pp.727-740. <10.1080/09511920600996419>
Dynamic Manufacturing Network (DMN). Similarly, FIF [4] “ATHENA Interoperability Framework”,
is being extended in order identifying and addressing new http://athena.modelbased.net/
interoperability challenges and scientific gaps related to [5] N.Figay, “Interoperability of technical enterprise applications thesis
Virtual Manufacturing with the usage of emerging Internet report”, 2009, https://tel.archives-ouvertes.fr/tel-00806072
technologies. It provides a way for short the time for using [6] IMAGINE project, “Innovative end-to-end Management of
emerging standards for Manufacturing 4.0 or digital twins. Dynamic Manufacturing Networks, http://www.imagine-
futurefactory.eu
Important identified scientific gap to address in the future
[7] SystemX Standard Interoperability PLM project web site
is related to the needs to consider complex systems of http://www.irt-systemx.fr/systemx-lance-le-projet-sip-standards-
systems for which the systemic paradigm is less and less interoperabilite-plm/Data
accurate when considering multi-scale virtualized systems. [8] N. Ivezic, B. Kulvatunyou, A.. Jones, “A manufacturing B2B
interoperability testbed”, ICEC '03 Proceedings of the 5th
ACKNOWLEDGMENT international conference on Electronic commerce ,Pages 195-204
[9] ETSI, « Interoperability Best Practices Edition 2», ETSI,
This work has been partly funded by the European http://www.etsi.org/about/how-we-work/testing-and-
Commission through the Project IMAGINE: “Innovative interoperability
End-to-end Management of Dynamic Manufacturing [10] D. Tchoffa, N. Figay , P. Ghodous , E. Expósito , L. Kermad , T.
Networks” (Grant Agreement No. 285132). The authors Vosgien , A. El Mhamedil, “Digital factory system for dynamic
wish to acknowledge the Commission for their support. We manufacturing network supporting networked collaborative
also wish to acknowledge our gratitude and appreciation to product”, Data & Knowledge Engineering, Volume 105,
the ATHENA, IMAGINE and SIP Projects partners who September 2016, Pages 130–154
the contributed to the development of various ideas and [11] N. Figay, P. Ghodous, B. Shariat,, E. Exposito, D. Tchoffa,, L.
concepts presented in this paper. Kermad, E. M. Dafaoui,
[12] Thomas Vosgienl, “Model Based Enterprise Modelling for Testing
PLM Interoperability in Dynamic Manufacturing Network”, IWEI
REFERENCES 2015: 141-153
[1] “About PLM”, CIMDATA, http://www.cimdata.or/
[2] “Through Life Cycle Interoperability. A critical strategic level of
competitiveness Rev 1.0”, 09/05/2014, ASD/TLCI/01,
http://www.asd-ssg.org/through-life-cycle-interoperability
130
7th International Conference on Information Society and Technology ICIST 2017
Abstract—Systems that can tightly integrate physical with Dynamic reconfiguration of systems
virtual components have represented a priority for the Measure system state with the aid of sensing
Systems Engineering Research. Research efforts have been systems and changing state with the aid of
concentrated in domains such as: Internet of Things,
actuation systems
Internet of Services and recently in the domain of Cyber
Physical Systems. The main objective of this paper refers to Adaptive control systems and networked control
the investigation of applying process mining techniques on systems
streams of data collected from existing sensing networks. A
method based on automated planning problem is proposed
and implemented.
I. INTRODUCTION
In recent years, advances in various fields such as
pervasive computing, sensor and computing technology,
plan and activity recognition (especially human activity
recognition) have opened several paths to achieve novel
solutions for applying process mining techniques in
various new settings.
In this paper, the practical aspects of using the process
mining techniques are investigated, specifically the case in
which data is collected from existing sensor networks. The
objective will be to extract the process model that best
explains the observed behavior – the result of applying a
process discovery techniques – with the minimal effort on
the part of the experts required to set up the system.
This problem can be aligned to challenges from the
Process Mining Manifesto [1], specifically:
C1 – Finding, Merging and Cleaning Event Data
Figure 1. Generic CPS system architecture
C7 – Cross-Organizational Mining – as it will be
presented in the case study In the field of process mining, from the early stages of
C11 – Improving Understandability for Non- its evolution, there has been a keen interest in the practical
Experts and the related guiding principle “GP5: aspects and benefits of the developed methods and
Models Should Be Treated as Purposeful techniques, specifically in the case of process discovery
Abstractions of Reality” techniques.
In [2] the authors analyze the usage of process mining
II. RESEARCH QUESTIONS AND RELATED APPROACHES techniques in a hospital setting the application of process
The Cyber-Physical System paradigm addresses the discovery techniques in various Dutch organizations –
relation between “Cyber” (virtual, computation public (municipalities) and private. The authors of [3]
environment) and “Physical” (natural environment) present an overview of the (perceived) success of process
components of ICT enabled systems. discovery techniques. Closest to this investigation –
industrial setting and an existing sensor network - is the
Adopting the CPS paradigm in the context of enterprise
case study present in [4] involving the extraction of a
systems must consider the following:
process model from a wafer machine.
the components heterogeneity in regard to As all these practical (even simulated) case studies
interoperability and composition; reveal – even though the process discovery algorithms
the system components complexity and thus new have matured to the point of achieving their intended
modelling techniques objective with minimal input from the human operators
complex interaction between human operators, (parameter tuning), the main issues revolve around the
equipment and software systems; pre-processing stages.
stochastic nature of processes; The “canonical” process discovery problem is defined
in [5] as extracting a process model from an event log
131
7th International Conference on Information Society and Technology ICIST 2017
without any a-prori information and is the focus of the reviews of these research fields, proving the growing
process mining research field. Event logs are considered interest in them, are presented in [10][11][12][13][14].
the central data structure in the process mining field and A key knowledge artifact, common to the majority
are bags (multi-sets – sets in which each element has an activity and plan recognition methods is represented by an
associated cardinality) of event sequences, where each action library that specifies the necessary (partially-
sequence represents a process realization / instance (or ordered) sequences of events that correspond to an activity
case according to the process mining terminology). Each or plan. But methods of this type have an important
sequence elements have at least a reference to an activity limitation caused by the considerable effort required in
action type, although usually other event attributes are order to build these knowledge bases. Furthermore, in
available such as the timestamp, the actors and/or most cases, the created knowledge bases will be domain
resources involved in performing the action, etc. It should specific, being unable to be transferred from one system
be noted that not all attributes can be used to extract useful implementation to another.
information using the current process mining techniques. An alternative explored in this paper involves a
The authors of [6] discuss some of the issues generative solution, in which the plans / partially-ordered
surrounding the resulting process models obtained through sequences are constructed during the execution of the
the usage of process discovery methods. They employ a activity recognition procedure using simpler action
“process model as a map” metaphor mentioning that the models. Such an approach is similar to the one presented
resulting process models should meet the expectations of in [15], where the possible set of plans for an agent is
the end-user, and, similar to maps, some of the aspects of constructed based on an action library represented using
the process (representing events or activities) should be the PDDL notation. The described algorithm is capable of
abstracted-away(omitted) of aggregated into higher level identifying the distribution of the achievable objectives
representation. based on a set of observations regarding the already
This line of thinking aligns with the problem considered performed actions and a cost function.
(scenario considered) in this paper – data collected from Thus, the problem of action recognition can be cast into
existing sensor networks – in that it becomes obvious that a problem of constructing the (partially-ordered) sequence
a pre-processing stage is required in order to achieve the of actions whose execution determines a trajectory in the
abstraction level desired or required by the end-user, state space explaining the sequence of collected
which in most cases is represented by a business analyst observations. Such an approach can be implemented using
or system stakeholder. For example, in a logistic scenario, „model checking” techniques [16] that allow the
the data collected from a set of RFID readers connected exploration of a system’s state space in order to check the
through a public informatics infrastructure will be validity of a set of properties, usually expressed as a
represented by a stream of events corresponding to temporal logic formula.
passing of various objects through monitored gates. In this
These techniques have been used to solve the
case, it is obvious that the business analyst / end-user / etc.
automated planning problem, this scheme being
would not be able to extract any useful information from a
investigated in a number of papers [4][5][6][7]. Broadly
process model constructed directly from this data stream.
speaking, solving the automated planning problem using a
As such, a set of data transformation steps will need to be
“model checker” involves the conversion of the planning
employed to aggregate some RFID event patterns into
domain into a system model. Most “model checking”
higher level activity reference – for example a sequence
techniques are capable of providing a counter-example in
involving 3 RFID events – one referring to a product case the verified (temporal) property is satisfied
entering a packing area, one from a container entering the representing the solution to the planning problem.
same area and one from the container exiting the
packaging area may refer to a single action. A related approach can be found in the research into the
(discrete event) system diagnosis problem. The system
Some of the methods of dealing with this abstraction
diagnosis problem refers to the identification of all
problem, proposed in the field of process mining, have behaviors of a system that can explain a set of
focused on the usage of clustering techniques [7] and
observations and determining which of them are correct
extraction of common patterns [9][8]. These methods have and faulty, a faulty behavior usually containing a set of
the advantage of requiring limited domain-dependent, forbidden actions.
expert provided background explicit knowledge (usually
in the form of user / operator intervention) thus Given this problem’s similarity to model checking,
simplifying their adoption. system diagnosis techniques can be considered for solving
the activity recognition problem. Such an approach is
However, as mentioned above, the final, discovered
described in [10] and accepts as a system model a PDDL
process model should be tailored according to needs of the
domain. Another similar solution for the system diagnosis
end-user, and the previously mentioned methods rely on problem is presented in [12]. The authors highlight the
metrics based on event correlation frequency that might similarities between this problem and the automated
not suffice or provide relevant information to business
planning problem and propose a PDDL transformation
experts. Furthermore, the limited expert input that these
schema for it. After this transformation, the problem is
methods accept significantly hinders their flexibility and solved using existing planners, leveraging the heuristics
their ability to provide the correct level of abstraction
developed for efficient state space exploration developed
and/or perform the desired aggregations in every
in the last two decades. Similar approaches, relying on
application.
existing automated planners and PDDL transformation
As an alternative to these methods, in this paper we schemas are presented in [13]. Closely related to [13], in
investigate the possibility of reusing techniques developed [14] the authors develop the LTS++ application, tackling
in the fields of plan and activity recognition. Several the problem of generating hypotheses regarding the
132
7th International Conference on Information Society and Technology ICIST 2017
133
7th International Conference on Information Society and Technology ICIST 2017
Ontology) ontologies and should be used by domain planning problem file (in PDDL format) using the event
experts to encode all the static aspects of the monitored data fetched from the system’s KB. A set of simple
environment; Its assertional component is used by the rest syntactic processing rules are required to generate these
of the system’s components as an information storage and files. The resulting plan, after a post-processing phase that
retrieval medium(??); removes redundant actions represents the sequence of
The other three components of the system deal with activities that explain the observed behavior in the
each of the 3 sub-problems – process instance monitored system. As in the case of the process instance
identification, activity recognition and process model identification phase, the result of this phase is used to
discovery, the latter one using the results of the first ones. create a set of “Activity” OWL individuals in the system’s
It should be noted that the instance identification and KB. Another important result from this step is a set of
activity recognition steps can be performed in any order – independence relations between the items of plan
depending on the domain(??). All these components (activities??) that will be exploited in the process
extract information using domain-dependent SPARQL discovery phase.
queries.
134
7th International Conference on Information Society and Technology ICIST 2017
expert provided knowledge, the ontology has been system knowledge base, allowing domain-independent
populated with entities pertaining to the static aspects of implementations.
the modeled environment. Although the proposed solution proposes a domain-
During the next phase, the sequence of semantically independent representation, it will need expert knowledge
lifted observations is transformed into high-level actions for correlation rules and domain definition
using the planning-based action recognition method. In
the case of this supply chain, due to the presence of REFERENCES
packing and unpacking actions, no correlation rule can be [1] 1. Van Der Aalst, W., Adriansyah, A., De Medeiros, A. K. A.,
created in order to correctly partition the events into Arcieri, F., Baier, T., Blickle, T., ... & Burattin, A. (2011, August).
process instances. Consequently, the instance Process mining manifesto. In International Conference on
identification phase is executed based on the results Business Process Management (pp. 169-194). Springer Berlin
obtained during the action recognition step. The Heidelberg.
observations are added to an existing (expert provided) set [2] 2. Rebuge, Á., & Ferreira, D. R. (2012). Business process analysis
of action definitions as special, “constraint” actions that in healthcare environments: A methodology based on process
mining. Information systems, 37(2), 99-116.
force the planner to generate a solution (a sequence of
[3] 3. Mans, R., Reijers, H., Berends, H., Bandara, W., & Rogier, P.
action instances) capable to explaining the collected (2013). Business process mining success. In Proceedings of the
events. After registering the resulting action instances in 21st European Conference on Information Systems. AIS
the system’s work ontology, the instance identification Electronic Library (AISeL).
phase can be performed. Using an implementation of the [4] 4. Rozinat, A., de Jong, I. S., Gunther, C. W., & van der Aalst, W.
algorithm mentioned in the previous section, a set of M. (2009). Process mining applied to the test process of wafer
candidate correlation rules have been identified. A human scanners in ASML. IEEE Transactions on Systems, Man, and
operator is required to specify the parameters of the Cybernetics, Part C (Applications and Reviews), 39(4), 474-479.
heuristic rules guiding the candidate generation process, to [5] 5. Van Der Aalst, W. (2012). Process mining: Overview and
opportunities. ACM Transactions on Management Information
purge irrelevant atomic and conjunctive rules and finally Systems (TMIS), 3(2), 7.
to select the correlation rule that best fits the process logic. [6] 6. Van der Aalst, W. M. (2009). Process-aware information
The application of the user selected event correlation systems: Lessons to be learned from process mining. In
rule leads to the creation of new OWL Individuals Transactions on petri nets and other models of concurrency II (pp.
corresponding to the identified process instances that can 1-26). Springer Berlin Heidelberg.
be used to build the “event log” data structure, required [7] 7. De Medeiros, A. K. A., Guzzo, A., Greco, G., Van Der Aalst,
for process discovery. For this, relatively small case, the W. M., Weijters, A. J. M. M., Van Dongen, B. F., & Saccà, D.
(2007, September). Process mining based on clustering: A quest
POD tool was used which, besides the event log, can also for precision. In International Conference on Business Process
accept a set of independence relationships between Management (pp. 17-29). Springer Berlin Heidelberg.
detected actions. These relations can be extracted from the [8] 8. Bose, R. J. C., & van der Aalst, W. M. (2009, September).
plans generated during the action recognition phase. The Abstractions in process mining: A taxonomy of patterns. In
result of running the POD tool on the simulated data is the International Conference on Business Process Management (pp.
Petri Net (process model) depicted in Fig 5 159-175). Springer Berlin Heidelberg.
[9] 9. Günther, C. W., & Van Der Aalst, W. M. (2007, September).
Fuzzy mining–adaptive process simplification based on multi-
perspective metrics. In International Conference on Business
Process Management (pp. 328-343). Springer Berlin Heidelberg.
[10] 10. Chen, L., & Khalil, I. (2011). Activity recognition:
Approaches, practices and trends. Activity Recognition in
Pervasive Intelligent Environments, 1-31.
[11] 11. Aggarwal, J. K., & Ryoo, M. S. (2011). Human activity
analysis: A review. ACM Computing Surveys (CSUR), 43(3), 16.
[12] 12. Turaga, P., Chellappa, R., Subrahmanian, V. S., & Udrea, O.
(2008). Machine recognition of human activities: A survey. IEEE
Transactions on Circuits and Systems for Video Technology,
18(11), 1473-1488.
[13] 13. Ye, J., Dobson, S., & McKeever, S. (2012). Situation
identification techniques in pervasive computing: A review.
Pervasive and mobile computing, 8(1), 36-66.
[14] 14. Krüger, V., Kragic, D., Ude, A., & Geib, C. (2007). The
Figure 5. Process model meaning of action: a review on action recognition and mapping.
Advanced Robotics, 21(13), 1473-1501.
[15] 15. Ramırez, M., & Geffner, H. (2010). Probabilistic plan
CONCLUSIONS recognition using off-the-shelf classical planners. In Proceedings
of the Conference of the Association for the Advancement of
A solution to the problem of applying process mining Artificial Intelligence (AAAI 2010) (pp. 1121-1126).
techniques to events collected using sensor networks has [16] 16. Baier, C., Katoen, J. P., & Larsen, K. G. (2008). Principles of
been discussed, focusing on three main components: event model checking. MIT press.
representation, instance identification and activity 17. Rozsnyai, S., Slominski, A., & Lakshmanan, G. T. (2011, July).
recognition components Discovering event correlation rules for semi-structured business
The need to use the event data acquired from sensors, in processes. In Proceedings of the 5th ACM international
a conventional format, lead to the development of a conference on Distributed event-based system (pp. 75-86). ACM.
135
7th International Conference on Information Society and Technology ICIST 2017
Abstract— Continuous need for collaborations of large Fig. 1 represents the customary mode of
number of systems has created a requirement to achieve communication of two software systems based on direct
interoperability of the systems involved in collaborations. communication of their services. This is widely accepted
The subject of the research work is to find a solution that and a good way of communication if one of the systems
ensures interoperability of those systems whose business is adapts to the needs of another system, taking into account
rigid and there is no possibility of introducing standards. the data structures, business processes and rules. If these
The existing solutions dealing with a similar problem are systems want to use different types of data structures in
described. However, those solutions are very limited when their communication, described communication mode is
flexibility and easy expandability are needed. This paper not good solution.
proposed an approach based on the domain-specific This typical problem of interoperability in many of
knowledge base that contains the referent rules and models
proposed solutions is solved by the mediator layer, which
for each system which participates in collaboration.
do necessary transformations for systems. The generally
accepted solution, Enterprise Service Bus architecture
I. INTRODUCTION implementation, is based on the mediator layer and it is an
essential part of the implementations that require the layer
There is a need to find the flexible solution to achieve of mediator. [1] One solution that introduces a dimension
interoperability in systems whose business processes are of knowledge, whose usage can reach the interoperability,
rigid and in which there is no possibility for the is the approach suggested by the authors in [2].
introduction of standards due to business policy and rules
on which the business is based. In order to facilitate the The ESB architecture and the introduction of the
achievement of the interoperability of the large number of knowledge dimension are the main components of the
systems, we work on improving the existing solutions of approach described in [3]. They define the distributed
interoperability of information systems. The goal is an knowledge by using ontology and quite formal, which in
adaptive solution to achieve interoperability of the large our case was not an option, because due to the need for
number of systems, which can be adapted to the business faster and easier maintenance of the solution, there is a
of each individual system. necessity for the domain-specific knowledge base.
ESB is very good solution for described problem of
II. THE MAIN PROBLEM interoperability if we have final and not very large number
of companies. The solution is to configure adapter for
The main problem of our research is how to achieve every pair of systems in communication, represented in
interoperability in the case of a large number of systems Fig. 2. But, if this number of systems increases too much,
that need to communicate with one system as a it would be very hard to maintain this architecture.
broadcaster or service provider. The most common case of
this communication occurs when due to changes of the III. THE ADAPTIVE INTEROPERABILITY OF
regulations in a country all companies of some type need
INFORMATION SYSTEMS
to start communication with the system of some
institution. This paper proposes the approach that is based on the
When the business processes and rules of these introduction of domain-specific knowledge base that
companies are not rigid or business processes are not yet contains the referent rules and models for each system that
defined there is possibility for introduction of standards in subscribes to the usage of services of the main system as
this communication. Two the most recently used global broadcaster or service provider (the system that exposes
standards in business communication are EDI and XML. services). Another one domain specific approach is
[6][7] The goal of this project assignment is to achieve the described in [5]. The main difference is that this approach
interoperability of the systems, taking into account the fact is based on domain-specific language, while our approach
that the existing information systems of these companies is based on domain-specific knowledge base. The
are more or less developed as closed systems with proposed architecture is shown below in Fig. 3. Like you
established business processes and adopted standards. can see some stereotypes are defined, to simplify
understanding of this architecture in any domain where it
The problem occurs in different data structures that is used.
these companies use in their systems and different
sequence of activities in some of the business processes. call of
System 1 Service 1 Service 2 System 2
In addition, there are some certain business rules that are service
specific only for one of the companies. Therefore, we Figure 1 The customary mode of communication of two software
need some flexible solution to achieve interoperability of systems
these systems.
136
7th International Conference on Information Society and Technology ICIST 2017
Company 1
Service 1 `` Adapter 1 Service 1 ` Service 1
system
System as a Company 2
broadcaster or Service 2 `` Adapter 2 Service 2 ` Service 2
Service * system
service provider *
Company n
Service n `` Adapter n Service n ` Service n
system
ESB
- SOAP - REST
- RNI - JNI
The adapter is defined, which performs the format, the rules are stored in the knowledge base like
transformation of data from one format to another by referent rules and models.
consulting the knowledge base where the transformation The knowledge base has its own services, depending on
rules are stored and defined for each user system in the type of physical implementation, for example stored
relation to the system that exposes services. In some procedures. All the knowledge about of rules, types of
points the adapter layer also must consult the system, the structures and flows is in the data not in structure of
base system that introduces this architecture with earlier tables. So, it is very easy to make some new rules or flows
defined and exposed services to external user systems. or to change the type of structure some user system wants
It is important to note that the knowledge base and the to use.
adapter layer belong to the system that provides the Just like in similar proposed solutions based on the
services. The internal database maintenance is also done mediator layer, the user system is not aware of the
by this system. Namely, when there is new system that existence of the adapter layer. The main difference is that
initiates the collaborating with the existing system we define only one adapter layer no matter how many
provider, in their mutual communications the formats of systems users participate in communication with system
data that the new system wants to use are defined, and provider.
next to that, the way to carry out some business
functionality or the execution order of activities, depends A. The knowledge base
on what level there is a need to achieve interoperability – Our solution introduces the domain-specific
the level of data, the level of service, the level of the knowledge base that contains the referent rules and
process or the level of the business systems. In the defined models for each organization (company) that subscribes
137
7th International Conference on Information Society and Technology ICIST 2017
138
7th International Conference on Information Society and Technology ICIST 2017
139
7th International Conference on Information Society and Technology ICIST 2017
TABLE I.
SERVICES DEFINED I N THE PROPOSED ARCHITECTURE IN THE
DOMAIN OF INSURANCE
Service
Service name
code
CR1 Execute the algorithm of the rule
Figure 6 The proposed architecture in the domain of insurance BN4 Get the value of the rule
TABLE II.
THE REPRESENTATIONS OF BUSINESS RULES ABOUT THE RULES OF FLOWS
TABLE III.
THE REPRESENTATIONS OF BUSINESS RULES ABOUT DATA FORMATS
Name of Structure Structure of Date
structure of data data Date valid valid
Company of data Schema of structure of data type classification from to Status
Dataset
A input NULL Input Dataset 01/01/2017 NULL TRUE
<?xml version="1.0" encoding="UTF-8" ?>
XML <xs:schema
A output 1 xmlns:xs="http://www.w3.org/2001/XMLSchema"> Output XML 01/01/2017 NULL TRUE
Dataset ...
B input NULL
</xs:schema> Input Dataset 01/01/2017 NULL TRUE
Dataset
B output NULL Output Dataset 01/01/2017 NULL TRUE
140
7th International Conference on Information Society and Technology ICIST 2017
141
7th International Conference on Information Society and Technology ICIST 2017
Abstract— Big Data analytics refers to analysis of large Schema on write has been the standard for many years
volumes of data of various types and sources for patterns in relational databases. Before any data is written in the
that identify potential risks and opportunities for an database, the structure of that data is strictly defined, and
organization. This analysis almost certainly performed by that metadata stored and tracked. The schema – the
highly skilled data analysis specialists such as data scientists columns, rows, tables and relationships are all defined first
in a flexible analytical environment called analytics for the specific purpose that database will serve. Then the
sandbox. However, the most of organizations do not base data is filled into its pre-defined positions. The data must
their business on such flexible environments, but the all be cleansed, transformed and made to fit in that
traditional Enterprise Information Systems (EISs). In order structure before it can be stored [3].
to enable organizations to use the results of Big Data
Schema on read is the revolutionary concept that we do
analysis specialists, Big Data analytics should be integrated
not have to know what we are going to do with our data
with existing EISs, used by business users and consumers.
before we store it. Data of many types, sizes, shapes and
This research focuses on the development of a model for Big
Data analytics integration with existing EISs. We propose
structures can all be thrown into the data storage systems.
Big Data analytics integration based on schema on read
When we access the data, when we query it, then we
modeling approach, and three scenarios of integration: 1) determine the structure we want to use [4]. Therefore, this
integration on the corporate data model level and on the approach can be called “schema on demand”.
data warehouse level, 2) integration on the corporate data In the second section of this papaer, a Big Data
model level and on the report level and 3) integration on the analytics overwiev was given. In the third section of this
corporate data model level, on the data warehouse level and paper will be presented our Big Data analytics integration
on the report level. In order to evaluate our approach we approach. In order to evaluate our approach we have
have realized the case study in the traffic domain. We realized the case study in the traffic domain. We carried
carried out custom analysis of road traffic data on Hadoop out custom analysis of road traffic data on Big Data
platform and integrate it with the local SQL Server platform and integrate it with the local SQL Server
database, custom traffic geo-application and Business database, Business Intelligence (BI) tool and traffic geo-
Intelligence tool, according to proposed integration application, according to proposed integration approach.
approach. Why Big Data approach in a traffic analytics?
Companies focused on logistic management and
1. INTRODUCTION transportation historically used data warehouses and
business intelligence tools to report on and analyze
Databases only contain a small percentage of the data in customer behavior, optimize operations, and build
an organization. The vast majority of organizational data advanced routing solutions [5]. As logistics management
is unstructured. It is important to be able to analyze and transportation networks become larger, more complex
unstructured data as well as integrate unstructured data and driven by demands for more exacting service levels,
with structured data [1]. Big Data analytics holds the the type of data that is managed also becomes more
promise of creating value through the collection, complex. Today, these data sources can include:
integration, and analysis of many large, disparate data traditional enterprise data from operational systems, traffic
sets. As the size of these data sets is extremely large, it is & weather data from sensors and forecast systems, vehicle
unadvisable and infeasible to collect and move data to diagnostics, driving patterns, and location information,
only one or several centers for analysis. Data-driven financial business forecasts, advertising response data,
analysis needs contrary analysis direction, which needs to web site browsing pattern data, social media data [6]. By
bring the analysis tasks to data sites. In data-intensive understanding signals coming from machines and sensors,
computation problems data is the driver, not analytical server logs and other new data sources, transport
human or machines. Different analyses will employ a organizations can predict future events and become more
variety of data sources, implying the potential need to use proactive. Sensors and others intelligent devices can
the same datasets multiple times in different ways. Data capture traffic data, but this creates a large, ongoing flow
reuse has specific ramifications to the environment and of data. Such data require Big Data management systems
implies that the data management architecture must for processing and reporting. As a result of the
support capture and access to these datasets in a consistent complexity, diversity and stochastic nature of
and predictable manner [2]. In this research we propose transportation problems, the analytical toolbox required of
schema on read modeling approach, as a basis for the transportation analyst must be broad [7]. Where Big
consistent and predictable Big Data analytics integration Data differs from other technologies is in the
in traditional EISs (Enterprise Information Systems). sophistication of the analysis it applies. Also, while
142
7th International Conference on Information Society and Technology ICIST 2017
traditional analysis is often designed around the conditions Machine learning, including supervised learning and
that allow valid statistical inference about the unsupervised learning, is another main field of artificial
characteristics of a population based on measurements on intelligence. It can derive valuable knowledge
a small sample, Big Data-style analysis is built around the automatically once a designed algorithm learns behaviors
possibility of learning about systems by observing them in from empirical data. Traditional machine learning
their entirety [6]. algorithms, such as support vector machine, have
The case study and the results of implementation of our improved their effectiveness based on large-scale par-
integration approach, will be presented in the fourth allel framework like Map/Reduce. Scale machine learning
section of paper. Finally, we will present our conclusions algorithms, for example, the boosting based sparse
about possibilities and constraints of our integration approximation algorithm proposed by Sun and Yao [10],
approach. are presented to suit large-scale problems. Especially,
multi-classification is another hot spot in machine learning
2. BIG DATA ANALYTICS OVERVIEW which has been used in various applications including big
biological data [11] and sensors data [12]. Multi-
Large volumes of data of various types and sources are classification on multi-dimensional data, such as the
analyzed for patterns that identify potential risks and stocks market prediction [13], becomes one challenge in
opportunities for an organization. This analysisis probably Big Data analysis for prediction using the extended
performed by highly skilled data analysis specialists such techniques of multi-classification in single data stream.
as data scientists in a flexible analytical environment Artificial neural network has been successfully employed
called analytics sandbox. The analysts create process and for pattern recognition, adaptive control and so on.
event models that identify opportunities to be leveraged or However, there is a big challenge if it is employed for
risks to be avoided [1]. The models are then integrated analyzing Big Data because of the natural contradiction
with real-time streaming data that, when certain patterns between the necessity of more hidden layers and nodes for
or situations occur, trigger warnings and transactions to higher performance and the memory and time consuming
respond immediately to the highlighted situation. in a neural network. Two approaches have been adopted
to ease the contradiction. The first is to reduce the sizes of
A. Analytic models data by sampling techniques, and the second one is to put
Analytic models fall into three broad categories: neural networks in a parallel and distributed setting [14].
descriptive, predictive and prescriptive [8]. Opinion mining (OM), or sentiment analysis,
Descriptive models apply systems thinking to produce implemented by specific machine learning techniques and
an understanding that goes deeper than the prima facie lexicon-based techniques, plays an important role of
characteristics of the data. For example, the characteristics processing Big Data that include texts [15]. This ongoing
of an investment portfolio may be the subject of a field has been discussed by a large number of
descriptive model that assesses investment risk based on contributions because it enables the users to understand
statistical analysis. sentiment polarities from huge volume of texts. The
Predictive models apply statistics and data mining to granularity of OM varies from document level, sentence
make assessments about the likelihood of future behaviors level, and word level to concept level and clause level. To
or outcomes. For example, regression models may be used fit for the huge volume of Big Data and scale up OM,
on past performance to produce a model that predicts these algorithms can be implemented based on some Big
whether someone will default on a loan. Data platforms like MapReduce and Storm [16].
Prescriptive models actually prescribe a future course of In addition, multi-classification is another hot spot in
action based on a desired outcome. For example, a machine learning which has been used in various
prescriptive model can prompt a call center operator to applications including big biological data [11] and sensors
make a specific offer to a customer with a complaint, data [12]. As a kind of representation learning, deep
based on the customer’s lifetime value, likelihood to learning techniques compose simple but non-linear
leave, and likelihood to accept the offer. modules into a representation at a higher and more
All of these types of models require Big Data and Big abstract level. These techniques can start with the raw
Data analysis techniques to be successful. inputs, comparing with the traditional machine learning
algorithms [17]. To meet the requirement of analyzing Big
B. Big Data analysis techniques Data, specific deep learning techniques have been
developed by many researchers [18, 19].
Data analysis techniques include a wide range of
disciplines such as data mining, machine learning, Accordingly, contemporary analytic environment
artificial neural networks and signal processing, most of includes: descriptive analytics, to provide insight into the
which have shown their capabilities in processing Big past and answer: “What has happened?”; predictive
Data. Data mining, including classification, regression and analytics, to understand the future and answer: “What
clustering, is the collection of artificial intelligent could happen?” and prescriptive analytics, to advice on
techniques that mine hiding knowledge and patterns from possible outcomes and answer: “What should we do?”.
given data [9]. Several classification and clustering
algorithms have been proposed for large-scale samples, C. Big Data analytics use cases
such as linguistic fuzzy rule based classification systems, Big Data analytics use cases can be categorized in terms
clustering large application algorithm and so on. Because of their users in three major categories: data discovery for
of the complex uncertainties of Big Data, more and more data scientists and analysts, business reporting for
theoretical tools, such as fuzzy reasoning, are taken for business users and real time intelligence for consumers
modeling uncertainties of both raw data and outputs of (Fig. 1).
algorithms.
143
7th International Conference on Information Society and Technology ICIST 2017
Figure 1. Big Data analytics use cases Schema on read is simple at first glance: you just write
the information to the data store. Unlike schema on write,
We can identify key Big Data business use cases that which requires you to expend time before loading the
are delivering positive business outcomes: customer data, schema on read involves very little delay and you
analytics and loyalty marketing, sentiment analysis, multi- generally store the data at a raw level [22]. Schema on
channel marketing, customer micro-segmentation, ad read means you can write your data first and then figure
fraud detection, capacity and pricing optimization and how you want to organize it later (Fig. 2). So why do it
predictive maintenance analytics [20]. Big Data and that way? As we can see in Table I, the key drivers are
analytics can help companies create a comprehensive 360- flexibility, reuse of raw/atomic data and querying multiple
degree view of the customer, dramatically improving data stores and types at once. Additionally, it possesses
customer interaction at every touch point and across the greater flexibility when the structured data is
end-to-end customer experience. heterogeneously stored [23]. In the data management
More complete “persona-level” customer profiles can layer, the data is enforced to be integrated and valid.
help organizations discover new ways to interact with There are three reasons why we have chosen shema on
individual customers, as well as improve service delivery read approach to Big Data analytics modeling:
and marketing strategies. Deep analysis of customer 1. our Big Data analytics modeling is modeling for non-
buying behavior patterns across all channels allows relational storage,
organizations to move beyond marketing to the masses
and toward more segmented, targeted marketing tactics. 2. this approach is extremely flexible from an integration
Analytics insights help develop more relevant marketing perspective,
strategies and self-service capabilities to meet individual 3. allows reuse of raw/atomic data and thus data sharing.
customer’s needs. The ability to analyze more historical
information with higher frequency – even in near-real TABLE I.
time – allows for more dynamic and smarter pricing SCHEMA ON READ VS. SCHEMA ON WRITE
actions, optimized capacity planning and effective yield Approach Advantages Disadvantages
management.
You know exactly
Predictive maintenance analytics solutions can capture where your data is. The data has been altered
equipment sensor data in real time and integrate it with The structure is and structured specifically
data from visual inspections, manual measurements, optimized for the to serve a specific
performance videos, operational data, etc. Big Data and specific purpose. purpose.
analytics solutions can help companies utilize this varied Simple SQL queries. Query limitations.
information to improve yield on assets, improve service Schema Fast answers. ETL (Extract, Transform
on write The data’s quality is and Load) processes and
levels and reduce risk of unplanned service delays and checked. validation rules take time
outages, lower the cost of parts inventory and implement The data is checked to build, time to execute,
smarter maintenance planning [21]. against business rules. and time to alter if you
Answers we get from need to change it to suit a
3. OUR INTEGRATION APPROACH querying this data are different purpose.
precise and trustworthy.
In this research, we are concerned with solving of the Query capabilities are Since the data is not
following issues: storing of raw data in Big Data storages, very flexible. subjected to ETL and data
analysis of data using Big Data techniques and tools and Different types of data cleansing processes, nor
integration of the results of Big Data analysis with generated by different does it pass through any
existing Data Warehouses (DWs), BI and custom Schema
sources can be stored in validation, that data may
applications. We bring the analysis tasks to data sites, the same place. be riddled with missing or
on read
which is why our integration approach data-driven. We This allows you to invalid data, duplicates...
store what we get from the source systems, as it comes in query multiple data The SQL queries tend to
stores and types at be very complex. They
from those systems, and define the schema at the time of once. take time to write, and
data use. Therefore, from modeling perspective our time to execute.
approach is shema on read (Fig. 2).
144
7th International Conference on Information Society and Technology ICIST 2017
We propose Big Data analytics integration in EISs appropriate for consumers and decision makings in real
based on schema on read modeling approach, on three time, because they are batch processing oriented.
levels: corporate data model level, data warehouse level
and report level (Fig. 3). Possible integration scenarios 4. CASE STUDY
are: Our approach to Big Data analytics integration in EIS
1. integration on the corporate data model level and on applied through a case study of the analysis of traffic data.
the data warehouse level; Excellent tailor-made traffic data is the best basis for
2. integration on the corporate data model level and on excellent transportation models [24]. We want to provide
the report level; the traffic engineers and authorities with a pre-attributed
3. integration on the corporate data model level, on the maps tailored to their specific needs. For the analysis of
data warehouse level and on the report level. traffic flow, traffic engineers calculate the indicators on an
annual basis. For example, Annual Average Daily Traffic
All three integration scenarios are starting with
(AADT) along with it main characteristics composition
designing of schemas for data analysis in time of reading
and time distribution (minutes, hourly, daily, monthly,
raw source Big Data. Schemas on read are designing in a
yearly) is the basic and kay input to the traffic-technical
way that they can be integrated into existing relational
dimensioning of road infrastructure and road facilities.
corporate data models and/or existing business reports,
This parameter is used in: capacity analysis, level of
taking into account the structure of the source data files.
service analysis, cost benefit analysis, safety analysis,
The first scenario includes ETL (Extract, Transform and
environmental assessment impact analysis of noise
Load) operations over the query results, in order to
emission and air pollution, analyses of pavement
normalize and to clean data. This scenario represents an
construction and for static calculation of road
integration on data model level and on the data level. The
infrastructure objects, traffic forecasting, and others.
second scenario not includes ETL operations, but the
distribution of the results of Big Data analysis, which are To count the traffic at the specified locations on the
ready for presentation at reports for end users. This state roads in the Republic of Serbia 391 inductive loop
scenario represents an integration on data model level and detectors were used [25]. These detectors are QLTC-10C
on the application level. The third scenario includes ETL automatic traffic counters. Case study included analysis of
operations and an integration on data model level, data traffic data for ten locations on the state roads and streets
level, and application level. in the town of Novi Sad, Serbia, which the traffic counters
generated during the 2015. The solution was implemented
From Big Data analytics use cases point of view the
through the following phases:
first and the third integration scenario are appropriate for
business users and business reporting, as well as for data 1. Each counter generated 365 text files in 2015. Each
analysts and data discovery. The second scenario is file contained about 10,000 records on average, so
appropriate only for business users and business reporting. that the collected data amounted to 10 365 10000 =
Neither of proposed integration scenarios is not 36 500 000 records.
145
7th International Conference on Information Society and Technology ICIST 2017
146
7th International Conference on Information Society and Technology ICIST 2017
147
7th International Conference on Information Society and Technology ICIST 2017
Lorraine, France
*** Program of Electrical Engineering and Industrial Computing Science (CPGEI), Federal University of Technology
Abstract—In the fourth industrial revolution, the current governmental driving forces, more and more research
Enterprise Information Systems (EIS) are facing a set of new centres, companies, and universities have taken part and
challenges raised by the applications of Cyber-Physical contributed to this new challenge through either laboratory
Systems (CPS) and Internet of Things (IoT). In this scenario, experiments (such as the Smart Factory OWL 1 ), or
a data-intensive EIS involves networks of physical objects industrial applications (such as the Digital Enterprise
with sensing, data collection, transmission and actuation Software Suite 2 by Siemens). Although there exist some
capabilities, and vast endpoints in the cloud, thereby offering different focuses among the above-mentioned plans and
large amounts of data. Such systems can be considered as a projects, this new industrial revolution is commonly
multidisciplinary complex system with strong inter-relations recognized as the technical integration of CPS [11] into
between the involved components. In order to cope with the manufacturing and logistics and the use of IoT [12] in
big heterogeneousness of those physical objects and their industrial processes [1].
intrinsic information, the authors propose a notification-
The CPS is the term that describes a broad range of
based approach derived from the so-called Notification
Oriented Paradigm (NOP), a new rule and event driven
network connected, multi-disciplinary, physically-aware
approach for software and hardware specification and engineered systems that integrates embedded computing
execution. However, the heterogeneity of those information (cyber-) technologies into the physical world (adapted from
and their interpretation relatively to an evolving context [13]). Inside this kind of network, each smart component (a
impose the definition of model-driven patterns based on some sub-system of the CPS) is with sensing, data collection,
formal knowledge modelled by a set of skill-based ontologies. transmission and actuation capabilities, and vast endpoints
Thus, the paper focuses on the open issue related to the in the cloud, offering large amounts of heterogeneous data.
formalisation of such ontology-based patterns for their Though, there already exists some cloud-based IoT
verification, ensuring the coherence of the whole set of data platforms (such as IBM Bluemix3, Google Cloud Platform4,
in each contextual engineering domain involved in the EIS. and Microsoft Azure 5 ) to provide functions like
connectivity, storage, analytics and visualization in a secure
I. INTRODUCTION way and at scale, the critical issue about the transition of
the legacy EIS to adopt this change is still remain open.
Nowadays, governments, organizations, and industries
worldwide have noticed the trend of “the fourth industrial In particular, the strong inter-relations among the CPS
revolution” and intended to catch up with this new wave of along a product’s life cycle increase dramatically the
transformation. Lots of high-level plans and projects are complexity of managing their local coherency and their
incessantly carried out, such as, the “Industrie 4.0” in global interoperability [14]. There is a need to develop
Germany [1], the “La Nouvelle France Industrielle” in some methods, or even paradigms, for formalizing those
France [2], the “Future of Manufacturing” in United relationships and ensuring the coherence between CPS data
Kingdom [3], the “Factories of the Future (FoF)” in exchange, based on the knowledge coming from the
European Union [4], the “Industrial Internet Consortium stakeholders employing the data produced jointly by sub-
(IIC)” [5] and “Advanced Manufacturing Partnership sets of those smart components. In this case, one of the
(AMP)” [6] in United States, the “Innovation in possible solutions is to empower current EIS with a new
Manufacturing 3.0” in South Korea [7], the “Super Smart interoperability infrastructure.
Society” in Japan [8], the “ Research, Innovation and This paper proposes a notification-based approach
Enterprise (RIE) 2020 Plan” in Singapore [9], and the derived from the so-called Notification Oriented Paradigm
“Made in China 2025” in China [10]. Because of these (NOP) [15], a new rule and event driven approach for
1 3
Smart Factory OWL: http://www.smartfactory-owl.de/index.php/en/ IBM Bluemix: https://console.ng.bluemix.net/
2 4
Digital Enterprise Software Suite: https://www.industry.siemens.com/ Google Cloud Platform: https://cloud.google.com/
topics/global/en/digital-enterprise-suite/pages/default.aspx 5
Microsoft Azure: https://azure.microsoft.com/
148
7th International Conference on Information Society and Technology ICIST 2017
To collect a comprehensive set of papers that contribute It is not focusing on the research
to the next generation EIS through CPS and IoT, two search related to CPS, IoT and EIS.
strings were created. The first search string was constructed NR-1: The definition of CPS,
by combining the operator “or” between “Cyber Physical IoT, or EIS is out of our research
Systems”, “CPS”, “Internet of Things”, and “IoT”. The Not Related (NR)
context. (e.g. EIS is used as the
second search string was constructed by combining the abbreviation of Electrochemical
Impedance Spectra)
operator “or” between “Enterprise Information Systems”
NR-2: CPS, IoT, or EIS is only
and “EIS”. Finally, these two search strings are joined appearing once or twice as a cited
together through the operator “and” for the systematic expression.
search in the largest abstract and citation database, Scopus.
It is explicitly and specifically
Meanwhile, all the collected papers should also satisfy focusing on the review, survey,
the following three conditions: Closely Related (CR) proposition or application related
l They contain at least one term in each search string to interoperability issues within
Inclusion
(both its singular form and plural form) in the paper IoT, CPS and EIS context.
titles, abstracts, or keywords; Its main focus is not about
l They were published in conference proceedings, book interoperability issues within IoT,
Partially Related (PR)
series, or journals; CPS and EIS context, only part of
its contents is related.
l They were written in the English language.
Figure 1. The UML activity diagram of the systematic literature review phases based on the PRISMA flow chart.
149
7th International Conference on Information Society and Technology ICIST 2017
discovered or tried to address; (2) For Q2, they are the Fig. 2, the numbers and percentage of papers that involved
research problems related to interoperability and the in each requirement categories are quantitatively listed.
proposed solutions in each CR paper. It can be found that, 70.0% of the included papers (14
C. Qualitative and Quantitative Analysis papers) were mentioning or addressing the requirements
within the Self and Environmental Awareness category. In
As can be seen from Fig.1, through the Preferred this case, the future EIS should, first of all, be able to accept,
Reporting Items for Systematic review and Meta-Analysis process and perform various functions based on big amount
(PRISMA) Statement flow chart [19], different phases of of heterogeneous lower level sensory data from distributed
this systematic literature review are illustrated. In total, 20 or co-located devices [20]. The heterogeneity of those data
papers out of 57 initial ones were included. and their interpretation relatively to an evolving context,
1) New Requirements for Future EIS namely the interoperability issues related to the networked
During this phase, the qualitative and quantitative smart CPS components and the future EIS, is becoming the
analysis of the collected new EIS requirements for Q1 were primary subject to deal with.
performed. 2) Current Research Efforts about Interoperability for
Firstly, seven main categories of requirements for the Future EIS
future EIS were summarized from [20], [21]: Nowadays, the envisaged paradigms, such as CPS and
l Omnipresence: The implementation or existence of IoT, are gradually becoming a reality. More and more
EIS on a wide range of computing and other platforms identifiable resources with communication and computing
(e.g. tablets, smart phones, smart watches) through capability, like sensors, mobile devices and RFID tags, are
cloud technology. introduced and integrated into the manufacturing field. On
l Openness: The technological and legal accessibility one hand they will significantly contribute to different
of software entities, artefacts and distributed phases along a product’s life cycle. On the other hand, they
collaborative process based on knowledge sharing are also dramatically increasing the complexity for the
between peers. traditional EIS to manage their local coherency and their
global interoperability, which turns out to be incomplete
l Cyber Security and Privacy: The ability to ensure and
and insufficient in attempting to address this issue. To
protect the data that they exchange, store and use
empower those legacy EIS with a new interoperability
against the misuse and unauthorized accesses.
infrastructure is one of the possible solutions.
l Self and Environmental Awareness: The ability to
Among all the included papers, 40.0% of them (8 papers)
gather and organize phenomenon or events within
are explicitly and specifically focusing on addressing the
itself or from its environments in real-time (e.g.
interoperability issues within the IoT, CPS, and EIS context.
through sensors of other devices)
More specifically, their main research problems and
l Semantic Awareness: The ability to assign a meaning corresponding proposed solutions collected for Q2 are
to the observation (e.g. sensor data) and align them illustrated in Table 2. It can be found that, certain research
with higher level semantic constructs (e.g. through efforts have been made, such as reviewing some existing
ontologies).
l Dynamic Re-configurability: The ability to function in TABLE II.
a different manner, to perform the same function, or RESEARCH PROBLEMS AND PROPOSED SOLUTIONS ABOUT
accomplish a different set of functions, using the same INTEROPERABILITY FOR FUTURE EIS
or improved set of resources
Papers Research Problems Proposed Solutions
l Computational Flexibility: The ability to perform [23] The application of RFID to An agent-based smart
fuzzy reasoning depending on incomplete data, the real-life manufacturing objects management system
information, and knowledge available for various enterprises
contexts. [22] The role of information A review of the information
architecture to address architecture for supply chain
Then, based on these categories, the collected new EIS evolving supply chain needs quality management
requirements that each included paper discovered or tried [24] The lack of standardized A context-awareness and
to address were qualitatively classified into their approach to support IoT product lifecycle
corresponding categories. At the end, as can be seen from solution in enterprise information management
[21] Legacy consideration of the Interoperability as a
interoperability fail to meet Property (IaaP) of every
the future IoT challenges ubiquitous system
[25] Dynamic systems, evolved A mediator system to enable
information models, and self-sustainable
“liquid” semantic domains interoperability
[26] Few approaches combine Use web services and work-
efficiently modelling and flow for interoperability
simulation and real between simulation and real-
execution in the workflow world application
[20] The necessitate to identify New perspectives for the
new requirements that future interoperable
address both theoretical and enterprise systems
practical aspects of the EIS
[14] The extended view of EIS The formal definition of the
Figure 2. The number and percentage of papers involved in each new
in the IoT introduces systems’ interoperability
requirement category for the future EIS.
additional complexity to the capability
interoperability problems.
150
7th International Conference on Information Society and Technology ICIST 2017
concepts and technologies [22], addressing some domain easier to compose distributed systems. As can be seen from
specific issues [23]–[26], and proposing new properties, Fig. 3, the NOP modelling primitives essentially include
perspectives, and capabilities of interoperability [14], [20], the following eight main elements:
[21]. However, there is still a lack of unifying theories, l Fact Base Element (FBE): It is a NOP factual element,
systematic design methods, techniques or tools to formally which represents a system element identified in CPPS.
model the relationships and ensure the coherence of data Such as, an ID card reader, a biometric (BIO) reader,
exchanges among the heterogeneous CPS for such data- a gate, and a recording system. It has quite active
intensive EIS. entities called Attributes and quite reactive entities
called Methods.
III. PROPOSITION
l Attribute: It is a part of an FBE, which keeps one of
In IoT era, on one hand, engineers and engineering its variables and can notify (by means of an event) this
organizations no longer have to be restricted to the state to other entities. For example, it can either be the
availability of advanced processing capabilities but can employee ID collected from the ID card reader, or the
adopt a ‘pay as you go’ approach which will enable them employee BIO data identified by the BIO reader.
to access and use software resources for engineering
l Method: It is a part of an FBE, which deals with one
activities from any remote location in the world. On the
of its service and can be instigated by other entities.
other hand, such network of systems is also producing and
Such as the open gate service inside the gate FBE and
consuming a large amount of data coming from or going to
entry record service inside the recording system FBE.
all other systems and their environment. A data-intensive
EIS, in such case, is to facilitate acquisition and processing l Rule: It is a NOP logical unit, which represents a set
of those data coming from the whole set of Cyber Physical of explicit principles (control rules) within CPPS. It
Production Systems (CPPS). Therefore, to model those executes the pre-defined Actions (e.g. to open the gate
complex interrelationships and to ensure more coherence and record the numbers of entries) when all its
data exchange among CPPS, with this work, an modelling Conditions (e.g. both the employee ID and the
formalism(s) extracted from NOP specification [27] is employee BIO data are confirmed) are satisfied
proposed. In which, each data transfer is considered as a l Premise: It is an associated entity of one or more
kind of notification. Conditions. It contains either one or two Attributes for
the evaluation of their states. For example, it can be
A. Ontology-based NOP Modelling Primitives either the employee ID is confirmed (Premise 1) or
The essence of NOP is based upon the concept of small, the employee BIO data is confirmed (Premise 2).
smart, and decoupled pieces of entities that collaborate by l Condition: It is the decisional part of one Rule, which
means of notifications [15], [27], which potentially makes can contain one or more Premises. For example, it can
151
7th International Conference on Information Society and Technology ICIST 2017
be the combination of the Premise 1 and the Premise formally described based on the NOP modelling primitives
2 through the operator AND. that established in the previous section.
l Instigation: It is an associate entity of one or more More specifically, for each data transfer in this data
Actions and is also linked to one or more Methods. For exchange scenario, a general view of the adopted modelling
example, it can be either to instigate the open gate procedures is extended from [28] and introduced as follows:
service (Instigation 1) or to instigate the entry record l To analyse a data transfer aiming at:
service (Instigation 2). 1) Identifying the source and target System Elements
l Action: It is the execution part of one Rule, which can of the data transfer.
contain one or more Methods. It describes the
2) Identifying the Function of the data transfer.
sequential or parallel relation between Instigations.
For example, the Instigation 1 and Instigation 2 can 3) Identifying the Variables of the source System
be carried at the same time. Elements that are used during the data transfer.
Moreover, in order to enable future EIS with not only the 4) Identifying the Procedures in the target System
Self and Environmental Awareness capability but also the Elements that are instigated during the data
Semantic Awareness capability, the NOP modelling transfer.
primitives were also represented into an ontology through 5) Identifying the Pre-conditions of the Function that
Protégé OWL editor. As can be seen from Fig. 4, the class related to those identified Variables.
hierarchy is illustrated. Based on these ontology-based 6) Identifying the Post-conditions of the Function
model-driven patterns, the relationships among different that related to those identified Procedures.
system elements and the entire data exchange scenario of l To create a FBE for every System Element (both
CPPS can be formally described. source and target ones) identified in step 1. It is drawn
B. Notification-Oriented Data-Intensive EIS as a dotted line rectangle. Four examples are given on
the both sides of Fig. 6.
In NOP specification, a notification is an explicit advice
from a NOP factual element to another one, indicating that l To create a Rule for every Function identified in step
a change of a value or a state in the system occurred. In this 1. It is drawn as a solid line rectangle. An example is
paper, a notification is considered as the representation of a illustrated in the middle of Fig. 6.
data transfer from a system or even its element to another l To create an Attribute for every Variable identified in
one identified in CPPS. Fig.5 shows a notification-oriented step 1. It is drawn a cycle connected by an out-coming
data-intensive EIS, in which, the data exchange (1) between arrow of a FBE. An example is given on the left hand
cyber and physical world entities, (2) in the Cloud-based side of Fig. 6.
IoT Platform, and (3) between EIS and CPPS will be
Figure 4. The class hierarchy of the NOP modeling primitives. Figure 6. The NOP modeling example of a data transfer.
152
7th International Conference on Information Society and Technology ICIST 2017
l To create a Method for every Procedure identified in [7] H. S. Kang et al., “Smart manufacturing: Past research, present
step 1. It is drawn as a right angle connected by an in- findings, and future directions,” Int. J. Precis. Eng. Manuf. - Green
Technol., vol. 3, no. 1, pp. 111–128, 2016.
coming arrow of a FBE. An example is given on the
[8] Cabinet Office, “Report on The 5th Science and Technology Basic
right hand side of Fig.6 Plan,” Report, Tokyo, Japan, 2015.
l To create a Condition with corresponding Premises [9] National Research Foundation, “Research, Innovation and
for every Pre-condition identified in step 1. It is drawn Enterprise (RIE) 2015 Plan,” Report, Singapore, 2016.
as hollow arcs next to some Attributes and connected [10] K. Li, “Made in China 2025,” Report, Beijing, China, 2015.
by in-coming arrows of a Rule. An example is given [11] L. Atzori, A. Iera, and G. Morabito, “The Internet of Things: A
on the left hand side of Fig. 6. survey,” Comput. Networks, vol. 54, no. 15, pp. 2787–2805, 2010.
l To create an Action with corresponding Instigations [12] S. K. Khaitan and J. D. McCalley, “Design Techniques and
for every Post-condition identified in step 1. It is Applications of Cyberphysical Systems: A Survey,” IEEE Syst. J.,
vol. 9, no. 2, pp. 350–365, 2015.
drawn as arrow coming out from a Rule and pointing
[13] V. Gunes, S. Peter, T. Givargis, and F. Vahid, “A Survey on
to a Method of a FBE. An example is illustrated on the Concepts, Applications, and Challenges in Cyber-Physical
right hand side of Fig. 6. Systems,” KSII Trans. Internet Inf. Syst., vol. 8, no. 12, pp. 4242–
l Finally, to merge FBEs and Rules with analogous 4268, 2014.
FBEs and Rules previously created. [14] M. Zdravković, F. Luis-Ferreira, R. Jardim-Goncalves, and M.
Trajanović, “On the formal definition of the systems’
IV. CONCLUSION interoperability capability: an anthropomorphic approach,” Enterp.
Inf. Syst., vol. 11, no. 3, pp. 389–413, 2017.
The main objective of this work is to discover the [15] J. M. Simão, C. A. Tacla, P. C. Stadzisz, and R. F. Banaszewski,
potential challenges that the legacy EIS face in the fourth “Notification Oriented Paradigm (NOP) and Imperative Paradigm:
industrial revolution, and to propose some potential A Comparative Study,” J. Softw. Eng. Appl., vol. 5, no. 6, pp. 402–
solutions. Via the systematic literature review, seven main 416, 2012.
categories of requirements for future EIS were summarized. [16] A. Bryman, “Integrating quantitative and qualitative research: how
Based on the qualitative and quantitative analysis, the is it done?,” Qual. Res., vol. 6, no. 1, pp. 97–113, 2006.
interoperability issues related to the networked smart CPS [17] A. Nightingale, “A guide to systematic literature reviews,” Surg.,
components and future EIS turns out to be the primary vol. 27, no. 9, pp. 381–384, 2009.
subject to deal with. Therefore, this paper mainly focuses [18] C. Pickering and J. Byrne, “The benefits of publishing systematic
quantitative literature reviews for PhD candidates and other early-
on empowering current EIS with a new interoperability career researchers,” High. Educ. Res. Dev., vol. 33, no. 3, pp. 534–
infrastructure, which can formally model the relations and 548, 2014.
ensure the coherence of data exchanges among the [19] D. Moher, A. Liberati, J. Tetzlaff, and D. G. Altman, “Preferred
heterogeneous CPS. To achieve this goal, one solution is to Reporting Items for Systematic Reviews and Meta-Analyses: The
employ a modelling formalism(s) extracted from NOP PRISMA Statement,” PLoS Med., vol. 6, no. 7, pp. 264–269, 2009.
specification, namely the ontology-based model-driven [20] H. Panetto, M. Zdravkovic, R. Jardim-Goncalves, D. Romero, J.
patterns for notification-oriented data-intensive EIS. The Cecil, and I. Mezgár, “New perspectives for the future interoperable
research work in the next stage will focus on collecting the enterprise systems,” Comput. In Ind., vol. 79, pp. 47–63, 2016.
experimental data from a real smart factory laboratory, and [21] M. Zdravković, O. Noran, H. Panetto, and M. Trajanovic,
“Enabling interoperability as a property of ubiquitous systems for
applying the proposed modelling patterns to address the big disaster management,” Comput. Sci. Inf. Syst., vol. 12, no. 3, pp.
amount of data involved in a data-intensive EIS. 1009–1031, 2015.
[22] L. D. Xu, “Information architecture for supply chain quality
ACKNOWLEDGMENT management,” Int. J. Prod. Res., vol. 49, no. 1, pp. 183–198, 2011.
The authors would like to thank the financial support [23] Y. Zhang, G. Q. Huang, T. Qu, and O. Ho, “Agent-based smart
provided the Coordenação de Aperfeiçoamento de Pessoal objects management system for real- Time wireless manufacturing,”
de Nível Superior (CAPES) , the Pontifícia Universidade in The proceedings of the 6th CIRP International Conference on
Digital Enterprise Technology, 2010, vol. 66, pp. 1709–1721.
Católica do Paraná (PUCPR), and the Universidade
[24] S. Kubler and K. Främling, “CaPLIM: The Next Generation of
Tecnológica Federal do Paraná (UTFPR) in Brazil. Product Lifecycle Information Management?,” in Proceedings of
the 16th International Conference on Enterprise Information
REFERENCES Systems, 2014, pp. 539–547.
[1] H. Kagermann, W. Wahlster, and J. Helbig, “Recommendations for [25] C. Agostinho, J. Ferreira, and R. Jardim-Goncalves, “Information
implementing the strategic initiative Industrie 4.0,” The final report Realignment in Pursuit of Self-Sustainable Interoperability at the
of Industry 4.0 working group, Berlin, German, 2013. Digital and Sensing Enterprise,” IFAC-PapersOnLine, vol. 48, no.
[2] Conseil national de l’industrie, “The New Face of Industry In 3, pp. 38–45, 2015.
France,” Report, Paris, France, 2013. [26] J. Ribault and G. Zacharewicz, “Time-based orchestration of
[3] Foresight, “The Future of Manufacturing: a New Era pf workflow, interoperability with G-Devs/Hla,” J. Comput. Sci., vol.
Opportunity and Challenge for the UK,” Report, London, UK, 2013. 10, pp. 126–136, 2015.
[4] European Commission, “Factories of the Future PPP: towards [27] J. M. Simao and P. C. Stadzisz, “Inference Based on Notifications:
competitive EU manufacturing.” [Online]. Available: A Holonic Metamodel Applied to Control Issues,” IEEE Trans. Syst.
http://ec.europa.eu/research/industrial_technologies/factories-of- Man, Cybern. - Part A Syst. Humans, vol. 39, no. 1, pp. 238–250,
the-future_en.html. [Accessed: 22-Jun-2016]. 2009.
[5] P. C. Evans and M. Annunziata, “Industrial Internet: Pushing the [28] J. M. Simão, H. Panetto, Y. Liao, and P. C. Stadzisz, “A
Boundaries of Minds and Machines,” Report, Boston, US, 2012. Notification-Oriented Approach for Systems Requirements
[6] R. Rafael, A. J. Shirley, and A. Liveris, “Report To The President Engineering,” in the proceedings of the 23rd ISPE Inc. International
Accelerating U.S. Advanced Manufacturing,” Report, Washington Conference on Transdisciplinary Engineering, 2016, pp. 229–238.
D.C, US, 2014.
153
7th International Conference on Information Society and Technology ICIST 2017
Abstract— The transition from traditional to Big Data SQL-like facade for NoSQL databases. The leader in Big
technologies has put new challenges in front of existing Data application, Google, developed Megastore [15], a
database modeling techniques. The “Design then hybrid distributed database.
implement” approach to relational databases modeling, The transition from traditional to Big Data technologies
could not be fully applied to “Discover and analyze” brought challenges to existing techniques for data
databases with rapid data growth. In the context of Big structure modeling. The “Design then implement, top-
Data, models have preserved a descriptive role: they are down” approach for modeling relational databases, with a
used to analyze and understand the architecture of hybrid predictable volume of information, could not be applied to
data structures. From the Model-Driven Engineering point “Discover and analyze” collaborative, interactive, non-
of view, several questions emerge: Could Big Data and
relational databases, with exponential and rapid data
hybrid database models represent a formal specification for
volume growth.
a solution, which can be transformed to an executable
specification for a specific target platform, using automated Models [16], in the context of hybrid structures, did not
transformations? Does the refining mechanism have its completely lose the position they had in the process of
place in the process of hybrid data structure development? traditional database development. ERwin® Data modeling
What is the role of modeling languages in the process of tool [17] supports embedding NoSQL schemas into sub-
hybrid data structure specification? An approach to models of relational models, or creating mappings
modeling hybrid databases, based on empirically validated between relational and NoSQL databases, which provides
concepts of Model-Driven Engineering, is presented in this a basis for migrating the data structure design. In the Big
paper. The approach was applied during the Data environment, models currently have a descriptive
implementation of Precision Agriculture Based Information role: they are used to analyze and understand the
Service (PAIS) application – a web-based service for the architecture of hybrid data structures.
collection and archiving of diverse data on crops, and their On the other hand, Model-Driven Engineering (MDE)
presentation to farmers. [18,19,20] represents a methodological approach to
software development, in which models represent a formal
specification of the solution, which is than converted into
I. INTRODUCTION
an executable specification using automated
The mass usage of smart phones and applications, transformations. From the MDE point of view, several
expansion of social networks and the emerging Internet- questions emerge regarding the Big Data world: Can the
of-Things (IoT) [1] technologies have opened the door to refining mechanism [21,22] have its place in the process
a new technological shift of our society. This created new of hybrid data structure development? Do conceptual,
challenges for the data management area, with logical and physical levels [23] exist, and what do they
requirements to handle data with volumes expressed in represent in Big Data technologies? What is the role of
petabytes. As a result, the concept of Big Data [2,3] modeling languages [24,25] in the process of describing
emerged. This concept can be characterized as “4 V”: hybrid data structures? In this paper, the authors provide
Volume (large data amounts), Velocity (large data change answers to some of these questions, with reference to
frequency), Variety (storage of data in different formats) results of developing hybrid data structure models using
and Value (valuable insights gained from the ability to existing MDE concepts.
analyze and discover new patterns and trends from high- The remainder of the paper is structured as follows. The
volume and/or cross-platform systems). model driven relational datbase development process is
Traditional database management systems [4,5] could described in Section 2. In Section 3, the methodology for
not successfully handle the collection, storage, search and the model driven hybrid database development is
analysis of this type and amount of data. The presented. An illustration of the application of this
complementary use of NoSQL (“not only SQL”) [6,7] methodology using PAIS1 project as proof of concept is
databases (MongoDB [8], Cassandra [9], etc.) and described in Section 4, whereas in Section 5 the
distributed computing systems, such as Hadoop [10], conclusions and future steps are described.
represents one way to face the Big Data challenge. Hybrid
databases [11], containing both relational and NoSQL
characteristics, represent an alternative path for the
implementation of the Big Data paradigm.
The advantages for combining SQL characteristics and
NoSQL databases have been recognized by traditional
relational database management system vendors. 1
"This project is funded by FRACTALS (Future Internet
Microsoft, for instance, provided support for column Enabled Agricultural Applications, FP7 project No.
storage in MSSQL Server 2012 [12]. On the other hand, 632874), under the funding framework of the European
projects such as Hive [13] or CQL [14] have become an
Commission"
154
7th International Conference on Information Society and Technology ICIST 2017
2 3
Object Management Group (OMG) - https://www.theregister.co.uk/2007/06/14/data_modellin
http://www.omg.org/ g_layers/
155
7th International Conference on Information Society and Technology ICIST 2017
III. MODEL DRIVEN DEVELOPMENT OF HYBRID using appropriate metamodels [34] (Document
DATABASES Metamodel, Key Value Metamodel, Graph Metamodel,
RDBMS Metamodel etc.). Each metamodel contains
The following requirements led to the choice of model specific characteristics of the selected system. The
driven hybrid database development as a methodology for process of creating metamodels is illustrated using
our solution: relationships with <<conceptualization>> stereotype. The
i) one model needs to be used to describe an arbitrary metamodels can comply to any meta-metamodel (MOF,
number of schemas for different database management ECORE [35], KM3 [36]).
systems. The databases, for which schemas are integrated The integration of different metamodels was executed by
on model level, do not need to be physically integrated; transforming these metamodels into lightweight
ii) it is possible to include/exclude the model of a extensions [37,38,39] of the base metamodel. The
new/existing database schema in/out of the existing metamodel of LDM and PDM was selected as the base
model; metamodel. The process of transforming metamodels into
iii) the integrated model for the hybrid database schema lightweight extensions is illustrated using the relationship
can be partially transformed (different parts of the model with <<transformation>> stereotype;
can be used to generate different types of artifacts) and ii) according to the definition, lightweight extensions can
iv) the approach used needs to be supported using existing be dynamically applied to, or removed from, the base
(meta-) modeling tools. metamodel. Using this approach it is possible to included
Basic model driven relational database development new sub-models into existing models, with no information
concepts were generalized and applied in the conteht of loss;
designing hybrid databases. The three abstraction levels iii) using the origin of specific elements (relation with
(CIM, PIM and PSM) were kept, whereas the models are <<instanceOf>> stereotype) on model level (Hybrid
transformed from one to the other using raffinations. Due Database Model on Fig. 4.), virtual sub-models were
to the fact that the CIM level does not need to present created (Graph DBMS SubModel, KeyValue DBMS
specificities of different database management systems, SubModel, Document DBMS SubModel). Virtual sub-
the initial CDM was appropriate for modelling hybrid models represent inputs for dedicated generators, which
database structures on conceptual level. On the other output appropriate artifacts (relational database SQL
hand, the PIM and PSM levels were used to describe schema, Hive queries for NoSQL DBs, etc.) and
informational and implementational characteristics of the iv) the usage of models, created by extending existing
data storages. This is why a controlled extension of the database modeling languages, has the characteristic of
LDM and PDM needed to be implemented, in order to using existing technologies and user experience in the
cover NoSQL database specific concepts. implementation of the MDE approach. Due to this fact,
Fig.4 illustrates part of the architecture for developing existing modeling tools, with support for lightweight
hybrid databases using model driven development extension mechanism, can be used to implement the
approach: described approach.
i) specific database management systems were described
156
7th International Conference on Information Society and Technology ICIST 2017
157
7th International Conference on Information Society and Technology ICIST 2017
concrete NoSQL databases and traditional databases [16] Bézivin, J., 2005. On the unification power of models. Software &
systems were described using lightweight extensions of Systems Modeling, 4(2), pp.171-188.
existing metamodels. [17] Erwin Inc, erwin Data Modeler: http://erwin.com/products/data-
modeler/, , Accessed April 2017.
The model driven hybrid database development [18] Schmidt, D.C., 2006. Model-driven engineering. COMPUTER-
approach was used during the implementation of PAIS IEEE COMPUTER SOCIETY-, 39(2), p.25.
application. PAIS represents a complex service with [19] Favre, J.M., 2004, October. Towards a basic theory to model
heterogeneous data, consisting of aerial images and land model driven engineering. In 3rd Workshop in Software Model
sensor measurements, associated with spatial and temporal Engineering, WiSME (pp. 262-271).
attributes. To suit the project needs, the logical and [20] France, R. and Rumpe, B., 2007, May. Model-driven development
physical database model were extended, to provide of complex software: A research roadmap. In 2007 Future of
support for a single NoSQL database type (Document- Software Engineering (pp. 37-54). IEEE Computer Society.
oriented databases [3]). As part of future work, the aim is [21] Brown, A.W., Conallen, J. and Tropeano, D., 2005. Introduction:
to create extensions to include key-value and column- Models, modeling, and model-driven architecture (mda). In
Model-Driven Software Development (pp. 1-16). Springer Berlin
oriented databases. Heidelberg.
[22] Kleppe, A.G., Warmer, J.B. and Bast, W., 2003. MDA explained:
REFERENCES the model driven architecture: practice and promise. Addison-
[1] Gubbi, J., Buyya, R., Marusic, S. and Palaniswami, M., 2013. Wesley Professional.
Internet of Things (IoT): A vision, architectural elements, and [23] West, M., 2011. Developing high quality data models. Elsevier.
future directions. Future generation computer systems, 29(7), [24] Selic, B., 2009, July. The theory and practice of modeling
pp.1645-1660. language design for model-based software engineering—a
[2] Mayer-Schönberger, V. and Cukier, K., 2013. Big data: A personal perspective. In International Summer School on
revolution that will transform how we live, work, and think. Generative and Transformational Techniques in Software
Houghton Mifflin Harcourt. Engineering (pp. 290-321). Springer Berlin Heidelberg.
[3] Chen, M., Mao, S. and Liu, Y., 2014. Big data: A survey. Mobile [25] Guizzardi, G., 2007. On ontology, ontologies, conceptualizations,
Networks and Applications, 19(2), pp.171-209. modeling languages, and (meta) models. Frontiers in artificial
[4] Codd, E.F., 1990. The relational model for database management: intelligence and applications, 155, p.18.
version 2. Addison-Wesley Longman Publishing Co., Inc. [26] Watson, A., 2008. A brief history of MDA. Upgrade, the
[5] Silberschatz, A., Korth, H.F. and Sudarshan, S., 1997. Database European Journal for the Informatics Professional, 9(2), pp.7-11.
system concepts (Vol. 4). New York: McGraw-Hill. [27] Bézivin, J., 2004. In search of a basic principle for model driven
[6] Han, J., Haihong, E., Le, G. and Du, J., 2011, October. Survey on engineering. Novatica Journal, Special Issue, 5(2), pp.21-24.
NoSQL database. In Pervasive computing and applications [28] Völter, M., Stahl, T., Bettin, J., Haase, A. and Helsen, S., 2013.
(ICPCA), 2011 6th international conference on (pp. 363-366). Model-driven software development: technology, engineering,
IEEE. management. John Wiley & Sons.
[7] Padhy, R.P., Patra, M.R. and Satapathy, S.C., 2011. RDBMS to [29] Fowler, M., 2004. UML distilled: a brief guide to the standard
NoSQL: reviewing some next-generation non-relational object modeling language. Addison-Wesley Professional.
database’s. International Journal of Advanced Engineering [30] OMG Meta Object Facility (MOF) Core Specification. OMG.
Science and Technologies, 11(1), pp.15-30. http://www.omg.org/spec/MOF/2.5/, Accessed, April 2017.
[8] Parker, Z., Poe, S. and Vrbsky, S.V., 2013, April. Comparing [31] XML Metadata Interchange (XMI) Specification Version 2.5.1.
nosql mongodb to an sql db. In Proceedings of the 51st ACM OMG. http://www.omg.org/spec/XMI/2.5.1, Accessed, April
Southeast Conference (p. 5). ACM. 2017.
[9] Abramova, V. and Bernardino, J., 2013, July. NoSQL databases: [32] Fondement, F. and Silaghi, R., 2004, October. Defining model
MongoDB vs cassandra. In Proceedings of the international C* driven engineering processes. In Third International Workshop in
conference on computer science and software engineering (pp. 14- Software Model Engineering (WiSME), held at the 7th
22). ACM. International Conference on the Unified Modeling Language
[10] Dede, E., Govindaraju, M., Gunter, D., Canon, R.S. and (UML).
Ramakrishnan, L., 2013, June. Performance evaluation of a [33] Miller, J. and Mukerji, J., 2003. MDA Guide Version 1.0. 1.
mongodb and hadoop platform for scientific data analysis. In [34] Bézivin, J. and Gerbé, O., 2001, November. Towards a precise
Proceedings of the 4th ACM workshop on Scientific cloud definition of the OMG/MDA framework. In Automated Software
computing (pp. 13-20). ACM. Engineering, 2001.(ASE 2001). Proceedings. 16th Annual
[11] Nayak, A., Poriya, A. and Poojary, D., 2013. Type of NOSQL International Conference on (pp. 273-280). IEEE.
databases and its comparison with relational databases. [35] Budinsky, F., 2004. Eclipse modeling framework: a developer's
International Journal of Applied Information Systems, 5(4), pp.16- guide. Addison-Wesley Professional.
19.
[36] Jouault, F. and Bézivin, J., 2006, June. KM3: a DSL for
[12] Larson, P.A., Clinciu, C., Fraser, C., Hanson, E.N., Mokhtar, M., Metamodel Specification. In International Conference on Formal
Nowakiewicz, M., Papadimos, V., Price, S.L., Rangarajan, S., Methods for Open Object-Based Distributed Systems (pp. 171-
Rusanu, R. and Saubhasik, M., 2013, June. Enhancements to SQL 185). Springer Berlin Heidelberg.
server column stores. In Proceedings of the 2013 ACM SIGMOD
International Conference on Management of Data (pp. 1159- [37] Bruneliere, H., Garcia, J., Desfray, P., Khelladi, D.E., Hebig, R.,
1168). ACM. Bendraou, R. and Cabot, J., 2015, July. On Lightweight
Metamodel Extension to Support Modeling Tools Agility. In
[13] Rallapalli, S. and Minalkar, A., 2015, August. Map Reduce European Conference on Modelling Foundations and Applications
Programming for Electronic Medical Records Data Analysis on (pp. 62-74). Springer International Publishing.
Cloud Using Apache Hadoop, Hive and Sqoop. In 2015 5th
International Conference on IT Convergence and Security [38] Robert, S., Gérard, S., Terrier, F. and Lagarde, F., 2009, August.
(ICITCS) (pp. 1-4). IEEE. A lightweight approach for domain-specific modeling languages
design. In Software Engineering and Advanced Applications,
[14] Bach, M. and Werner, A., 2014, May. Standardization of NoSQL 2009. SEAA'09. 35th Euromicro Conference on (pp. 155-161).
database languages. In International Conference: Beyond IEEE.
Databases, Architectures and Structures (pp. 50-60). Springer
International Publishing. [39] Fuentes-Fernández, L. and Vallecillo-Moreno, A., 2004. An
introduction to UML profiles. UML and Model Engineering, 2.
[15] Grov, J. and Ölveczky, P.C., 2014. Formal modeling and analysis
of Google’s Megastore in Real-Time Maude. In Specification,
Algebra, and Software (pp. 494-519). Springer Berlin Heidelberg.
158
7th International Conference on Information Society and Technology ICIST 2017
Abstract— The e-health platform Pinga is an integrated public or protected. Public services provide access to
health information system represents a central electronic public data about the health organizations and specialties,
system, in which all medical and health related information doctors, resources, available timeslots etc. Protected
about patients, health workers, facilities, documents, and services provide access to patient data, referral, exam and
procedures is stored and processed. The platform is prescription details etc. Protected services require
implemented in Serbia as MojDoktor (My Doctor), and authentication. For the functionalities not covered by any
Macedonia as MojTermin (My Appointment). The existing system a web interface provides authorized
architecture of the system was designed to allow for process access. The organizations, that did not use any kind of
oriented development with agile methodologies. This information system before, use the web interface. A
methodology allowed for fast deployment and adoption but complete high-level architecture of the system is given in
a change in the architecture to a more formal approach is Figure 1.
required to assure its extensibility, soundness,
From a data collection point of view the system
interoperability and standardization. In this paper we
continuously creates new and updates existing records. In
propose a transformation of the design framework from
Process Oriented to a Model-Driven Architecture.
Serbia, the system creates records for over 250000
prescriptions, 80000 referrals and 190000 exams, each
working day. The system handles over 4800000 requests
I. INTRODUCTION through the APIs.
The health platform Pinga represents a central The authors in [8] recognize that long term strategy
electronic system, in which all medical and health related targets for a national e-health system should be
information is stored and processed on-line. The system standardization on a national level, progressing to full
allows integration with existing and future systems used interoperability on a European level. The current
by hospitals, clinics, the Ministry of Health, and other architecture design enabled an implementation of the
public health institutions [1]. Pinga can be classified as a processes and to create a functional system. This allowed
system that integrates several components: an EHR MojDoktor to be built and functional in 9 months.
(Electronic Health Record), Electronic Prescriptions, A change to a more formal approach is required in
Electronic Referrals, Hospital Stay and Surgeries order to transform the platform to be interoperable and
Information System, Laboratory Information System and a standardized, and do so in a secure and sound way. The
Radiology Information System. current implementation uses standardized international
Pinga keeps the data in a centralized system. The codes for doctor specialties, diagnosis (ICD10), drug
various health organizations integrated with the system generics (ATC). And the next step is to make the system
have autonomous heterogeneous systems who keep data compliant with EHR standards like HL7 or OpenEHR.
locally, but a part of that data is written in the central
database in the process of integration with Pinga. The II. RELATED WORK
platform was implemented in Macedonia as MojTermin When building an integrated health information system,
(MyAppointment). Pinga integrates all existing electronic the potential to incorrectly implement the requirements is
information systems used in public and private health very high, due to the complexity of the processes in health
organizations that have an agreement with the Public care. A change in any of the requirements introduces a
Health Fund. Integration is done through a precisely need for a correction of the implemented system, and can
defined API accessible through web services with a lead to regression.
HTTPS connection. The access to the web services can be
159
7th International Conference on Information Society and Technology ICIST 2017
This can be avoided if the system is established using a improvement initiatives in healthcare systems, so they
well-defined framework/architecture. Model Driven developed a Model - Driven framework and
Architecture (MDA) has proven to be one of the best implementation that allows local teams in medical
choices when requirements changes are expected [11]. organizations to specify the metrics to track their
MDA focuses on the consequent utilization of performance during an intervention, together with data
diagrammatic models, to describe the contextual situation, points to calculate these metrics. Based on the metric
which is successively transferred through different model specification the software generates appropriate data
layers into a technical model [2]. MDA specifies three collection pages.
layers: In [7] MDA augmented with formal validation was
1. Computation Independent Model (CIM), the used to develop m-health components promising
business model; portability, interoperability, platform independence and
2. Platform Independent Model (PIM), software domain specificity. The authors are developing systems
engineering model without technology aspects; based on inter-communicating devices worn on the body
(Body Area Networks) that provide an integrated set of
3. Platform Specific Model (PSM), technology-
related aspects of the target platform. personalized health-related services to the user. This
mobile healthcare application feeds captured data into a
Business models extracted from domain experts are healthcare provider’s enterprise information system.
transformed through the layers, to derive the software.
In their previous work [13] the same authors propose an
Rules defined for transformation between the layers allow
extension of the model-driven approach where formal
changes in the business process to be easily transformed
into correct code changes [3, 4]. methods are used to support the process of modelling and
model transformation. The semi-formal modeling with
Several attempts in using MDA for e-health systems UML is verified by using tools such as SPIN [17] for
have been published. The authors in [5] attempt to create a model checking and TORX [18] model testing. They
specific MDA for the healthcare domain. They needed a present [14], [15] and [16] as possible formal approaches
mechanism that fosters a sustainable tele-health platform to the transformation.
in terms of a methodology that provides an efficient way
In this paper we present a transformation from Process
to deploy new artifacts on the platform and ensure these
Oriented design to a Model-Driven architecture for the
artifacts are compliant to the platform properties. They
Pinga platform.
show how and MDA approach can be used in order to
build a methodological fundament for a systematic
creation of an application system. Their use-case is an III. METHODOLOGY
application for IT based workflow support for an By analysis of the requirements of health institutions,
interdisciplinary stroke care project. several functional modules have been discovered and
In [6], with MDA, tools are developed for data implemented in Pinga (Public Portal [9] [10], Electronic
collection to allow non-informatics experts to model Health Record, Reporting and BI, Integration etc.).
requirements in their local domain. They recognized a Different modules have disjoint sets of processes; each
problem in the process of implementing local can depend on one or more processes from other modules.
160
7th International Conference on Information Society and Technology ICIST 2017
161
7th International Conference on Information Society and Technology ICIST 2017
[11] Loniewski, Grzegorz, Emilio Insfran, and Silvia Abrahão. "A [15] Jones V (1995). Realization of CCR in C, In Bolognesi T, van de
systematic review of the use of requirements engineering Lagemaat J and Vissers C.A. (ed), LOTOSphere: Software
techniques in model-driven development." International Development with LOTOS, pp. 348-368, Kluwer Academic
Conference on Model Driven Engineering Languages and Publishers, 1995.
Systems. Springer Berlin Heidelberg, 2010. [16] Jones VM (1997) Engineering an implementation of the OSI CCR
[12] Truyen, Frank. "The fast guide to model driven architecture the Protocol using the information systems engineering techniques of
basics of model driven architecture." Cephas Consulting Corp formal specification and program transformation. University of
(2006). Twente, Centre for Telematics and Information Technology
[13] Jones, V. M., et al. "A formal MDA approach for mobile health Technical Report series no. 97-19. ISSN 1381-3625.
systems." (2004). [17] G.J. Holzmann, (2003) The Spin Model Checker: Primer and
[14] Correctness Preserving Transformations for the Early Phases of Reference Manual, AddisonWesley, ISBN 0-321-22862-6
Software Development; T.Bolognesi, D. De Frutos, R. Langerak, [18] J. Tretmans and A. Belinfante. Automatic testing with formal
D. Latella.I,IN Bolognesi T, van de Lagemaat J and Vissers C.A. methods. In EuroSTAR'99: 7th European Int. Conference on
(ed), LOTOSphere: Software Development with LOTOS, pp. 348- Software Testing, Analysis & Review, Barcelona, Spain,
368, Kluwer Academic Publishers, 1995. November 8-12, 1999. EuroStar Conferences, Galway, Ireland.
162
7th International Conference on Information Society and Technology ICIST 2017
Abstract— Supply chains are complex systems with silos of activities, financial performance, production, competition,
information that is very difficult to integrate and analyze. regulatory compliance, quality controls, device data and
The best way to effectively analyze these composite systems Internet.
is the use of business intelligence (BI). However, traditional Over the past few decades, the way in which companies
BI systems face many challenges that include processing of need to collate, analyze, report and share their data has
vast data volumes, demand for real-time analytics, enhanced changed dramatically. Organizations need to be more
decision making, insight discovery and optimization of adaptive, have increased access to information for
supply chain processes. Big Data initiatives promise to decision-making, and effectively deal with a rapidly
answer these challenges by incorporating various methods,
growing volume of data.
tools and services for more agile and flexibly analytics and
decision making. Nevertheless, potential value of big data in Today’s business environment demands fast supply
supply chain management (SCM) has not yet been fully chain decisions and reduced time from raw data to insights
realized and requires establishing new BI infrastructures, and actions. Typically, supply chains are capturing
architectures, models and tools. The first part of the paper enormous data volumes - including vast amounts of
discusses challenges and new trends in supply chain BI and unstructured data such as files, images, videos, blogs,
provides background research of big data initiatives related clickstreams and geo-spatial data, as well as data coming
to SCM. In this paper, the methodology and the unified from various sensors, devices, and social networks.
model for supply chain big data analytics which comprises Supply chain BI system proved to be very useful in
the whole BI lifecycle is presented. Architecture of the extracting information and knowledge from existing
model is scalable and layered in such a way to provide enterprise information systems, but in recent years,
necessary agility and adaptivity. The proposed BI model organizations face new challenges in term of huge data
encompasses supply chain process model, data and volumes generated through supply chain and externally,
analytical models, as well as insights delivery. It enables variety (different kind of structured and unstructured
creation of the next-generation cloud-based big data systems data), as well as data velocity (batch processing, streaming
that can create strategic value and improve performance of and real-time data). Most of the existing analytical
supply chains. Finally, example of supply chain big data systems are incapable to cope with these new dynamics
solution that illustrates applicability and effectiveness of the [1].
model is presented.
On the other hand, we have seen tremendous
advancements in technology like in-memory computing,
I. INTRODUCTION cloud computing, Internet of Things (IoT), NoSQL
databases, distributed computing, machine learning, etc.
As the globalized business environment is forcing Big data is a term that underpins a raft of these
supply chain networks to adapt to new business models, technologies that have been created in the drive to better
collaboration, integration and information sharing are analyze and derive meaning from data at a dramatically
becoming even more critical for the ultimate success. lower cost and while delivering new insights and products
Supply chains enterprise systems are experiencing a major for organizations in the supply chain.
structural shift as more organizations rely on a community
of partners to perform complex supply chain processes. The key challenges for modern supply chain analytical
While supply chains are growing increasingly complex, systems include [2]:
from linear arrangements to interconnected, multi- • Data explosion – supply chains need the right tools
echelon, collaborative networks of companies, there is to make sense of the overwhelming amount of data
much more information that needs to be stored, processed generated by a growing set of data internal and
and analyzed than there was just a few years ago. external sources.
Supply chain business intelligence is a collection of • Growing variety of data – most of the new data is
activities to understand business situations by performing unstructured or comes in different types and forms.
various types of analysis on the organization data as well • Data speed – data is being generated at high
as on external data from supply chain partners and other velocity which makes data processing even more
data sources (devices, sensors, social networks, etc.) to challenging.
help make strategic, tactical, and operational business
decisions and take necessary actions for improving supply • Real-time analysis - in today’s turbulent business
chain performance. This includes gathering, analyzing, climate the ability to make the right decisions in
understanding, and managing high volumes of variety data real-time brings real competitive advantage. Yet
about operation performance, customer and supplier many supply chains do not have the infrastructure,
163
7th International Conference on Information Society and Technology ICIST 2017
tools and applications to make timely and accurate these applications, as well as examples of research
decisions. questions employing big data that stem from management
• Achieving simplified deployment and management theories [9].
– despite its promise, big data systems can be Identifying specific ways that big data systems can be
complex, costly and difficult to deploy and leveraged to improve specific supply chain business
maintain. Supply chains need more flexible, processes and to automate and enhance performance
scalable and cost effective infrastructure, platforms becomes crucial for ultimate business success. The
and services, such as those offered in cloud. information and analytics delivered would be used to
To succeed in a competitive marketplace, an agile improve supply planning, vendor negotiations, capacity
supply chain requires a special big data analytical system planning, warehousing and transportation performance,
to quickly anticipate, adapt, and react to changing business productivity, shop floor performance, materials
conditions. BI systems provide sustainable success in a requirements planning, distribution and customer service.
dynamic environment by empowering business users at all Organizations can apply big data and BI in the
levels of the supply chain and enabling them to use following supply chain areas [10]:
actionable, real-time information. • Plan Analytics — balancing supply chain resources
with requirements.
II. BACKGROUND REASEARCH
• Source Analytics — improving inbound supply
During the past two decades organizations have made chain consolidation and optimization.
large investments in SCM information systems in order to • Make Analytics — providing insight into the
improve their businesses. However, these systems usually manufacturing process.
provide only transaction-based functionality and mostly
maintain operational view of the business. They lack • Deliver Analytics — improving outbound supply
sophisticated analytical capabilities required to provide an efficiency and effectiveness.
integrated view of the supply chain. On the other hand, • Return Analytics — managing the return of goods
organizations that implemented some kind of enterprise effectively and efficiently.
business intelligence systems still face many challenges Some of the most important data-driven supply chain
related to data integration, storage and processing, as well management challenges can be summarized as follows
as data velocity, volume and variety. Additional issues [11]:
include lack of predictive intelligence features, mobile
analytics and self-service business intelligence capabilities • Meet rising customer expectations on supply chain
[3]. management.
The current approaches to BI has some fundamental • Increase costs efficiency in supply chain
challenges when confronted with the scale and management.
characteristics of big data: types of data, enterprise data • Monitor and manage supply chain compliance &
modeling, data integration, costs, master data risk.
management, metadata management, and skills [4]. • Make supply chain traceability and sustainability a
The big data phenomenon, the volume, variety, and priority.
velocity of data, has impacted business intelligence and • Remain agile and flexible in volatile times and
the use of information. New trends such as fast analytics markets.
and data science have emerged as part of business
Supply chains which implemented certain big data
intelligence [5].
systems have achieved the following benefits [12]:
Sixty-four percent of supply chain executives consider
big data analytics a disruptive and important technology, • Improvement in customer service and demand
setting the foundation for long-term change management fulfillment.
in their organizations [6]. Ninety-seven percent of supply • Faster and more efficient reaction time to supply
chain executives report having an understanding of how chain issues.
big data analytics can benefit their supply chain. But, only • An increase in supply chain efficiency.
17 percent report having already implemented analytics in
• Integration across the supply chain.
one or more supply chain functions [7].
While data science, predictive analytics, and big data • Optimization of inventory and asset productivity.
have been frequently used buzzwords, rigorous academic Despite new business requirements and technology
investigations into these areas are just emerging. innovations, big data methods, models and applications in
Even though a hot topic, there is not many research SCM still need to be researched and studied.
related to big data analytics in SCM. Most of the papers In the subsequent sections we present the supply chain
deal with big data potential, possible applications and big data model, software architecture and example of big
value propositions. data analytical system.
Wamba and Akter provide a literature review of big
data analytics for supply chain management [8]. They III. MODELS AND METHODS
highlighting future research directions where the Big data analytics has to do more with ideas, question
deployment of big data analytics is likely to transform and value, than with technology. Therefore, the big data
supply chain management practices. Waller and Facett analytics methodology is a combination of sequential
examine possible applications of big data analytics in execution of tasks in certain phases and highly iterative
SCM and provide examples of research questions from execution steps in certain phases.
164
7th International Conference on Information Society and Technology ICIST 2017
The big data analytics methodology is a combination of • Build the production ready system - architect and
sequential execution of tasks in certain phases and highly develop the end state solution
iterative execution steps in certain phases. Because of the • Measure and monitor - measure effectiveness of
scale issue associated with supply chain big data system, the big data analytics solution
an incremental approach is recommended, which include Figure 1 provides a high-level view of the big data
modifying and expanding processes gradually across analytics methodology, and designers (i.e., architects,
several activities as opposed to designing a system all at analysts, data modelers, etc.) are advised to iterate through
once [13]. the steps presented. Several cycles of design and
• Analyze and evaluate supply chain use case - frame experimentation during steps 2 through 5 should be
the problem, gather sample data and perform data completed. Each cycle should include additional and
discovery and analysis. larger data samples and apply different analytics
• Develop business hypotheses - assemble techniques as appropriate for data and relevant for solving
illustrative supply chain use cases and perform fit- the supply chain problem. The entire framework should be
gap analysis. revised periodically after the go-live in order to maintain
• Build and prepare data sets - acquire data and quality and accuracy of the supply chain analytical
understand data characteristics, system.
• Select and build the supply chain analytical models
- test and validate with data, apply data
visualization techniques and review results
165
7th International Conference on Information Society and Technology ICIST 2017
The proposed big data model unifies processes, of metadata and annotations, and allows all supply chain
methodologies and tools into a single business solution. participants to contribute their knowledge to build a
The model has been developed in such a way to various data models which can be integrated into specific
seamlessly integrate within overall BI and collaboration applications and services.
framework [14]. It is process-centric, metrics-based and In order to provide more flexibility, two big data stores
modular. It introduces the new supply network and data are designed: supply chain enterprise multidimensional
modeling approaches, as well as layered application data warehouse and special in-memory tabular model for
architecture which enables creation of composite BI processing large amount of data. Combined with specific
systems. cloud analysis services, it is possible to design different
The data integration layer supports various data types analytical models. For example, stream analytical services
(relational, unstructured, streaming, OLAP, etc.) via cloud can be used to set up real-time analytic computations on
ETL services. The data management layer is based on the data streaming from devices, sensors, e-commerce sites,
Hadoop engine but with additional services which provide social media, information systems, infrastructure systems,
more flexible data models and querying. The analytical etc. Another example is cloud machine learning service
layer hosts various analytical models and schemas. These that enables supply chain participants to easily build,
can be exploration, data mining, or performance deploy, and share predictive analytics solutions (i.e.
monitoring models. The final insights layer provides forecasting sales or inventory data) [15].
insights to all users such as self-service BI, data search, Finally, information derived from such analytical
collaboration and performance monitoring. The central models need to be delivered to decision makers in timely
component of this layer is specialized supply chain BI and user-friendly way. For this purpose, a special web
portal as the unifying component that provides integrated portal is used. It acts as a single point of data analysis,
analytical information and services, and also fosters collaborative decision making.
collaborative decision making and planning.
In order to demonstrate our approach, we have designed
This approach utilizes various cloud data management a supply chain BI solution for analysis of supplier quality
services such as ETL jobs for data extraction, cleansing within the supply chain. Data from different sources
and import, as well as event hubs that acts as a scalable (relational database, files, and web feeds) is integrated via
data streaming platform capable of ingesting large cloud ETL job into the in-memory tabular data store.
amounts of high-velocity data form sensors and IoT
Various analytical reports are designed using different
devices throughout supply chain. technologies and services. All these BI artifacts (reports,
Additionally, the supply chain-wide data catalog was charts, maps, etc.) are than integrated in the web
created in the cloud. It is fully managed cloud service that dashboards as shown in Figure 3.
enables users to discover, understand, and consume data
sources. The data catalog includes a crowdsourcing model
166
7th International Conference on Information Society and Technology ICIST 2017
Supplier quality analysis is a typical supply chain task. Each of the analytical segments on the dashboard can
Two primary metrics in this analysis are: total number of be further investigated by using various drill-down
defects and the total downtime that these defects caused. reports. For example, if user wants to analyze how plants
This sample has two main objectives: dela with defective materials and the downtime, by
• Understand who the best and worst supply chain clicking on the map segment, it opens supplier analysis
suppliers are, with respect to quality. dashboard (Figure 4) that can be used for deeper analysis
and filtering in order to derive meaningful knwoledge
• Identify which plants do a better job finding and about supplier management processes and take corrective
rejecting defects, to minimize downtime. actions.
167
7th International Conference on Information Society and Technology ICIST 2017
168
7th International Conference on Information Society and Technology ICIST 2017
Abstract—Operations over data represent the main this, the abstract application models do not define user-
functionality of web business applications. User-specific specific operations. Model-driven tools do not support
operations extend the set of common create, read, update, user-specific algorithm modeling. Such modeling is a
and delete (CRUD) operations. As user-specific operations necessary development step and tools ought to support it.
require an underlying algorithm definitions, they are not as Without the support, developers use custom manual
generic as CRUD operations are. By this, user-specific workarounds outside of the model-driven approach. The
operations do not have an extensive model-driven workarounds take place after the code generators produce
development support and the abstract models of existing the application core features. Such approach makes
model-driven tools do not support such algorithms by portability of user-specific operations a difficult process.
default. Without a proper development support, developers The same operations have different definitions and
use custom manual workarounds to develop user-specific implementations for different platforms.
operations. This paper proposes a new model-driven
In this paper, we have addressed the problem of a
approach that supports user-specific operation definitions
through visual algorithm modeling. The approach also
model-driven support for user-specific operations through
supports an automated code writing from the
several steps. In Section 2, we have pointed out the lack of
aforementioned algorithm model. The proposed approach
the support in the chosen representative model-driven and
supports unified declaration of the same user-specific traditional tools. Following in Section 3, we have
operation across different platforms. By this, the same proposed a new model-driven approach that supports a
operations would not have a different declaration for platform-independent specification of a user-specific
different platforms and a portability of operations would operation algorithm. The proposed specification takes the
not require an extra implementation effort. form of a customized flowchart diagram for algorithm
description. A flowchart support shifts the focus on
algorithm concepts rather than the underlying
I. INTRODUCTION programming language syntax. A flowchart algorithm
Web business applications are data-intensive and deal modeling supports four algorithm steps - declaration of
with various operations over data. The create, read, variables, assignment of values, if conditions and while
update, and delete (CRUD) operations are common for all loops. The proposed flowchart support is part of the
data-intensive applications. By this, all CRUD related OdinModel framework [4] for the web business
application features are suitable for the code generation. application development. The framework’s code
Such generic nature allows model-driven tools to have an generation facilities support automated implementation of
extensive CRUD development support. Model-driven user-specific operations, which we have illustrated
tools shift the focus of application development from code through an example in Section 4. At last, in Section 5, we
writing to higher-level abstract modeling [1, 2]. The resonate about proposed solution benefits and future work
model represents a formal, complete, and consistent directions.
abstraction of an application, as stated in [3], from which
code generation facilities write the target platform code. II. RELATED WORK
We consider a platform as any set of concrete For our research, we have chosen representative tools
development tools in any programming language. that support user-specific operation implementation. To
Business web applications may also provide a set of our knowledge, there are only two model-driven tools that
user-specific operations. User-specific operations are support platform-independent algorithm specification by
unique for each application and deal with the requirements default. The first one is WebDSL, which supports
that CRUD operations do not cover. Calculating cars fuel algorithm definition through internal domain-specific
efficiency or orders total price are examples. Such language (DSL) [5]. By manual DSL code writing,
operations require an algorithm description, i.e. a self- developers may define all steps of an algorithm.
contained systematic set of expressions such as declaring Comparing WebDSL with our proposed approach, we
variables and assigning values. By this, user-specific may say WebDSL is locking developers to its native
operations are not as generic as CRUD operations are. textual DSL. The OdinModel flowchart represents
algorithms through visual diagram which is Unified
User-specific operations do not have the same model-
Modeling Language (UML) alike. By this, our approach
driven development support as CRUD operations do. By
169
7th International Conference on Information Society and Technology ICIST 2017
170
7th International Conference on Information Society and Technology ICIST 2017
171
7th International Conference on Information Society and Technology ICIST 2017
Results presented in this paper are part of the research [6] Popovic, Aleksandar, Ivan Lukovic, Vladimir Dimitrieski, and
conducted within the Grant No. III-44010, Ministry of Verislav Djukic. "A DSL for modeling application-specific
functionalities of business applications." Computer Languages,
Education, Science and Technological Development of the Systems & Structures 43 (2015): 69-95.
Republic of Serbia. [7] Flowgorithm [Online] Available: http://www.flowgorithm.org/
(Current March 2017)
REFERENCES [8] Larp [Online] Available:
[1] Cuadrado, J.S., Izquierdo J.L.C., Molina J.G.: Applying model- http://www.marcolavoie.ca/larp/en/default.htm (Current March
driven engineering in small software enterprises. In: Science of 2017)
Computer Programming 89. pp.176-198 (2014) [9] Raptor [Online] Available: http://raptor.martincarlisle.com/
[2] Čalić, T., Dascalu S., Egbert D.: Tools for MDA software (Current March 2017)
development: Evaluation criteria and set of desirable features. In: [10] Visual logic [Online] Available: http://www.visuallogic.org/
Information Technology: New Generations 5. pp. 44-50. IEEE (Current March 2017)
(2008) [11] DRAKON Editor [Online] Available: http://drakon-
[3] Voelter, M., Benz S., Dietrich C., Engelmann B., Helander M., editor.sourceforge.net/ (Current March 2017)
Kats L.C., Visser E., Wachsmuth G.: DSL engineering: Designing, [12] Blockly [Online] Available:
implementing and using domain-specific languages. dslbook. org https://developers.google.com/blockly/ (Current March 2017)
(2013)
[13] Cook, S.: Domain-specific modeling and model driven
[4] Balać, V., Vidaković, M. Extendable Multiplatform Approach architecture. MDA Journal (2004)
to the Development of the Web Business Applications. In:
[14] Selic, B.: The pragmatics of model-driven development. In: IEEE
Konjović, Z., Zdravković, M., Trajanović, M. (Eds.) ICIST
software 20, no. 5. pp.19-25 (2003)
2016 Proceedings Vol.1, pp.22-27 (2016)
[15] Sirius [Online] Available: https://eclipse.org/sirius/ (Current
[5] Groenewegen, D.M., Hemel Z., Kats L., Visser E.: Webdsl: a
March 2017)
domain-specific language for dynamic web applications. In:
Companion to the 23rd ACM SIGPLAN conference on Object- [16] Acceleo [Online] Available: https://eclipse.org/acceleo/ (Current
oriented programming systems languages and applications. pp. March 2017)
779-780. (2008)
172
7th International Conference on Information Society and Technology ICIST 2017
Abstract—Increase in number of internet users in recent Binding component representation with its type –
years conditioned reportion of work between client and each data type should have own GUI
server applications. Client applications had to become representation. Implementing this feature would
richer, aware of data model and states. Front-end provide necessary flexibility while maintaining
developers now have to shape not only views but also states, language simple.
transitions, communication with server side etc. Each of For this implementation Model-Driven Software
those requires mastering different formalism that can
Development (MDSD) approach was used. This approach
represent an additional burden for a developer. The main
suggests that development foundation should be model
goal of this implementation is to provide a uniform and
instead of code. Domain specific abstractions are used for
consistent manner of describing graphical user interface
(GUI) for data-oriented web application on higher
formal modeling as their use leads to improvement in
abstraction level. This paper presents the implementation of
productivity, quality and ease of maintenance.
domain-specific language along with code generator Domain specific languages [2] offer increase in
developed for this language. GenAn DSL is implemented productivity and improvement in software quality with
using TextX meta-language. This project aims to enable high-level abstractions strongly related to the applied
users to write simple, platform-independent, model-based domain can partially solve this problem.
application descriptions that include GUI specification. Code generator [3] is automated process that accesses
Considerable benefit of this approach is that it leads to models, extracts data and transforms it to code ready to be
significant decrease in the number of code lines necessary interpreted or compiled to executable code. Generated
for application description, higher component reusability code is executable, complete, of production quality and
and faster development process. does neither require additions nor adjustments. Meta-
model (modeling language with its concepts, semantics
Keywords: DSL, GUI, code generator, TextX and rules) controls the generation. Domain framework
[3][4] sets the basis for interpreting software solution and
I. INTRODUCTION architecture for future development. Among other, domain
framework prevents duplication of generated code and
enables integration with existing code.
Since the beginning of year 2000th, the number of Arpeggio parser [5] is recursive, descending parser with
internet users has grown rapidly [1]. Users felt the backtracking and memorization (packrat parser). It was
necessity for web application rich with interactions and developed on the Faculty of Technical Sciences in Novi
diverse components which required intensifying Sad by a team headed by professor Igor Dejanovic.
communication with servers. This trend imposed shifting Arpeggio is available on GitHub under the MIT license. It
part of processing, validations and business logic from supports grammar definition using PEG notation or
server-side to client-side applications. The main goal was Python programming language. Its primary goal is to
to diminish the load on server-side which became unable provide ground for domain specific language development
to handle such amount of traffic. Newly acquired – more specifically infrastructure for TextX parsing.
complexity brought additional responsibility for
developers who need to master different formalisms for TextX [6] is a meta-language for domain specific
describing different client-side application segments. language development based on Arpeggio parser. As
Developers lack a consistent mechanism for application Arpeggio, it was developed on the Faculty of Technical
description on high abstraction level. Sciences in Novi Sad. TextX was built to aid textual
language development enabling custom language
The goal of this paper was to address following development as well as providing support for already
problems: existing language or particular data format.
Minimizing number of lines of code. This could be MEAN [7] is full-stack JavaScript solution for creating
solved by specifying model on higher abstraction fast and robust web applications using MongoDB,
level. Express.js, AngularJS and Node.js.
Increase code/component reusability. User should The source code of GenAn can be obtained at [8].
be able to specify data model or GUI component GenAn is available under the terms of MIT license.
once and reuse it later. The rest of the paper is structured as follows: in Section
II related work has been reviewed. Section III presents
173
7th International Conference on Information Society and Technology ICIST 2017
174
7th International Conference on Information Society and Technology ICIST 2017
Figure 3. View concept definition Footer is GUI component position in the bottom of the
page, beneath main content and often contains basic data
Page concept represents autonomous web application about application such as the name of the company in
page and possesses the name, entity data, title, data about charge of maintenance and development, current date,
common GUI components, sub-views and their layout. license etc. Figure 8. shows footer definition.
Page definition is given in Figure 4.
B. Code generator
Before code generation, syntax checks are performed.
These checks are based on mechanisms supported by
TextX known as object processors. Object processors
allow concept specific checks for every meta-model
Figure 4. Page concept definition
concept.
GenAn code generator consists of two independent
Form is a GUI component dedicated to data
manipulation. It is represented as regular HTML form parts – client and server application generator that
with additions to support required technology. In addition, communicate over HTTP [19] protocol compliant to
it contains data about the object for display and set of REST architectural style [20] which makes them
actions available to user. Form definition is given in independent and replaceable. The routes are generated
Figure 5. automatically according to data entity (object) description
from the model. At the moment, only CRUD and filter
operations are supported.
Presented software version supports AngularJS [21]
client-side and Node.js [22] server-side with MongoDB
[23] database (MEAN stack application).
GenAn code generator implements visitor design
Figure 5. Form concept definition pattern. From the model, TextX generates Python class
tree. Each element carries data about the manner it should
Menubar, sidebar and footer are common GUI be handled and translated to goal language. Visitor design
components. Menubar and sidebar are displayed as pattern enables uniform tree walk and reduces the load on
references (hyperlinks). If the menu item leads to
generator’s code by shifting control to particular meta-
generated page, its name should be set with prefix “@”
(for example @user_page). The page is referenced by model classes. The generation process is supported by
URL by setting attribute ref (for example Jinja2 template engine.
ref=’http://google.com’). The first step in the generation process is the
The Menubar is graphical control element. It instantiation of TextX interpreter. In this process, TextX
contains sub-menus filled with actions that user can built-in types are instantiated. Fixed built-in types save
trigger or links to pages to which user can shift control. user’s time (the since user does not have to define them in
Menubar meta-model is given in Figure 6. every model) and reduce the number of code lines for
model description. In GenAn DSL, built-in types are
focused on predefined GUI components – views.
The installation process begins with domain framework
instantiation. The framework represents a basis for
AngularJS application with configuration files and often
used components as a generic module for communication
with the server, localization module, GUI layouting
Figure 6. Menubar concept definition module, data transfer module, basic CSS and font files.
Object generation starts with foreign key and
The Sidebar is graphical control element that displays relationship name (it should be unique) check. Afterward,
various forms or information to the left or right side of a model is transformed by adding a new (meta-)property
application GUI. In business application, it often shows that represents that relationship along with the view
hyperlinks - shortcuts to frequently used operations. necessary for this relationship display on GUI.
Sidebar definition is given in Figure 7. Generation process continues with query conversion to
the format capable of being sent as HTTP GET method
parameter. After that, infrastructure for communication
with server-side is provided (AngularJS factory for
CRUD and query operation in referent implementation).
Figure 7. Sidebar concept definition Page generation begins with retrieving references to
AngularJS factories for objects represented in that page.
175
7th International Conference on Information Society and Technology ICIST 2017
176
7th International Conference on Information Society and Technology ICIST 2017
V. CONCLUSION
Figure 11. GenAn description of user object and user_form page
The primary goal of this paper was to present
implementation of domain specific language GenAn and
For this application, one object for entity user was code generator for this language. GenAn aimed to provide
created with properties username (GenAn text type), e- an uniform and consistent manner of describing graphical
mail (GenAn email type) and gender (GenAn combobox user interface for data-oriented web application. Those
type). Also, a description for user_form_page was given. applications, especially in business domain require very
It represents a form for CRUD operations for one user. few designer decisions so design aspects should be
described in a simple and effortless way. GenAn DSL is
created for developers as domain experts and provides fast
scaffolding with support for detailed GUI description. The
generator is pluggable [24] and allows user to integrate
custom code generators to support other platforms. S
server and client-side application are generated from the
model but using different generators. That decision
offered greater flexibility and freedom to end users to re-
implement only one of two code generators. Generators
are implemented according to existing standards, readable
and efficient, with support for validation, localization and
query definition.
REFERENCES
[1] Internet Live Stats – Internet Users,
http://www.internetlivestats.com/internet-users/, January 2017.
[2] M. Voelter, DSL Ingineering: Designing, Implementing and Using
Domain-Specific Languages, 2010-2013.
Figure 12. GenAn description of home page [3] S. Kelly, J.-P. Tolvanen, Domain Specific Modeling: Enabling
Full Code Generation, Wiley-IEEE Computer Society Press, 2008
Home page is initial application page. It contains [4] G. Milosavljević, materijali sa predmeta Metodologije brzog
razvoja softvera, Fakultet tehničkih nauka, Novi Sad, 2016
menubar, sidebar and footer (standard GUI components
[5] I. Dejanović, G. Milosavljević, R. Vaderna, Arpeggio: A flexible
described in Figure 12). Home page consists of jumbo PEG parser for Python, 2015
Welcome panel and two paragraphs. Generated GUI is [6] Zvanična dokumentacija za TextX, http://igordejanovic.net/textX/
given in Figures 13 and 14. [7] MEAN stack, http://mean.io/, January 2017.
[8] GenAn repository, https://github.com/theshammy/GenAn
[9] JHipster, https://jhipster.github.io/, January 2017.
[10] Yeoman, http://yeoman.io/, January 2017.
[11] JDL, https://jhipster.github.io/jdl/, January 2017.
[12] Sifu documentation, https://docs.codesifu.com/, January 2017.
[13] WebDSL, http://webdsl.org/, January 2017.
[14] D.Groenewegen, Z. Hemel, L. Kats, E. Visser, WebDSL: A
Domain-Specific Language for Dynamic Web Applications, Delft
University of Technology, 2008
[15] Stratego/XT, http://strategoxt.org/, January 2017.
[16] SDF,
Figure 13. Generated form for user object http://metaborg.org/en/latest/source/langdev/meta/lang/sdf3.html,
January 2017.
[17] Spoofax, http://metaborg.org/en/latest/, January 2017.
[18] TextX built-in objects,
http://igordejanovic.net/textX/metamodel/#built-in-objects,
January 2017.
177
7th International Conference on Information Society and Technology ICIST 2017
[19] HTTP, https://tools.ietf.org/html/rfc2616, January 2017. [22] Node.js, https://nodejs.org/en/, January 2017.
[20] REST, [23] MongoDB, https://www.mongodb.com/, January 2017.
http://www.ics.uci.edu/~fielding/pubs/dissertation/rest_arch_style. [24] L. Nikolic, B.Zoranovic, I.Dejanovic, G. Milosavljevic, A
htm, January 2017. framework for building component software in Python, ICIST
[21] AngularJS, https://angularjs.org/, January 2017. 2017
178
7th International Conference on Information Society and Technology ICIST 2017
Abstract—In this paper we present MicroBuilder, the tool software solution. Their functionalities are usually
used for the specification of software architecture that exposed in the form of REST API specification, while
follows REST microservice design principles. MicroBuilder inter-service communication is based on lightweight
comprises MicroDSL and MicroGenerator modules. The protocols such as the HTTP [9]. In this way, engineers
MicroDSL module provides the MicroDSL domain-specific can independently build parts of application as
language used for specification of microservice architecture. microservices and deploy them separately to web servers
In this paper we present the MicroDSL meta-model, [7].
specified in Ecore, the concrete syntax of the MicroDSL and A large number of microservices per application leads
examples of its usage. The MicroGenerator module is used to new challenges: (i) user request acceptance and
to generate executable program code based on MicroDSL routing, i.e. providing a unified access interface and a
specifications. single entry point to the whole microservice ecosystem,
(ii) microservice auto-discovery and registering, i.e.
providing a single point for microservice instances
I. INTRODUCTION monitoring and microservice name, host and port registry,
and (iii) load balancing, providing an improvement of
In the last few decades, due to the development of workload distribution across the microservices [10]. In
Internet-related technologies, Representational State order to provide a solution to these challenges, a
Transfer (REST) web service software architectures redundant program code that covers the same
become more popular, due to their simplicity, functionality needs to be written at different layers of the
interoperability and scalability [1]. Software applications software architecture. This often leads to mistakes as a
which follow this software architecture style comprise a developer introduces unintentional errors to the repetitive
set of REST web services that are using Hypertext code constructs. Furthermore, the configuration of
Transfer Protocol (HTTP) [2]. Each REST web service microservice architecture is not a trivial task and it
provides a certain set of functionalities exposed to requires a complex and redundant work to be performed.
different web clients through an Application Programming Therefore, in order to alleviate and speed up such a
Interface (API) specification [3]. REST web services can process, it could be beneficial for a developer to have a
be organized in a form of: (i) software units that can exist language with concise set of concepts which are specific
and run only within the core application to which they to the domain of REST microservice architecture
belong, or (ii) loosely coupled software units that can exist development. Such a language should allow developers to
and run independently from the core application [4]. have a single specification of a microservice without
Therefore, there are two main approaches in development writing any boilerplate or redundant code. In Model-
of REST web service software architectures: (i) through Driven Software Engineering (MDSE) such a language
monolithic software applications and (ii) through that is tailored to a specific application domain is called a
microservice software applications [5, 6]. Domain-Specific Language (DSL). The main goal of the
Monolithic software applications are composed of research presented in this paper is to provide such a
software units that are not independent from the core language and use it to solve real-world problems.
application which they belong to [7]. Development of In this paper, we present a model-driven software tool
monolithic applications was dominant programing style for the REST microservice architecture specification and
used by a majority of engineers over the past several program code generation, named MicroBuilder.
decades. This type of development approach forced MicroBuilder comprises the following modules: (i)
engineers to work in large teams which resulted in MicroDSL, the module that provides a DSL for the
software solutions that were hard to maintain and specification of REST microservice software architecture,
understand because of their size and architecture (ii) MicroGenerator, the module which comprises set of
complexity [7]. Microservice software applications were code generators used to generate executable program code
introduced as suits of small, independent and one-purpose based on MicroDSL specifications. To develop the
software units, called microservices [8]. Microservices MicroBuilder tool we have used the Eclipse Modeling
have a small set of responsibilities and have separate Framework (EMF) [11]. MicroDSL concrete syntax is
database storage. Thus, microservices are autonomous developed using the Xtext language, while individual code
software units which can be written in different
programming languages and still be a part of the same
179
7th International Conference on Information Society and Technology ICIST 2017
generators within the MicroGenerator module are Generated program code consists of: (i) Netflix Zuul
developed using the Xtend language [12,13]. microservice, (ii) Netflix Eureka microservice, and (iii)
Apart from the Introduction and Conclusion, the paper user-defined microservices with basic REST interfaces
has four sections. In Section 2 we present the architecture for insert, update and delete operations over the
of MicroBuilder. The abstract syntax of MicroDSL is MongoDB database [18]. This way, a user is able to
introduced in Section 3. In the fourth section we present specify each microservice at one place, and necessary
the concrete textual syntax of the language, alongside Java code will be generated in all application layers
examples of its usage. In Section 5 we give an overview of where it is required.
the related work. Furthermore, if a user decides to use other frameworks
involved in REST microservice architecture
II. THE ARCHITECTURE OF MICROBUILDER development, there is no need to change a MicroDSL
In this section we present the architecture of the specification. Only a new code generator for a chosen
MicroBuilder tool. The overview of main modules and framework needs to be developed and added to the
their inputs and outputs are given in Figure 1. MicroBuilder tool as a plugin.
The MicroBuilder tool comprises two main modules: III. MICRODSL ABSTRACT SYNTAX
the MicroDSL module and the MicroGenerator module.
The MicroDSL module provides a domain-specific In this section we present the abstract syntax of the
language used for microservice software architecture MicroDSL language. The abstract syntax is specified in a
specification, named MicroDSL. The MicroDSL form of a meta-model that conforms to the Ecore meta-
language provides a set of concepts which are related to meta-model [19]. The developed meta-model is presented
the domain of REST microservice architecture in Figure 2. In the rest of this section, we present each of
development [7]. These concepts are presented in detail the MicroDSL concepts with the names of corresponding
in the third section of this paper where we describe the meta-model classes and attributes written in italics inside
language meta-model. For the specification of the parentheses.
MicroDSL concepts, the textual concrete syntax has been The main language concept is microservice
developed. Using the MicroDSL concrete syntax, user is architecture specification (McServiceArchitecture), which
able to specify the whole microservice architecture at one comprises all other language concepts. Such a
place, using just concepts related to the REST specification is identified by its name (name) and may
microservice architecture development domain. A also have a description (mcsDescription). Each
microroservice architecture specification is then used as microservice architecture specification comprises zero or
an input to the MicroGenerator module. The more microservices (McService). Each microservice is
MicroGenerator module provides a set of code generators described by the name (name) and its working group
which are used to generate executable program code for (mcsGorupId). Working group is used as microservice
the target execution platform. Currently, we have unique identifier within microservice architecture
developed a set of code generators for the Java specification. For each microservice, an HTTP port
programming language. Generated program code uses (mcsPort) is specified which will be used by the
Spring, Spring Cloud and NetflixOSS frameworks [14, microservice to receive user requests. Also, at this level,
15, 16]. These are all open-source frameworks developed it can be specified whether the microservice is to be
by Netflix, Amazon and Pivotal companies. These registered within microservice auto-discovery registry or
companies have become early adopters of microservice not (mcAutoDiscovery). Each microservice comprises a
architecture in large-scale software systems implemented set of microservice resources (McsResources) which may
in cloud architecture [17]. Aforementioned frameworks be one of the following: (i) proxy filters
provide a set of tools such as: (i) Zuul, for user request (McsProxyFilter), used for modeling custom user request
routing and filtering (ii) Eureka, for microservice auto- filters or (ii) microservice entities (McsEntity), used for
discovery and registry, (iii) Ribbon, for resilient and modeling microservice business entities. For a proxy
intelligent inter-process communication and load filter user may specify its type (mcFilterType) which
balancing, and (iv) Hystrics, for isolated latency and fault values are predefined in the filter type enumeration
tolerance among microservices. These tools are utilized (FilterType). Proxy filter type is used to specify different
in order to resolve some of the aforementioned challenges types of proxy filters: (i) pre, filter is executed before the
caused by a large number of microservices per user request has been routed, (ii) routing, filter is
application (c.f. Intoduction). The remaining challenges executed in the moment when user request is accepted,
concerning redundant coding and not so trivial (iii) post, filter is executed after the user request was
configuration of microservice architectures, using these routed, and (iv) error, filter is executed if an error
tools, can be resolved by using our MicroBuilder tool. occurred in the moment of user request handling. Further,
This would allow developers to specify microservice it is possible to define a logical expression that will
architecture and to generate executable program code. determine should a proxy filter be applied on user request
180
7th International Conference on Information Society and Technology ICIST 2017
or not (mcShouldFilter). Also, the priority of a proxy microservice business entity fields or REST method
filter application can be specified (mcFilterOrder). The parameters. Accordingly, when they are used to describe
microservice entity is defined as an aggregation of microservice business entity fields, the following may be
different resources (McEntityResource) that can be specified: (i) should this field be used as unique database
reached from the base URI (mcBaseURI). It can be identifier (mcIsUnique) or (ii) should the REST API
specified should a microservice entity be persisted within search operation for entity field be added to the generated
database storage or not (mcIsPersistable). Entity code (mcGenerateFindBy). Further, when the
resources can be classified as REST methods microservices attributes are used to describe REST
(McRESTMethod) or microservice attributes method parameter, the name and the attribute data type
(McServiceAttribute). REST methods are described by are specified. The microservice attribute data type
the base URL which represents a unique identifier for (McDataType) is described by data type multiplicity
REST API (mcMethodURL). URL description is used by (mcMultiplicity). User is able to specify both, simple
web clients to access REST method resources. Different (McDataTypeSimple) data types and complex
types of REST methods are predefined in the method type (McDataTypeComplex) data types. Simple data types are
enumeration (RESTMethodType), while the used for the specification of predefined data types such
implementation of method resources is specified within as: numbers, date, boolean and string. All predefined data
method body (mcMethodBody). For each REST method types are specified in the data type enumeration
both, method return type and method arguments can be (DataTypes). Within simple data types, user is also able
defined. These are modeled using to specify enumerations (mcIsEnum), alongside
RESTMethodReturnType and RESTMethodParameters enumeration literals (mcEnumLiterals). Complex data
aggregation relationships, respectively. Microservice types are used to describe user-defined data types, that
attributes (McServiceAttribute) can be used to describe
181
7th International Conference on Information Society and Technology ICIST 2017
182
7th International Conference on Information Society and Technology ICIST 2017
TABLE I.
THE COMPARISON OF NUMBER OF LINES FOR MICRODSL AND MANUALLY WRITTEN JAVA CODE
The number of lines of code in The number of manually
Microservice name
the MicroDSL specification written lines of code
User microservice 28 406
Total 98 1802
183
7th International Conference on Information Society and Technology ICIST 2017
integration of Web 2.0 functionalities, implemented as applications in the cloud using AWS lambda and monolithic and
REST services. The authors present a REST meta-model microservice architectures. In Cluster, Cloud and Grid Computing
(CCGrid), 2016 16th IEEE/ACM International Symposium on. pp.
for specifying these REST services at a conceptual level. 179-182, IEEE, May 2016.
In [24], the authors present a multi layered meta-model [6] H. Knoche. Sustaining runtime performance while incrementally
for REST applications, discuss the connection to REST modernizing transactional monolithic software towards
compliance and show an implementation of their microservices. In Proceedings of the 7th ACM/SPEC on
approach based on the proposed meta-model and method. International Conference on Performance Engineering. pp. 121-
In [25], the authors discuss the challenges of composing 124, ACM, March 2016.
REST web services and propose a formal model for [7] D. Namiot, M. Sneps-Sneppe. On micro-services architecture.
describing individual web services and automating the International Journal of Open Information Technologies. 2(9), pp.
24-27, 2014.
composition.
[8] N. Dragoni, S. Giallorenzo, A. L. Lafuente, M. Mazzara, F.
Montesi, R. Mustafin, L. Safina. Microservices: yesterday, today,
CONCLUSION and tomorrow. arXiv preprint arXiv:1606.04036, 2016.
In this paper we have presented a model-driven tool for [9] J. I. Fernández Villamor, C. A. Iglesias Fernandez, M. Garijo
a specification of REST microservice architectures, Ayestaran. Microservices: Lightweight service descriptions for
named MicroBuilder. Our goal was to automate and ease REST architectural style. 2nd International Conference on Agents
and Artificial Intelligence, Valencia, España, 2010.
the process of microservice software architecture
[10] A. Balalaie, A. Heydarnoori, P. Jamshidi. Migrating to cloud-
specification and configuration. In order to achieve this native architectures using microservices: an experience report. In
goal, we have developed the MicroDSL domain-specific European Conference on Service-Oriented and Cloud Computing.
language used for microservice software architecture pp. 201-215, Springer International Publishing, September, 2015.
modeling. First, we have developed the MicroDSL meta- [11] “Eclipse” [Online], Available:
model, specified using Ecore. The MicroDSL meta- http://projects.eclipse.org/projects/modeling. [Accessed: 22-
model represents the abstract syntax of the language. March-2017.].
Then, we have developed textual syntax which is used as [12] M. Eysholdt, H. Behrens, Xtext: “Implement your language faster
a visual representation of the MicroDSL modeling than the quick and dirty way”, in: Proceedings of the ACM
International Conference Companion on Object Oriented
concepts. Using the MicroDSL concrete syntax, users are Programming Systems Languages and Applications Companion,
able to specify the whole microservice architecture at one OOPSLA '10, ACM, New York, NY, USA, pp. 307-309, 2010.
place. We have also developed a set of code generators [13] “Xtend” [Online], Available: http://www.eclipse.org/xtend/
that are part of the MicroBuilder tool. These code [Accessed: 22-May-2017.].
generators are used to generate executable program code, [14] “Spring” [Online], Available: https://spring.io/ [Accessed: 22-
based on MicroDSL specification. March-2017.].
In our future research we plan to extend the MicroDSL [15] “Spring Cloud” [Online], Available:
meta-model with concepts that will enable the http://projects.spring.io/spring-cloud/ [Accessed: 22-March-
2017.].
specification of microservice architecture load balancing
[16] “NetflixOSS” [Online], Available: https://cloud.spring.io/spring-
and inter-service communication patterns. We also plan cloud-netflix/ [Accessed: 22-May-2017.].
to develop a graphical concrete syntax, that will be used [17] F. Montesi, J. Weber. Circuit Breakers, Discovery, and API
for the specification of inter-service communication Gateways in Microservices. arXiv preprint arXiv:1609.05830,
patterns. The graphical concrete syntax will not be used 2016.
as an alternative to existing textual syntax. We plan to use [18] “MongoDB” [Online], Available: https://www.mongodb.com/
them both in parallel in order to determine which [Accessed:22-03-2017.].
approach is more suitable for our users. The detail [19] “Ecore” [Online], Available https://github.com/occiware/ecore
analyses of our approach will be investigated and [Accessed:22-03-2017.].
presented in detail in a case study. [20] S. Pérez, F. Durao, S. Meliá, P. Dolog, O. Díaz. RESTful,
resource-oriented architectures: a model-driven approach.
ACKNOWLEDGMENT In International Conference on Web Information Systems
Engineering. pp. 282-294, Springer Berlin Heidelberg, December
The research presented in this paper was supported by 2010.
Ministry of Education, Science and Technological [21] H. Ed-Douibi, J. L. C. Izquierdo, A. Gómez, M. Tisi, J. Cabot.
Development of Republic of Serbia, Grant III-44010. EMF-REST: generation of restful apis from models. In
Proceedings of the 31st Annual ACM Symposium on Applied
Computing. pp. 1446-1453, ACM, April 2016.
REFERENCES
[22] M. Laitkorpi, P. Selonen, T. Systa: Towards a model-driven
[1] C. Pautasso: RESTful web services: principles, patterns, emerging process for designing restful web services. In Web Services, 2009.
technologies. In Web Services Foundations. pp. 31-51, Springer ICWS 2009. IEEE International Conference on IEEE. pp. 173-
New York, 2014. 180, July 2009.
[2] S. Mumbaikar, P. Padiya: Web services based on soap and rest [23] F. Valverde, O. Pastor. Dealing with REST services in model-
principles. International Journal of Scientific and Research driven web engineering methods. V Jornadas Científico-Técnicas
Publications, 3(5), May 2013. en Servicios Web y SOA, JSWEB, pp. 243-250, 2009.
[3] S. Vinoski: Restful web services development checklist. IEEE [24] F. Haupt, D. Karastoyanova, F. Leymann, B. Schroth. A model-
Internet Computing, 12(6), Novembr 2008. driven approach for REST compliant services. In Web Services
[4] C. Pautasso, O. Zimmermann, F. Leymann Restful web services (ICWS), 2014 IEEE International Conference on (pp. 129-136).
vs. “big” web services: making the right architectural decision. IEEE, June 2014.
In Proceedings of the 17th international conference on World [25] H. Zhao, P. Doshi. Towards automated restful web service
Wide Web ACM, pp. 805-814, April 2008. composition. In Web Services, 2009. ICWS 2009. IEEE
[5] M. Villamizar, O. Garcés, L. Ochoa, H. Castro., L. Salamanca, M. International Conference on IEEE, pp. 189-196, July 2009.
Verano, M. Lang. Infrastructure cost comparison of running web
184
7th International Conference on Information Society and Technology ICIST 2017
Abstract—To increase energy efficiency, heat pumps are because it realizes through parallel processes with certain
becoming part of an integral solution for residential heating, synchronization between them. The problem is even
ventilation, and air-conditioning. Depending on the bigger when the existing processes should be modified.
electricity tariff, coefficient of performance (COP) and For instance, if we want to improve the existing
thermal load, the heat pump can also be combined with functionality or add a new one.
other heat sources (smart grid). All these new possibilities In order to get to know a system better, it is necessary
additionally complicate the heat pump control. In On-Off to create its model. Simulation of such a system can give
control, special attention is paid to start-up/shut-down us answers to various questions. Different design solutions
sequences which repeat periodically during the entire life of
and different use cases can be tested. It is desirable for
the heat pump. For the design of control logic for start-
such a model to be the backbone through all phases of
up/shut-down sequences, the paper uses the model-driven
development, from the definition of requirements to the
approach instead of the traditional method. The design is
final realization. It enables an iterative and incremental
organized around the model which is developed iteratively
and incrementally through several use cases. Harel’s
development of the model. In each phase, starting from
statecharts implemented by Matlab tool Stateflow are used the easier ones, the model is corrected through several
in all phases of development: from the specification of iterations. Each phase is tested by simulation. After that,
requirements up to implementation. the model is added by a new increment, a new use case.
Depending on the complexity of such a model,
engineers of different profiles can participate in its
I. INTRODUCTION development. At least two profiles are necessary: process
Ever since the beginning of application of heat pumps engineer (thermodynamics specialist) and software
(HPs), one of the primary goals has been to increase their engineer (programmer). Mutual communication between
energy efficiency. The modern control theory offers new domain specialists and software specialists is particularly
possibilities in this field [1-3]. The lower prices of important. The first phase in the development of a control
sensors, actuators and computers allow practical system (CS) - proper defining and understanding of
realization of advanced control algorithms. Domestic HP requirements, is especially critical. On one side there is a
systems have, more or less, a standard hardware domain specialist who uses 'natural language' to describe
configuration, but control algorithms, most commonly functionalities. Textual specification can be imprecise,
provided by programmable logic controllers, are the key incomplete and inconsistent. On the other side there is a
factor which makes a system advantageous over the software specialist who uses, for example, a language
others. from the IEC 1131-3 family. Such a language does not
have to be understandable to a domain specialist.
In On-Off control of heat pumps, special attention is Verification of requirements in that case is more difficult.
paid to start-up/shut-down (SU/SD) sequences. The Every misunderstanding of requirements leads to bigger
realization of these sequences without errors is problems in later phases of design.
particularly critical during commissioning of the system.
The scope and type of problems which arise at that point There may arise a question which language to use in the
are a good indicator of the success of design. Their development of such a model. The essential thing which
solution may sometimes require only minor modifications, defines a modelling methodology is the language for
but sometimes it can lead back to the design goals. The model description. What can be expressed in models is
modifications in that case are expensive and demanding. determined by the language in which they are stated. The
The problems detected after a successful start-up can language specifies what terms can be incorporated in a
usually be solved through several iterations, but they do model (the vocabulary) and how these can be
not affect the basic operation of the system. The start-up meaningfully combined (the grammar) [4].
of the system is also important from the aspect of safety of The control logic of a heat pump is event-driven [5]. CS
both the employees and the equipment. responds to discrete events, such as mode switching by the
In HP systems with On-Off control, the SU/SD process customer, switching on/off by the thermostat, switching
repeats periodically during the entire life. Optimization of on/off by the pressure switch, mode switching by
these sequences, and not only the correct operation, may timetable, alarm occurrence, etc. The Statechart notation
contribute to the energy efficiency of the whole system. established by Harel [6] is used for describing discrete
logic. Statecharts are based on the finite state machine
The control of SU/SD sequences is not an easy task
185
7th International Conference on Information Society and Technology ICIST 2017
theory. The model consists of a set of states, a set of manner [7]. The Matlab® tool, Stateflow® [8], is used for
transitions between states, and a set of actions. The Harel's realization of statecharts diagrams in this paper.
statechart introduces three new mechanisms into the
conventional state-transitions diagram: hierarchy, II. HEAT PUMP SYSTEM
concurrency, communication. Thanks to these The HP system which is considered in the paper is
mechanisms, statecharts can be used for modelling schematically shown in Fig. 1.
complex discrete-event systems in a compact and modular
A more detailed description of operation of the in case of occurrence of an alarm or if the user changes the
installation can be found in [3], and the start-up/shut-down desired temperature of the thermal buffer. Even in case of
process is considered in this paper. any interruption of power supply, the system should
The start-up/shut-down of an HP system consists of a continue its operation where it has stopped. In that case,
series of sequential steps performed one after another in a the system should not execute the SU/SD process until the
precisely defined order. The processes cover the following end. The flow of activities is changed after the last
devices: BP, ESV, EMV, EXV, Comp. In each step, a new completed action. For example, if the signal ShutDown
device is started up/shut down. The first step is known, as appeared after the action EMV_On, the shut-down process
well as each next step up to the last one. The time that starts from that device. Similarly, the start-up may begin
passes between two steps is variable and has final values. during the shut-down sequence, too.
The SU/SD process is shown by means of the activity After the power is supplied, the control system can be
diagram in Fig. 2. described by four parallel processes which are executed
simultaneously (Fig. 3).
186
7th International Conference on Information Society and Technology ICIST 2017
the control strategy is determined according to the defined the On state (temporal logic operator
algorithm. after(TdOn ) ). This condition aims at preventing a
The communication with the user is performed by the too frequent shut-down of the pump;
module UserInterface. It covers the communication by - second, that the system is in the shut-down mode
means of the programmable graphic display or by means (~StartUp);
of the embedded web server.
- third, that the signal for the shut-down of the buffer
The module SUSD is responsible for the start-up/shut- pump has been received (~BP_On).
down of the system according to the activity diagram in
Fig. 2. In the Power_On state (Fig. 3), the SUSD state
communicates only with the ControlLogic state. There is
no direct communication among the Alarms,
UserInterface, and SUSD states. The messages are
exchanged only through ControlLogic. All events in the
outside world (sensors, user interface) are manifested
through the one-way message (StartUp) sent by
ControlLogic to the SUSD state. StartUp has two logic
values: True, which denotes starting up, and False, which
shuts down the system. This paper gives a more detailed
presentation of the SUSD state, which is responsible for
the realization of start-up/shut-down sequences of the HP Figure 5. First state in the chain
system.
The self-loop transition from Fig. 5 designates that BP,
III. START-UP/SHUT-DOWN SEQUENCES after a certain period spent in the On ( Tdnext ) state, if the
system is in the StartUp mode, sends a signal for
The composite SUSD state consists of five orthogonal
activation of the ESP pump (ESP_On=True).
substates shown in Fig. 4.
The second design pattern is used for modelling a
middle state from Fig. 4. They are the states which receive
the excitation for activation from a previous internal state.
Also, they send the activation message to a next internal
state. For example, the structure of the EXV state is
presented in Fig. 6.
187
7th International Conference on Information Society and Technology ICIST 2017
The third design pattern is used for the state where the
condition ~StartUp is sufficient for its shut-down
(OnOff). For all other devices, it is only one of two
conditions. As in the first pattern, there is nonsymmetry
between the conditions for start-up/shut-down. In the HP
system, this pattern is usually used for the Comp state
(Fig. 7). The compressor is the last device which is started
in the start-up sequence and the first to be shut down in
the shut-down sequence.
188
7th International Conference on Information Society and Technology ICIST 2017
189
7th International Conference on Information Society and Technology ICIST 2017
1. INTRODUCTION
Extensible software is widespread in Python, which is
evident from the fact that some of the most popular
projects offer extensibility through plug-ins. This is made
possible by Python’s ecosystem rich support for
extensibility, but there is no standard way of doing it.
Large projects such as Django, Sphinx and Flask,
[11][12][13] all offer extensibility. What is common
among these projects though, is that their plug-in
mechanisms are built and developed exclusively to Figure 1. Example setup.py used in puzzle_box framework
support its parent project [10]. For this reason Python
community is denied of a more general, yet pythonic [6],
solution. instantiate objects of decorated class during
On the other side there are component software runtime.
frameworks such as iPOPO and Cohorte [14][15], that are @Property: Contextual properties which can serve
heavily influenced by Java and OSGi [17]. They promise as keywords for component discovery.
production ready solutions and distributed plug-in support, @Provides: Mark this class as a service provider
but setting up new projects can be cumbersome, especially for the passed service.
for smaller projects. Their design is closer to Java than to @Requires: Name of the service required by this
Python; component declaration is mainly done by component. The service will be injected in the class
decorators, that are highly similar to Java annotations in field with the passed name.
both syntax and semantics. This approach poorly handles
separation of concerns, since component declaration is @Instantiate: Instantiate the component as soon as
done in the same file as its implementation. the bundle is active.
Our goal is to build a general purpose, pythonic
solution for building component software in Python Additionally, iPOPO offers method decorators, such as
named puzzle_box. The framework must cover all core @Validate and @Invalidate, which lets the platform call
concepts of CBSD, while still being simple and intuitive. the decorated method when the component is valid or
Python’s ecosystem has good support for CBSD; Popular invalid. Running an iPOPO application is done by starting
Python libraries, Setuptools, virtualenv, pkg_resources the Pelix platform and then installing and starting all the
and pip [17][18][19], provide the necessary toolset to required bundles.
achieve this goal.
2.2. Cohorte
2. RELATED WORK COHORTE is a Python SOCM that relies on iPOPO for
component declaration, meaning that the component
2.1. iPOPO declaration is identical. COHORTE first requires the user
iPOPO iPOPO is a Python-based Service-Oriented to download its runtime platform and add it to
Component Model (SOCM) based on Pelix platform. In $COHORTE_HOME environment variable. Minimal
iPOPO, components are declared as classes with certain configuration requires creating autorun_conf.js file, which
decorators and are contained in bundles. These decorators contains JSON object with name of the application and
include: name, factory and language of each component.
Application is started by running the generated run script
@ComponentFactory: Mark this class as a
component factory. This factory is used to
190
7th International Conference on Information Society and Technology ICIST 2017
191
7th International Conference on Information Society and Technology ICIST 2017
aggregated into groups to define component types. Single 3. STARTED – Component is operational and its
component can declare any number of entry points, even services are available.
those in different groups. An example is shown in Fig. 1. 4. STOPPED – Component's package is available
The component in question declares two entry point in virtual environment, but is not operational and
groups: “visualiser” and “puzzle_box_events”. The first its services are unavailable.
contains “node_painter” endpoint which exposes on_start 5. UNINSTALLED – Component's package is
function from pretty_visualiser module, while the latter
removed from the virtual environment.
contains “on_start” and “on_install” endpoints in a similar
manner. State change handlers can be exposed in
“puzzle_box_events” entry point group. Each entry point
4. COMPONENT PLATFORM of this group corresponds to a single state: on_init,
on_start, on_stop and on_uninstall. Component will then
Component platform serves as component run time emit an event that notifies other components about the
context and should provide the following: state change. Handlers for state change event coming from
1. component assembly and life cycle management other components are registered to platform during
2. service registration and discovery runtime, by calling register_event_handler function.
3. model of communication and infrastructure. Component deployment phases are shown in Fig. 2:
Component platform is represented by Platform class of 1. Python package (distribution) is retrieved from a
our framework. It acts as a facade that uses virtualenv, repository.
pip, pkg_resources combined with custom code to create 2. Package's modules are loaded into virtual
run time context for components. Starting the platform is environment with pip install.
done by instantiating Platform class and each instance 3. pkg_resources loads entry points for package
represents a different context and will be isolated from modules
others. Platform holds references to components so that 4. New Plugin object is created with status
they can register, discover services and pass messages INSTALLED and added to Platform's context.
between themselves through it. Reference to the platform Previously loaded entry points are registered as
is injected as a parameter into registered service callables. its services.
Platform relies on virtual environments created by
virtualenv to ensure that only component packages and 4.1. Service registration and discovery
their dependencies are available during run time, and thus
require a virtual environment to be activated before Component registration is done once a package is
starting. loaded into virtual environment. Packages are installed in
two ways:
1.1. Component assembly and life cycle management 1. By calling install command of pip package
Component life cycle is split into two main phases [7][8]: manager. Required package can be located in file
1. Design phase – In this phase, components are system or in available in the PiPy package
constructed and designed. They are stateless and repository.
cannot be executed, but can be stored in a 2. By instantiating the platform with folder_name
repository to be retreived for later use. argument. Platform will then periodically poll the
2. Deployment phase - In this phase, components file system for changes and automatically install or
are created and initialized with a state and thus uninstall packages as they are added or removed,
ready for execution. Before their instantiation, respectively. Packages are installed by internally
components must be retrieved from a repository. calling pip install and dynamically loading their
modules.
Components in design phase are represented by Python
packages. They carry component's design in the source Upon registration, components are given INSTALLED
code and component descriptor in the setup.py file. status after their on_init functions are called, and are ready
Packages can be stored in PiPy package repository or file to be started.
system, both of which can serve as the component Components can be discovered by calling the platform's
repository. Pip package manager is used for the purpose find_plugin method that takes two optional parameters:
of retrieving packages (components) from a repository. By component name and component type. If a component
installing a package, component's source code is loaded name is given, find_plugin returns reference to the
into virtual environment so it can be deployed by the component with the given name to the client, or a list of
platform. Components in deployment phase are components of the required type if a component type is
represented by Plugin class, which encapsulates given.
component's metadata (name, type, version etc.), state and
services. Services are meant to be used for direct 4.2. Model of communication and infrastructure
communication, by invoking them as functions in the Platform supports two communication models: direct
traditional sense. To make sure each component can only and indirect. Direct communication involves one
be installed once, Plugin objects are singletons. During component retrieving reference to another component and
run time, components go through states similar to OSGi directly calling its services. In this case the client
bundle life cycle states: component can communicate with exactly one component
1. INSTALLED – Component's package is installed at a time (one-to-one communication). Indirect
and is available in virtual environment. communication involves one component emitting an event
2. STARTED – Component is operational and its that carries a payload, by calling the platform's broadcast
function. Custom events can also be created alongside
services are available.
framework's built in events. Other components can
192
7th International Conference on Information Society and Technology ICIST 2017
5. CASE STUDY
In this section we present how puzzle_box framework
can be used to build a simple pluggable application. The
application consists of extensible core and two plugin
types: source, used to load data, and visualizer, used to
represent the loaded data. Both component types provide
their services through an interface. Component diagram of
this application is shown in Fig. 3.
193
7th International Conference on Information Society and Technology ICIST 2017
194
7th International Conference on Information Society and Technology ICIST 2017
Abstract—In the last few decades, there have been a lot of In our previous research we have developed a Multi-
different approaches to the information system (IS) design Paradigm Information System Modeling Tool (MIST)
and development. They are based on different data models. that provides a model driven approach to information
The selection of a data model for conceptual design depends system development and evolution. MIST is aimed to
on the problem domain, the knowledge and the personal provide an evolutive and incremental approach to IS
preferences of an information system designer. The development, which is based on the form type data model
approaches based on the Extended Entity-Relationship (FT) and the Extended Entity-Relationship model (EER).
(EER) data model are broadly accepted throughout the The approach is purely platform independent and strictly
community of IS designers. In the recent years the differentiates between the specification of a system and
eXtensible Markup Language (XML) has spread to many its implementation on a particular platform. A detailed
application fields and it is being widely adopted as a description of the FT data model and FT approach may
standard language for data representation, exchange and be found in [1]. A detailed description of the EER data
storage. As the number of possible usages of XML grows, so model, domain specific language (DSL) aimed at EER
does the need of easy management of XML data sources and model specification (EERDSL) and EER approach
their integration. The XML documents represent real-world implemented in MIST tool may be found in [2].
aspects defined by a conceptual schema expressed by means
of a selected data model. Having different model Conceptual design is a decisive step for the successful
representations is an advantage for the flexibility they offer design and development of an IS. Conceptual models can
to the user in selecting the right representation based on capture the concepts and requirements that are close to
their requirements. On the other side, model-based users' perception and in a way they can bridge the gap
heterogeneity raises challenges for schema interoperability. between end-users and designers. In the past decades,
In the paper we use model-driven approach to address the EER data model, aimed at conceptual database design,
problem of interoperability between an EER database has been broadly accepted throughout the community of
schema and XML schema specification. The presented IS designers. On the other side, changes in the demands
transformation of a schema expressed by concepts of EER for the data processing have caused emergence of new
model to an equivalent schema expressed by means of XML data storage, retrieval and processing mechanisms. A lot
data model is integrated into Multi-Paradigm Information of data sources store unstructured or semi-structured data.
System Modeling Tool that provides a model-driven Storing and processing such data are beyond the
approach to information system development and evolution. capabilities of traditional relational database management
systems (RDBMS). NoSQL database systems have built-
I. INTRODUCTION in mechanisms to save data in various formats such as:
The emergence of large, complex and integrated XML, JavaScript Object Notation (JSON) or Binary
information systems that are interoperable in highly JSON (BSON). There are several types of NoSQL
changeable environment increases the interest for the databases. In the paper we present the extension of our
design and implementation approaches that successfully MIST aimed at support of document-oriented NoSQL
address the system complexity and evolution. Model- database type.
driven system engineering (MDSE) and Model Driven Document-oriented databases store data as a collection
Software Development (MDSD) applied in the context of of XML documents. XML is being widely adopted as a
enterprise information system design, implementation, standard language for data representation, exchange and
maintenance and evolution involves the automatic storage. The main reasons behind this adoption are its
generation of information system artifacts from structure auto-descriptive structure and textual and non-proprietary
and functional models. MDSE and MDSD address IS format. Consequently, the XML documents can be
complexity through abstraction and increase the created both by humans and by software. XML Schema is
importance and power of models. A model-driven used to describe the structure and the content of XML
approach allows the developer to work at a higher level data. XML Schema was designed as an interchange
of abstraction, with concepts and structures that are closer language and it is used for specifying and validating
to the end-users and their perception. Models are used to XML documents. It is not designed as a conceptual
specify, simulate, test, verify and generate code for the modeling language. Therefore, designers still use EER
application to be built. data model to express database schema. It allows a
domain expert to model the problem domain
195
7th International Conference on Information Society and Technology ICIST 2017
independently of the implementation (XML e.g.) and then specific DBMS, database configuration files must be
create corresponding XML schemas, which are used to changed. Bellow is an example how it can be done for
describe a structure of XML documents. eXist database.
That was the main motivation to extend MIST with To enable implicit validation, the eXist database
new components designed to provide complete support configuration must be changed by editing configuration
for: generic XML schema generation from a conceptual file. All XML Schemas that are used for validation must
database schema based on EER data model, vendor be registered with eXist using OASIS catalog file, which
specific XQuery script generation from a generic XML is supported by our XML Generator. For the code
schema, and execution of generated XQuery code under a generation we used Xtend tool.
vendor specific database management system (DBMS).
XML Generator currently supports creating XML schema III. XML META-MODEL
scripts for eXist and Sedna DBMSs. In this section, we present in more details XML meta-
Apart from Introduction and Conclusion, the paper is model. This meta-model represents the abstract syntax of
organized in six sections. In Section 2, we describe the DSL used for specifying data models at the conceptual
extended architecture of MIST. The XML schema meta- level. The abstract syntax is implemented in a form of a
model is presented in Section 3. The transformation of a meta-model that conforms to the Ecore meta-meta-model.
conceptual database schema based on EER data model The XML meta-model consists of approximately sixty
into a generic XML schema is presented in Section 4. A components. The most important components of the
case study that illustrates a usage of the meta-models, XML meta-model are given in Fig. 2 and Fig. 3. In the
extended MIST components and XML code generator is rest of this section, we describe each of the most
set up in Section 5 and related work is discussed in important components of the XML meta-model with
Section 6. corresponding meta-model class written in italics inside
parentheses.
II. THE ARCHITECTURE OF MIST Meta-model of XML schema basic concepts are
In this section we present the architecture of MIST. Its depicted in Fig. 2. Schema element (SchemaElement) is
global picture is depicted in Fig. 1. MIST comprises the the root element of each XML Schema document. It
following components: FTDSL, Synthesis, Business contains the definition of an XML Schema document and
Application Generator, EERDSL, EER2Rel, EER2Class, it may contain some attributes. Each schema element
R2FT, SQL Generator, Java Generator, EER2XML, and comprises zero or more elements. Elements can be
XML Generator. grouped by their function: top-level elements
Components from the upper part of Fig. 1 are already (GlobalElement) and particles (LocalElement). Top-level
present in our previously published papers [2, 5, 6, 7, 8, elements are elements that appear at the top level of an
9, 10]. In this paper we present the components XML Schema document, as a direct child of a schema
positioned in the rectangle at the bottom of Fig. 1: element. Particles are elements that always appear as part
EER2XML and XML Generator. of complex type definition. As a parent element of the
local elements, sequence element (XMLSequence) is
used. Sequence element defines ordered list of elements.
Each element may have a name attribute, which is the tag
name that will appear in the XML document. Local
elements, as well as the sequence elements, can have
minOccurs and maxOccurs attributes. A minOccurs value
can be any number greater than or equals to zero, while
maxOccurs can be provided with the same value as the
minOccurs, but also no limit can be set on the maximum
number, using the value "unbounded".
Meta-model of XML schema constraints are depicted
in Fig. 3. Each element comprises one or more constraints
inheriting the abstract concept constraint (Constraint),
Figure 1. The architecture of MIST which may have a name attribute. At the top levele of
XML schema specification, for each element following
EER2XML component of the MIST provides the constraints may be specified: (i) primary key constraint
model-to-model (M2M) transformation of EER database (Key); (ii) unique constraint (Unique); and (iii) foreign
model to generic XML schema specification. The input key constraint (KeyRef). Each constraint comprises one
into the EER2XML component is a conceptual database and only one constraint content (ConstraintContent).
model which is expressed by the concepts of the Constraint content contains at least one field element
EERDSL modeling language. Models being transformed (Field) and one and only one selector element (Selector).
conform to the EER meta-model [2] and XML meta- The selector is modeled to specify an Xpath expression
model, respectively. The output of the EER2XML that selects a set of elements for an identity constraint,
component, generic XML schema specification, may be while as the field element specifies an Xpath expression
further used in the process of XQuery code generation. that specifies the value used to define an identity
For this purpose, the XML Generator component is constraint. Unique and key references inherit the abstract
developed. It generates the XML Schema document and class refer (Refer). This concept is specified in order to
Xquery scripts for creating collections of appropriate allow foreign key constraints to have both, reference to
XML documents. In order to enable implicit validation in unique and primary key constraint (UniqueRefer and
KeyRefer).
196
7th International Conference on Information Society and Technology ICIST 2017
197
7th International Conference on Information Society and Technology ICIST 2017
IV. TRANSFORMATION OF EER DATABASE MODEL Attributes of the entities and relationships are
INTO A GENERIC XML SCHEMA transformed into the attributes concept of the XML data
In this section we present EER2XML transformation model. Based on the domain concept, XML data model
for transforming an EER database model into a generic built-in types are created.
XML schema. EER2XML transformation is based on V. CASE STUDY
meta-models. Transformation is specified in ATL
transformation language (ATL). In this paper we present In this section we introduce a case study of Pharmacy
conceptual mappings between appropriate concepts from IS in order to illustrate concepts, data models,
different meta-models without the ATL implementation transformation, and code generator described in previous
details. In Table I, columns represent all of the sections. We present a part of a Pharmacy database
corresponding concepts between the EER and XML model, named Pharmacymodel. Due to complexity of the
meta-models. Based on these correspondences, concrete example, we present in details only a pharmacy,
ATL transformation rules are specified. pharmacy company, and medicine. Pharmacymodel EER
data model instance is consisting of:
TABLE I
MAPPING CONCEPTS
three entities: Pharmcomp with attributes IDPHC and
NAMEPHC; Medicine with attributes IDMED and
EER meta-model concepts XML meta-model concepts NAMEMED; and Pharmacy with three attributes
IDPHARM, NAMEPHARM and ADDRESSPHARM;
ERModel DataBase, XMLCollection, one Identification dependency ID1 with weak entity
XMLSchema, Declaration, Medicine and regular entity Pharmcomp;
SchemaElement, GlobalElement, one Gerund Hascontract with Pharmacy and
ComplexType, XMLSequence Pharmcomp regular entities in the relationship and
Entity LocalElement, ComplexType, attributes DATECONTR and EXPIRDATECONTR;
Key and two relationships: Subordinate compromises one
regular entity Pharmcomp with two roles: company
Relationship LocalElement, ComplexType, and subcompany; and Offers with weak entity
KeyRef, KeyRefer, Key, Unique, Medicine and gerund Hascontract in relationship.
ConstraintContent, Selector, Field For the specification of the database model, we use
Attribute Attribute modeling concepts presented in [5]. In Fig. 4, we present
Attribute domain AttrBuiltType our example using graphical notation of EERDSL, while
Key Key, ConstraintContent, Selector, as Fig. 5 presents the source and target models of
Field EER2XML transformation opened in the Eclipse
“Sample Reflective Ecore Model Editor”. In EERDSL all
ISA LocalElement, ComplexType, concepts are represented with widely used graphical
Key, KeyRef, KeyRefer, notation presented by Thalheim [14].
ConstraintContent, Selector, Field
It can be seen that EER meta-model Pharmacymodel
Categorization LocalElement, ComplexType, concept is mapped into the XML DB Pharmacymodel.
KeyRef, KeyRefer, Each of three EER entities has been mapped into a local
ConstraintContent, Selector,Field element with the same name. Here, we present local
Categories LocalElement, ComplexType
elements. The local element Medicine contains one
complex type, whose subtypes, IDMED and NAMEMED,
Identification dependency LocalElement, ComplexType, are attributes created from an entity Medicine in EER
(weak entity) Key, KeyRef, KeyRefer, model. IDPHC represents a foreign key from local
ConstraintContent, Selector, Field element Pharmcomp. Foreign key is created because of
Identification dependency LocalElement, ComplexType, the relationship between entities Medicine and
(regular entity) Key Pharmcomp, with maximum cardinalities one and many.
Based on that foreign key, key ref fk1_medicine element
Gerund LocalElement, ComplexType,
is created, that references the primary key of the
KeyRef, KeyRefer, Key, Unique,
Pharmcomp local element.
ConstraintConten, Selector, Field
Pharmcomp local element contains one complex type,
Domain AttrBuiltType whose subtypes, IDPHC and NAMEPHC, are attributes
created from an entity Pharmcomp. It contains an
Most of the concepts may be transformed directly to attribute IDPHC_company, created because of the
XML schema concepts. Local elements are created from recursive relationship, where the same entity,
eight different concepts of EER data model: Entity, Pharmcomp, participate as the parent and child. Also,
Relationship, ISA, Categorization, Categories, because of the recursive relationship, key ref
Identification dependency (regular entity), Identification fk1_pharmcomp element is created. The local element
dependency (weak entity) and Gerund. Each regular Pharmacy contains one complex type, whose subtypes,
entity concept is transformed into a local element ADDRESSPHARM, IDPHARM and NAMEPHARM, are
concept. The set of constraint of the regular entity is attributes created from an entity Pharmacy. Hascontract
transformed into a set of constraints that can be defined at local element contains one complex type, whose
the level of the XML data model. The primary key of the subtypes, EXPIREDATECONTR and DATECONTR, are
regular entity is transformed into key concept, and the the attributes created from the gerund HASCONTRACT.
unique constraint is transformed into unique concept. Also, it contains attributes, IDPHARM and IDPHC, from
the regular entities who participate in the relationship of
198
7th International Conference on Information Society and Technology ICIST 2017
this gerund, Pharmacy and Pharmcomp. Based on that cardinalities many and many, two key ref elements are
relationship, key ref elements, fk1_hascontract and created, fk1_offers and fk2_offers.
fk2_hascontract are created, and they reference the In Fig. 6, we present how implicit validation is enabled
primary key of the Pharmcomp and Pharmacy local for eXist database using OASIS catalog file, which
elements. The Offers local element contains one complex contains all the XML Schemas that are used for
type, whose subtypes are the attributes that represent the validation, including the one that is the result of our
attributes of the primary keys from the entities that generator, pharmacymodel.xsd. The same is done for
participate in the relationship offers, Medicine and Sedna database.
Hascontract. Based on this relationship with maximum
Figure 5. The source and target models of the EER2XML transformation in the Pharmacy IS example
199
7th International Conference on Information Society and Technology ICIST 2017
VIII. ACKNOWLEDGMENT
The research presented in this paper was supported by
Ministry of Education, Science and Technological
Development of Republic of Serbia, Grant OI-174023.
REFERENCES
[1] I. Luković, P. Mogin, J. Pavićević, S. Ristić, “An Approach to
Developing Complex Database Schemas Using Form Types”,
Software: Practice and Experience, John Wiley & Sons Inc,
Figure 6. eXist OASIS catalog file Hoboken, USA, ISSN: 0038-0644, Published Online, May 29,
DOI: 10.1002/spe.820, Vol. 37, No. 15, pp. 1621-1656, 2007.
VI. RELATED WORK [2] V. Dimitrieski, M. Čeliković S. Kordić, S. Ristić, A. Alargt, I.
Several papers have dealt with the generation of XML Luković, “Concepts and Evaluation of the Extended Entity-
schema from conceptual database schema. Wang et al. Relationship Approach to Database Design in a Multi-Paradigm
Information System Modeling Tool”, Computer Languages,
[11] DTD is obtained from the data whose structure is Systems & Structures (COMLAN), Elsevier, ISSN: 1477-8424,
described by a conceptual data model. This DOI: 10.1016/j.cl.2015.08.011, 2015.
transformation preserves structural information from the [3] J. Vidaković, I. Luković, S. Kordić, “Specification and Validation
conceptual schema as much as possible. In DTD there are of the Referential Integrity Constraint in XML Databases”,
less capabilities to represent constraints than in XML Proceedings of the 6th International Conference on Information
Schema. In [12] ER data model is first extracted from the Society and Technology, Kopaonik 2016, pp. 197-213
relational data model, and then it is transformed into the [4] M. Celikovic, V. Dimitrieski, S. Aleksic, S. Ristic, I. Lukovic, “A
XML Schema. DSL for EER data model specification”, University of Zagreb,
Faculty of Organization and Informatics, Varazdin, Croatia, 2014,
The paper [13] describes a recursive algorithm pp. 290-297.
ERD2XSD which takes an arbitrary ER diagram as input [5] M. Celikovic, I. Lukovic, S. Aleksic, V. Ivancevic, “A MOF based
and generates a correspondent XML schema as an output. metamodel and a concrete DSL syntax of IIS*case PIM concepts”,
This algorithm does not cover all concepts of the ER data Computer Science and Information Systems 9 (3) (2012) 1075-
model, such as gerund. One of the advantages of our 1103.
XML Generator is that it provides the implementation of [6] S. Ristić, S. Aleksić, M. Čeliković, I. Luković. “Meta-Models in
Support of Database Model Transformations”, in the book:
some special cases of database constraints, like the Information and Communication Technologies in Everyday Life:
referential integrity constraint. Opportunities and Challenges, (Ed.) Ali AL-Dahoud, Ubiquitous
Computing and Communication (UbiCC) Research Publishing,
VII. CONCLUSION 2014, ISBN 978-1-312-55980-6, pp. 45-62.
During our previous research we developed Multi- [7] I. Luković, V. Ivančević, M. Čeliković, S. Aleksić, “DSLs in
Paradigm Information System Modeling Tool (MIST). Action with Model Based Approaches to Information System
Development”, in the book: Formal and Practical Aspects of
The tool provides specification of an IS database schema, Domain-Specific Languages: Recent Developments, (Ed.) Marjan
business applications and their graphical user interface Mernik, IGI Global, USA, 2013, ISBN: 978-1-4666-2092-6, DOI:
elements. However, designers widely use XML as a 10.4018/978-1-4666-2092-6, pp. 502-532.
standard language for data representation, exchange and [8] S. Ristić, S. Aleksić, M. Čeliković , I. Luković, “Generic and
storage. Therefore, we provided MIST components Standard Database Constraint Meta-Models”, Computer Science
supporting XML approach. We have developed a meta- and Information Systems (ComSIS), Consortium of Faculties of
model of XML schema language in a form of a meta- Serbia and Montenegro, Belgrade, Serbia, DOI:
10.2298/CSIS140216037R, ISSN: 1820-0214, Vol. 11, No. 2,
model that conforms to the Ecore meta-model, M2M 2014, pp. 679-696.
transformation of model which conforms to XML meta-
[9] S. Ristić, S. Aleksić, M. Čeliković, V. Dimitrieski, and I. Luković,
model, and a code generation within the MIST which is “Database reverse engineering based on meta-models,” Central
responsible for code generation based on the XML European Journal on Computer Science, vol. 4(3), pp: 150–159,
schema specification. XML Generator provides creating 2014. DOI: 10.2478/s13537-014-0218-1
XML schema scripts according to the syntax of eXist and [10] C. Kleiner, U. Lipeck, “Automatic Generation of XML DTDs
Sedna databases. Also, it provides validation of XML from Conceptual Database Schemas”. In GI Jahrestagung (1) pp.
document based on generic XML Schema under specific 396–405, 2001.
databases. For the development of a new component in [11] C. Wang, A. Lo, R. Alhajj, K. Barker, “Converting Legacy
the MIST we have used the EMF (Eclipse Modeling Relational Database into XML Database through Reverse
Engineering”. Proceedings of the International Conference on
Framework). Enterprise Information Systems, Portugal. 2004.
One of the advantages of our XML Generator is that it [12] C. Liu, J. Li, “Designing Quality XML Schemas from E-R
provides the implementation of some special cases of DB Diagrams”, WAIM 2006, LNCS 4016, pp. 508–519. 2006.
constraints, for example, the referential integrity [13] B. Thalheim, “Entity-relationship modeling: foundations of
constraint, that was described in [3]. database technology”, Springer, 2000.
We also use our MIST tool extensively in courses on
domain specific languages and model driven software
development, as students may familiarize themselves
with new concepts using well-known EER concepts.
Our future research has to consider the extension of our
MIST aimed to support different type of NoSQL database
and to provide the implementation of the inverse
referential integrity constraint for some specific XML
databases.
200
7th International Conference on Information Society and Technology ICIST 2017
Abstract—the data quality assessment method specially Data quality aspects are numerous. Data quality may
customized to environmental measurement systems is reflect its completeness, accuracy, precision, adequacy,
presented in this paper. The methodology is based on etc. The selection of data quality aspects that should be
mathematical data relations and may be used in any considered is usually based on case to case bases and may
acquisition system where sufficient data redundancy or not be generalized in any way. Nevertheless, for
functional correlations between measured quantities exist. It hydroinformatic problem focused on hydraulic and
relies on quantitative comparison of the measured data and hydrologic modeling or at data assimilation, the data
predicted data with the result that is expressed in the form precision and accuracy play the crucial role. The aim of
of continuous data quality grade. It may be customized to be data quality analysis presented in this paper is to detect
used on-line, with real-time data streams, for model data and classify the data value due to presence of numeric
assimilation purposes or off-line for off-line applications and errors.
analysis. The system is tested on both testing data, where the
Nevertheless, sometimes some subtle errors in data may
capabilities of the framework were defined and on real
world real time data, obtained from the system of cascading
not be that obvious, even if the data may be represented
dams. Although the system provided fairly good results in
graphically, as may be seen in the figure 1. In those cases,
the real time use, some pitfalls were identified that are
some kind of more precise and preferably automatic
mostly focused on the data quality interpretation once that system has to be used.
the data quality grade is obtained. Since the quality of acquired data varies constantly
because of its dynamic nature, data acquisition systems
themselves must perform the data quality check, assess the
I. INTRODUCTION data quality and, if possible, work on data quality
One of the main tasks in the hydroinformatic analysis is improvement.
the proper representation and interpretation of the state of From the point of data analysis procedures and
the system. One of the most common ways to applications, its implementation may be suitable for
quantitatively represent the state of the system is to use the offline analysis, real-time analysis or near real-time
measured data obtained from the gauging stations. analysis that comprises the batch data analysis as the data
Nevertheless, sometimes measured data quality may be in arrives in the database.
question (figure 1). Therefore, poor data quality leads to The methodology and implementation strategy for
poor information obtained, which further leads to efficient automatic data quality evaluation, suitable for
unreliable decisions and various side effects. real-time data acquisition systems is presented in this
paper. The system is developed specifically as a part of
the hydroinformatic system used for managing the
cascaded hydropower plants (HIS Djerdap) [5], but can be
customized for any acquisition system where sufficient
data redundancy or functional correlations between
measured quantities exist.
Various authors addressed the issue of measured data
quality [3]. Most of them performed custom off-line data
processing before running the simulation models [2] and
just few of them addressed the possibilities for simple on-
line data validation [1]. There are also some developed
methodologies for data quality assessment where, due to
Figure 1. Anomalies in data complex nature of data acquisition and processing, several
validation tests have to be performed on single measured
data value [3], but robustness of applied method prevented
201
7th International Conference on Information Society and Technology ICIST 2017
the full-scale implementation, suitable for on-line, real- malfunction) and therefore this stage, in some cases, saves
time use. Some other authors [4] defined some special some amount of CPU time.
preprocessing and post-processing procedures to improve The second stage (figure 2 and 3), data quality
data quality assessment. assessment, is used for more granular data quality check
In this paper, we introduce the data quality assessment and is consisted of set of computational methods, based on
framework especially suitable for real-time environmental predefined data relations (M1, M2, …, MN), that are used
data. The methodology is presently customized to be for measured data prediction and the data quality grade
suitable to hydroinformatic systems, but may be assessment.
implemented and adjusted for any kind of measurement
system, as long the data values may be computationally
related to some environmental parameters, other measured
values or some general rules and constrains.
II. METHODOLOGY
The methodology used in presented data quality
assessment framework is based on predicting the
measured data values using predefined relations. The set
of relations are specifically determined for the specific
measured variable data stream. The selection of relations
that are used for one data stream data quality assessment is
based on several criteria. The nature of measured variable,
availability of relation parameters, availability of other
measured data and the nature of expected errors in data are
some of them. Figure 3. Data quality method procedure: 1 – prediction, 2 –
In the process of data quality estimation, every relation mapping to data quality grade
is used for measured data value prediction and to map the
measured data value to the data quality grade between 0 In the process of data quality estimation, every relation
and 1. is used for measured data value prediction and to map the
measured data value to the data quality grade between 0
Usually, more than one relation can be used, so more and 1.
than one data quality grades are obtained. Therefore,
some aggregation function (average, max, min, etc.) may
be applied to provide the final data quality grade. After
that, quality grade may be used to accept the data, reject it
or to use the data only for certain types of analysis, what
should be determined on case to case basis.
The methodology is divided into two successive stages: Figure 4. Data quality method calibration diagram
data validation and data quality assessment (figure 2).
Data validation stage is comprised of two basic steps -
check of expected data format and check of the global Data quality computational method consisted of two
value boundaries for specific measured variable in the stages is presented on figure 4. First, the data value under
specific measuring micro-location under specific consideration is predicted using predefined relation.
environmental circumstances. Second, the fuzzy set is determined to map the measured
If the measured data doesn’t pass the data validation data value to the data quality grade. The parameters of
stage, due to extremely low data quality, the data is fuzzy set are determined using four boundaries that are
rejected at that level and it doesn’t pass to the next, data determined in the process of the method calibration
quality assessment stage. This stage is therefore, used as a (figure 4):
kind of filter, that filters the data with the extreme low 1. Two inner boundaries that bounds the data that
quality (usually due to measurement equipment should be considered as accurate
202
7th International Conference on Information Society and Technology ICIST 2017
2. Two outer boundaries, that bounds the data that To ensure that the data value has got the data quality
have the quality grades between 1 and 0 grade before it is being entered into the database system, a
Proper determination of those boundaries is crucial for fallback system is provided. The fallback system
the data quality assessment precision and accuracy, and encompasses several data quality sequential schemas that
therefore during that process the special attention should are used in situations when the envisaged data quality
be payed. It should be emphasized that data values methods can't be used due to the various reasons like
uncertainty may be used for more reliably determine the missing data. The fallback system is usually consisted of
boundaries and the fuzzy set parameters. Usually several data quality assessment schemas as presented on
historical data are used in the process of calibration, but figure 5.
experience of the expert may be of great value in the
absence of the historical data.
After the data quality grades are obtained, they are then
aggregated to single representative value (figure 4) in the
range 0-1, using the aggregation module. Seven data
quality assessment methods were developed with
particular aim to cover most of the existing relations of
environmental data:
1. Constant relation,
2. Periodical continuous relation,
3. Managed relation,
4. Seasonal constant relation,
Figure 5. Data quality system schema with fallback
5. AR relation,
6. Linear (regressive) relation, Proposed methodology can be used for both continuous
and discrete real-time data streams as well as for off-line
7. Nonlinear relation. collected data. The system may be used as fully automatic,
The aim of first relation (constant relation) is to predict where the expert is involved only in the system
the measured data value with single constant value. That preparation stage since he has to calibrate the data
value is usually determined as average of the historical relations and estimate parameters based on historical data
dataset. The inner and outer boundaries are determined and his/her experience. The proposed methodology is also
using the historical extreme values supported by data considered to be extremely flexible, since the relations and
uncertainty. This relation is the most basic one and it may its parameters may be adjusted or updated later at any
be used for any measured variable. time.
Periodical continuous relation is usually used for the II. CASE STUDY
meteorological periodic variables, like air temperature. To reduce the number of different constellation of
Parameters of this relation have to be estimated using
methods in data quality schema, several classes of
some kind of optimization algorithm - manual, semi
automatic or automatic. If the temporal periodicity is variables are defined. One of the classes defined is class of
present in data, this method may be used as the basic one. measured water level in large rivers. One member of that
class is measured water level of river Danube in the Ram
Managed relation is reserved for managed variables,
like discharge through turbines that is managed by the gauging station that is presented as case study (figure 6).
valves. For this kind of relation, external variable that can
be used as management indicator have to be available. In
the case of discharge through turbine, an indicator of valve
opening may be used.
In some cases measured variables may have seasonal
periodicity. In that case, seasonal constant relation may be
used. Therefore, the measured variable is represented by a
constant value during different seasons (e.g. spring,
summer, autumn, winter).
Autoregressive (AR) relation is used if the measured
variable has strong autoregressive features. That is the
case of most the environmental variables. In general, AR
method is the special case of linear method and for
calibration purposes the historical data is needed. Figure 6. Position of gauging station near the town Ram on the
bank of the Danube river
Linear relation represents the method based on two
variables that are linearly related. For this method, The schema for water levels on big rivers is consisted
additional variable is needed. of three stages (figure 5). First stage is consisted of three
prediction and data quality assessment methods, the
The last one, nonlinear relation represents the most second one is based on two methods (when the additional
general relation that may be formed between two or more measured data value for the linear relation is missing) and
variables. the third stage is based only on the base constant method.
203
7th International Conference on Information Society and Technology ICIST 2017
Historical data are divided into two series. First data one (figure 9). Nevertheless, three subtle anomalies were
series is used for data quality methods calibration, and the detected and classified with data quality in the class 0.66-
second one is used for the testing purposes (figure 7). 1.
II. CONCLUSION
The methodology used in presented data quality
assessment framework is based on predicting the
measured data values using predefined relations. The set
of relations are chosen for the specific measured variable
data stream. The selection of relations that are used for
one data stream data quality assessment is based on
several criteria. The nature of measured variable and the
availability of relation parameters are some of them.
Figure 7. Calibration and testing period of historical data The methodology is implemented in for two different
applications. First one is for offline data and is
implemented in MSExcel environment. That
The methods are calibrated and the calibration results
implementation is also used for manual data quality
are presented on the figure 8.
assessment and calibration of data relations. The second
implementation is based on a server that takes the data
from the database, compute the data quality and returns
data quality grade to the database as meta-data value.
Nevertheless, this methodology has some pitfalls. First,
the data quality grades in the continuous form may be
ambiguous in the case of its interpretation. That’s why its
interpretation should be defined on case to case bases,
with specific focus to data application and its further
transformation. The major challenge in this field still
remains the issue of rare events that may be considered as
faults due to equipment malfunction. The fact is that the
rare events are usually the most valuable to be acquired,
recorded and analyzed. That’s why the role of expert in
such a system is crucial.
ACKNOWLEDGMENT
The authors are grateful to the Serbian Ministry of
Education, Science and Technological Development for
its financial support (Projects No. TR37010 and
TR37013).
204
7th International Conference on Information Society and Technology ICIST 2017
Abstract— In the field of Water Distribution Network topology is often neglected. Widely used approach to
(WDN) analysis sectorisation is regarded as one of the main sectorisation of a WDN used in engineering practice is
strategies for efficient management of WDNs. This paper “trial and error” method done by a local expert, which can
presents an ICT-based solution which helps the managers to lead to an arbitrary solution that can be far from (near)
design the optimal number and size of sectors within the optimal one. Having in mind previously mentioned
WDN. Presented algorithm relies on Graph Theory to uniqueness of WDNs, there is a need for a practical tool
search for the Strong Connected Components (SCCs) within that will enable greater involvement of engineering
the graph (water network), that are later topologically knowledge and specific limitations and constraints. For
sorted and aggregated into sectors. Process of aggregation is practical application in a real time decision making
driven by engineering criteria and relevant heuristics (e.g. process, algorithm has to be fast to execute and robust.
sectors of approximately equal size with smallest number of
In recent years different algorithms for automated
links connecting them). Interventions in the network are not
sectorisation of the WDN into DMAs have been
implemented in order to avoid negative effects on the
presented, as well as the tools that can be used to support
networks’ hydraulics. This is important especially at
primary stages of sectorisation in which preserving
this process [1, 2]. Majority of the presented
hydraulic performance and minimal investment are the methodologies are based on the Graph Theory algorithms
main objectives. Methodology is illustrated on the real size [3, 4]. Others are using the modularity index [5] or
WDN. Preliminary results show that new algorithm is community structure metrics [6] to perform the division of
efficient in terms of the computational speed and has the the WDN. Sectorisation algorithm is usually coupled with
potential to be used inside an optimization algorithm where an optimization algorithm [5, 7] in order to search within
multiple alternatives and scenarios have to be investigated. broader solution space and ensure that a suboptimal
Also, due to its robustness, algorithm can serve as an feasible solution is identified. In sectorisation process
effective support tool for decision making process in different objective functions are used. For example, in [3]
engineering practice. reachability of every potential solution is minimized, in [5,
6] modularity metrics [8] is maximized and in [7]
minimization of dissipated hydraulic power is adopted.
I. INTRODUCTION Resiliance index, that regards to the network post-
Sectorisation or decomposition of a water distribution segmentation reliability [9], is often used as a main
network (WDN) into zones (sectors, clusters or District performance indicator [10] while in [6] fire-flow and
Metered Areas - DMAs) has become one of the main water age metrics are added into consideration. However,
strategies for efficient management of WDNs. despite all recent advancements made, scope exist to
Traditionally, sectorisation has been used in order to better further improve existing water network sectorisation
control water losses in the network by means of observing algorithms, especially in terms of usability for practicing
all inflows and outflows from the sector. To accomplish engineers [11].
this, network interventions such as installation of isolation Methodology presented here is named Water Network
valves and flow metering devices are required. However, Sectorisation (WNS) and is based on the heuristic
if not implemented carefully, such interventions can aggregation of Strong Connected Components (SCCs)
significantly worsen the network supply reliability, water determined using the Graph Theory algorithms.
quality, fire-flow supply and system response in the case Interventions in the network, such as closure of the valves
of accidental bursts and other failures. This is due to the or blockage of the pipes are not implemented at this point
fact that historically WDNs are developed as extremely to avoid negative effects on networks’ hydraulics. The
looped systems in order to provide aforementioned main goal of the WNS is to search for the optimal
requirements, and sectorisation into DMAs can sectorisation suitable for the water balance control with
considerably affect their topology. Complexity of the real minimal number of connecting links, implementing some
life WDN results in many different alternatives in which engineering criteria and heuristics. Presented methodology
network sectorisation can be done. Every WDN is unique can serve as a support tool for practice engineers when
in its topology and characteristics so there is no unique designing the sectorisation solution that will have the least
procedure for performing its’ sectorisation, but rather a effect on the hydraulics of the system and minimal
series of guidelines provided by the different water investment. Furthermore, it can be used as a good starting
authorities and used in this process by practice engineers. point for narrowing down the solution space subjected to
Sectorisation is usually governed by the criteria of the optimization algorithm in search for the optimal
having sectors of “manageable size” in terms of number of solution. Also, possibility for hierarchical sectorisation
consumers, links or network length, while network’s should be investigated as it would enable perception of the
WDN in the range of resolutions (level of detail)
205
7th International Conference on Information Society and Technology ICIST 2017
206
7th International Conference on Information Society and Technology ICIST 2017
aggregated nodes and two remaining nodes 1 and 10 The next step is to check if candidate node SCC2 could
(Fig. 2c). The most important property of new aggregated be aggregated with node SCC1. The primary criterion that
digraph is acyclicity, indicating it is a digraph without is checked to determine if aggregation is feasible is the
cycles. Such graph is referred to as Directed Acyclic allowable size of the sector (Qmax) in terms of its demand.
Graph (DAG), and in terms of water network is very Maximum size for a sector is defined as Qmax = Qtot/Nsect,
important, because it clearly separates source from the where Qtot is total input in the WDN and Nsect is desired
demand nodes and hence, makes the sectorisation of number of sectors, which has to be smaller than the
network easier. number of identified SCCs. If the aggregated sector size is
smaller than the Qmax, then upstream node is joined to the
B. Stage 3 downstream one. Furthermore, even if this criterion is not
Finally, the topological sorting of DAG and its satisfied, nodes are aggregated if change in the sector size
aggregation by using a number of pre-specified is smaller than the 10%. This is done to avoid the case
engineering criteria (as explained below) is conducted, as where very small nodes are left as not aggregated.
shown in Fig. 3. The following engineering criteria are Following the simple example (Fig. 3), if we assume that
used here: size of the sector in terms of water demand, all demand nodes have the same water consumption, node
number of connections between the sectors, pipe SCC2 will not be aggregated to node SCC1.
diameters and pipe lengths. At this point, aggregation criteria have to be explained
Topological sorting of DAG starts from the most in more detail as the example in consideration does not
downstream (sink) nodes. Sink nodes are the one that have illustrate complexity of real world networks. Unlike the
only inlet links, and propagation starts simultaneously Fig. 3 example, where only a single aggregation option
from all these nodes. In the example discussed, SCC1 is exists (to aggregate SCC2 to SCC1 or not), in a real size
the only sink node in the DAG, hence topological sorting network, a set of nodes that are valid candidates for
will start from that node. Topological sorting continues in propagation and aggregation according to the
the direction opposite of the flow orientation, thus aforementioned primary criterion of maximum sector size
adjacent nodes upstream of SCC1 are candidates for is likely to exist. After checking that primary condition for
propagation. A precondition for visiting an adjacent aggregation, ranking of the remaining candidates is done
upstream node, and putting it in the set of signed (sorted) according to the following additional criteria:
nodes, is that a candidate node has all of its downstream • Criterion 1: The number of links between the SCCs is
nodes already signed. This means that only nodes that checked. A candidate node whose aggregation would
have sorted (signed) adjacent downstream nodes are valid reduce the maximum number of links between the SCCs
candidates for further upstream propagation and possible is chosen. Following this idea, aggregation of SCCs is
aggregation to the downstream component. Applying this done in such way that minimum number of links between
precondition on the example DAG, the only valid SCCs remains at the end of algorithm run.
candidate for propagation from SCC1 is SCC2, so it is • Criterion 2: If multiple candidates exist after applying
signed and moved to the set of sorted nodes (Fig. 3a). above criterion, a node is chosen by selecting the
Nodes 1 and 10 are source nodes (nodes without input upstream link with the smallest diameter. This way the
links), so they are left for the final stage of algorithm. algorithm propagation through the main lines is postponed
for the later stages. If several pipes with the same diameter
exist, the pipe with minimum length is selected, in order
to keep the sectors’ pipeline length minimal.
Going back to the simple example shown above, the
next node feasible for propagation is node SCC3, simply
because its’ adjacent downstream nodes SCC1 and SCC2
are already signed and topologically sorted. Next step is
aggregation. Node SCC3 could be aggregated to SCC1 or
SCC2. Given that SCC1 is already large enough and not
feasible for further aggregation, node SCC3 is aggregated
to SCC2 (Fig. 3b). The algorithm stops after the source
nodes have been reached. As a result of the algorithm
(Fig. 3c), there are 2 sectors – sector 1 made of nodes
0,2,3,4,5,6 (SCC1), and sector 2 made of nodes
7,8,9,11,12 (SCC2 and SCC3) – and 2 source nodes (1
and 10). There are five links connecting them.
A. Description
The WNS algorithm presented here is applied to a real
size WDN of a town in Serbia. This WDN supplies water
to approximately 50,000 inhabitants, industry and public
Figure 3. DAG Aggregation: a) Upstream propagation and topological and commercial institutions. The water is pumped from
sorting; b) Components aggregation; c) Result of the algorithm wells into the Reservoir (Volume = 2 x 2500 m3), from
where pumping station delivers clean water to the
network. The WDN is divided into 3 zones based on
207
7th International Conference on Information Society and Technology ICIST 2017
elevation: Zone I with ground elevation below 100 m.a.s.l, The center of town is divided into 2 sectors (SCT 1
zone II with elevations between 100 and 150 m.a.s.l and and SCT 2), that are connected between themselves via 4
zone III with elevation between 150 and 200 m.a.s.l. pipes. The most important connection is a pipe with a
Pumping station pumps water toward town and tank
500 mm diameter (location C), with maximum discharge
Tank 1. It is equipped with 4 pumps and its’ operating
regime depends on the water consumption and water level that varies from about 100 to 153 l/s. Next two
in Tank 1. Three pumps work normally 24 hours retaining connection pipes have diameters of 200 and 80 mm
pressure at 5 bars. Water is delivered through the pipes (locations D and E, respectively), but with much smaller
with diameters of 600/500 mm (main lines) to the Tank 1. (almost negligible) discharges at the peak demand hour.
From the Tank 1 water is also pumped to the Tank 2, In all three pipes flow direction is from SCT 2 to SCT 1.
which serves for the consumers that are in the zone II, and Fourth pipe connecting these two sectors has the diameter
for supply of small pumping station that delivers water for of 200 mm (location F) and is in the vicinity of the
the consumers that are in the zone III. Total length of
water distribution mains is 175 km, and summary length aforementioned pipe with the 80 mm diameter. The
of the pipes with diameter below 100 mm is about 110 km closure of this pipe is suggested due to the fact that
(64.5 %). The average daily water consumption in the simulated discharge was below 0.1 l/s, so measuring
WDN is 203 l/s. The average pressure in the network is would not make any sense and closing it would not affect
42.5 m. hydraulic performance of the network. Finally, three flow
measuring locations (C, D and E) are proposed to control
B. Results and discussion
the flow from SCT 2 to SCT 1.
The implementation of WNS algorithm requires only a In order to complete the water balance in SCT 2, it is
single hydraulic simulation and number of sectors to be
necessary to measure the water flow supplied to the
specified. The results of hydraulic simulation were used
for determining flow directions in pipes, in order to create sector from the Tank 1 (pipe with diameter 350 mm –
the DIGRAPH. The final result of the WNS algorithm location G), and water that is delivered from SCT 2 to
application is presented in Fig. 4. As it can be seen from SCT 5 through the pipes with diameters 300 mm and 125
this figure, the WDN is divided into 5 sectors (SCTs). A mm (locations H and I, respectively). It should be also
total of 14 links are identified as separation links between noted that one pipe between these two sectors with the
5 SCTs, which is a very small number, bearing in mind 80 mm diameter is to be closed as well, due to the
that the WDN hydraulic model contains 2272 pipes, 568 negligible flow (location J). This completes the water
valves and 12 pumps. This indicates that the WDN water
balance could be controlled with relatively small number balance in the sectors SCT 2 and SCT 5 with additional 3
of measurement locations. measuring locations (G, H and I).
Sector SCT 1 is supplied from SCT 2, and for the
water balance control in SCT 1, it is necessary to
measure water delivered to SCT 3. This is done through
two pipes with the diameters of 200 and 80 mm
(locations L and K, respectively), with one pipe with
100 mm diameter also being closed. Discharge that goes
to Tank 1 from SCT 1 has to be calculated as well
(location N). There is no need to establish a new
measurement point for discharge at this location, simply
because the water level in the Tank 1 and discharges from
it (locations B and G) are already measured. In this
manner, the water balance is fully controlled for the
sectors SCT 1 and SCT 3 with 2 additional measuring
locations. To fully control water balance in the sector
SCT 4, supplied from the Tank 1, one more discharge
measuring location is needed at the outlet of the small
Figure 4. Results of WNS algorithm pumping station at Tank 2 supplying consumers in zone
The 2 mandatory discharge measuring locations are at III (location O).
the outlets from pumping stations at Reservoir and Tank 1 In summary, to control the water balance in the whole
(Fig. 4. – locations A and B). Water level in tanks Tank 1 WDN it is necessary to establish 11 discharge measuring
and Tank 2 have to be continuously measured as well. locations (see Fig. 4). Closure of 3 pipes with almost
Moreover, as it will be explained, 3 of 14 separation negligible discharges reduced the number of connecting
links between the SCTs can be safely closed without links between the components to 11, from starting 14. It
affecting the hydraulic performance of the network, hence is obvious that some of the proposed measurement
leaving only 11 discharge measuring locations. It is worth locations are suitable only for the water balance control,
mentioning that the algorithm is computationally very fast and cannot be used for reducing the pressures in the
(execution lasts about 5 seconds on PC Intel i5 CPU), network. Those are the locations on water mains (600 and
which makes it suitable for the usage inside of an 500 mm) that deliver water to Tank 1. Other
optimization algorithm. measurement locations could possibly be equipped with
208
7th International Conference on Information Society and Technology ICIST 2017
valves, and used for reducing pressure in sectors and [3] S. Alvisi & M. Franchini (2014), “A heuristic procedure for the
consequently, water leakage. automatic creation of district metered areas in water distribution
systems”, Urban Water Journal, 11:2, 137-159
IV. CONCLUSION [4] G. Ferrari, D. A. Savic, G. Becciu, “A graph theoretic approach and
sound engineering principles for design of district metered areas”, J.
Efficient and simple algorithm for water network of Water Resources Planning and Management, Vol. 140, Issue 12,
sectorisation (WNS) is presented in the paper. Algorithm December 2014
uses results of 24-hour hydraulic simulation to determine [5] O. Giustolisi, and L. Ridolfi, "New Modularity-Based Approach to
Segmentation of Water Distribution Networks." J. of Hydraulic
the orientation of the pipes and to identify non-oriented Engineering, vol. 140, Issue 10, October 2014
pipes in which water flows in both directions during the [6] K.. Diao, Y. Zhou, and W. Rauch, "Automated Creation of District
simulation period. Once the oriented graph (DIGRAPH) Metered Area Boundaries in Water Distribution Systems." J. of
is defined, SCCs are identified and aggregated to sectors Water Resources Planning and Management., vol. 139, Issue 2,
according to the number of sectors set as an input. March 2013 184-190.
[7] A. Di Nardo, M. Di Natale, G. Santonastaso, V. Tzatchkov and V.
Algorithm is tailored to the need of finding the Alcocer-Yamanaka, "Water Network Sectorization Based on Graph
sectiorisation solution that will have the least effect on Theory and Energy Performance Indices." J. of Water Resources
the hydraulics of the system with minimal investment. Planning and Management., vol. 140, Issue 5, May 2014, 620-629..
The WNS algorithm was tested and demonstrated on the [8] M. E. J. Newman and M. Girvan (2004), “Finding and Evaluating
real-life network of a town in Serbia. Based on the results Community Structure in Networks”, Physical Review E, Vol. 69,
Isssue 2, Feb 2004, DOI 10.1103/PhysRevE.69.026113; The
obtained, it can be concluded that the algorithm can serve American Physical Society
as a valuable tool for practicing engineers dealing with [9] E. Todini (2000): “Looped Water Distribution Networks Design
water network sectorisation. Future work should be Using a Resilience Index Based Heuristic Approach”. Urban Water,
aimed at improvement of the algorithm by adding the Vol. 2, Issue 2, June 2000, pp. 115-122, DOI 10.1016/S1462-
possibility of testing the effects of potential network 0758(00)00049-2; Elsevier
[10] F. De Paola, N. Fontana, E. Galdiero, M. Giugni, D. Savic and G.
interventions. In this manner, different sectorisation Sorgenti degli Uberti (2014). “Automatic Multi-Objective
solutions could be investigated through an optimization Sectoriyation of A Water Distribution Network”. 16th Conference on
method and evaluated using some performance Water Distribution System Analysis WDSA2014.
indicators, allowing selection of a (sub)optimal one. [11] Ž. Vasilić, M. Stanić, D. Prodanović and Z. Kapelan: „Network
Sectorisation Through Aggregation of Strong Connected
ACKNOWLEDGMENT Components”; 18th Conference on Water Distribution System
Analysis, WDSA 2016; Cartagena des Indias, Colombia 24-28 July
The authors express their gratitude to the Serbian 2016;
Ministry of Education and Science for financial support [12] R. Tarjan (1972). “Depth-First Search and Linear Graph
through the project TR37010. Algorithms. SIAM Journal on Computing, Vol. 1, No. 2, June 1972
[13] R. Sedgewick, Algorithms in C++, Part 5 Graph Algorithms,
REFERENCES Addison Wesley 2002.
[1] J. Deuerlein, “Decomposition Model of a General Water Supply
Network Graph,”J. of Hydraulic Engineering, vol. 134, Issue 6,
June 2008, pp. 822-832
[2] L. Perelman, A. Ostfeld., “Water distribution systems
simplifications through clustering”, J. of Water Resources Planning
and Management., ASCE (2011), doi:10.1061/(ASCE)WR.1943-
5452.0000173
209
7th International Conference on Information Society and Technology ICIST 2017
Abstract— In this paper, a novel data assimilation method Monte-Carlo method in each time step, they are too slow
for improved management of open channel hydraulic for regular hourly assimilation purposes.
system is presented. As opposed to standard data We present in this paper the feasible data assimilation
assimilation methods, which either require linearization of technique for real-time update of model’s hydraulic states
the used numeric simulation model or time-consuming data based on PID (Proportional-integral-derivative) controller
analyzing steps, the proposed method involves the use of theory. At every computational step, the controller with
“full” free surface flow equations and simple data correction proportional and integral terms (PI controller) updates the
step based on the proportional-integral-derivative (PID water levels by tuning water discharge along the river.
controller) theory. The novel method is verified in the case Such kind of assimilation is possible since total upstream
of river Danube and reservoir management of the Iron Gate
volume of Danube section upstream of HPP Iron Gate 1 is
I and Iron Gate II power plants. As described in this paper,
large and will filter out the PI controller’s incompetency
the presented data assimilation methodology is particularly
to adopt to nonlinearity of system.
suitable in the operational management of systems with high
inertia, such as reservoirs on the Danube River. Two case studies shown that the model is applicable on
long and large rivers with high inertia, such is the Danube
River’s reach from Novi Sad to downstream the Iron Gate
I. INTRODUCTION II. The proposed method is fast enough to be used on
Water resource management requires accurate and regular daily basis, improving the optimization of
timely knowledge of the states of available water hydropower production on the Danube river.
resources. In operational management, for given forecast
of input variables (inflows and meteo data), complex II. METHODS
nonlinear 1D simulation model of river flow and power
plant operation is used to optimize the energy production The 1D model of the Danube river, between Novi Sad
while satisfying the given constraints (water levels and and Iron Gates 1 and 2 HPP, is formulated in the FEQ
needs of other water users). The imperative is to use the (Full EQuation Model) solver [4] and calibrated based on
up-to-date simulation model hence it has to be constantly historical hydrologic data series. The level and flow rate at
updated with current river states (levels) and known flows Iron Gate 1, as downstream condition are considered as
(in power plants). quantities with least uncertainty, while input flowrates are
Data assimilation is the technique to keep the assumed to be with high margin of error. Data
simulation model up-to-date. It is the process where, for assimilation is applied on several characteristic sections
each time step, the “most probable” state of the system is along the river, where water levels are measured at
computed based on real-time measured variables (with hydrologic stations. Using PI controller’s simulation in
their intrinsic uncertainty assessed by data quality FEQ model, the measured water levels are applied to
control), and results of simulation model (assimilated to correct the model’s state (water level). Applied
the previous time stamp). During assimilation, only methodology assumes that simulation model in FEQ is
system state variables are updated while calibrated model well calibrated so induced state changes by assimilations
parameters are kept constant. are small.
Some well-known data assimilation methods are In proposed method, dummy tributaries are added at
Kalman Filter, particle tracking and data-variation [1]. To locations where measurements of water levels take place.
apply the most used Kalman filter, the river flow At every computational time step, discharge through
equations requires linearization [2] which opens the new additional (dummy) river reach is corrected. This
problems and uncertainties. Kalman Filter can be used correction is actualy sum of proportional and integral
also with nonlinear models (known as Extended Kalman terms of PI controller. Both terms are calculated using
Filter) but in each time step the time consuming differences between measured (accurate) values of the
calculation of uncertainty is needed (mostly done using water level and the values calculated using the hydraulic
Monte-Carlo method) which prevents the usage of this model. To calculate an increment (correction) of discharge
method in every-day operational usage. In some papers [3] through the dummy reach, the following formula is used:
the Extended Kalman Filter is compared with particle
tracking method, with conclusions that both methods are
applicable for nonlinear river models, but since both use
210
7th International Conference on Information Society and Technology ICIST 2017
Q (t ) k p e(t ) ki e(t ) dt
0
(1)
Figure 1. A block diagram of PI controller applied to correct system’s Figure 2. Portion of the Danube river between Iron Gate 1 and Iron Gate
state in FEQ model. Hm is measured water level (“set point”), and Hupt 2 hydropower plants. Green line represents the dummy reach added to
is water level calculated (updated) using FEQ model. correct discharges and water levels.
where e(t) is error in water levels (difference between At locations of both Iron Gate 1 HP and Iron Gate 2
measured and predicted levels), kp is coefficient for HP, same real hydrographs were used as boundary
proportional term and ki is coefficient for integral term. conditions (Fig. 3). These hydrographs represent predicted
The procedure of correction is depicted in Fig. 1. discharges through turbines of hydropower plants. Zero
discharge was used as boundary condition at location of
III. RESULTS & DISCUSSION Gogos hydropower plant.
The capability of proposed method was tested on two The method was assesed through the comparison of the
case study [5]. An application of the assimilation method computed water levels during the three simulations at
for 1D model for river Danube between Iron Gate 1 location where water levels are assumed to be measured,
hydropower plant and Iron Gate 2 hydropower plant Fig. 4. By visual evaluation of the results, it can be
during nonstationary flow in turbines at both river reach concluded that proposed method is capable to correct
ends was analyzed in first case study. The correction of water levels in conditons of highly variable flow rates
model states during flood waves propagation on a portion induced by hydropower plants operation.
of Danube river between Novi Sad and Iron Gate 1 were
analysed in second case study.
In each case study, three simulations were done. The
first simulation was run to generate model states that were
considered as accurate (“true”) states (levels and
discharges). Random errors in input flowrates were added
to the second simulation data. White noises were
introduced to simulate “false” flowrates prediction at
hydro power plants and hydrological stations. Using the
same (false) input data, the assimilation technique were
tested running third simulation to conclude weather the
proposed method is capable to correct false states obtained
from the second simulation.
Figure 3. The discharges through hydropower plants
The tuning of discharges, based on PI controller theory,
were applied at every computational time step of
hydraulic model in both case studies (Δt = 15 min). Both
coefficients of equation xx, the proportional and the
integral gain, were held constant during assimilation
process. The proportional gain (kp) was set to 1000 m2/s
and the integral gain (kp) was set to 1 m2/s2.
Satellite image of river reach under consideration in the
first case study are shown in Fig. 2. There are two
downstream ends, first one at location of Iron Gate 2
hydropower plant, and second one at location of
hydropower plant Gogos. The upstream end of the reach
corresponds to the location of Iron Gate 1 power plant. It
is assumed that “accurate” levels measurements are
available near the town of Kladovo. At same location,
dummy reach was added to the 1D model as tributary of
the Danube in order to apply proposed method.
Figure 4. Water levels calculated at location near the town of Kladovo
211
7th International Conference on Information Society and Technology ICIST 2017
Figure 5. Portion of the Danube river between Novi Sad and Iron Gate 1
hydropower plant. Green lines represent dummy reaches added to Figure 8. The Comparison between “Corrected” (yellow line), “True”
correct discharges and water levels. (dashed line) and “Predicted” (red line) water levels at location of
Zemun hydrological station.
212
7th International Conference on Information Society and Technology ICIST 2017
IV. CONCLUSION
The main goal of the presented study was to present novel
assimilation method for river flow, non-linear simulation
models. We demonstrate that PID controller theory can be
applied as robust computational technique to keep
simulation model up-to-date. The proposed method
requires 1D river flow model to be extended by adding
additional- dummy river reaches. The proportional and
integral gains of PID controller were used to control
discharges through dummy reaches and consequently
through the whole computational domain. The method is
verified in two case studies that deals with different river’s
flow scenarios for the river Danube reach between Novi
Sad and Iron Gate 2. It was shown that more than one
dummy reaches can be used without affecting each other.
Figure 9. The Comparison between “Corrected” (yellow line), “True”
(dashed line) and “Predicted” (red line) water levels at location of ACKNOWLEDGMENT
Pančevo hydrological station.
The authors are grateful to the Serbian Ministry of
Education, Science and Technological Development for
its financial support (Projects No. TR37010).
REFERENCES
[1] C.J. Hutton, L.S. Vamvakeridou-Lyroudia, Z. Kapelan and D. A.
Savic, “Real-time modelling and Data Assimilation techniques for
improving the accuracy of model predictions” Scientific report..
[2] P.O. Malaterre, “PILOTE: Linear quadratic optimal controller for
irrigation canals”, Journal of irrigation and drainage engineering,
1998, 124(4), pp.187-194.
[3] N. Jean-Baptiste, P.O. Malaterre, C. Dorée and J. Sau, “Data
assimilation for real-time estimation of hydraulic states and
unmeasured perturbations in a 1D hydrodynamic model”,
Mathematics and Computers in Simulation, 2011, 81(10),
pp.2201-2214.
[4] D.D. Franz and C.S. Melching, Full equations (FEQ) model for
the solution of the full, dynamic equations of motion for one-
dimensional unsteady flow in open channels and through control
structures, US Department of the Interior, US Geological Survey,
1997.
Figure 10. The Comparison between “Corrected” (yellow line), “True” [5] Institute Jaroslav Černi (2016): Hydroinformatic system
(dashed line) and “Predicted” (red line) water levels at location near the “Djerdap”. Report made for HE Djerdap power plants.
town of Smederevo.
213
7th International Conference on Information Society and Technology ICIST 2017
Abstract— This paper addresses the issue of providing data interpretation process [3]. New technologies have opened
quality exchange in data management platforms to ensure the possibility and made inquire for exchanging data in
adequate data interpretation across different systems. Data machine-automated and interoperable ways [4].
quality metadata exchange is analyzed in data exchange in Data exchange is vital for water resource management
hydrology and hydropower monitoring, with emphasis on and there are already numerous data management
exchange of data for further use in mathematical modeling. platforms and best-practices available [5]. However,
A service-oriented solution is proposed in data quality simulation models and decision support systems for water
modelling and exchange for seamless integration in existing resource management are very sensitive regarding data
data management platforms. Examples from hydro- quality, and usually this quality is application-specific.
information system for Drina river basin are used
The standards for data exchange, e.g. WaterML 2.0, do
throughout the paper.
not incorporate data quality metadata exchange or
metadata interoperability [6]. Therefore, usability of
I. INTRODUCTION exchanged data in consumer application may be different
from supplier application.
The value of digital data as market goods is becoming
increasingly important in current economy. The value of II. DATA QUALITY METADATA IN DATA MANAGEMENT
data is closely related to the perception of data quality and
PLATFORM FOR HYDROPOWER SYSTEMS
its usability in digital environments. However, the quality
of data is not generally well defined and can be perceived Nowadays, there are various data management
quite differently across various data management platforms for hydropower systems. Usually, Supervisory
platforms. Therefore, the real value of data in specific Control and Data Acquisition (SCADA) systems are used
system may be lessened because of diverse interpretation for critical processes in hydropower facilities. Due to the
of data quality, especially when data is used in sensitivity of control issue data exchange is limited and
mathematical modelling and data assimilation. Evidently, unidirectional (SCADA is the data provider). However,
there is a need for unified approach in data quality other important data is available from other providers,
exchange and metadata interoperability between various such as monitoring networks and public services. Special
information systems. data management platforms are used for centralizing data
Water represents a resource of vital importance and one in hydropower systems, and these platforms are used in
of the main factors of economic and social development. conjunction with mathematical models for decision
The expansion of information technologies has provided support. Generally, these specialized platforms are known
the opportunity to collect huge amounts o data that are as hydro-information systems.
necessary for various analysis and decision-making in The concept of a hydro-information system has been
hydrology [1]. Integrated water resources management recently developed and it connects measurement systems
requires the collection of data from multiple data suppliers and simulation models for water resources planning and
and utilization by many different stakeholders to support management [7]. Although hydro-information systems are
the decision-making process. Sustainable business not necessarily designed on the service-oriented
decisions are based on contingent of good data which is principles, every day practice demonstrates that it is
the result of effective data quality management [2]. The highly desirable for the software solution of such a system
level of data quality defines data usability. For that reason, to be defined as an open service-oriented architecture, due
information on data quality must be implemented within to the diversity of measurement systems and simulation
the data exchange process. models. Such an approach would simplify the complexity
Data exchange process in multidisciplinary researches of a solution and improve the re-usability of the software,
is based on many-to-many relation between data suppliers along with the possibility of a dynamic integration of
and data consumers. Improvement of metadata different components and simplification of their
interoperability is required to meet the needs of data maintenance and development.
214
7th International Conference on Information Society and Technology ICIST 2017
Because of the complexity of water resources and power plants’ productions and other information
exploitation, both of natural and artificial origins, hydro- important for interpretation of current state of river basin.
information systems have been designed as platforms with Acquisition of this data is mostly automated, which is
broad extension possibilities, whose structure is frequently done by data exchange with other information systems
modified because of changes in the mode of exploitation. through specially designed services. Some data are
Regardless of a HIS system or the goals of its manually measured and acquired through manual entry
development, the most important part of such a system is application by staff. In total, over 200 measurement
its hydrologic component, because the natural water flow stations are included in acquisition and represented with
is the most important factor in water resources real-time and historic data. This data is readily available
management. However, hydro-information systems for use in user applications and numerical modules.
represent potentially broader systems that include Selected data can be monitored through HIS “Drina” web
electricity generation, irrigation, water supply and other portal, which provides data for all interested parties within
artificial activities within a system, as well. The use of river basin.
complex hydrological models is common in hydro-
Data may be acquired with or without quality measure.
information systems, as is in the case of HIS for Drina
When there is no data quality provided, a proprietary data
river basin.
quality mechanism is used against acquired data before
The “Drina” hydro-information system (HIS “Drina” archiving. This is done through use of data receive and
for short) is a distributed hydro-information system quality control layer. This layer uses measured data
designed according to service-oriented architecture, quality assessment algorithm, and is based on schema
created for the decision-making support for management
of the waters in the river Drina basin. It consists of various
data management services, databases, and numerical
modules. An overview of HIS “Drina” architecture is
shown in Fig.1.
The system is built on commercial technologies, with
use of SQL Server database and .NET Framework. There
are various schemas defined in Microsoft SQL server
central database, and GIS-related data is stored in
ArcHydro-based schema. All the services are proprietary,
with open interface specifications, and are published as
Web Services.
The full use of HIS “Drina” relies on availability of
data gathered from monitoring networks. The data
required as simulation input are mostly meteorological
measurements, while data necessary for state updating Figure 2. Data quality metadata in HIS “Drina”
procedures [8] also include hydrological measurements
215
7th International Conference on Information Society and Technology ICIST 2017
repository. Schemas are used for documenting data quality assessment algorithms. Therefore, it is crucial that
metadata so that every single measurement stored in metadata on data quality assessment is also exchanged.
central database is referenced to corresponding quality Ideally, the consumer has been already provided with
assessment method [9] used for obtaining data quality metadata upon the start of data exchange, through formal
measure. An example of data quality schema is shown on procedure or service agreement. More often, the metadata
Fig.2. may be missing when the consumer is receiving data, so it
Schema repository is managed through administrative needs to be requested from the supplier. The data may also
tools and is updated when a new data provider is reference to an unknown schema, which also needs to be
introduced in acquisition process. Schemas may be resolved.
provided for further use along with HIS data. Web service for data quality exchange is used whenever
there is quality measure in the stream that is described by
III. METHODOLOGY FOR DATA QUALTIY METADATA unknown schema. Consumer service prompts supplier
EXCHANGE service with request, and supplier service exchanges
metadata with consumer. The provided metadata is stored
Definition and delegation of roles, responsibilities, in consumer’s schema repository. Based on exchanged
policies, and procedures within the process of acquisition, metadata, the consumer then interprets quality of the
maintenance, dissemination, and disposition of data is received data.
known as a data quality management. Data quality is a
multidimensional function. Decision support systems must One of the main features of this HIS “Drina” is the
be based on data that are complete, correct, and timely. system for quality control of measured technical data. The
Those data quality dimensions define degree of data concept of quality control of measured technical data
quality [10]. provides insights into data accuracy, and data is further
used in numerical simulations and decision making.
The proposed data quality exchange methodology is
relying on data quality metadata interoperability as a base Measured technical data is formed by measuring
for data quality reusability across various data hydrotechnical and other important parameters. Apart
management systems. The data for which the quality is from data value it has attributes that include record on
being exchanged is represented as time series, whereas measuring conditions, time of measuring, measuring
each of the time series’ quality is assessed using equipment information, and other relevant information.
application-specific metrics. The result of assessment is One of the most important attribute is information about
data with data quality measure, which is usually the data quality. This attribute is calculated and added to
exchanged without metadata on quality. data in the process of quality control of measured
technical data.
For use in this platform, interoperable schema defines
application-specific metrics. The platform is designed to Quality control of measured technical data has the
handle both continuous and discrete data streams. It is important place in HIS and it can be implemented in two
based on implementation of web services for metadata ways, regarding the interoperability of metadata from data
exchange. The metadata for data quality is tailored for supplier:
hydrological and hydropower monitoring, i.e. point-based • If data quality metadata is available, based on
real-time monitoring. exchanged metadata, HIS interprets quality of the
Given the scenario where data is exchanged between received data and stores original data quality
supplying and consuming service, the issue of data quality measure, per proposed methodology.
exchange should be carefully addressed. The supplier and • If data quality metadata is not available, HIS-
the consumer usually do not use same data quality specific quality control of measured technical
data is executed after the acquisition process that
216
7th International Conference on Information Society and Technology ICIST 2017
217
7th International Conference on Information Society and Technology ICIST 2017
Abstract— A procedure for modelling of mean monthly flow component. These components describe low-frequency
time series using records of the Great Morava River and seasonal variability, while high-frequency variability
(Serbia) is presented in this paper. The main assumption of is represented by the stochastic component. In the present
the conducted research is that a time series of monthly flow research, the stochastic component is modelled by cross-
rates represents a stochastic process comprised of correlation transfer function (TFs) that describe linear
deterministic, stochastic and random components. The relations among the flows and climatic data. For the same
former component can be further decomposed into a purpose, artificial neural network (ANN) and polynomial
composite trend and two periodic components. The regression (PLR) are used as techniques capable of
deterministic component of a monthly flow time-series is modelling the complex relations between monthly flows
assessed by spectral analysis, whereas its stochastic and climatic inputs. In such cases, ANN and PLR serve as
component is modelled using cross-correlation transfer tools for examining the accuracy of the approach based on
functions, artificial neural networks and polynomial transfer functions, with the ultimate goal of deriving the
regression. The results suggest that the deterministic most reliable model for the stochastic component of
component can be expressed solely as a function of time, monthly flow time-series.
whereas the stochastic component changes as a nonlinear
function of climatic factors (rainfall and temperature). The II. METHODOLOGY
results infer a lower value of Kling-Gupta Efficiency in the
case of transfer functions, whereas artificial neural The present research relies on the basic assumption that
networks and polynomial regression suggest a significantly monthly flow time series Qt represent the sum of three
better match between the observed and simulated values. It components [7]:
seems that transfer functions fail to capture high monthly Qt Dett Stocht errort
flow rates, whereas the model based on polynomial
(1)
Qt QT QP QS t Y t e t
regression reproduces high monthly flows much better
because it is able to successfully capture a highly nonlinear t 1, 2,..., N
relationship between the inputs and the output. where QT is the composite trend, QP is the long-term
periodic component, QS is the seasonal component, Yt is
I. INTRODUCTION the stochastic component and N denotes the sample size.
Addend et is the error term (random time series).
There is a number hydrological models for flow We model the deterministic component separately from
estimation based on long-term flow statistics [6] or which the stochastic component. The composite trend and
use the conventional mathematical approach based on macro-periodic component constitute the annual
dependence among inputs and the output at a few deterministic component, which is modelled on an annual
successive lags [1]. The former approach preserves the time-scale, whereas the seasonal component is modelled
characteristics of the hydrologic process in the long-run, on a monthly time scale. After the annual deterministic
while the latter models can reproduce the short-term component is modelled, it is downscaled to monthly
hydrologic pattern. The motivation of thus research is to intervals, so as to subtract it from the monthly flow time-
introduce an approach that preserves both long-term and series. The first and second order residuals are derived in
short-term characteristics of hydrologic time-series on two this way. The latter contains a significant seasonal
time scales (annual and monthly), using a disaggregated component, which is modelled further using the monthly
time series. flow time-series.
The underlying assumption is that time series can be The composite trend is modelled using the Linear
decomposed into deterministic, stochastic and random Moving Window (LMW) method [4]. Once the composite
parts. The deterministic part consists of the composite linear trend QT has been determined, it is downscaled
trend, macro-periodic component and seasonal from the annual to a monthly time scale. The first-order
218
7th International Conference on Information Society and Technology ICIST 2017
219
7th International Conference on Information Society and Technology ICIST 2017
Composite trend
800
600
400
200
0
May.35
May.46
May.58
May.69
May.81
May.04
Apr.34
Apr.57
Apr.80
Apr.92
Apr.03
Mar.33
Mar.45
Mar.56
Mar.68
Mar.79
Mar.91
Mar.02
Jun.36
Jun.47
Jun.59
Jun.70
Jun.82
Jun.93
Feb.32
Sep.38
Feb.44
Sep.50
Feb.55
Feb.67
Sep.73
Feb.78
Feb.90
Sep.96
Feb.01
Jun.05
Dec.41
Dec.53
Dec.64
Dec.76
Dec.87
Dec.99
Dec.10
Jan.31
Jan.43
Jan.66
Jan.89
Jan.12
Oct.39
Nov.40
Aug.49
Oct.51
Nov.52
Aug.61
Oct.62
Nov.63
Aug.72
Oct.74
Nov.75
Aug.84
Oct.85
Nov.86
Aug.95
Oct.97
Nov.98
Aug.07
Oct.08
Nov.09
Jul.37
Jul.48
Jul.60
Jul.71
Jul.83
Jul.94
Jul.06
1200
(b) Observed flow rate
800
600
400
200
0
1931
1932
1934
1936
1938
1940
1942
1944
1946
1948
1950
1952
1954
1956
1958
1960
1962
1964
1966
1968
1970
1972
1974
1976
1978
1980
1982
1984
1986
1988
1990
1992
1994
1996
1998
2000
2002
2004
2006
2008
2010
Figure 1. Observed monthly flow rates together with composite trend, macroperiodical component (a) and seasonal component (c) of the Great
Morava River at h.s. Ljubičevski Most, 1931-2012.
0 0
0
0 200 400 600 800 1000 1200 0 200 400 600 800 1000 1200
0 200 400 600 800 1000 1200
Observed flow rate (m3 /s) Observed flow rate (m3 /s) Observed flow rate (m3 /s)
Figure 2. Observed and modelled monthly flow rates of the Great Morava River at h.s. Ljubičevski Most for the calibration period (1950-1990):
(a) TF, (b) ANN, (c) PLR.
(m3 /s)
0 0 0
0 200 400 600 800 1000 1200 0 200 400 600 800 1000 1200 0 200 400 600 800 1000 1200
Observed flow rate (m3 /s) Observed flow rate (m3 /s) Observed flow rate (m3 /s)
Figure 3. Predictive power of proposed models using (a) TF, (b) ANN and (c) PLR for the validation phase (1991-2012).
220
7th International Conference on Information Society and Technology ICIST 2017
ACKNOWLEDGMENT
This study was supported by the Ministry of Education,
Science and Technological development of the Republic
of Serbia as a part of the project TR 37013 "System
development to support optimal sustainability of the high
dams in Serbia, 2010-2017".
REFERENCES
[1] S. Abudu, J.P. King, and A.S. Bawazir, 2011. “Forecasting
monthly streamflow of spring-summer runoff season in Rio
Grande headwaters basin using stochastic hybrid modeling
approach” Journal of Hydrologic Engineering. vol. 16, pp. 384-
390. 2011.
[2] G.E.P. Box, G.M. Jenkins, and G.C. Reinsel. Time Series
Analysis, Forecasting and Control. Fourth Edition. John Wiley &
Sons, INC., Publication. USA. 2008.
[3] M. Khandelwal, and T.N. Singh. “Prediction of macerals contents
of Indian coals from proximate and ultimate analyses using
artificial neural networks” Fuel. vol. 89, pp.1101-1109. 2010.
[4] M. Stojković, S. Kostić, S. Prohaska. J. Plavšić, and V. Tripković
“A New Approach for Trend Assessment of Annual Streamflows:
a Case Study of Hydropower Plants in Serbia”. Water Resource
Management. vol. 31, pp. 1089-1103, 2017.
[5] S. Kostić, M. Stojković, S. Prohaska, and N. Vasović. “Modeling
of river flow rate as a function of rainfall and temperature using
response surface methodology based on historical time series”.
Journal of Hydroinformatics. vol. 18. 2016.
[6] M. Stojković, J. Plavšić, S. Prohaska. “Annual and seasonal
discharge prediction in the middle Danube River basin based on a
modified TIPS methodology”. Journal of Hydrology and
Hydromechanics. vol. 65, pp. 165–174, 2017.
[7] M. Stojković, S. Kostić, J. Plavšić, and S. Prohaska., “A joint
stochastic-deterministic approach for long-term and short-term
modelling of monthly flow rates”. Journal of Hydrology. vol. 544,
pp. 555–566, 2017.
221
7th International Conference on Information Society and Technology ICIST 2017
222
7th International Conference on Information Society and Technology ICIST 2017
integrate measured data, historical data, prognostic data, Our solution was built upon a two-fold storage server
analytical tools, and expert knowledge. architecture, the central component incorporating the
Our main goal was to develop a hydro-information monitoring system organizational data, and the auxiliary
system as an EDSS component which includes different component storing the time-series data. Time-series are
aspects of data management while maintaining its kept consistent with monitoring system storage through
adaptive potential. By finding a balance between synchronization model. This model is based on the
commercial and open-source key features, our solution premise of storing series metadata (i.e. its corresponding
was planned according to the structure of the monitoring monitoring instruments and measurement units) in the
system, but it also focused on data flow, its analysis, central database component, while their raw data is stored
storage, and presentation. in a relational database [15] indexed by a unique
identifier. This organization enabled straightforward and
II. MATERIALS AND METHODS fast series data retrieval, using simple database queries,
while each series is constantly connected to its origin in
A. Data storage and calculations the monitoring system. Central component was developed
independently, from the ground up, based upon a custom
Hydro-information system (Fig. 1), as a heterogeneous data model which primary enabled storage of multiple
data integrative platform, can benefit from server-client temporal varieties of the same monitoring system
architecture, owing to the development of web-based component as unique database record. Additionally,
technologies and their increased standardization. central component enabled potential users to define
Database servers are common building blocks in custom data records, such as new time-series or complex
development of hydrological data applications and they measurement units, since no rigid model was enforced.
often rely on either structured (SQL) or non-structured
database (NoSQL – not only SQL) architecture [10] [14]. Calculations server was associated with two-fold
Starting from acquired, raw time series data, stored storage server and its primary role was both on-demand
predominantly in Microsoft SQL Server database [15], and automatic calculation of derived time-series (i.e.
our first step was to develop a more flexible data solution, aggregation or composition) and statistical models used
for the elements of the monitoring system. Time series in real-time data monitoring and data quality assurance.
and monitoring instrumentation data is characterized by
B. Web API and user experience components
high spatial and temporal variability and no fixed model
can be determined for all the possible measurements and Uniformed access to all stored content, with
types of temporal dynamics in advance. Therefore, standardized methods for data retrieval and administration
NoSQL was the more natural choice than SQL. NoSQL was achieved by using a web application programming
databases are also more preferred when building interface (API). Its main role is in exposing a variety of
hydrological data-centric systems [16] [17] [18] [19]. data access and update methods, which are available to
Effective stored data access along with flexible scaling any possible client application. It directly communicates
with the addition of new measurement data [20] were also with both database components by means of request-
among the reasons for our NoSQL solution. Monitoring response mechanism. More specifically, the central
system was also a subject to changes over the decades of database component (NoSQL) directly offers its data
monitoring practice, meaning a flexible database model access methods to the API, while time-series data is
should reflect temporal nature of the monitoring system. retrieved through a web-service. This service is also
Figure 1. Overview of the hydro-information system structural components. Software technology symbols denote their usage in different parts of the
system. ETL stands for Extraction Transformation Loading process of data acquisition.
223
7th International Conference on Information Society and Technology ICIST 2017
interconnected with the calculations server, so time-series corresponding data diagrams in adjacent ones). Finally,
data responses can be either raw or transformed through because of the temporal data database access it is possible
one of series derivation methods. for users to shift back the entire view of the data to any
Client applications were designed with respect to end- point in the past by using timeline mechanics. This
user access rights that can range from data overview and feature enables much broader outlook on the distribution
administration to complex data analysis, quality and the organization of the monitoring system and their
assurance, and real-time monitoring. Broadly these dynamics and development through time.
applications can be classified as database administrative Besides detailed historical data management, one of
tools, data presentation and historical overview tools, and the keynote features of our hydro-informatics system is
tools for time-series data analysis. Although separate, they its ability to forecast future data by providing tools for
share the same common underlying organization, creating integrated data models. Each time series of
primarily the API-related communication methods and a interest for successful hydro-electric management can be
focus on monitoring system and time-series data. Each associated with appropriate data model, while the model
application always has rich data visualizations, from the itself can be easily visualized along the historical or real-
usual data tables and time-series plots, to complex time data. Models can also be extended to an arbitrary
monitoring area maps and power station schematics. future time and used as forecasts. Time-series modeling
Client applications are all web-based, developed with provides basis for quality assurance of both historical and
contemporary web-technologies (HTML5, CSS3 and real-time data. An early warning system uses confidence
JavaScript) and especially by using MVVM (Model- interval type quality control and automatically issues
View-ViewModel) web-frameworks [21] [22]. MVVM warnings whenever new data falls out of the control
software design pattern enables independent development interval.
of the graphical user interface from the development of Custom central database component provided flexible
the application logic and data manipulation methods [23]. data model of the monitoring system. Users can define
This approach greatly facilitates the overall development new data types according to their possible usage.
process by decoupling of its main constituent parts which Structural changes of the monitoring system (i.e. adding
results in an efficient software code reuse and parallel new instruments) will always result in the most up-to-
development [24] [25]. All components of MVVM user date database configuration, offering its historical states.
interface are always in synchronization with underlying Finally, time-series data is always in synchronization
data models, meaning that all user actions result in the with the monitoring system, offering both raw instrument
immediate data view updates. This feature was especially measurement values and derived values
important for the timeline mechanism, which enables an
on-demand insight into historical configurations of the B. Role in decision-making and future work
monitoring and instrumentation system and its Reliable and efficient data management platform is a
accompanying time-series data. cornerstone in any EDSS application [15]. By offering an
accessible and visually engaging solution, we have
III. RESULTS AND DISCUSSION created a solid base and a variety of functional
components which can be further upgraded and new
A. Hydro-information system performance features added. Decision makers can have both general
Our hydro-information system was successfully and specific insight into both environmental and technical
developed and its initial release versions are now fully conditions driving the production of electrical energy.
operational in real-life industry environment. The entire Constant availability of accurate data offers integrated
system is comprised of several servers, with their primary knowledge to a broad range of interested users, workers,
role in storing the database components and hosting the researchers, industry experts and practicing engineers,
client application code, and a suite of client applications, which is one of the most important features of data
easily accessible through common web-browsers. management platforms for hydrological data [19] [20].
With respect to decisions we made during Our hydro-information system bridges the gap between
development, we could determine the advantages of web complex and abundant, temporally dependent data and
based technologies which include easy application life- the overall understanding of a complex socio-economic-
cycle management and the enhanced user experience, environmental process such as the production of electrical
providing both functional and aesthetically engaging energy.
graphical user interfaces suited for comfortable day-to- Owing to a system-wide extensibility, future work on
day work. Our system fulfills main functional features of our hydro-information system would focus on adding
a general hydro-information system [6]. Data is organized more EDSS features, mainly by introducing additional
and displayed through meaningful and easily navigable data sources, adding new decision making tools [1] [3]
categories, by using diverse data visualization techniques, [5], enhancing and integrating available hydrological and
from simple tabular and graphical display of time series hydraulic modeling frameworks [6] [26] [27] [28].
data, through their combinations, to complex area maps
and power stations schematics. The integrative aspect of REFERENCES
the hydro-information system is reflected in the network [1] A.P. Simpson, Decision Making in Energy: Advancing Technical,
of different pieces of information. For example, it is Environmental, and Economic Perspectives, PhD Thesis, Stanford
possible to display a detailed area map with all the University, 2010.
monitoring locations and obtain the information about [2] D.J. Power, Decision Support Systems: Concepts and Resources
measurement equipment and a plot showing the time for Managers, 1st ed. Westport, Connecticut, Quorum Books,
series data. The panel cell-based display, offers custom 2002.
data view design (i.e. display area map in one cell and
224
7th International Conference on Information Society and Technology ICIST 2017
[3] J. Mysiak, C. Giupponi, and P. Rosato, “Towards the development Analysis”, IEEE International Conference on Computing for
of a decision support system for water resource management” Geospatial Research and Application, pp. 9-16, 2014.
Environ. Mod. & Software, vol. 20(2), pp. 203-214, 2005. [17] C. Vitolo, Y. Elkhatib, D. Reusser, C. J. MacLeod, and W.
[4] M. Matthies, C. Guipponi, and B. Ostendorf, “Environmental Buytaert, “Web Technologies for Environmental Big Data”,
Decision Support Systems: Current Issues, Methods and Tools” Environ. Mod. & Software, vol. 63, pp. 185-198, 2015.
Environ. Mod. & Software, vol. 22(2), pp. 123-127, 2007. [18] A. Tello, R. Freire, M. Espinoza, and V. Saquicela, “Integration
[5] Y. Liu, H. Gupta, E. Springer, and T. Wagner. “Linking Science and Massive Storage of Hydro-Meteorological Data Combining
with Environmental Decision Making: Experiences from an Big Data & Semantic Web Technologies”, Maskana, vol. 6(Supl.),
Integrated Modeling Approach to Supporting Sustainable Water pp. 165-172, 2015.
Resources Management” Environ. Mod. & Software, vol. 23(7), [19] Y. Ma et al., “Remote Sensing Big Data Computing: Challenges
pp. 846-858, 2008. and Opportunities” Fut. Gen. Comp. Sys., vol. 51, pp. 47-60, 2015.
[6] D. Divac, N. Grujović, N. Milivojević, Z. Stojanović, and Z. [20] A. Oussous, F. Z. Benjelloun, A. A. Lahcen, and S. Belfkih,
Simić, “Hydro-Information Systems and Management of “Comparison and Classification of NoSQL Databases for Big
Hydropower Resources in Serbia” J. Ser. Soc. Comp. Mech., vol. Data”, Int. Journ. Database Theo. App., vol. 6(4), 2013.
3(1), pp. 1-37, 2009. [21] AngularJS, https://angularjs.org/, 2016.
[7] VistaDataVision, http://www.vistadatavision.com/, 2017.
[22] ASP.NET, https://www.asp.net/, 2016.
[8] AquaticInformatics, http://aquaticinformatics.com/, 2017. [23] A. Syromiatnikov, and D. Weyns, “A Journey through the Land of
[9] Bgeo, http://www.bgeo.es/, 2017. Model-View Design patterns”, IEEE/IFIP Conference on Software
[10] L.G. Conner, D.P. Ames, and R.A. Gill, “HydroServer Lite as an Architecture (WICSA), pp. 21-30, 2014.
open source solution for archiving and sharing environmental data [24] M. Fowler, Analysis Patterns: Reusable Object Models, 1st ed.
for independent university labs”, Ecol. Inf., vol. 18, pp. 171-177, Addison-Wesley Professional, 1997.
2013.
[25] J. Vlissides, R. Helm, R. Johnson, and E. Gamma, Design
[11] U. Cortés, M. Sànchez-Marrè, L. Ceccaroni, I. Roda, and M. Poch, Patterns: Elements of Reusable Object-oriented Software, 1st ed.
“Artificial intelligence and environmental decision support Reading: Addison-Wesley, 1995.
systems” Appl. Intelligence, vol. 13(1), pp 77-91, 2000. [26] B. Stojanović, M. Milivojević, N. Milivojević, and D. Antonijević,
[12] J. J. Harou et al., “An open-source model platform for water “A Self-tuning System for Dam Behavior Modeling Based on
management that links models to a generic user-interface and data- Evolving Artificaial Neural Networks”, Adv. Eng. Software, vol.
maganer”, iEMSs Fifth Biennial Meeting, Ottawa, Canada, 2010. 97, pp 85-95. 2016.
[13] N. Milivojević, N. Grujović, D. Divac, V. Milivojević and R. [27] N. Milivojević, Z. Simić, A. Orlić, V. Milivojević, and B.
Martać, “Information System for Dam Safety Management”, Stojanović, “Parameter Estimation and Validation of the Proposed
ICIST, vol. 1, pp. 56-60, 2014. SWAT Based Rainfall-Runoff Model–Methods and Outcomes”, J.
[14] D. P. Ames et al., “HydroDesktop: Web services-based software Ser. Soc. Comp. Mech, vol. 3(1), pp. 86-110, 2009.
for hydrologic data discovery, download, visualization, and [28] S. Kostić, M. Stojković, and S. Prohaska, “Hydrological Flow
analysis”, Environ. Mod. & Software, vol. 37, pp. 146-156, 2012. Rate Estimation Using Artificial Neural Networks: Model
[15] Microsoft SQL server, https://www.microsoft.com/en-us/sql- Development and Potential Applications”, Appl. Mat. Comp., vol.
server/, 2016. 291, pp. 373-385, 2016.
[16] M. P. McGuire, C. M.C. Roberge, and J. Lian, “Hydrocloud: a
Cloud-Based System for Hydrologic Data Integration and
225
7th International Conference on Information Society and Technology ICIST 2017
Abstract—For a long time, novel analytical models have solutions based on conformal mapping of the complex
been applied in modeling, prediction and monitoring of variable methods are derived for two-dimensional, steady
water losses and seepage in hydrotechnical objects (HO). seepage into an underwater circular tunnel.
Statistical models based on multiple linear regression Models that are based on numerical methods, such as
(MLR) have been shown to be more or less successful for the finite element method, have been developed as well. In
modeling processes in many domains, but have not been a [3], an interlaced algorithm based on the finite element
common choice for modeling the above mentioned method was suggested to solve the coupled processes of
processes. The rapid improvement of sensor and data non-steady seepage flow and non-linear deformation for a
acquisition technologies enables the development of concrete-faced rockfill dam. In [4], stress-strain analysis,
statistical models in basic and advanced MLR forms, such underground water flow influence analysis and filtration
as hierarchical regression, stepwise regression (SWR),
analysis were performed for the hydrotechnical tunnel in
robust regression, and ridge regression. This paper presents
the phase of excavation.
a framework for the application of advanced statistical tools
in modeling and evaluation of water losses and seepage in Recently, numerical and statistical methods have been
hydrotechnical objects, with the focus on stepwise enriched with various heuristics from the artificial
regressions. The developed framework provides a platform intelligence (AI) domain, creating hybrid models that
for software generation of adequate regression models of combine their advantages. In [5], phreatic line detection,
seepage and water losses, considering both the number, and which is a major challenge in seepage problems, is
the type of regressors. accomplished with the use of Natural Element Method
(NEM) and Genetic Algorithm (GA).
Although AI models have high performance,
I. INTRODUCTION mathematical models in MLR forms are still very useful.
The creation of mathematical models for water losses The problem of determining the appropriate number and
and seepage processes in hydraulic objects (HO) is of type of regressors in MLR models is still present. This
great importance concerning HO functionality and safety. open question preoccupies many researchers. In light of
In the last few decades analytical models, which describe the above, we posed the following research question: Is it
the above mentioned processes, have been developed. The possible, at a sufficiently high quality level to automate
use of statistical models has not been significant although the procedure for selection of the number and type of
they are characterized by the simplicity of formulation, the regressors, which describe the water losses and seepage
speed of execution and the availability of any type of processes in HO such as dams, hydraulic tunnels, etc.?
correlation between independent and response variables. In general, the modern AI approach provides certain
To the best of our knowledge, statistical models in the hybrid solutions for this problem. In [6], in the domain of
multiple linear regression (MLR) form have not been the structural behavior of concrete dams, authors have
common approach for modeling water losses and seepage developed a method for model optimization in terms of
processes in HO. model complexity and accuracy (regularization) [7], based
Based on the review of current literature we found that on hybrid GA and MLR approach.
there are few studies in this domain of research. In the In this paper, for the purpose of solving the problem of
field of modeling water losses and seepage processes in regularization in MLR model for seepage and water losses
HO, a number of models based on experimental methods processes in HO, we focused on the enhanced statistical
with resulting analytical solutions have been shown to be technique of stepwise regression, mainly due to the high
more or less successful [1, 2]. In [1], an experimental execution speed it offers.
method is used for modeling water losses in a hydraulic
tunnel - an analytical model was established and the II. THEORETICAL BACKGROUND
relationship between water losses and the main influences
upon which they depend, such as: internal water pressure The following sections provide a brief outline of the
in the tunnel, groundwater pressure in the area above the employed theoretical base for the statistical modeling of
tunnel and tunnel lining temperature. In [2], analytical water losses and seepage in hydrotechnical objects.
226
7th International Conference on Information Society and Technology ICIST 2017
227
7th International Conference on Information Society and Technology ICIST 2017
Let be the set of all independent variables X , and N is number of measurements, and K is the number of
k be the set of all independent variables X except for regressors.
xk . Then the squared semipartial correlation coefficient is Stepwise regression is explained in detail in [11, 12].
expressed as follows in (7): III. MODELING METHODOLOGY
Development of statistical mathematical models of
srk2 R2 R2k (7)
real objects and systems is a very demanding research
and engineering activity that involves the use of subtle
Therefore, in order to obtain the unique contribution of mathematical apparatus and making of a large number of
predictor xk to the coefficient of determination of the decisions based on both the theoretical knowledge as well
regression model, R 2 , it is first necessary to regress the as empirical an evidence. Although it is impossible to
dependent variable y , based on all independent variables develop turnkey solutions, it is possible to define critical
( ), and then regress the dependent variable y on the steps, and an indicative algorithm to generate adequate
regression models.
basis of all variables except xk ( k ). The difference in
obtained R 2 values, represents squared semipartial A. Proposed algorithm
correlation coefficients. The meaning of semipartial For the realization of statistical modeling of water
correlation coefficient can be expressed in different ways. losses and seepage in hydrotechnical objects algorithm the
One way is through the Ballantine chart in Fig. 1: given in Fig. 2 is proposed. Key features of the proposed
algorithm are described in the following sections.
228
7th International Conference on Information Society and Technology ICIST 2017
domains of applied statistics and computer science. This is of great importance in this matter, because it provides a
refers to the theoretical knowledge and laws, and also the strong foundation for the proper selection of input and
experience of researchers and the results achieved in output variables, and their regressors. Regression models
previous periods. Without well-grounded theoretical are very sensitive to multicollinearity and singularity, so
platform software tools can easily miss the target. the steps 3.2 and 3.3 are necessary to avoid the pitfalls of
model overfitting (models with high accuracy, but small
C. Handling missing data and outliers ability of generalization).
Regression models are very sensitive to the occurrence Furthermore, this complex issue can be supported by
of outliers, and the occurrence of missing data, so the elements of descriptive statistics and probability
process of anomaly detection in the data, or among the distribution laws of physical values upon which water
predictors, is of crucial importance. losses and seepage depend (Fig. 2 - step 3.a). Since the
For dealing with missing data, different strategies can largest number of variables are dependent on a function of
be applied: linear interpolation, spline interpolation, time, it is useful to use the tools for time series analysis
regression interpolation, replacing missing data with mean (Fig. 2 - step 3.b). In this way, trends, cyclical and
values, moving averages, etc. seasonal changes, can be observed, which should facilitate
the modeling process.
The problem of outlier detection and dealing with
outliers is far more complex. For this purpose we It is also useful to support the correlation analysis with
recommend the use of the following packages developed the paradigms of dimension reduction. In this context, the
in the R programming language environment [13]: technique of factor analysis, Principal component
analysis, is very effective (Fig. 2 – step 3.c). Its results
• zoo – provides various methods for replacing data, provide an overview of supporting, influential factors that
including cubic spline interpolation, linear missing underlie the processes.
interpolation, etc.
Often the methodological steps in the third stage (Fig. 2
• tsoutliers – provides time series analysis and outlier – steps: 3.1, 3.2, 3.2, 3.a, 3.b, 3.c) are implemented in
detection methods based on an iterative outlier several iterative steps.
detection and adjustment procedure that obtains
joint estimates of model parameters and outlier E. Iterative generation and selection of models
effects [14]. Five types of outliers are considered:
innovational outliers, additive outliers, level shifts, As previously mentioned, a key issue in creating a
temporary changes and seasonal level shifts. model in the form of multiple regression is the selection of
appropriate regressors, considering both the number and
• forecast – provides methods and tools for analyzing type of these regressors. To solve this problem, according
univariate time series. to the proposed methodology, the techniques engaged
• AnomalyDetection – provides time series outlier were Stepwise regressions. These techniques in Fig. 2 -
detection on the basis of the Seasonal Hybrid step 4.1, based on partial correlation coefficients, perform
extreme studentized deviate (ESD) algorithm, based automated entering and removing of regressors, thus
on [15]. creating optimal regression models. At this stage, we
An example of outlier detection for time series of water defined the following critical steps:
losses in a hydrotechnical object, is given in Fig. 3: • Generating a regression model using stepwise
regression algorithm in multiple variants that are
associated with different criteria of exclusion or
inclusion of regressors from or in the regression
model during the iterative procedure (stepwise,
forward, backward, best subsets).
• Analysis of the resulting, reduced number of
regressors.
• Analysis of regressor importance. To the extent that
a predictor is important in the model, leaving it out
of the model should produce a substantial increase
in the residual sum of squares; to the extent that a
Figure 3. Potential outliers detected with the use of R package
tsoutliers
predictor is not important in the model, leaving it
out of the model should produce a minor increase
Using the tsoutliers package tso function, four outliers inthe residual sum of squares. A series of models is
were detected (red points). The plot also displays outlier computed excluding each predictor in each
effects, the original and suggested adjusted time series successive model, record the sum of squares
(red, grey, and blue, respectively). associated with the residuals for each model, adds
1/p to each residual where p is the total number of
D. Preparation for modeling and modeling predictors, and then determines the ratio of each
subtotal to the grand total (these ratios will sum to
Generating high quality models involves proper 1.00) [16].
selection of predictors and the dependent variable and the
ranges of their values (Fig. 2 - step 3.1). In other words, • Selection of an acceptable model from the obtained
the success of the methodology depends upon defining the variants. Choice is based on criteria of: accuracy
multidimensional space (hyperspace), in which the model (adjusted R 2 , root mean squared error, etc.),
should be created. Correlation analysis (Fig. 2 - step 3.2) complexity, and the type of candidate regressors
(Akaike information criterion, F-statistics, etc.)
229
7th International Conference on Information Society and Technology ICIST 2017
230
7th International Conference on Information Society and Technology ICIST 2017
Abstract—A thermal numerical analysis of large The questions about the possible solutions of this
infrastructural objects such as dams, bridges, tunnels, problem arise with the complexity of the FEM model,
buildings, etc., requires the details about the structure because the limitation in possibility to make huge number
geometry, loadings, boundary conditions and carefully of repeated analysis necessary for the parameters
determined material parameters. The material parameters calibration procedure due to the huge computer time
obtained by an expert opinion or an experimental needed for that processes.
identification can vary from the parameters of specific real
structure. This can give inadequate results and the III. METHODOLOGY
significant difference between the measured data and
computed results. To overcome this problem, it is necessary A. Thermal Analysis Background and Input Variables
to develop and prescribe the methodology for material The numerical thermal analysis is based on the Fourier
parameters calibration. In this paper, one possible approach equation of heat transfer [3,4]:
is applied on example of gravity dam. The proposed
dT 2T 2T 2T
methodology consists of: 1) the huge dam model reduction c p k 2 2 2 q 0 (1)
to substructures (lamellas) with the best quality of measured dt x y z
data, 2) the material parameters sensitivity analysis, 3) the
calibration using intelligent methods (artificial neural where T is the temperature, k is the conduction
networks, genetic algorithms, etc.) and 4) the verification by coefficient, is the density of the material, c p is the
comparison of numerical analysis of the whole structure specific heat, q qn qh qr is the flux. The flux can be
using the calibrated parameters and the available measured
data. further given as the heat source qn qn x, y, z,t , the
convection qh h To Ts and the radiation
I. INTRODUCTION qr hr Tr Ts , where Ts is the surface temperature, T0
A numerical analysis of advanced large infrastructural is the environment temperature, h is the convection
objects such as dams, bridges, buildings etc., is always coefficient, hr is the radiation coefficient and Tr is the
challenging job, because the material parameters of the
temperature of radiation source. It can be noticed that
specific model can be different from the literature
there are five material parameters, proposed by theory,
parameters or the parameters obtained from an
which could be considered for the calibration: k - the
experimental identification (if it is possible make and
experiment). In many cases, there are lack of conduction coefficient, c p - the specific heat, hac - the
measurement data or the data are not reliable due to many convection coefficient between the concrete and air, hwc -
reasons. Also, the parameters can vary in different zones the convection coefficient between the concrete and water,
of the structure, so it is necessary to prescribe and to hr - the radiation coefficient.
develop the methodology for the efficient material
parameters identification and their calibration [1, 2]. B. The Calibration Parameters Selection
In the literature, there are various ranges for material
II. RESEARCH QUESTIONS parameters, but the best choice would be set of parameters
The health analysis of dams includes (beside the obtained from the measurements from the specific
structural and the filtration analysis) the analysis of location. The second choice should be the overview of
thermal changes influence on overall dam’s safety. For the parameters from the project documentation for similar
correct thermal analysis, the accurate determination of objects. In the Table I the selected ranges of calibration
representative material parameters has to be provided [2]. parameters are listed. Based on an expert opinion, the
Using the input parameters obtained by the expert’s radiation coefficient should be observed as nonsensitive
opinion, the literature data or the experimental parameter, while the conduction coefficient, the specific
investigation, into the thermal numerical analysis, the heat and the convection coefficients have to be calibrated.
significant difference between the computed and the The conduction coefficient k of concrete is assumed as
measured data can be noticed. Usually, the common isotropic parameter. The Djerdap dam model is not
reason is the difference between the real material divided into the separate zones, so the conduction
characteristics of specific structure and the parameters coefficient should be observed as same for the whole
determined by the previously listed approaches. construction.
231
7th International Conference on Information Society and Technology ICIST 2017
232
7th International Conference on Information Society and Technology ICIST 2017
10
20
40
50
Figure 4. Heat transfer mechanism [8, 9]
T y dy c T
s c e y dy
Tdv Tsr D 0
0
D D (6)
D
1
cD Ts c e y
1 Ts c D
D
0
D
e 1
c
By this relation, the temperature of downstream water is Figure 5. Determination of boundary conditions
given as the function of depth D, which also has to be
calibrated as an input parameter. Due to the small level of The output of the sensitivity analysis is sensitivity of the
downstream water and the significant mixing, the water FEM model on the parameter change. If the model gives
temperature is assumed as constant value in depth. In case the similar results independently of parameter change in
of very low level of downstream water, the parameter D is some prescribed range, that parameter can be observed as
233
7th International Conference on Information Society and Technology ICIST 2017
V. CONCLUSION
For the whole process of the calibration, the mentioned
six parameters should be used for both: the separate
segments of dam and the whole dam models. Firstly, the
calibration analysis will be conducted for the specific
segments models with the best quality of measured data
Figure 6. Calibration Algorithm independently, but with the common material parameters.
The next step of calibration process is the iterative
procedure for chosen segments simultaneously. The
nonsensitive and the calibration procedure is not necessary sensitivity analysis provide possibility to reduce the
(the parameter can be fixed). number of parameters which need calibration. The
Recently, the intelligent methods based on Artificial calibration procedure includes application of intelligent
Neural Networks and Genetic algorithms are very popular. methods instead of FEM analysis to ensure the efficiency
The strong nonlinear correlation between the known and and accuracy. The obtained material parameters will be
the unknown parameters is mapped by these methods applied on whole dam as the verification step of
instead of FEM analysis, but still for the training of the calibration process.
neural network the few FEM simulations are necessary.
The methodology is prescribed in 3 steps: 1) construction ACKNOWLEDGMENT
and training of the neural network, 2) testing of neural
network, 3) calibration conducted by evolution algorithm The research presented in this paper is supported by the
and verified neural network [10]. Jaroslav Černi Institute for the Development of Water
Resources, Belgrade, Serbia, the WIMB Tempus project
A. The optimization problem
and the projects of Ministry of Science Republic of
The calibration consists of two analyses A and B. Serbia TR32036 and III41007.
Firstly, the temperature field is computed and compared
with the measured data in one lamella with the best REFERENCES
quality of measured data. The thermal analysis should be
[1] Hariri-Ardebili, M. A., Mirzabozorg, H., Ghaemian, M., & Amini,
done for various parameters in the given range. The R. (2011). Calibration of 3D FE model of DEZ high arch dam in
boundary conditions are prescribed by measured thermal and static conditions using instruments and site
temperatures on: observation. 6th International Conference on Dam Engineering
- the surfaces of dam which are in contact with water (стр. 121-135). Lisbon, Portugal
on upstream side, [2] Mirzabozorg, H., Hariri-Ardebili, M., Shirkhan, M., & Seyed-
Kolbadi, S. (2014). Matematical modeling and Numerical
- the surfaces which are in contact with air, Analysis of Thermal Distribution in Arch Dams considering Solar
- foundation and Radiation Effect. The Scientific World Journal, 597393, 15 pages.
- galleries. [3] M. Kojić, R. Slavković, M. Živković, N. Grujović, PAK-T:
Program for Heat Transfer Analysis, Faculty of Mechanical
The calibration is successful if the difference between Engineering, University of Kragujevac, Kragujevac, 1999
the measured and the computed temperatures is up to 7%. [4] M. Kojić, R. Slavković, M. Živković, N. Grujović, Metod
After this step, the verification of the obtained konačnih elemenata – Linearna analiza, Kragujevac, Mašinski
fakultet Kragujevac, Univezitet u Kragujevcu, 2010
parameters is done by thermal analysis of the whole dam
model with different boundary conditions. If the [5] Engineering Software Laboratory at the Faculty of Engineering,
University of Kragujevac
difference between the measured and the computed
[6] Bofang, Z. (1997). Prediction of water temperature in deep
temperature field is significant, the process of calibration reservoirs, Dam Engineering, vol. 8, no. 1, pp. 13–25
has to be repeated. [7] Yu, H., Li, S., Liu, Y. and Chen, C. (2011), Study on temperature
B. The step A – Real initial temperature field distribution due to freezing and thawing at the Fengman Concrete
The determination of initial temperature field [2] is Gravity Dam, Thermal Sci., 15 (1), S27-S32
necessary for the thermal analysis of the dam. For this [8] Jin, F., Chen, Z., Wang, J., & Yang, J. (2011). Practical procedure
for predicting non-uniform temperature on the exposed face of
analysis, at the beginning, the constantan temperatures at arch dams. Applied Thermal Engineering, 30(14-15), 2146-2156.
all nodes is prescribed. This is usually average yearly [9] Žvanut, P., Turk, G., & Kružanowski, A. (2015). Effects of
temperature on the location for the previous year. The changing surrounding conditions on the thermal analysis of the
next step is a nonstationary analysis for the chosen period Moste concrete dam. Journal of Performance of Constructed
(a year or few years). The third step is an adoption of the Facilities, 30(3), 1-9.
computed temperature field as the initial state for the next [10] Yu, Y., Zhang, B., & Yuan, H. (2007). An intelligent displacement
thermal analysis for the same period of time. This step is back-analysis method for earth–rockfill dams. Computers and
repeated until the temperature field varies. Geotechnics, 34(6), 423-434.
C. The step B – Nonstationary thermal analysis for
the desired period
234
7th International Conference on Information Society and Technology ICIST 2017
Paper introduces model for adaptive e-business continuity continuity [1], we have defined e-business continuity as
management (e-BCM) in insurance sector, which is the capability of an organization to continue delivery of e-
adjustable to changes in the business environment of the services, at acceptable predefined levels of reliability and
organization. The research is focused on improvements to availability, following a disruptive incident.
the establishment of effective business continuity E-business is a result of application of modern
management in insurance companies that use modern e- information and communication technologies (ICT):
business technologies: the Internet, mobile computing, Internet technology, mobile technology, cloud computing
electronic services, virtual infrastructure, etc. The proposed and the next generation networks, as well as the new
model is the result of both, theoretical and practical work in concepts developed on the basis of these technologies:
the last four years. It defines a general framework for
Internet of things, big data and ubiquitous computing [2].
establishment and continuous improvement of e-BCM in
Apart from numerous benefits [3], modern e-business
insurance companies and includes methods for defining the
technologies create new business risks, in particular to
key components of a business continuity management
business continuity [4][5][6] and information security,
system: business impact analysis, business continuity risk
assessment, and business continuity plan. The results may
taking into account the amount, speed of creation and
have broad practical application in insurance companies availability of information in modern business. The grater
around the world that uses modern e-business technologies. an organization’s dependence on modern e-business
technologies, the greater its vulnerability to incidents [7].
The usage of modern e-business technologies creates
I. MOTIVATION business and information systems that are complex in
themselves and are considered to be inherently risky[5].
The subject of this research work is the development of One of the objectives of business continuity under these
a model for adaptive e-business continuity management. circumstances becomes the ability to adapt to such
The research is focused on insurance companies that use emerging risks based on constant development of modern
modern e-business technologies (the Internet, mobile technologies, in order to protect the naturally risky
computing, electronic services, virtual infrastructure, etc.) systems [5].
for which reason it uses the concept of e-Business
Continuity Management (e-BCM). Since e-business Several different models and frameworks are described
technologies are rapidly changing and developing, the in literature that address various aspects of business
proposed model should provide adaptive e-business continuity management, but there is a lack of a specific
continuity management, taking into account changes in model that defines practical steps and procedures in
the environment in which the insurance company establishing business continuity in insurance companies
operates. The model is developed in accordance with the that use modern e-business technologies: the Internet,
ISO 22301 [1] standard, which is considered the main mobile computing, electronic services, virtual
international standard for business continuity management infrastructure, and the like.
and which defines the theoretical basis for establishing The following common issues have been identified in
business continuity in an arbitrary organization. existing business continuity models and frameworks: 1)
The motive of the research is to define a general Focus is placed solely on the definition and planning of
framework for the establishment and continuous the business continuity program [8] or exclusively on one
improvement of e-BCM in insurance companies. In aspect of business continuity management [9][10]; 2)
accordance with the ISO 22301 standard, the model will Specific steps are missing in the definition of business
include methods for defining the most important continuity management [10][11][12][13][14], and, more
components of a business continuity management system specifically, implementation of the business impact
(BCMS): business impact analysis, business continuity analysis and BCM risk assessment lacks practical steps
risk assessment and business continuity plan. [7][15]; 3) The approach is excessively specific for a
particular industry or line of business [14][16]; and 4) The
II. RESEARCH QUESTIONS approach is too complicated to be implemented in an
arbitrary organization [9][10]. In addition, there is no
The primary goal of the research is to explore framework specifically tailored to insurance companies.
possibilities of improving business continuity
In our work we have attempted to identify solutions to
management in insurance sector through the development
all these issues and develop a general framework that
of specific model that will facilitate the establishment of
addresses all aspects of BCM and that has broad practical
e-BCM and, thanks to the adaptive component, enable
successful e-BCM. Similarly to the definition of business
235
7th International Conference on Information Society and Technology ICIST 2017
application in insurance companies around the world that perspective. The model’s methods (see Fig. 2) define the
use modern e-business technologies. key components of BCMS: 1) Business impact analysis
(BIA) method; 2) e-BCM risk assessment method; 3)
III. METHODOLOGY Method for development of business continuity plan
The research methodology to define model for adaptive (BCP); and 4) Method for continual BCMS improvement.
e-BCM consists of the following general and specific Adaptive business continuity management is essential
scientific methods, grouped in five stages: 1) in response to emerging risks which organizations and
Systematized collection, review and analysis of BCM their information systems face due to rapid developing of
literature and existing BCM models and frameworks; 2) e-business technologies. The model has to be adapted and
Initial definition of the Model for adaptive e-BCM in applied every time there is a change in any of the model’s
accordance with changes in the environment in which the parameters, which enables an e-BCM that is equally
organization operates; 3) Implementation of the proposed flexible to both internal and external changes in the
model in one insurance company; 4) Evaluation of the business environment of the organization. The method for
model by means of questionnaires and interviews with continual BCMS improvement defines corrective
representatives of insurance company that implemented measures for improving the BCMS. Adaptability to
the model and 5) Final model adjustments as a result of a changes in the business environment of the organization,
positive case study. together with corrective measures for BCMS
This is still work in progress, and we are currently at improvement, allows adaptive e-BCM.
stage four – model evaluation stage. A. Model Parameters
IV. MODEL FOR ADAPTIVE E-BCM IN INSURANCE Model parameters (defined in Table 1) address the main
SECTOR
characteristics of e-business and its underlying
technologies: ubiquity, global reach, universal standards,
The model for adaptive e-BCM was initially richness, interactivity, information density,
theoretically defined based on ISO 22301 standard and personalization/customization, and usage of social
BCM literature review. In later research stages, model has networks [17]. Parameter values are specifically tailored
been developed and advanced from practical work. It has for insurance companies. For example, Disruption
been successfully implemented in Insurance Stock timescale (P6) is set in hours and days, as a consequence
Company “Milenijum Osiguranje” in Belgrade, Serbia of the nature, volume, and complexity of business
which has 200 employees. operations.
The model for adaptive e-BCM establishes, maintains
and continually improves the effectiveness of insurance B. Model Methods
company’s business continuity management system Each model method is precisely defined and contains a
(BCMS), which shields the insurance company from set of specific objectives and procedures. Model
potential incidents (see Fig. 1). components – methods and associated procedures – are
The model consists of twelve parameters and four shown in Fig. 2, together with their related model
methods. Model parameters (see Fig. 1) reflect the basic parameters and important flow dependencies.
characteristics of the business environment in which
insurance company operates, from the business continuity
236
7th International Conference on Information Society and Technology ICIST 2017
237
7th International Conference on Information Society and Technology ICIST 2017
Figure 2. Model components (model parameters in circles; model methods in dotted-line rectangles; procedures, as part of methods, in rounded-
corner rectangles) and important flow dependencies
238
7th International Conference on Information Society and Technology ICIST 2017
239
7th International Conference on Information Society and Technology ICIST 2017
happen to their business and should determine how Continuity Management, A. Hiles, Ed. John Wiley & Sons, Ltd,
quickly they can recover, as well as how quickly their 2007, pp. 205–235.
business can come to a point of no return. The main goal [7] J. Cook, “A Six-Stage Business Continuity and Disaster Recovery
Planning Cycle.,” SAM Adv. Manag. J., vol. 80, no. 3, pp. 23–68,
of BCM is to mitigate losses and prepare to respond 2015.
effectively and efficiently in case of a major disruptive [8] F. Gibb and S. Buchanan, “A framework for business continuity
incident. management,” Int. J. Inf. Manage., vol. 26, no. 2, pp. 128–141,
Our research is focused on improvements to 2006.
[9] S. A. Torabi, H. Rezaei Soufi, and N. Sahebjamnia, “A new
establishing effective BCM in insurance companies that
framework for business impact analysis in business continuity
use modern e-business technologies. We believe that management (with a case study),” Saf. Sci., vol. 68, pp. 309–323,
adaptive BCM is essential in response to emerging risks 2014.
that insurance companies and their information systems [10] K.-M. Bryson, H. Millar, A. Joseph, and A. Mobolurin, “Using
face due to the rapid development of e-business formal MS/OR modeling to support disaster recovery planning,”
technologies. Eur. J. Oper. Res., vol. 141, no. 3, pp. 679–688, 2002.
[11] B. Herbane, D. Elliott, and E. M. Swartz, “Business Continuity
The proposed model for adaptive e-BCM in insurance Management: Time for a strategic role?,” Long Range Plann., vol.
sector is the result of our theoretical and practical work in 37, no. 5, pp. 435–457, 2004.
the last four years, including implementation of the model [12] R. L. Tammineedi, “Business Continuity Management: A
in Insurance Stock Company “Milenijum Osiguranje”. Standards-Based Approach,” Inf. Secur. J. A Glob. Perspect., vol.
The model defines a general framework for establishment 19, no. 1, pp. 36–50, 2010.
and continuous improvement of e-BCM and includes [13] J. Lindström, S. Samuelsson, and A. Hägerfors, “Business
continuity planning methodology,” Disaster Prev. Manag., vol.
methods for defining the most important components of a 19, no. 2, pp. 243–255, 2010.
business continuity management system. Final [14] H. Baba, T. Watanabe, M. Nagaishi, and H. Matsumoto, “Area
adjustments will be made to the model as a result of Business Continuity Management, a New Opportunity for
ongoing evaluation. Building Economic Resilience,” Procedia Econ. Financ., vol. 18,
We believe that the proposed model can be applied in no. September, pp. 296–303, 2014.
[15] M. Savage, “Business continuity planning,” Disaster Prev.
any insurance company that uses modern e-business Manag., vol. 19, no. 5, pp. 254–261, 2002.
technologies. We will continue this research with [16] M. F. Blos, S. L. Hoeflich, and P. E. Miyagi, “A general supply
additional model implementations focusing on insurance chain continuity management framework,” Procedia Comput. Sci.,
and financial sectors. vol. 55, no. Itqm, pp. 1160–1164, 2015.
[17] K. C. Laudon and C. Guercio Traver, E-commerce: business,
ACKNOWLEDGMENT technology, society. 2012.
[18] J. Watters, Disaster Recovery, Crisis Response, and Business
We thank our colleagues from “Milenijum Osiguranje” Continuity A Management Desk Reference. Apress, 2013.
who participated in the project and provided insight and [19] B. Zawada and G. Marbais, “Implementing ISO 22301,” Avalution
expertise that greatly assisted the model implementation Consulting. 2015.
[20] J. D. Nosworthy, “A Practical Risk Analysis Approach: Managing
and our research.
BCM Risk,” Comput. Secur., vol. 19, no. 7, pp. 596–614, 2000.
[21] T. Aven, “Risk assessment and risk management: Review of
REFERENCES recent advances on their foundation,” Eur. J. Oper. Res., vol. 253,
[1] ISO 22301:2012 Societal security - Business continuity no. 1, pp. 1–13, 2016.
management systems - Requirements. International Organization [22] S.-C. Cha, P.-W. Juo, L.-T. Liu, and W.-N. Chen, “Duplicate
for Standardization, 2012. Work Reduction in Business Continuity and Risk Management
[2] B. Radenković, M. Despotović Zrakić, Z. Bogdanović, D. Barać, Processes,” Res. Conf. Pap., vol. 9, no. Apr, pp. 25–40, 2010.
and A. Labus, Elektronsko poslovanje (eng. E-business). Belgrade: [23] “Business Continuity is Vital to Success,” GCInfotech, 2014.
Faculty of Organizational Sciences, 2015. [Online]. Available: http://gcinfotech.com/business-continuity-is-
[3] D. Chaffey, E-Business and E-Commerce Management - Strategy, vital-to-its-success/.
Implementation and Practice. 2009. [24] D. Faertes, “Reliability of supply chains and business continuity
[4] F. Arduini and V. Morabito, “Business continuity and the banking management,” Procedia Comput. Sci., vol. 55, no. Itqm, pp. 1400–
industry,” Commun. ACM, vol. 53, no. 3, pp. 121–125, 2010. 1409, 2015.
[5] J. Bergström, R. van Winsen, and E. Henriqson, “On the rationale [25] S. M. Hawkins, D. C. Yen, and D. C. Chou, “Disaster recovery
of resilience in the domain of safety: A literature review,” Reliab. planning: a strategy for data security,” Inf. Manag. Comput.
Eng. Syst. Saf., vol. 141, pp. 131–141, 2015. Secur., vol. 8, no. 5, pp. 222–230, 2000.
[6] M. Smith and P.-A. Shields, “Strategies for IT and
communications,” in The Defi nitive Handbook of Business
240
7th International Conference on Information Society and Technology ICIST 2017
Abstract— Past decade revealed testbed based evaluation able to discover, reserve, acquire and monitor selected
approaches in networking research as highly important and resources over testbeds in a federation after
primarily due to the emphasis on practicality. At the authentication. Interoperability with other technologies is
beginning, testbeds were built for a specific project or provided by FITeagle that implements the standard SFA
studying a specific technology. In the context of New
interface. After allocated resources are successfully
Internet, networking and computational testbeds appear of
crucial importance for conducting testing of computational provisioned, FITeagle returns a manifest containing
tools and new technologies, describing experiments, new names and other information about resources. Using these
platforms and environments. This paper addresses names, experimenter can build OEDL script and/or shell
networking testbed enhancements in terms of orchestration, script, which is then used to execute the experiment. Even
control and virtualization capabilities. It reports about real though the script languages are designed with easy-to-use
world evaluation of federated testbed infrastructure in in mind, it is a domain specific language with a steep
terms of development of ontology-based automatic code learning curve. Our research question is: “How to help
generator for experiments. The main idea is to enable experimenters to more easily develop experiment code for
experimenters to define semantic description of the
federated testbed premises?”
experiment through a high-level specification and generate
software code that is directly deployable on the testbed
With a goal to scale the experimentation process, both
federation. As a prerequisite for semantic driven automatic in time and in number of experimenters, we propose the
code generation, the ontology framework that gives use of semantic based automatic code generation. To the
semantic description of particular experiment domain is best of our knowledge there is no previous work focusing
developed and its design is discussed in the paper. on semantic-based automatic experiment code generation.
Writing domain specific code is a knowledge intensive
Keywords: ontology, semantics, automatic code generation, process that requires programming as well as domain
5G, networking testbed, federated testbeds knowledge. The required knowledge for automatic code
generation is heterogeneous, including understanding of
the radio-related features, knowledge about software and
I. INTRODUCTION hardware modules available in the used testbeds, and
A consequence of the increasing complexity of practical knowledge of the domain specific language. We
federated testbeds for 5G applications is the demand for envisage that the process starts with the semantic
reducing complexity and time of designing and running description of the experiment components using ontology
experiments. On the other side, emerging networking driven user interface. Based on the inputs and the
technologies, such as software defined networks (SDN) semantic based ontologies framework, the experiment
and network function virtualization (NFV), together with source code could be generated.
the diversity of radio access technologies, such as LTE, In this paper, ontologies are proposed to represent the
Bluetooth, and Wi-Fi, and the growing trends requiring knowledge in a formal manner and access it in the process
their simultaneous use, significantly increase learning of code generation as needed. Framework that is proposed
curve for wireless networking experiments. A promising is focused on experiments to be conducted on the testbed
approach to the problem is automatic generation of target- and considers network-sensing domain. It should be noted
specific code directly from a high-level experiment that the approach is general in a sense that it could be used
description [1,2,3]. for different domains and experiments conducting over
Different approaches to testbed federation management any complex heterogeneous system or federation of such
systems.
are proposed by researchers including Future
Internet Research and Experimentation (FIRE) initiative Section II of this paper gives background on
[4], Slice Based Federation Architecture (SFA), FITeagle, networking testbeds and related work in ontologies usage
within testbeds. Section III gives overview of automatic
Open Baton, jFed Experimenter Client, OMF 6 – code generation process and set the position of
Federated Resource Control Protocol, and OML [4,5,6,7]. OFSecGENE in such process. The main part of this paper
Resources from different testbeds in the federation, is Section IV where we introduce and discuss the
needed for an experiment, can be accessed by the OFSecGENE. The last section is Section V, which
experimenter through a single access-point named concludes paper.
FITeagle framework in the FIRE environment. User is
241
7th International Conference on Information Society and Technology ICIST 2017
242
7th International Conference on Information Society and Technology ICIST 2017
topology. Experimentator imports semantic experimenters, and testbeds federation services. The most
description of the experiment topology into the relevant are OMN Monitoring Data Ontology and OMN
existing domain knowledge residing in the Monitoring Tool Ontology.
Experiment Flow Editor (EFE).
Experiment flow creation. EFE provides EXPERIMENTATOR TESTBED
experimentator with the intuitive graphical user
interface which navigates him through the process of Experiment OEDL code
creating experiment flow. EFE interact with semantic Topology
knowledge of the networking domain. We envisage Semantic Shell Script code
that EFE should contains experiment components Description Experiment Code
that user can use to construct experiment by
performing simple drag/drop and connect actions on Experiment Framework
these components. Each component should describe
experiment flow task on the high level of Experiment Code
conceptualization, such as: specific metrics (for Flow Generator
example throughput), transfer (selection of one of the Editor
existing topology nodes and one interface that exists
on the selected node), start/end of experiment etc.
Automatic code generation. For each flow OMN
component automatic code generator should NWC/iPerf Experiment execution
automatically generates code for the targeted testbed
SOEDL CLOnt Execution Flow
execution environment. Automatic code generator
should generate commands that follow experiment OFSecGENE
flow. The code generator generates OEDL
experiment code and Shell Script code. In the case of Figure 1 Processes of experimenting using automatic
shell script execution environment (such are VNF code generator
capable testbeds) output should be provided in the
form of the shell scripts files for each of the network
nodes. Scripts can be included in the corresponding
VNFD packages, in which case they will be executed CoordSS ontologies. The CoordSS ontologies
when VNFD is deployed. For OMF capable testbeds framework is evaluated and used for experimenting over
file with OEDL code should be generated. testbeds federation and coordination in heterogeneous
communication networks [16]. The CoordSS framework
Experiment execution. Experiment should be
represents a base for coordination protocols based on
executed by running generated script file.
semantic technologies and ontologies. The CoordSS
framework comprises of ontologies: command line
Described steps are shown on Figure 1. Also, Figure 1 ontology (CLOnt), wireless ontology (WDSO), spectrum
positioning OFSecGENE. sensing capabilities ontology (SSCO), Meta System
Architecture ontology (MSAO), Resource Description
IV. ONTOLOGIES FRAMEWORK FOR SEMANTIC DRIVEN
Framework ontology (RDFO). They specify semantic
CODE GENERATION
descriptions of radio spectrum, coordination, frequency
The OFSecGENE includes a set of the domain and selection, dynamic spectrum access, command line,
system ontologies for semantic descriptions of the wireless, and spectrum sensing capabilities of supporting
network sensing and experiment execution flow. software.
For automatic code generator CLOnt, MSAO and
OMN. We envisage that everything in experimenting RDFO are needed. RDFO represents a very general
environment is considered as a accessible resource. We ontology, and can be used to describe any concept. MSAO
adopted the OMN Resource ontology with a goal of represents ontology for describing in computational
knowledge reuse and interoperability. The OMN provides systems. MSAO gives descriptions of entity, element and
the concepts and methods to describe resources present in attribute concepts. RDFO is the most generic ontology in
Future Internet platforms [18][25]. the framework, and describes value and property concepts.
The OMN defines a formal information model for
federated infrastructure management. Corresponding data Network Sensing Ontology. The fundamental concepts
models are developed in order to enable the for network sensing domain are specified in the Network
communication among the various components in Sensing Ontology (NS). Network sensing measurements
software architecture. This ontology also holds of the bandwidth in IP networks are practically considered
information on how resources are connected together in a in the context of one possible implementation represented
federated infrastructure. Further, we identified OMN by the software tool iPerf [26]. IPerf has command line
Monitoring ontologies as useful for the scenario of arguments, receives data from nodes and writes results to
automatic code generation in wireless networking domain the database. The NS ontology semantically describes
area. Those set of ontologies covers various monitoring iPerf software module that implements throughput
services provided for federation administrators,
243
7th International Conference on Information Society and Technology ICIST 2017
measuring. Perf reports the bandwidth in test process. NS create experiment code that can be executed on any OMF
uses CLOnt for semantic description of concepts used in capable testbed. Example is given on figure 2 and figure
command line, because iPerf receives arguments via 3. Figure 2 gives an example experiment of network with
command line. two WiFis as seen in SOEDL ontology, while figure 3
gives corresponding OEDL code.
Client ontology. In the experiment environment client
application is responsible for communication among V. CONCLUSION
server and testbed nodes. Experiment client represents a In this paper, we propose an ontology system that deals
console application with following arguments: URI of the with heterogeneity of testbeds and technologies they cover
agent, URI of the channel, URL of the server, subscribe, and automatic experimental code generation. These two
publish, register and read messages, corresponding issues are important for experimentation with networking
operations. Client application is based on experiment systems. The main impact of this paper presented
Client Ontology (CO), which contains statements ontologies framework is to facilitate experimentation
describing client application with its parameters and based on flexibility of semantic automatic code
command line arguments. Client parameters are: generation. The presented framework presents the
a) server represents the server that is responsible for prerequisite for automatic code generation of testbed
sending and receiving client messages. experiments. The OMF framework with OEDL language
and different shell scripts could be used as the target
b) agent is the user of the client. It can be human user or
platform for practical implementation of the automatic
another software component.
code generator.
c) channel represents communication channel on which
agents sends and receives messages.
d) message contains data that are send from one agent to
another via given channel.
244
7th International Conference on Information Society and Technology ICIST 2017
245
7th International Conference on Information Society and Technology ICIST 2017
Abstract—In this paper the influence of discontinuous activity. Signal switching off or on is based on the
transmission (DTX) on the mean emission power of base function of voice activity detection (VAD). When VAD
station in GSM system is analyzed. Calculated results are finds that there is no voice activity, signal is not sent over
obtained by implementing Erlang formulas for Markov traffic channels. In this way channel bandwidth is
modelled systems. These results are compared to the results
reduced (channel capacity is saved and signal from more
obtained by simulation. Specific characteristics of signal
transmission in GSM systems must be considered when traffic channels can be transmitted using available
influence of DTX on base station power is estimated. It is capacity). It seems, at a glance, that 55-60% of data
proved that considering only DTX without considering bandwidth can be saved using DTX, and that, as the
characteristics of signal transmission on different frequency consequence, the same percent of base station emission
carriers leads to underestimated values of base station power can be saved.
power. GSM systems are based on the implementation of time
division multiplex (TDMA) and frequency division
I. INTRODUCTION multiplex (FDMA). There are 8 TDMA channels on each
frequency of FDMA (frequency carrier). The first
Power saving is important demand in life, science and frequency carrier is specific: it consists of signalling
technology today. This goal is also obvious in modern channels and traffic channels. The minimum number of
mobile telecommunications. There are many ways how
emission power of base stations is saved in mobile signalling channels is 2 (when the number of frequency
communications, i.e. many factors that influence base carriers is 1 or 2). In this case, there are 6 traffic channels
station power. Some of elements in this case are on the first carrier. When the number of frequency
considering traffic load in base stations of mobile carriers increases, the number of signalling channels also
communication systems (power is not transmitted in idle increases (usually, there are 3 signalling channels when
traffic channels), structure of the traffic in the cell there are 3 or 4 frequency carriers, as it is presented in
(relation of external traffic to the intra-cell traffic), [8]). In our analysis, we supposed, without significant
adjusting emission power (power control) according to the influence on the accuracy of the results, that there are 2
distance between base station and mobile stations, users’ signalling channels also when there are 3 or 4 frequency
distribution in the area of base station cell, implementation carriers. It means that we analyze systems with
of lower rate codec (for example, half rate codec), and so
N 6 ( fc 1) 8, fc 1, 2,3, 4 (1)
on, [1]-[5]. Our aim was to analyze influence of
discontinuous transmission function (DTX) on the base channels, where fc is the number of frequency carriers.
station output power. There is no power control on 8 channels of the first
carrier – the emission power has its maximum value on
II. DTX FUNCTION IN GSM SYSTEMS these 8 channels. It means that the power on these 8
During one conversation, there are periods when users channels of the first carrier does not depend on the
are active (they are speaking) and when they are inactive distance between base station and mobile station and that
(not speaking). It is usually supposed in literature that signal is always transmitted, independent of the VAD
period of voice activity is pDTX=40-45% of the function. Traffic channels on the first carrier are seized
conversation duration. The value pDTX=43% is obtained as first: channels on the following carriers (if available) are
the solution of the Brady model, which is presented in seized when all traffic channels on the first carrier are
[6], [7]. More precisely, we calculated the state busy.
probabilities for the model in Fig. 2 in [7], using matrix When we consider the presented characteristics of GSM
calculation. We obtained 4.8% the doubletalk probability, systems, it is intuitively obvious that contribution of DTX
18.6% the mute probability (both users are silent) and to the base station power saving depends strongly on the
pDTX=43.1% the probability that user A (user B) is offered traffic and on the number of available frequency
speaking, alone or in a doubletalk phase. carriers. It is also obvious that the percent of power saving
The aim of DTX function in mobile systems is to stop when DTX is implemented compared to the situation
mobile signal sending in case when there is no voice when there is no DTX function is always less than
246
7th International Conference on Information Society and Technology ICIST 2017
expected 55-60%. The aim in this paper was to determine base station emission power and probability of voice
the level of emission power saving in GSM systems as the activity in the connection, similar as it is presented in [1],
function of the offered traffic. [3], [5], [12], [13] for the analysis of different problems
in mobile telecommunication systems. When simulation
III. METHODOLOGY OF ANALYSIS: CALCULATION AND is performed, the state probabilities are determined on the
SIMULATION base of traffic process simulation, and, at the end of
The analysis in this paper is based on the Markov simulation, base station power is calculated according to
models, which were developed for switching systems at (3) or (4).
first. Erlang formulas, as the solution for such systems,
give the system important characteristics. Among these IV. POWER OF THE BASE STATION WITH DTX
characteristics, the probabilities of the number of busy FUNCTION
channels (state probabilities) are the most important for In our analysis, we supposed that base station power in
us. Starting from state probabilities, base station emission traffic channels is maximal, when signal is transmitted. It
power can be very closely determined. means that we haven’t considered base station power
The probabilities of the number of busy channels can control as the function of distance between the base
be expressed by the formula from some of the numerous station and the mobile station. Making such an
references about queueing systems, i.e., more precisely, approximation, we included DTX function as the single
about Markov models and Erlang formulas, [9], [10]: influencing component on the base station power.
Ai Further, we suppose that the number of frequency
carriers is not great (maximum 4). About 70% base
p i N i ! l , i 0,1,..., N (2)
stations have only 2 frequency carriers and 99% of them
A
l 0 l !
less than 7 frequency carriers, [14]. It is one more reason
that we can adopt there are always 2 signalling channels
As it is already explained, the emission power for all 6
on the first carrier.
traffic channels on the first carrier, whether they are busy
or not, has its maximum value (Pchmax). On the other Fig. 1-4 present mean value of base station emission
power (P) as the function of the offered traffic (A),
carriers there is no emission power when channels are
separately for the GSM system with one traffic carrier (6
idle and when there is no voice activity in the busy
traffic channels) in Fig. 1 to four frequency carriers (30
channel state. According to these considerations, base
traffic channels) in Fig. 4. The values of P represent
station emission power for traffic channels can be
expressed as: multiple of the maximum power needed for one traffic
channel without DTX function. Two parallel
P 6 Pch max (3)
characteristics are presented in each figure, the first one
when there is only one frequency carrier, and as: obtained by numerical calculation (curve “BS w/ DTX
6 ( fc 1)8
calculation”) and the second one obtained by simulation
P Pch max 6 pDTX p i (4)
i 7 (curve “BS w/ DTX simulation”). These characteristics
when there are fc=2, 3, 4, ... frequency carriers. are situated between two other characteristics, which are
More accurate value of emission power is obtained by also presented. The upper one is determined for GSM
simulation. The reason is the fact that Erlang formulas system without DTX on active channels when there is
provide only the number of busy channels; it is not emission power on all channels of the first carrier,
possible to distinguish with these formulas precisely regardless of their activity or inactivity (curve “BS w/o
which channels are busy and which are idle. Because of DTX”). The lower one is determined for the system with
that, when we consider by calculation the situation when DTX implemented on all active channels, including also
the number of busy channels is greater than 6, we channels on the first carrier (curve “DTX”). The system
suppose that 6 channels on the first carrier are always modelled by this lower characteristic is usually supposed
busy. But, in real traffic process, there are time intervals in the literature, when power saving caused by DTX is
after the release of a channel on the first carrier when considered. Unfortunately, such analysis underestimates
there are less than 6 busy traffic channels on this first real mean and instantaneous base station power in traffic
carrier. As we emphasized, the power needed for traffic channels of GSM systems. The level of underestimation
channels on the first carrier remains unchanged when is as greater as offered traffic is smaller (for the situation
some channel is released. When simulation is when offered traffic is very small, for example during
implemented for the analysis, it is possible to model night, traffic analysis may be not necessary, because base
precisely which channel is busy and which is idle. That’s station can go to the sleep mode, and the users from its
why we can have (on average) more busy traffic channels area can be served by the neighbouring cells). The
on frequency carriers with power control (second, third, exception of the described characteristics’ shape and
fourth, etc. carrier) when simulation trial is performed, position is the characteristic for one frequency carrier
than when the results are calculated. (Fig. 1), where three characteristics coincide: the
Simulation is based on roulette or Monte Carlo characteristic for the system without DTX, simulation
method, adjusted for telephone traffic calculations, [11]. characteristic and calculation characteristic for the real
This simulation program is upgraded to include value of system with DTX.
247
7th International Conference on Information Society and Technology ICIST 2017
5
P (• max Pch.),,..
0
0 5 10 15 20 25 30
A (E)
BS w/o DTX BS w/ DTX simulation BS w/ DTX calculation DTX
Figure 1. Base station emission power as the function of offered traffic when there is one frequency carrier (6 channels): the results of
simulation (“BS w/ DTX simulation”) and calculation (“BS w/ DTX calculation”) compared to the system without DTX (“BS w/o DTX”) and to the
system with DTX on all carriers (“DTX”)
15
14
13
12
11
P (• max Pch.) .
10
9
8
7
6
5
4
3
2
1
0
0 5 10 15 20 25 30
A (E)
BS w/o DTX BS w/ DTX simulation BS w/ DTX calculation DTX
Figure 2. Base station emission power as the function of offered traffic when there are two frequency carriers (14 channels): the results of
simulation (“BS w/ DTX simulation”) and calculation (“BS w/ DTX calculation”) compared to the system without DTX (“BS w/o DTX”) and to the
system with DTX on all carriers (“DTX”)
22
20
18
P (• max Pch.)…
16
14
12
10
8
6
4
2
0
0 5 10 15 20 25 30
A (E)
BS w/o DTX BS w/ DTX simulation BS w/ DTX calculation DTX
Figure 3. Base station emission power as the function of offered traffic when there are three frequency carriers (22 channels): the results of
simulation (“BS w/ DTX simulation”) and calculation (“BS w/ DTX calculation”) compared to the system without DTX (“BS w/o DTX”) and to the
system with DTX on all carriers (“DTX”)
248
7th International Conference on Information Society and Technology ICIST 2017
28
26
24
22
P (• max Pch.)… 20
18
16
14
12
10
8
6
4
2
0
0 5 10 15 20 25 30
A (E)
BS w/o DTX BS w/ DTX simulation BS w/ DTX calculation DTX
Figure 4. Base station emission power as the function of offered traffic when there are four frequency carriers (30 channels): the results of
simulation (“BS w/ DTX simulation”) and calculation (“BS w/ DTX calculation”) compared to the system without DTX (“BS w/o DTX”) and to the
system with DTX on all carriers (“DTX”)
50
45
40
35
∆P/P (%)..
30
25
20
15
10
5
0
0 5 10 15 20 25 30
A (E)
1 carrier (6 channels) 2 carriers (14 channels) 3 carriers (22 channels) 4 carriers (30 channels)
Figure 5. Relative decrease of base station power as the function of offered traffic for different number of frequency carriers
The calculated values of base station emission power the number of carriers is greater. As it is expected, there
in Fig. 1-4 are obtained using formulas (3) and (4), and, is no power decrease if there is only one frequency
when calculation of state probabilities is considered, carrier.
probabilities of the number of busy channels are obtained
using formula (2). V. CONCLUSION
It can be noticed, comparing the curves “BS w/ DTX In this paper it is presented how DTX function
calculation” and “BS w/ DTX simulation” that power contributes to decrease of base station emission power in
values for the first of these two curves are never greater GSM systems. The power is not sent when traffic
than the values for the second curve. The reason is, as channels are in idle state and when there is no voice
already explained in Section III, the fact that Erlang activity in busy traffic channels. But, traffic channels on
formulas does not provide the data which channels are the first frequency carrier are exception: signal is always
busy. If simulation is realized such that only the number sent in these channels regardless of the channel state
of busy channels is considered, the results of simulation (whether a channel is busy or not and whether there is
and calculation nearly match one to the other, but, voice activity or not in a busy channel). That’s why the
unfortunately, these results are a little underestimated, effect of DTX function is smaller than it would be
comparing to the real situation. expected only on the base of its implementation in GSM
Fig. 5 presents relative decrease of base station power systems.
as the function of the offered traffic when DTX function Relative contribution of the DTX function to the base
is implemented, comparing to the power in the systems station emission power saving is as greater as the offered
without DTX. The curves for 1, 2, 3 and 4 frequency traffic is greater and the system is with more frequency
carriers are presented. Relative decrease is as greater as carriers. When the offered traffic is small, or the system
249
7th International Conference on Information Society and Technology ICIST 2017
consists of only one frequency carrier, DTX does not [5] A. Lebl, D. Mitić, M. Popović, Ž. Markov, M. Mileusnić, V.
Matić, „Influence of mobile users’ density distribution on the
contribute to base station power decrease. CDMA base station power”, Journal of Electrical Engineering,
The results of this paper are important, when base vol. 67, no. 6, December 2016, pp. 390-398.
stations are projected. These results are also important to [6] P. T. Brady, „A Model for Generating On-Off Speech Patterns in
perceive possibilities to reduce base station emission Two-Way Conversation”, Bell System Technical Journal., 1969,
power, thus making contribution to power saving and, as pp. 2445-2472.
the consequence, to environment protection. [7] H-H. Choi, J-R. Lee, D-H. Cho, „Hybrid Power Saving
Mechanism for VoIP Services with Silence Suppression in IEEE
802.16e Systems”, IEEE Communications Letters, vol. 11, no. 5,
ACKNOWLEDGMENT 2007, pp. 455-457.
The research presented in this paper is carried out [8] E. M. M. Winands, J. Wieland, B. Sanders, „Dynamic Half-rate
within the project TR32051, financed by the Ministry of Connections in GSM”, AEÜ - International Journal of Electronics
Education, Science and Technological Development of and Communications, vol. 60, no. 7, July 2006, 504-512.
Republic of Serbia. [9] L. Kleinrock, Queueing Systems, A Wiley-Interscience Publication
John Wiley & Sons, 1975.
[10] V. B. Iversen, Teletraffic Engineering Handbook, ITU-D SG 2/16
REFERENCES & ITC, Draft 2001-06-20, Technical University of Denmark, June
[1] M. Mileusnić, P. Jovanović, M. Popović, A. Lebl, D. Mitić, Ž. 2001.
Markov, „Influence of Intra-cell Traffic on the Output Power of [11] M. C. Jeruchim, P. Balaban, K. S. Shanmugan, Simulation of
base Station in GSM“, Radioengineering, vol. 23, no. 2, June Communication Systems: Modeling, Methodology and Techniques,
2014, pp. 601-608. 2nd ed. Kluwer, Academic Publishers, 2002.
[2] P. Jovanović, M. Mileusnić, A. Lebl, D. Mitić, Ž. Markov, [12] М. Mileusnić, T. Šuh, A. Lebl, D. Mitić, Ž. Markov, „Use of
„Calculation of the Mean Output Power of Base Transceiver Computer Simulation in Estimation of GSM Base Station Output
Station in GSM“, Automatika, vol. 55, no. 2, June 2014, pp. 182- Power“, Acta Polytechnica Hungarica, vol. 11, no. 6, August
187. 2014, pp. 129-142.
[3] М. Mileusnić, M. Popović, A. Lebl, D. Mitić, Ž. Markov, [13] D. Mitić, A. Lebl, M. Mileusnić, B. Trenkić, Ž. Markov: „Traffic
„Infuence of Users’ Density on the Mean Base Station Output Simulation of GSM Cells with Half-Rate Connection Realization
Power“, Elektronika ir Elektrotechnika, vol. 20, no. 9, November Possibility“, Journal of Electrical Engineering, vol. 67, no. 2,
2014, pp. 74-79. April 2016, pp. 95-102.
[4] D. Mitić, A. Lebl, Ž. Markov, „Influence of Traffic Model on the [14] D. Colombi, B. Thors, T. Persson, N. Wirén, L-E. Larsson, M.
Calculation of BTS Output Power in GSM Network“, AEÜ - Jonsson, C. Törmevik, „Downlink power distributions for 2G and
International Journal of Electronics and Communications, vol. 69, 3G mobile communication networks“, Radiation Protection
no. 5, May 2015, pp. 836-840. Dosimetry, vol.157, no. 4, 2013, pp. 477-487.
250
7th International Conference on Information Society and Technology ICIST 2017
Abstract—This paper discusses the emergence of Next discussion of the foundational functional blocks of such
Generation technologies and frameworks in Cyber integrated systems.
Physical Systems (CPS) and Internet-of-Things (IoT)
that hold the potential to revolutionize collaborative The term ‘Internet of Things’ (IoT) broadly refers to a
manufacturing, engineering and educational activities. collaborative network linking physical components or
It also outlines the foundational building blocks devices within sensors, software and other equipment
for the design and implementation of such IoT based with each other and with other software or physical
cyber physical systems for manufacturing contexts. components [2-5]. While such data and information
The components of an IoT based CPS framework for exchange based practices are not new, the term ‘IoT’ has
PCB assembly is also discussed. The role of Virtual captured the attention of customers, engineers and
Reality based Digital Mockup models and Genetic managers globally with its predominant emphasis on
Algorithms for planning and simulation activities is seamless data exchange with very accessible interfaces.
also highlighted. Although there has been only a limited set of
implementations, the potential of IoT based practices to
provide data and information through ‘apps’ on cell
I. INTRODUCTION phones and tablets has dramatically created a
With the growing interest in IoT and CPS Technologies phenomenally high exposure to this term ‘IoT’ while
for engineering and manufacturing applications, there is a simultaneously encouraging high expectations of its
need to propose and discuss frameworks and building impact in both technical and non-technical environments
blocks to support such Next Generation collaborative [5]. As an example, IoT based practices can provide an
practices based on these technologies and principles. up to date status of assembly operations in a factory in
Collaboration between geographically distributed Texas to a manufacturing manager in Australia through
organizations using cyber resources and networking an ‘app’ on her or his phone. While the ‘things’ or
technologies is not new. The need for engineering partner sensors collect and share any data from a cyber or
organizations to share information and data to accomplish physical resource, the importance of the following
target tasks has been well recognized to enable such relationships between data, information and knowledge
partnerships to become agile [4, 5, 7]. Agility can be needs to be recognized: data from sensors can be viewed
described as that ability to adapt and meet customer as ‘raw’ attributes; such data can be refined and
requirements even when these requirements or demands processed into information; information can eventually
change. With today’s emphasis on global collaboration, become the basis for generating knowledge for decision
there is a need for collaborating enterprises to be agile making.
and adapt to changing customer demands and
requirements. While most conceptual discussions revolve around
providing data or information, the long term value is in
The notion of temporary partnerships involving using such data and information to develop a knowledge
organizations with diverse capabilities and resources can based understanding which can be used for real time
be described as a Virtual Enterprise (VE). Such VE based decision making along with predictive analytics based on
collaborations can benefit from the emergence of IoT and collection of large data from a network of interconnected
CPS technologies. In this context, the role of IoT and ‘things’ in such IoT based frameworks. In general, such
CPS based technologies and principles needs to be IoT based collaborations cannot function by themselves
studied. IoT based concepts and technologies can and will need to be integrated with care to support
facilitate the implementation of CPS oriented practices in engineering collaborations; the components that need to
VE oriented contexts. A brief overview of these inter- be linked by IoT technologies can be viewed as CPS
related concepts and technologies is provided prior to a
251
7th International Conference on Information Society and Technology ICIST 2017
components that seek to integrate cyber and physical The assumption is that these C-P resources are linked
resources at various engineering sites; such integrated IoT through some networking mechanism which allows them
and CPS based collaborations need to ensure sharing of to communicate and collaborate.
data, information and knowledge to ensure customer
requirements are met while at the same time being The domain of Printed Circuit Board (PCB) assembly
prepared for changing market needs. will be used to illustrate the components coming into play
to support such IoT based interactions. The life cycle
A brief overview of Cyber-Physical Systems (CPS) is activities for our discussion can be conceptualized (and
necessary. CPS can be viewed as being composed of limited for purposes of discussion scope) into design/
cyber and physical system components collaborating to design analysis, assembly planning, simulation and
provide a service or product [1, 7 -10]. ‘Cyber’ can refer assembly. Currently, there has been very little emphasis
to a software entity embedded in a server, thin client, or on developing a ‘process’ to design, build and integrate
smart device. The term ‘Cyber Physical System’ is rooted IoT based C-P components. It is important to recognize
in the field of ‘embedded systems’ and can be viewed as and identify foundational building blocks for such IoT
a more technology appropriate term that emphasizes the based collaborations, which has to address multiple
emergence of cyber technologies which holds the elements for a successful implementation. These elements
potential to integrate distributed cyber physical activities include the following:
in various fields including manufacturing, education [7,
12], healthcare, and transportation. 1. Developing a process to design the collaborative
interactions: this is needed as it’s the basis for depicting
We propose the term ‘IoT based Cyber-Physical the collaborations functionally while identifying where
Systems’ to denote a system composed of collaborating the data and information exchanges need to occur
software and physical entities that share data, information 2. Identifying the generic functional components needed
and knowledge to achieve a function (which can be in a framework for such a collaboration
technical, service or social in nature). In a process 3. Implementing such a framework taking into
engineering context, such cyber-physical systems can be consideration interoperability issues, network capabilities
viewed as an emerging trend where software tools play a and integration needs
dominant role in interfacing, controlling or interacting
with humans or physical devices in a variety of activities; III. CREATION OF INFORMATION INTENSIVE PROCESS
these activities can include monitoring, manufacturing, MODELS
education and tele-medicine. In today’s network-oriented
technology context, such software tools and resources can While IoT literature has focused on the benefits as well as
interact with physical components through local area on conceptual elements needed to support IoT based
networks or through the ubiquitous World Wide Web (or collaborations, there has been a lack of emphasis on how
the Internet). With the advent of the Next Generation to design such IoT based approaches and frameworks in
Internet(s), the potential of adopting cyber-physical various domains; currently, there is a need to develop
technologies and frameworks for use in a range of methods to ‘design’ such collaborative approaches
educational and engineering domains has increased involving multiple engineering partners. We propose the
phenomenally. use of process models that are information-intensive and
scalable in terms of modeling different levels of
II. DESIGNING THE FOUNDATIONAL BUILDING collaborative abstractions. This design methodology
BLOCKS OF IOT BASED COLLABORATIONS provides a structured foundation to propose, modify, and
finalize a given set of CPS interactions. As these
In the top down perspective, the design of such IoT based interactions form the core essence of the cyber-physical
collaborations can begin with modeling the functional collaborations that will employ both cyber (software) and
relationships for a given set of engineering life cycle physical resources, it can be viewed as a blueprint for
activities among distributed partners using available analyzing the data and information exchange among the
cyber physical resources. Consider the following context: various CPS components, which can be distributed and
a team of engineering organizations want to share CPS integrated using advanced cyber-networking. In two
resources to assemble a product. This group of partners current projects that involve the creation of advanced
are geographically distributed and want to accomplish a cyber-physical systems in the areas of collaborative
manufacturing and distributed surgical training, we have
target set of manufacturing activities. These partners have explored the role of an information-intensive process
a mix of cyber physical (C-P) resources including modeling approach and modeling language [Lu 15, Cecil
manufacturing equipment at different remote sites. At the 16]. This modeling approach uses an evolving modeling
outset, there is a need to ‘design’ the collaborative language (eEML, or engineering Enterprise Modeling
interactions that need to be supported; the IoT Language) as well as a functional modeling methodology
components within the CPS framework can provide the (based on the IDEF-0 language) that enables the capture
data and information exchange. of the functional relationships for a given CPS context
using the various cyber and physical components [Cecil
11, 13]. The entire CPS lifecycle can be modeled
hierarchically as a set of functional units (responsible for a
given cyber or physical function) along with the
252
7th International Conference on Information Society and Technology ICIST 2017
associated information attributes (influencing criteria, life cycle is being carried out (this can be design,
performing agents, and decision outputs) for these units, planning, manufacturing or any other activity during the
along with the functional dependencies among the collaborative life cycle)
collaborating components. This was demonstrated as part
of a US Ignite collaborative manufacturing project where Interface Components: these can be cyber or physical
the focus was on the creation of an advanced, cyber-
interfaces necessary to interact and integrate across
physical system for the rapid assembly of micro-devices
[17]. This next generation smart manufacturing initiative platforms, systems and programming languages.
involved the design of a collaborative system that employs Data collectors or sensors (or things): which are the
cyber and physical components that communicate and essential lower level elements that transmit the raw data;
collaborate through a cloud-based framework from examples of these data collectors are cameras, RFID tags/
geographically distributed locations [4, 11]; the lifecycle readers, etc. which can be independent or part of a work
of this CPS for smart manufacturing supports includes cell, machine or some other group in a factory or remote
design analysis, assembly planning, simulation/analysis of partner site.
assembly plans, and finally physical assembly of target Coordinating components: these components can be
micro devices. viewed as software modules that manage, supervise and
coordinate necessary interactions to ensure the
IV. COMPONENTS OF IOT BASED CPS FRAMEWORKS collaborative process achieves the overall objectives of
The essential components of such IoT based frameworks the partnerships
include the following: Physical resource components including machines,
Design and design analysis components: These are robots, conveyors, work cell, etc.
modules which can be used to study the customer’s Networking components: which provides the vehicle for
design inputs and gain a better understanding of the all data, information and knowledge exchange.
requirements; they can integrate with the planning
components whose function can be to develop a meta In advanced manufacturing contexts, there can be a
plan as well as plans to address the customer’s design myriad of other components including scheduling, service
requirements. and other components required for conducting
Planning components: These are the essential modules manufacturing activities.
which are needed to support assembly or manufacturing
planning based on customer design requirements; the V. AN IOT BASED CPS FRAMEWORK FOR PCB
outcomes of such planning components serve as inputs to ASSSEMBLY
3D Virtual Reality based assembly analysis simulators. An IoT based framework has been developed for the field
These planning modules will depend on algorithmic of PCB assembly as part of a pilot initiative; the main
problem solving to address various issues such as components in this framework includes various planning,
generating assembly plans and sequences, path plans simulation and physical assembly components. Figure 1
based on obstacle avoidance, etc. illustrates some of the key components of this
Semantic analysis components: A semantic understanding framework; contract electronics packaging industry in an
of potential engineering partner capabilities is necessary extremely dynamic manufacturing environment with high
to identify partners who can become members of a variety and low volume products [18]. In such
rapidly forming virtual enterprise (VE). Semantic environments, having a CPS based framework with
modeling based capabilities are needed in software sensors on the shop floor functioning as ‘things’ in an IoT
components to discover services, compare capabilities context can enable organizations to work together sharing
and identify partners who can work together in data software and physical resources.
intensive IoT based manufacturing activities; semantic
interoperability is a major issue in today’s global The role of the Digital Mockup or Virtual Prototype
manufacturing context. Such semantic analysis based simulators (shown as VP simulators in figure 1)
components will enable software modules in different assumes significance; for the surface mount assembly
organizations to be used effectively. context, such Virtual Reality based digital mockups of
Simulation components: These modules can be created assembly environments (which can be accessed through a
using 3D Virtual Reality based tools; they can be also cloud) would facilitate the investigating the key factors
described as Digital Mockups or Virtual Prototypes that impact the process parameters. Another key benefit
which can be used to support cross functional analysis of using Virtual Prototype is in training the operators in the
proposed assembly or manufacturing plans at various SMT process and in creating an efficient assembly
levels of abstraction (machine, assembly, factory or other sequence. Training workers in assembly processes offline
levels). in a VR environment helps to improve efficiency and
Oracular components: are needed to keep track of status productivity by avoiding shut downs. To demonstrate the
of all CPS interactions during the accomplishment of a use of such Digital Mockups in an IoT context, Virtual
target engineering life cycle; this includes informing all Reality based digital mockups were created using the
components in the CPS framework of the work in Unity 3D engine along with C# and Java scripts (figures
progress (WIP), problems encountered at a particular 3, 4 and 5); based on customer designs and with the help
point in time, knowing what phase of the collaborative of assembly sequencing approaches, various candidate
253
7th International Conference on Information Society and Technology ICIST 2017
assembly and path plans can be compared and studied assembly. These Digital Mockups can be at the assembly
before the most feasible plan can be used for physical level (figure 3) or at the factory level (figure 4).
Cloud
Computing
Semantic Analysis
Tools
DE Simulation VP Simulator
Role of Genetic Algorithms (GA): GAs have been in head in accomplishing a given assembly sequence. New
variety of manufacturing and engineering contexts to operators such as the Advanced Cross-Over Operator
determine assembly sequences. After beginning with (ACX) have been proposed to generate child sequences.
random sequences (where each sequence is composed of An example of this operator is shown in figure 2; the
a set of feeder and destination positions), genetic inputs to this operator are 4 candidate parent sequences
operators can be used to generate new child sequences. which are used to generate new assembly (or child)
Operators such as cross over and mutation can be used to sequences. In each iteration, the sequences with least
produce new candidates child sequences which are travelling distance are selected; the new child sequences
compared based on a fitness function; in assembly in each iteration need to be better than the parent
planning contexts, such fitness functions can be the assembly sequences. The iterations are repeated until
travelling distance for a given assembly sequence; this there are no substantial improvements (or reductions) in
would include the distance travelled by a robot assembly the traveling distance of new sequences.
254
7th International Conference on Information Society and Technology ICIST 2017
255
7th International Conference on Information Society and Technology ICIST 2017
of Digital Mockup environments using Unity 3D engine [1] Liu, Y., Peng, Y., Wang, B., Yao, S., and Liu, Z., (2017) Review
on Cyber-Physical Systems. IEEE/CAA Journal of Automatica
and other software tools. The Unity based environments Sinica, 4(1), 27-38.
allow users to explore and interact with the Vive™ [2] Li Da Xu, Wu He, & Shancang Li. (2014). Internet of Things in
Virtual Reality system; the emergence of low cost digital Industries: A Survey. Industrial Informatics, IEEE Transactions
prototyping is an important development in the adoption on, 10(4), 2233-2243.
of cyber physical practices. Genetic algorithms will [3] Uckelmann, D., Harrison, M., & Michahelles, F. (2011). An
continue to play a key role as part of planning and architectural approach towards the future internet of things. In
Architecting the internet of things (pp. 1-24). Springer Berlin
sequencing components for changing customer Heidelberg.
requirements and manufacturing constraints. [4] Lu, Y., & Cecil, J. (2016). An Internet of Things (IoT)-based
collaborative framework for advanced manufacturing. The
Two implementations of IoT based frameworks for International Journal of Advanced Manufacturing Technology,
84(5), 1141-1152.
collaborative manufacturing [17] and cyber training of
[5] Cecil, J., Internet of Things (IoT) based Cyber Physical
surgeons [16] using Next Generation networks have been Frameworks for Advanced Manufacturing and Medicine, Internet
completed. The design of such interactions was achieved of Things (IoT) & Data Analytics Handbook, John Wiley & Sons,
using eEML as well as the IDEF-0 modeling languages pp. 545-559, 2017.
[5, 16, 17]. The data and information exchange was [6] Cecil, J, Jones, J., An Advanced Virtual Environment for Micro
Assembly, International Journal of Advanced Manufacturing
supported using Next Generation networking as part of Technology, Vol.72, Issue 1, 2014, pp. 47-56.
the GENI initiative (GENI stands for Global Environment [7] Cecil, J., and Chandler, D., Cyber Physical Systems and
for Network Innovations) [13]. In the EU, a similar Technologies for Next Generation e-Learning Activities,
initiative called FIRE [14] is underway to develop next Innovations 2014, W. Aung et al. (eds), iNEER, Potomac, MD,
generation Internet technologies for various applications. USA, 2014.
[8] CPS 2013, Cyber Physical Systems.
Software Defined Networking (SDN) principles are http://en.wikipedia.org/wiki/Cyber-physical_system#Examples,
supported in these next generation networks. The (accessed Dec 2013).
adoption of high gigabit networks for a range of [9] Cecil, J. and Gobinath, N. 2010. A Cyber Physical Test Bed for
applications as part of the US Ignite initiative is also Collaborative Micro Assembly Engineering, Proceedings of the
underway [15]. IEEE 2010 International Symposium on Collaborative
Technologies and Systems (CTS), Chicago, June, pp. 430-439.
VII. CONCLUSION [10] Kanduri, A. and Rahmani, A.. 2013. A Multicore Approach to
Model-Based Analysis and Design of Cyber-Physical Systems,
This paper outlined the functional components of IoT ISOCC.
based cyber physical systems for advanced [11] Xu, X. 2012. From cloud computing to cloud manufacturing,
manufacturing. Some of the key components included Robotics and Computer-Integrated Manufacturing, Volume 28,
Issue 1, February 2012, pp. 75–86.
design analysis, planning, simulation, semantic analysis,
[12] Cruz, L. 2011. How Cloud Computing is Revolutionizing
interface and other modules which are needed to support Education, Aug 22 2011, Cisco’s Technology News Site,
collaborations among Virtual Enterprise partners in both http://newsroom.cisco.com/feature-content?articleId=460910
manufacturing and other domains. An overview of an (accessed March 2013).
IoT framework for PCB assembly was also provided [13] GENI initiative, GENI, www.geni.net (accessed Dec 2013)
along with a discussion of Genetic Algorithms and their [14] FIRE initiative, http://cordis.europa.eu/fp7/ict/fire/ (accessed Dec
2013).
role in generating manufacturing assembly sequences; the
[15] US Ignite, What is US Ignite, http://us-ignite.org/ (accessed Dec
role of Digital Mockups in such cyber physical 2013)
environments was also discussed. [16] Cecil, J., Gupta, A., Pirela-Cruz, M., Ramanathan, P., A
Distributed Collaborative Simulation Environment for Orthopedic
ACKNOWLEDGMENT Surgical Training, Proceedings of 2017 Annual IEEE International
Systems Conference (SysCon), April 24-27, Montreal, Canada.
This material is based upon work supported by the US
[17] Cecil, J., Gunda, R., Cecil-Xavier, A., A Next Generation
National Science Foundation (under Grant Numbers Collaborative System for Micro Devices Assembly, Proceedings
1257803, 1359297, 1447237 and 1547156). Any opinions, of 2017 Annual IEEE International Systems Conference (SysCon),
findings, and conclusions or recommendations expressed April 24-27, Montreal, Canada.
in this material are those of the authors and do not [18] Waurzyniak, P., (2007) Lean Manufacturing for Contract
necessarily reflect the views of the National Science Manufacturer, International Journal of Advanced Manufacturing
Foundation. Technology, Vol.31, pp. 846-856.
REFERENCES
256
7th International Conference on Information Society and Technology ICIST 2017
Abstract— The reduction of energy losses, particularly non- According to publically available data Serbia produces
technical losses, in the distribution network has become one near 31 billion kWh yearly, but at the same time, loses up
of the priority business goals in companies engaged in to 19.5% from the whole amount of produced electric
distribution of electricity. Accordingly, Electric Power energy [3]. In regular business activities, Electric Power
Industry of Serbia use distribution network losses as one of Industry of Serbia (EPS) makes significant technical and
the key elements that indicate the degree of quality of non-technical losses of electrical energy. According to
performing its business activities. In this paper, we propose reports, EPS total losses are estimated at around 100
a system for automation of electricity theft detection and million Euros of which more than 60 million Euros are
monitoring. The proposed system uses analysis of existing non-technical losses caused by theft. These financial
data already owned by electric distribution company and losses result in lack of profits and shortage of funds for
does not require expensive equipment. Common model, investment in electricity system capacity and
based on elements of Common Interchange Model, is improvement [4]. Financial losses are not the only
implemented in order to provide means for the exchange of problem and maybe even a bigger challenge is the
information between different information systems within problem of protecting electric distribution infrastructure,
Electric Power Industry of Serbia. Proposed systems is ensuring the continuity of services and a necessity to
already implemented and used in production in different expand generating capacity to cope with the electricity
parts of Electric Power Industry of Serbia. losses. Therefore, the reduction of energy losses in the
distribution network has become one of the priority
I. INTRODUCTION business goals for EPS.
In electricity supply to final consumers, losses refer to II. RELATED WORK
the amounts of electricity injected into the transmission
and distribution grids that are not paid for by users [1]. There are many types of methods and techniques for by
Total losses have two components: technical and non- which dishonest consumers illegally use electric energy
technical. Technical losses occur naturally and consist [5, 6]. The electricity thieving methods can be divided into
mainly of power dissipation in electricity system two major categories: meter tempering where imposter
components such as transmission and distribution lines, manipulate the internal structure of metering system and
transformers, and measurement systems. Non-technical line tempering where intruder bypass the electricity meter
losses are caused by actions external to the electricity connection. These electricity thieving methods are
system and represents primarily the amount of electricity common and easy to use by electricity consumers in
consumed by consumers that distribution company is not Serbia.
able to identify and charge. The mentioned non-technical To prevent such activity or significantly limit the
losses include: consumption of electricity by illegally practice, EPS uses increasingly sophisticated techniques to
connected consumers, electric meter or electricity line secure its own infrastructure [5]. However, site inspection
tempering, and errors in accounting and record-keeping. controls showed that the thieves themselves also employ
There is no electricity system without some amount of increasingly advanced techniques for illegal energy
non-technical losses. In many systems the amount of theft consumption. Apart from securing infrastructure, there are
is small (1–2%) in terms of the electricity generated [2]. methods allowing to detect and locate a specific dishonest
But, the financial loss is high due to the large amount of energy consumer without the need to perform a site
electricity distributed. According to publically available inspection. Methods for electricity theft detection are often
data, in USA non-technical losses are estimated between based on data analysis and elimination (exclusion of
0.5% and 3.5% of the whole amount of produced electric suspicion), which leads to narrowing down the area of
energy. That seems like a small amount until USA search.
electricity revenues of more than $300 billion range are Detection of electricity theft is an extremely
considered. Therefore, between $1 and $10 billion worth challenging problem in today’s power engineering and
of electricity represents non-technical losses. utility’s everyday operations. Almost every plan to reduce
non-technical losses requires permanent monitoring of all
parameters of electricity supply (current, voltage, power
257
7th International Conference on Information Society and Technology ICIST 2017
factor, etc.). Large scale application of advanced metering related information exchanges between systems in electric
infrastructure (AMI) and different remote management distribution companies like EPS.
and monitoring devices provides powerful tools to reduce
total losses and increase collection rates. These tools are III. SYSTEM FOR AUTOMATION OF ELECTRICITY THEFT
often combined with specialized equipment and methods DETECTION AND MONITORING
for detecting electricity theft [7-10].
As already mentioned proposed system for electricity
All proposed techniques for reducing losses are very theft detection and monitoring is based on calculating the
expensive and require huge investments from EPS in balance difference between total readings from meters
distribution network. E.g. at the moment very small with control (balance) meter.
percent of customers are covered with AMI. There are
plans to change that but not at once, but over a period of A. Calculate balance difference
several years. In the meantime, the authors of this paper
Balance difference method is based on calculating the
propose a system that will automate electricity theft mismatch between the total readings of meters with the
detection and monitoring. This system will use method
control (balance) meter. This method is convincing when
based on calculating the balance difference between total
it comes to narrowing down the circle of suspects and
readings from meters with control (balance) meter [5].
checking a specific consumer’s meter readings.
This method uses analysis of existing data already owned
by EPS, allows detecting and locating electricity theft and In order to calculate balance difference, we need to
does not require additional expensive equipment. The obtain data from different enterprise information systems
balance difference is the basic and most important (Fig. 1):
component indicating that theft of energy took place. 1. Supervisory Control and Data Acquisition
The data that is necessary for calculation the balance (SCADA) system or some other specialized
difference is stored in different independent information measurement systems are used as balance meters in
systems throughout EPS electricity distribution company. transformer substation. According to EPS standards,
In order to ensure access to the necessary information, we these measurements systems usually provide 15
have defined a common model for the exchange of minute measurements of different electricity
information between different information systems within distribution system parameters.
EPS. For these reasons, the proposed system for electricity 2. Consumed energy from individual meters is
theft detection and monitoring is built upon IEC 61968 provided by Consumer Information System (CIS) or
standard and Common Information Model (CIM) for Advanced Metering Infrastructure (AMI) where
exchanging information about electric network [11]. present. According to EPS standards consumed energy
These standards are targeting different distributed is available in the form of monthly readings from
software applications that are supporting operations and individual meters in case of CIS, or as 15 minute
management of the electrical distribution network. They readings from meters in areas covered by AMI.
are providing enablement for inter-application integration 3. Geographic Information System (GIS) provides
between such software applications. The idea is to provide info about network topology and consumers that are
the standard for integration of the legacy software systems connected to specific transformer substation.
as well as the systems that are new or will introduced in 4. Technical information system (TIS) provides data
the future within an electrical utility company. These about of the installed equipment of the electric power
standards are broadly applied to enterprise integration and supply network.
Figure 1. Calculate non-technical losses using data from different enterprise information systems
258
7th International Conference on Information Society and Technology ICIST 2017
In general, the energy mismatch that is calculated using implemented only those parts of CIM model that are
(1), where Plosses represents the total amount of losses for relevant for calculating balance difference and detecting
specific transformer substation, both technical and non- losses (Fig. 2) [11]:
technical, Pcontrol meter represents reading from balance Substation – A collection of equipment for
meter in specific transformer substation and Pconsumed energy purposes other than generation or utilization, through
(2) represents sum of energy consumed by all metered which electric energy in bulk is passed for switching
consumers connected to a specific transformer substation. or modifying its characteristics.
Plosses = (Pcontrol meter – Pconsumed energy)[W] (1) PowerTransformer - An electrical device
consisting of two or more coupled windings, with or
Pconsumed energy = ∑ Pconsumer meter [W] (2) without a magnetic core, for introducing mutual
Technical losses are inevitable during the normal coupling between electric circuits.
operation of the electricity distribution system. In some ControlMeter – Balance metering device
cases, it is sufficient to provide only rough estimation of installed in substations or feeders.
technical losses based on empirical data. If parameters of
the electricity distribution system are known (network Measurement - A Measurement represents any
topology and order and attachment points of all consumers measured or calculated quantity.
from GIS and the line resistances between the attachment Feeder – Substation feeder.
points from TIS) we can provide finer estimation of Line - Equipment beyond a substation belonging
technical losses with certain precision. Remaining part of to a power transmission line
calculated mismatch (3) represents non-technical losses
and is a usually result of electricity theft. AC Line Segment - A wire or combination of
wires, with consistent electrical characteristics,
Pnon-technical losses = Plosses – Ptechnical losses [W] (3) building a single electrical system, used to carry
alternating current between points in the power
B. Common data model system.
As mentioned earlier, proposed system for electricity WireInfo - Wire data that can be specified per
theft detection and monitoring uses data obtained from line segment phase, or for the line segment as a whole
different enterprise information systems. These systems in case its phases all have the same wire
usually operate completely independent which implies characteristics.
that the data generated by a single system is available only
to users of that particular system. Therefore, in order to Consumer – Generic user of energy.
make calculation of balance difference successful, it is UsagePoint - Logical or physical point in the
necessary to provide access to all data that is needed. If network to which readings or events may be attributed.
the processing of this information is performed manually, Used at the place where a physical or virtual meter
probability that errors or violations of data consistency may be located.
will occur is increased. Meter - Physical asset that performs the metering
As mentioned before proposed system is based upon role of the usage point. Used for measuring
CIM as a common model for exchanging information consumption and detection of events.
between different systems. It is highly complicated and it MeterReading - values obtained from the meter.
covers broad set of different details regarding the
distribution network, equipment and their parts as well as
other aspects of the electric distribution company. Due to
its abstract nature, the CIM model may be implemented in
many ways but does not have to be implemented in its
entirety in a proposed system. Therefore, for the purpose
of proposed system we have implemented we have
Figure 2. Common data model based on IEC 61968 standard and Common Information Model (CIM)
259
7th International Conference on Information Society and Technology ICIST 2017
IV. IMPLEMENTATION OF THE SYSTEM Electric energy losses are one of the key factors that
Proposed systems is already implemented in production indicate the degree of economy and quality of business in
in two different EPS organizational units: the area of electricity distribution.
In this paper, we propose a system for automation of
A. EPS organizational unit Niš electricity theft detection and monitoring. The proposed
In this organizational unit proposed system is working system is based on calculation balance difference between
for more than seven years. This is the oldest the total delivered electrical energy and sum of electric
implementation of proposed system and during years this energy registered with meters installed with consumers.
system has gone through various transformation. System The balance difference is the basic and most important
is semi-automatic and is based on offline process for component indicating that theft of energy took place. This
importing flat files with measurements from balance method is also based on data analysis of data already
meters installed in transformer substation. The rest of the existing within company and doesn’t require expensive
system uses information integration on database level equipment.
System covers 300 transformer substations. For the purpose of proposed system, we have
implemented common data models that contains only
B. EPS organizational unit Velika Plana those elements of CIM model that are relevant for
EPS organizational unit Velika Plana uses proposed calculating balance difference and detecting losses.
system for almost a year. Implemented system covers 100 Implemented data model is used for the exchange of
transformer substations. System is automatic and is based information between different information systems within
on information integration on database level and using EPS.
web Proposed systems is already implemented and used in
Thanks to implemented common model, the proposed production in different parts of Electric Power Industry of
system can work in different EPS organizational units Serbia.
with quite different business processes and information
systems whose data is integrated for electric theft REFERENCES
detection and monitoring. [1] World Bank Group, “Reducing technical and non-technical losses
in the power sector”, Washington, DC, 2009,
After several years of exploitation, we can highlight the http://documents.worldbank.org/curated/en/829751468326689826
following advantages of using the proposed system: /Reducing-technical-and-non-technical-losses-in-the-power-sector
Detection of missing and erroneous data (wrong [2] T. B. Smith, “Electricity theft: A comparative analysis”, Energy
network topology, missing meter readings, wrong Policy, vol. 32, no. 18, pp. 2067-2076, 2004, DOI:
meter readings, etc.) - the proposed system is 10.1016/S0301-4215(03)00182-4
susceptible to errors in the data and can be used to [3] S. Marković, “Criminal analysis of electricity theft and its social
validate and refine existing data on distribution consequences”, International Scientific conference Archibald
Reiss days, vol 3, pp. 83-90, Belgrade, 2015, ISBN 978-86-7020-
network in the system 321-1
Detecting and locating electricity theft – As [4] A. Pavić and J. Trupinić, “Electrical Energy Losses In The
already mentioned in this paper the balance difference Distribution Network”, Energija, vol. 56, no. 2, pp. 185-215, 2007
is the basic and most important component indicating [5] R. Czechowski and A. M. Kosek, “The Most Frequent Energy
that theft of energy took place. Proposed system can be Theft Techniques and Hazards in Present Power Energy
used to localize transformer substations or transformer Consumption”, Joint Workshop on Cyber-Physical Security and
Resilience in Smart Grids (CPSR-SG2016), 2016.
substations feeders with high levels of non-technical
[6] Z. Hussain, “Methods and Techniques of Electricity Thieving in
losses. Pakistan”, Journal of Power and Energy Engineering, vol. 4, no. 9,
Narrowing down the area of search for electricity pp. 1-10, 2016.
theft and planning field controls or installing control [7] A. A. Cardenas, S. Amin, G. Schwartz, R. Dong, and S. Sastry, “A
meters – After energy theft is detected and localized, game theory model for electricity theft detection and privacy-
field controls and controls meters can be used to prove aware control in AMI systems”, in Proc. IEEE 50th Annual
Allerton Conference on Communication, Control, and Computing
the theft. (Allerton), pp. 1830-1837, 2012.
Planning AMI and advanced equipment [8] D. Mashima and A. A. Cardenas, “Evaluating electricity theft
installments in most critical areas – EPS can focus to detectors in smart grid networks“, in Research in Attacks,
cover with AMI and advanced equipment parts of the Intrusions, and Defenses, Springer, pp. 210-229, 2012.
distribution network where energy theft is detected and [9] S. Depuru, L. Wang, and V. Devabhaktuni, “Support vector
localized. machine based data classification for detection of electricity theft“,
in Proc. 2011 IEEE/PES Power Systems Conference and
Monitoring effects and return of investment of Exposition (PSCE), pp. 1-8, 2011.
applied techniques for reducing electricity losses. [10] S. McLaughlin, D. Podkuiko, and P. McDaniel, “Energy theft in
the advanced metering infrastructure“, in Proc. The 4th
CONCLUSION International Conference on Critical Information Infrastructures
Security, Springer, pp. 176-187, 2010.
Every company that deals with the distribution of [11] IEC 61968, “Common Information Model (CIM)/Distribution
electric energy pays great attention to the problem of Management“, http://www.iec.ch/smartgrid/standards/ .
electric energy losses, especially non-technical losses.
260
1
7th International Conference on Information Society and Technology ICIST 2017
Abstract—Large scale and complex systems that require sig- Furthermore, our system supports standardized agent com-
nificant hardware resources are typically designed to utilize munication defined by The Foundation for Intelligent Phys-
distributed computing. This paper presents an architecture of ical Agents (FIPA) [4]. This standardized communication
a lightweight multi-agent middleware, aimed to simplify dis-
tributed computing operations. This middleware is implemented is defined by the Agent Communication Language (ACL).
in JavaScript programming language, specifically in the Node.js By creating FIPA ACL-compliant agent middleware we’ve
framework. It is designed to support simple load balancing, avoid enabled communication between our agents and agents from
a single point of failure and enable execution of computationally different systems that support the same standard.
intensive tasks. Furthermore, by supporting standardized agent The rest of the paper is organized as follows. First, we
communication, our agents can transparently interact with agents
in existing, third-party multi-agent solutions. give a brief overview of the related work in section 2. Our
solution is presented in section 3. In section 4. we present
Keywords—Multi-agent platform, Software agents, Mobile
some performance tests that we’ve done and made some
agents, Distributed architecture, JavaScript
comparisons. We discuss the current state of our work and
our goals for the future in section 5. Finally, in section 7., we
I. I NTRODUCION draw our conclusions.
Modern applications and systems have high computational
complexity and require significant hardware resources to run
II. R ELATED WORK
efficiently. Applications which require the most computational
power utilize distributed computing, where machines and Use of Web, JavaScript, and HTML as an agent platform
devices are grouped and managed in clusters, in order to is not very common. The Radigost [5, 6] system uses Web
distribute the workload. The main challenges involved in the and JavaScript as an implementation platform for multi-
design and implementation of distributed systems are hardware agent systems. It has many interesting features like support
and software heterogeneity, resource access and availability, for standard agent communication and yellow-pages service
scalability, fault tolerance, concurrency, security, etc [1, 2]. for agent directories. However, it does not support running
The problem addressed in this paper is the development of agents outside the browser. Because of that, Radigost has
distributed system based on the agent paradigm, which can been merged with the JavaEE-based agent framework into
easily distribute agents to a wide variety of devices. one system named Siebog [7, 8]. Siebog allows execution of
The basic motivation for our work is to simplify distributed JavaScript agents on the server side with the use of adapters,
computing operations, where the underlying distributed system but only one-way mobility is currently supported. Agents can
consists of heterogeneous software and hardware. We do only migrate from a browser to a server.
this by developing a multi-agent middleware in JavaScript The framework described in our paper follow up previous
programming language. The JavaScript was chosen because research and development done in Siebog. Because the server-
it is the language of browsers, which means that agents that side of Siebog is based on JaveEE technologies, it cannot
were written for our middleware can easily be executed in be hosted on small devices, such as those used in Internet
every major browser. This also means that our agents can be of Things (IoT) applications. Taking that, agent migration
executed on any device that can open a Web browser, without limits, and agent distribution mentioned above into consider-
any previous installation or configuration. ation, we have decided to create a lightweight Node.js based
On the back-end, our system uses Node.js [3] application middleware. This middleware, conveniently named SiebogJS,
servers, and because of this design choice we’ve simplified the currently doesn’t support all functionality that Siebog does.
implementation of one of the most important mechanisms in Use of enterprise technologies and industry standard servers
the agent paradigm, the agent migration. Our agents can easily enabled Siebog to support automatic load balancing, agent’s
migrate between servers and between a browser and a server. state persistence and replication, and fault-tolerance.
This paper focuses on the server side of the whole system, Alongside Radigost, currently, there exist a few more addi-
which is designed to enable execution of computationally tional Web, more specifically HTML5 based agent middle-
intensive tasks. wares. Most similar to ours is described in [9]–[11]. The
261
2
7th International Conference on Information Society and Technology ICIST 2017
will process received messages fast, multiple agents could be The AgentClass consists of the type name, and the module
executed inside one process. But because we can’t guarantee name. The value property is just its string representation.
that this will be the case and also because we wanted to enable The agent also has two other properties that are injected into
execution of computationally intensive tasks, every agent had it upon its creation. These two properties are agent manager
to be executed inside a new Node.js child process. Drawbacks and message manager objects. The agent manager object pro-
of this design choice will be discussed in the next section. vides methods for getting information about currently available
Because agents are executed in a new process, agent manager agent types in the system, as well as information about running
had to handle inter-process communication. Furthermore, there agents. Besides that, agent manager object exposes the migra-
is no direct connection from the message manager to the agent, tion method, which agent can invoke and start the migration
so all agent messages first go to the agent manager, and then process. On the other side, message manager exposes methods
get passed to the agents. for sending messages.
The agent migration is a very important mechanism in the In order to create distributed applications, developers only
agent paradigm, and it is one of the main reasons we have need to create their own specific agents, subclasses of Agent
chosen JavaScript in the first place. Our system offers this class. The specific subclass can override the following meth-
mechanism to agents, and the agent itself can initiate the ods:
migration process. In the current solution, agent’s code gets • handleMessage(ACLMessage), this method is automati-
uploaded to the master host and when the upload finishes it cally called whenever agent center receives a message
gets distributed to all other hosts. Because every agent center addressed to that agent. As a parameter agent gets the
has a code of every agent, in the migration process it is only received ACL message.
necessary to send the agent’s state to the targeted agent center. • postConstruct() and preDestroy(), these methods are au-
Fig. 2, shows that some layers are configurable. The config- tomatically called after the agent is created, and before
uration manager uses configuration data received from JSON the agent is removed from the system, respectively. Some
files to setup appropriate functionality. initialization can be done within a postConstruct method,
REST and WebSocket (WS) services provide the endpoints while used resources could be released in the preDestroy
of our system to which the client applications can connect. method.
These services provide endpoints for creating and removing • postMigration(ACLMessage), when the migration pro-
agents, sending messages to agents etc. The only difference cess finishes this method gets called instead of postCon-
between endpoints they provide is that WS service has one struct. ACL message is the message sent via migrate
additional endpoint where the client can get the agent activity method. The migrate(hostAlias, ACLMessage) method
log. This endpoint exposure enables our system to support the initiates the migration process, it can be invoked by an
Web, mobile or desktop based client applications. agent itself. This method accepts two parameters, alias
of the host on which agent should move and an optional
B. Programming agents ACL message which can be used to pass additional
From a developers point of view, system functionality is information.
encapsulated in reusable Agent class. Every agent has a unique
identifier represented by the AID class, Listing 1. The AID C. Example of an agent
consists of the agent’s runtime name, the id of the host on In the Listing 3, we gave a simple agent example. This
which agent will be executed and the agent class. The value agent has overridden the handleMessage, postConstruct, and
property of AID class is just its string representation. preDestroy methods. In the last two methods, the agent just
function AID(name, host, agentClass) { logs to the console that it has started and that it will be
this.name = name; destroyed. In the handleMessage method, the agent asks the
this.host = host; agent manager for the identifier of an agent of Ping type, and
this.agentClass = agentClass; if there is at least one Ping agent running in the system this
this.value = name + divider + host +
agent will send the message to it with the message managers
divider + (agentClass.value ||
agentClass); post method.
}
function TestAgent() {
Listing 1: AID class var self = this;
Agent.call(self);
Besides a unique identifier, every agent has a type, which self.handleMessage = function (aclmsg) {
is represented by the AgentClass class, Listing 2. console.info(’Agent with aid: ’+ self.
aid.value + ’ received a message:
function AgentClass(name, module) { ’ + JSON.stringify(aclmsg));
this.name = name;
this.module = module; self.agentManager.getAgent(’siebogjs.
this.value = module + divider + name; pingpong@ping’, false, function (
} pingAid) {
Listing 2: AgentClass class
if (!pingAid)
263
4
7th International Conference on Information Society and Technology ICIST 2017
return; 4000 agents, while creation and execution of them took about
10 minutes. On the other hand, Siebog can execute more than
var msg = ACLMessage(self.aid. 15K agents, and for example, the creation and execution of 4K
value, ACLPerformatives.
REQUEST, [pingAid]); agents took about 20 seconds. The important thing to mention
msg.content = ’Ping’; is, when both systems were executing, for example, 2K agents,
self.messageManager.post(msg); in the case of Siebog the laptop was usable, but that can’t be
}; said for the our solution.
In the second test, we measured the time that two agents
self.postConstruct = function () {
console.info(’Created agent with aid: took to exchange a certain number of messages. The results
’+ self.aid.value ); of these measurements are depicted in Fig. 4.
};
self.preDestroy = function () {
console.info(’Destroying agent with
aid: ’+ self.aid.value);
};
}
IV. E VALUATION
During development, we have performed a couple of per-
formance tests and compared the results of Siebog and our
solution. The test environment consisted of a laptop computer
with a quad-core Intel i7 CPU with hyper-threading, 16GB
of RAM and an SSD. Besides that, Siebog was executed on
Wildfly 8 [14] server, while our solution was executed on Fig. 4: Message exchange between two agents
Node.js v6.10.0 application server.
Two performance tests were taken. First, we tested the limit The x - axis is presented using a logarithmic scale because
of the number of running agents and also measured the time we measured the exchange of messages in the range from 1K
needed to create a certain number of agents. The rate of change to 1M. Here we can see that our solution performs much better
of time given the number of created agents is depicted in Fig. then Siebog. For example, exchange of 1M messages took our
3. system about 2 minutes, while Siebog took about an hour and
a half. The slow message exchange process is a known Siebog
issue, and it could and should surely be optimized in the future.
round-robin load balancing, but we would like to add load [3] Node.js foundation. Node.js. https://nodejs.org/en/.
balancing that depends on the load of the agent center. [4] Foundation for intelligent physical agents. http://www.fipa.org/.
[5] Dejan Mitrović, Mirjana Ivanović, Zoran Budimac, and Milan Vi-
Our system currently doesn’t support any security mecha- daković. Radigost: Interoperable web-based multi-agent platform. Jour-
nisms. This is a big issue because we support agent upload, nal of Systems and Software, 90:167–178, 2014.
so malicious code could easily be executed on our servers. [6] Dejan Mitrović, Mirjana Ivanović, and Costin Bădică. Delivering the
multiagent technology to end-users through the web. In Proceedings
Since Node.js does not have built-in security mechanisms, we of the 4th International Conference on Web Intelligence, Mining and
will need to implement our own. One of our primary goals is Semantics (WIMS14), page 54. ACM, 2014.
to prevent security problems while agents execute their code, [7] Dejan Mitrović, Mirjana Ivanović, Milan Vidaković, and Zoran Budi-
mac. Extensible java ee-based agent framework in clustered environ-
and we will do this by implementing two security mechanisms: ments. In German Conference on Multiagent System Technologies, pages
static code analysis and run-time security based on policies. 202–215. Springer, 2014.
This will enable us to potentially detect problematic code and [8] Milan Vidaković, Mirjana Ivanović, Dejan Mitrović, and Zoran Budi-
mac. Extensible java ee-based agent framework–past, present, future. In
to prevent agent to execute that code. Furthermore, Node.js Multiagent Systems and Applications, pages 55–88. Springer, 2013.
depends on multiple modules which are used by the code that [9] Jari-Pekka Voutilainen, Anna-Liisa Mattila, Kari Systä, and Tommi
is executed on it. The way that a module is required and used Mikkonen. Html5-based mobile agents for web-of-things. Informatica,
40(1):43, 2016.
could pose a security threat. Besides that, JavaScript has an [10] Laura Jarvenpaa, Markku Lintinen, Anna-Liisa Mattila, Tommi Mikko-
eval function, that can be used to parse and evaluate a string nen, Kari Systa, and Jari-Pekka Voutilainen. Mobile agents for the
as an expression. This can also pose a security threat, and so it internet of things. In System Theory, Control and Computing (ICSTCC),
2013 17th International Conference, pages 763–767. IEEE, 2013.
should be detected and discarded with the static code analysis. [11] Kari Systä, Tommi Mikkonen, and Laura Järvenpää. Html5 agents:
Previously we have only talked about how to improve mobile agents for the web. In International Conference on Web
our current solution, but in order to complete the whole Information Systems and Technologies, pages 53–67. Springer, 2013.
[12] Marius Feldmann. An approach for using the web as a mobile agent
system, we also need to implement the client-side. Only with infrastructure. In Proceedings of the International Multiconference on
the completed server-side and client-side parts, we can fully ISSN, volume 1896, page 7094, 2007.
support agent distribution mentioned in section I. The client- [13] CEO of iMatix Pieter Hintjens. Ømq - the guide.
http://zguide.zeromq.org/page:all.
side will be very similar to the Radigost, for example, it would [14] Rad Hat. Wildfly homepage. http://wildfly.org/.
probably use WebSockets to communicate with the server and
WebWorkers to run agents.
VI. C ONCLUSION
In this paper, we have presented our server-side JavaScript-
based agent middleware. Also, we have presented our solu-
tion’s test results and compared them to the Siebog’s. Those
tests have shown strengths and weaknesses of our design. In
the previous section, we have suggested some techniques that
can be used to solve the main performance problems. We
have also discussed our future goals involving security and
more advanced clustering techniques, as well as the future
development of the middleware’s client-side part. In order to
support the distribution of agents without previous installation
or configuration, the client-side part must be implemented. The
server-side part can run on multiple platforms, but it requires
installation.
Like we mentioned before Web and JavaScript based agent
middlewares are rare, and there exists a lot of room for future
research and improvements.
ACKNOWLEDGMENT
Results presented in this paper are part of the research con-
ducted within the Grant No. III-47003, Ministry of Education,
Science and Technological Development of the Republic of
Serbia.
R EFERENCES
[1] Andrew S. Tanenbaum and Marteen van Steen. Distributed Systems.
Principles and Paradigms. Pearson Prentice Hall, Upper Saddle River,
NJ, USA, 2nd edition, 2007.
[2] George Coulouris, Jean Dollimore, Tim Kindberg, and Gordon Blair.
Distributed Systems. Concepts and Design. Addison-Wesley Publishing
Company , USA, 5th edition, 2011.
265
7th International Conference on Information Society and Technology ICIST 2017
Abstract—In this paper is analyzed the efficiency of image AM×N on three matrix UM×M, DN×N and VN×N matrices, with
watermarking, based on applying SVD decomposition and U and V orthogonal matrices and D is the diagonal matrix.
its resistance to the presence of impulsive noise. The first In [5] is shown watermarking algorithm where the
part of the paper describes SVD decomposition and SVDU watermark is inserted in the D matrix. For the process of
algorithm for watermarking and MDB algorithm for the extraction, besides of watermarked image, is required
detection and removal of impulse noise. The effect of level even the presence of the original watermark. In that case it
inserting of the watermark on visual characteristics of the is possible to extract and false watermark depending on
image with watermark was analyzed in the second part of the original watermark. A great number of authors are
the paper. Using the data shown in tables and graphs as well developed many algorithms based on SVD transformation
as pictures before and after watermarking indicate to the
with the extraction of the watermark without the presence
large level of visual image degradation.
of the original watermark [6, 7]. It was with small blocks,
Key words: Watermarking, SVD decomposition, filtering where the number of blocks was equal to the number of
pixels of the watermark [8]. Inserting is implied a change
I. INTRODUCTION of the values of diagonal matrix D depending on values of
Due to the rapid growth of modern communications, watermark pixel (0, 1). Usually in algorithms, when the
exchange of information and multimedia content has watermarking process was executing with inserting the
sharply increased. In such an environment, copyright pixels of digital watermark on singular values of the
protection and proof of ownership are a major problem, diagonal matrix, which is a disadvantage, because the
especially in the distribution of digital content (images, extraction of the digital watermark with setting other
video, audio ...). In order to resolve the problem of orthogonal matrices may lead to the false positive
copyright protection of digital images used the principle of detection of the watermark. In the watermarking algorithm
insertion visible or invisible information in the host image. based on applying Schur decomposition (A = U × D × U’
Insertion of hidden information in a digital image with the where's the U unitary and D upper triangular matrix)
aim of proving the right of ownership is called a digital changes are applied on individual elements of the matrix
watermarking [1]. In the process of distributing digital U for which is the statistical analysis shown, each other is
content comes to the degradation of the content as a result very little different [9]. In the practical implementation of
of the impact of different disturbances. In order to ensure the proposed algorithm the coefficient of the
the successful protection of copyright using digital watermarking should be great for the successfully
watermark it must be resistant to potential noise. The basic watermark extract. However increasing coefficient of
properties of the digital watermark's insensitivity, inserting the watermark leads to visible degradation of the
strength, capacity, no irreversibility and the ability to image. Statistical analysis for selecting elements, in which
provide proof of ownership. Digital watermarks can be the watermark will be inserted, applied to the algorithms
inserted in host image by using several different with Schur decomposition is applied in algorithms with
techniques, depending on the domain of insertion can be SVD transformation. In [10] is proposed new algorithm
divided into two categories: a) inserting watermark in (SVDU), where difference access is applied for inserting
frequent domain and b) inserting watermark in the spatial digital watermark in an image, with modification singular
domain. Inserting watermark in the frequent domain is a values of the elements. U orthogonal matrices. In [10]
more robust and more resistant to interference but carried out a detailed analysis of the algorithm and its
inserting is more complex. Watermarking in the spatial robustness on some specific attacks (superimpose of the
domain is less complex and easier for implementation. impulse and Gaussian noise, median filtering, cropping,
Many methods are proposed for watermarking, which are jpeg compression, etc...). The results of superimposing the
based on DCT [2, 3] transformation (Cosine Discrete impulse noise related to the percentage of infected pixels
Transform), DWT [4], transformation, (Discrete Wavelet 2 – 10 %. In this paper are presented results of application
Transform), SVD transformation [5-8], (Singular Value of the watermark extraction, after watermarking with
Decomposition), Schur decomposition (Schur SVDU algorithm and with degradation of the
Decomposition, SD) [9] and many other combinations of watermarked image with the impulse noise (salt and
the above transformation. The most common algorithms pepper) in the range p = 2 – 50 %. In order to increase the
for watermarking are based on SVD transformation. The quality of watermarked image and image of the
SVD transformation is performed decomposition of the watermark, MDB algorithm [11] is applied. This
266
7th International Conference on Information Society and Technology ICIST 2017
algorithm is implemented in two steps: a) locate infected SVDU watermarking algorithm, presented in [10]
pixels and b) filtering blocks of dimensions Mp × Np. performs the following steps:
In order to compare the results of this study with the Step 1: Perform dividing the image on blocks
results from [10] as a measure of the quality of a Hi,j = Mb × Nb = 4 × 4
watermark is applied Normalized Correlation (NC) Step 2: Perform the SVD decomposition over every
coefficient and Mean Squared Error (MSE). The results block Hi,j, to obtain the unitary matrices Ui,j, as in eq. 3.:
are presented graphically.
The paper is organized as follows: section 2 describes
the SVD decomposition and SVDU algorithm for H i , j = U i , j × Di , j × ViT, j (3)
watermarking. In section 3 are presented results and
analysis of the results. The conclusion is given in section 4 Step 3: Perform modification elements U2,1 and U3,1 of
the matrices Ui,j, of each block to obtain modified blocks
II. ALGORITHMS Ui,j’ in accordance with the information about binary
watermark Wi,j.
A. SVDU Algorithm Watermarking is accomplished in accordance with rule
The SVD decomposition applied to the image AM×N shown in eq. 4. and eq. 5. Binary watermark Wi,j is
returns three matrices: inserted with the modification performed between
elements U2,1 and element U3,1 in the first column, after
which it was obtained modified matrix Ui,j’:
AM × N = U × D × V ' (1)
where U is the left orthogonal, V is the right orthogonal, u '2,1 = sign(u2,1 ) ∗ (U avg + T / 2)
and D s the diagonal matrix with the singular elements, if wi , j = 1, (4)
u '3,1 = sign(u3,1 ) ∗ (U avg − T / 2)
and M and N are image dimensions. In [2] inserting the
watermark WN×N is accomplished by inserting the
watermark in to the D matrix (D' = D + α × W), where α u '2,1 = sign(u2,1 ) ∗ (U avg − T / 2)
denotes inserting coefficient. In [10] it was suggested, to if wi , j = 0, (5)
divide the image into the blocks, dimensions Mb × Nb u '3,1 = sign(u3,1 ) ∗ (U avg + T / 2)
where Mb = M / Mw, and Nb = N / Nw with the Mw and Nw
denotes dimensions of watermark. where sign (x) denotes the sign of the element x and
Watermarking based on the SVD algorithm is done U avg = ( u2,1 + u3,1 ) / 2 , x denotes the absolute value of
with inserting one bit of the watermark Wi.j in the block
Ui,j. Thereof the algorithm is called SVDU algorithm. The the x.
SVDU watermarking algorithm has two important Step 4: The blocks with the watermark are obtained in
characteristics, which refer to the orthogonal matrix U. accordance with the eq. 6.
This characteristic of the U matrix referred to the
relations between elements of the first column: a) all H 'i , j = U 'i , j × Di , j × ViT, j (6)
elements have the same sign, and b) their values are with
small differences [9]. The fact, that the values of the
elements U2,1 and U3,1 are with the small differences is Step 5: Repeat step 2 – 4 until of the bits of the
used for inserting the one bit of the watermark watermark are inserted, and until obtain watermarked
performing the modification values of these two image Aw.
elements. Selection two elements which will be modified Extraction of the watermark is performing following
is based on the statistical dependency analysis presented steps:
in [10], where it is shown that the strongest correlation is Step 1: Watermarked image Aw is divided on the blocks
between elements U2,1 and U3,1 (NC = 0,9969 for image dimensions Hi,j’ = 4 × 4.
Lena [10]), and optimum is to use blocks dimensions
Step 2: On these Hi,j’ block SVD decomposition is
4 × 4. As example of the values of matrix, in eq. 2. is
applied to obtain matrix Ui,j’.
shown the original blocks of image Lena (A) and U
matrix (after applying SVD decomposition), where the Step 3: Watermark Wi,j’, is obtained, from elements U21
characteristic is observed: and U3,1, using relation presented in eq. 7.:
267
7th International Conference on Information Society and Technology ICIST 2017
presented in [11]. Noise is added to the watermarked Step 5: From filtered watermarked image Awr,
image with the percentage p to obtain watermarked image watermark Wer is extracted.
with the noise Aws. MDB algorithm is applied on the As a quality measurement of filtered image and
watermarked image with added noise Aws to obtain extracted watermark was applied MSE and NC:
filtered image Awr.
The MDB algorithm for detection and elimination of
∑( x − yij )
2
ij
impulse noise is realized un following steps:
MSE =
ij
Step 1: From image Aws, blocks are created: (8)
M ×N
FOR i = 1 : M - Mp
∑( x y )
FOR j = 1 : N - Np
ij ij
FOR k = 1 : Mp
NC = i = 1..M , j = 1...N
ij
(9)
FOR l = 1 : Np
∑(x ) ∑( y )
2 2
B(f,g) = Aws(i + k, j + l); ij ij
ij ij
END
END where: xi,j – denote pixel of original image element, yi,j –
Step 2: Sort elements denote pixel of processed image, M × N – image
[Pmin, Pmed, Pmax] = SORT(B); dimensions.
Step 3: Calculate the mean values Pmean, In the experiment as host image was used image Lena
(512 × 512) figure 1. a., and as watermark image was used
Pmean = MEAN(B)
image presented on figure 1. b. (128 × 128). Coefficient of
Step 4: Detecting the pixels with the impulse noise: watermarking was fixed on value T = 0.04. Impulse noise
IF (Pmin < B (X, Y) & Pmin < 0 & Pmax < 255 & ... is varied p = {2, 10, 15, 20, 25, 30, 35, 40, 45, 50} %.
... B (X, Y) < Pmax);
/*pixel is without noise*/
ELSE
/*pixel is with noise*/
IF (Pmin == B (2, 2) & Pmin == 0 | B (2, 2) == ...
...Pmax & Pmax = 255);
B (X, Y) = Pmed;
ELSE
Pmed = Pmean; a) b)
END
END Figure 1. Images used experiment: a) Lena, b) watermark
END
END. B. Results
Step 4: Creating reconstructed (filtered) watermarked The result of applying SVDU watermarking algorithm
image Awr. is shown in the figure 2.
In step 2. SORT function was performing sorting all
elements of the block in ascending order by columns. As a
result of these activities the elements of foreign diagonal
are sorted in ascending order (Pmin = B3,3, Pmed = B2,2 and
Pmax = B1,1), where Pmin denotes minimal, Pmed medium and
Pmax maximum value of window elements.
III. EKSPERIMENTAL RESULTS AND ANALYSIS
A. Eksperiment
For the purpose of testing SVDU watermarking a) b)
algorithm following experiment was performed: Figure 2. Image appears: a) After watermarking, b) watermark
Step 1: Watermarked image Aw is obtained from extracted after inserting.
original image A, divided on blocks Mb × Nb = 4 × 4, on
which is applied SVD decomposition and inserted one bit On figure 3. is shown appear watermarked image after
of binary watermark (dimensions Mw × Nw = 128 × 128) in adding noise with p = {10, 30, 50} % and appear
each block with insertion coefficient T = 0.04. watermarked image after applying MDB algorithm.
Step 2: On the watermarked image Aw was added noise
with different percentage p (obtained Aws). On figure 4. is shown appear extracted watermark from
image with added noise and appear of extracted
Step 3: From image with added noise Aws, watermark
watermark, from watermarked image with noise, after
Wes is extracted.
applying MDB algorithm.
Step 4: On the watermarked image with the added noise
Aws, MDB algorithm is applied.
268
7th International Conference on Information Society and Technology ICIST 2017
a) b) e) f)
Fifure 4. Appear of extracted watermark from image with added noise:
a) p = 10 %, c) p = 30 %, e) p = 50 %, and appear of extracted watermark
from image after applying MDB algorithm with: b) p = 10 %, d)
p = 30 % and f) p = 50 %.
c) d) 0,45
0,40
0,35
0,30
added noise
MSE
0,25
added noise and filtered
0,20
0,15
0,10
0,05
e) f)
0,00
Figure 3. Appear of watermarked image Lena with added 0 5 10 15 20 25 30 35 40 45 50
1,00
0,95
a) b) 0,90
added noise
0,85 added noise and filtered
NC
0,80
0,75
0,70
0,65
0,60
0 5 10 15 20 25 30 35 40 45 50
c) d) p (%)
269
7th International Conference on Information Society and Technology ICIST 2017
270
7th International Conference on Information Society and Technology ICIST 2017
Abstract—Clinical decision support systems (CDSS) are Disadvantage of rule-based systems1, such as MYCIN,
designed to assist physicians and other health professionals is that they become increasingly unmanageable when the
with numerous clinical tasks, such as establishing a number of rules increases, and they often contain several
diagnosis or determining the appropriate course of therapy. thousand rules or more. Furthermore, “if–then” rule
While much research has been conducted in the field, with systems usually have difficulty dealing with uncertainty,
various degrees of success, little emphasis has been placed due to the fact that they don’t employ logic with sound
on unified knowledge representation with uncertainty and theoretical support for representation of different kinds of
learning capabilities of a diagnostic system. Non-axiomatic uncertainty.
logic (NAL), an Artificial General Intelligence project It is our belief that aforementioned issues can be solved
designed to realize general-purpose logic, provides
with the use of techniques of Artificial General
consistent format for representing and reasoning-learning
Intelligence (AGI). AGI is a field of artificial intelligence
with knowledge that has diverse degrees of uncertainty. This
whose research is focused on the goal of creating a
paper reviews the methods used in design of CDSS, and
general purpose intelligent system. In contrast to
proposes a CDSS framework based on NAL.
traditional knowledge-based systems, AGI systems are
designed to with general-purpose logic that is able to
I. INTRODUCTION reason across multiple domains and deal with various
kinds of uncertainty. A well suited AGI project for
Clinical decision support systems (CDSS) are systems reasoning in medicine is Non-Axiomatic Reasoning
designed to influence clinical decision-making tasks, done System (NARS) [5, 6]. Among other things, NARS is
by physicians and other health professionals, in order to scalable, supports learning, and provides support for the
improve the quality of health care [1]. complete problem-solving process. A case-study in
An important and non-trivial problem of CDSS is the medical diagnosis of NARS and its core logic, the Non-
diagnostic process – helping the physicians in determining axiomatic Logic (NAL), was conducted in 2011. That case
which questions to ask, or deciding which tests to order study showed that NAL can be successfully applied in the
and procedures to perform. To model such a complex medical domain [2]. NAL has, also been used for analysis
tasks the system would need to able to resolve of reasoning applicable to Percutaneous Coronary
inconsistencies, to conclude new evidence based on its Interventions [3]. In the domain of decision support, NAL
current knowledge, and to ask questions and generate has been used as a core logic for a conceptual framework
hypotheses in order to explain its reasoning. In other with reasoning capabilities for crisis response decision–
words, a CDSS needs to have human-like reasoning support systems [4]. The framework case study was in the
capabilities. domain of urban firefighting. Results of the study showed
Current research in the field of clinical systems is that their framework provided a promising approach for
largely focused on the application of artificial neural intelligent decision–support systems in crisis
networks (ANNs), due to their proven high accuracy [7, 8, management.
9]. Research has shown that ANNs diagnostic predictions In this paper we propose a conceptual framework for
are usually as good as, or better than, those of the CDSS, with the support for the complete diagnostic
physicians. However, the knowledge stored in ANNs is process, based on the use of text mining and AGI
very hard to interpret for medical professionals and the techniques. To the best of our knowledge, there haven’t
reasoning behind the computed diagnostic solutions is been any significant attempts at solving aforementioned
hard to explain, that is, they don’t provide support for the issues present in rule or machine learning based CDSS
complete diagnostic process [1]. with an AGI approach, that is, with the use of a general-
Another approach to diagnostic support are knowledge- purpose intelligent system with reasoning and learning
based expert systems [10, 11, 12]. Knowledge-based capabilities.
expert systems have an intrinsic quality of being able to This paper is organized as follows. Section 2 contains a
provide decision explanations. One of the first and widely brief overview of NAL. Our conceptual framework is
known knowledge-based expert system was MYCIN, presented in Section 3. Section 4 presents an example
developed at Stanford University in the 1970s. MYCINs NAL case study in medical diagnosis. Section 5 concludes
knowledge base consisted of if-then rules with certainty the paper and provides directions of future work.
factors attached to consequent diagnoses. For inference a
backward chaining reasoning strategy was employed. In a II. NON-AXIOMATIC LOGIC
test conducted at Stanford, MYCIN outperformed
members of the Stanford medical school faculty. NAL is the core logic of NARS an AGI project. It is
However, MYCIN was never actually used in practice and designed to be a uniform logical foundation for AGI, as
after 1979, due to lack of maintenance, its knowledge base well as an abstract description of the human thought
become out of date [13]. process. It is constructed around the notion of insufficient
1
When a knowledge-based system is implemented with
production rules, then it is called a rule based system [10, 16].
271
7th International Conference on Information Society and Technology ICIST 2017
knowledge and resources. That notion defines that the by the user. Additionally, if the knowledge comes from
meaning of any term and the truth-value of any statement statistical data source, then the evidence can be extracted
is determined by the evidence available to the system. By from the given data, which in turn determines the truth-
design, NAL can consistently represent and reason with values. For example, a physician can, based on his
different types of uncertainty. experience, know that a certain disease presents with a
NAL is organized into 9 levels. Each level extends the certain symptom in 80% of the cases. That expert
previous ones by introducing new concepts, grammar, knowledge can be directly imputed in the system. On the
and/or inference rules. A practical system which uses other hand, if the statistical data is available for a disease
NAL doesn’t need to implement all 9 levels to function, and its symptoms the frequency and confidence values
instead it can only use the levels needed for its specific can be directly calculated by the given formulas.
reasoning tasks. Briefly, each level includes the It is important to note that NAL is not better than
following: mathematical of probabilistic logic in a sense that it will
• NAL-1: Inference rules for inheritance. provide more accurate inferences then them, the creator
• NAL-2: Similarity, instance, and property copulas. of NAL stated: “Whenever the knowledge/resource
• NAL-3: Compound terms. demands of a traditional model can be satisfied, it usually
• NAL-4: Arbitrary relations among terms. works better than NAL. When those demands cannot be
• NAL-5: Higher-order statements. satisfied, NAL works better than illegally applying a
• NAL-6: Variables. traditional model” [6].
• NAL-7: The concept of time.
• NAL-8: Support for operations, i.e. procedural III. PROPOSED FRAMEWORK
statements. In order for a CDSS to properly support the complete
• NAL-9: Self-control and self-monitoring. diagnostic process, it needs to be able to gather
NAL is a term logic, its sentences are given in the form information, recognize patterns, ask questions, make
of “S r P (f, c)”. Where S is the subject, P is the predicate decision under uncertainty, and such, that is it needs to
and r is the copula which represents the relation between have human-like reasoning capabilities [16, 17]. A
the two terms. The most basic relation in NAL is system with human-like reasoning capabilities needs to
inheritance, symbolized as “→”. Statements S→P can be be constructed around several key concepts, such as [4, 5,
read as S is a type of P, for example coldPatient→patient 6]:
a coldPatient is a type of patient. • its inference engine needs to be constructed around
The truth-value (f, c) is a pair of real numbers, in the a solid theoretical foundation that supports
range [0,1], defined by the evidence available to the uncertain and inconsistent knowledge,
system. The letter f is called frequency and its value • knowledge base must be scalable,
represents the ratio between the positive evidence w+ and • it should provide real-time response,
total evidence w. The second number c is the confidence • system should support learning, and
of the system which describes how stable the frequency • it should provide support for external processes
will be in light of the new evidence. Confidence is (e.g. it should be able to pass the relevant data from
defined as w/(w+k), where k is the constant amount of its knowledgebase to a ANN and use the results
future evidence (k is usually set to the value of 1). For that ANN generates in its own reasoning).
example, in the statement cold-patient→[cough] <0.79, Aforementioned concepts are usually attributed to AGI
0.73> the value of 0.79 (frequency) can be read that there systems, and they can be used as a foundation for the
is a 79% chance of cough being a symptom of cold, and construction of a CDSS. Use of AGI systems could
the value of 0.73 (confidence) means that the system is decrease the cost and improve the reusability of the
73% confident that the value of frequency won’t be system in different domains. Therefore we propose a
changed in the foreseeable future by new evidence. framework that uses text mining techniques for
The system provides various kinds of inference rules knowledge-base maintenance/creation and a general-
such as deduction, induction, and abduction. Those purpose logic for its reasoning with learning capabilities.
inference rules can derive new knowledge from existing Our proposed framework is given in figure 1. The
knowledge. Each of the rules has its truth-value function framework was designed with NAL as the core logic of
that calculates the truth-value of the conclusion according the reasoning engine. The framework consists of three
to the evidential support provided by the premises. NALs main components: the text mining subsystem, the
knowledge base doesn’t need to be consistent at all times, knowledge base, and the reasoning engine.
and its revision rule is used to handle inconsistencies. The Text mining in medicine is not a new concept. It has
revision rule is used to unify the same knowledge thus far been applied for named entity recognition, de-
contained in two beliefs derived from disjoint evidential identification, etc. [14, 15]. In the aspect of CDSS text
sets. Also, if the amount of statements becomes too large, mining can be applied to medical textbooks and
the composition rule can be used to combine them and electronic health records. Medical textbooks are
enable more efficient reasoning. Full description of the thoroughly vetted sources of medical knowledge written
logic can be found in [5, 6]. by medical experts. With the use of text mining, the
In a practical scenario, the truth-value of the knowledge in medical textbooks can be translated into a
knowledge initially given to the system can determined knowledge base of a CDSS, which in turn can decrease
272
7th International Conference on Information Society and Technology ICIST 2017
273
7th International Conference on Information Society and Technology ICIST 2017
274
7th International Conference on Information Society and Technology ICIST 2017
Abstract—Pulse diagnosis, the main diagnosis method in individuals’ pre-meal and post-meal signals. In D.
traditional Chinese medicine, is a non-invasive and a Rangaprakash’s experiment [12], the pulse signal was
convenient way to check the health status. However, it takes taken from the volunteers before and after exercise. Seven
many years to master the pulse diagnosis and currently, the spatial features extracted from the pulse wave are used to
effective features of lung cancer patients’ radial artery classify the signals into these two groups using Recursive
signal are not found in clinical medicine. In our work, we Cluster Elimination based Support Vector Machine. Their
analyze the characteristics of the single periods of pulse work proved the features obtained from radial artery pulse
signal and extracted twelve features that are used to classify signal can be used to detect physiological state of the
the pulse signals of healthy individuals and lung cancer individuals.
patients using different classifiers. The results achieved by
In [13], the authors utilize a convolutional neural
the proposed features are found to be up to 90.63% accurate.
network to classify twelve pulse waveforms -- long pulse,
Index Terms—radial artery pulse feature extraction; pulse
feeble pulse, stringy pulse, thready pulse, deep pulse,
signal classification; lung cancer detection rapid pulse, hesitant pulse, soft pulse, short pulse, slippery
pulse, flood pulse, and faint pulse. Half dataset with 200
samples was selected to train the classifier without feature
I. INTRODUCTION extraction and the classification accuracy was found to be
93.49%. However, these pulse waveforms classification
Lung cancer is one of the major public health problems results couldn’t be applied in clinical medicine directly.
worldwide [1]. Pulse diagnosis is a safe and auxiliary way
to detect lung cancer in clinical medicine [2]. Since pulse Wai Hei Chow defined the Doppler parameters to be
diagnosis heavily depends on physicians’ experience, it the disease sensitive features, and applied Support Vector
takes a very long time and a lot of energy to master the Machine to distinguish between Acute Appendicitis
technique of pulse diagnosis [3]. With the development of patients and healthy individuals [14]. Shujie Gong
sensor technology, signal processing, data analysis designed a wrist-pulse sensing and analyzing system to
techniques, and artificial intelligence, pulse based recognize cirrhosis patients with an accuracy of 87.09%
diagnosis can be implemented by processing the radial [15]. Different from their work, our research aims to
artery pulse signal [4]-[5]. To popularize the pulse identify lung cancer patients from healthy individuals.
diagnosis in clinical medicine, researchers have been Currently in radial artery signal processing research,
trying to acquire and process the radial artery pulse signal only a few works are directly related to the detection of
using sensor technology and machine learning etc. [6]-[7]. some specific disease. To the best of our knowledge, no
Yong Jun et al [8] designed a sensor from a resonator to related work uses radial artery pulse signal to identify lung
acquire the radial artery pulse signal based on reflection cancer patients. This work analyzes the radial artery pulse
coefficient; the sensor can be embedded in a wearable signal to standardize the features for identification of lung
communication device. According to their experiment cancer patients’ radial artery pulse signal.
results, the acquired signal reflects the useful artery The rest of our paper is organized as follows: section II
information and meets the requirements of the signal explains the data acquisition and preprocessing briefly.
analysis for clinical purposes. The authors in [9] analyzed The proposed features for lung cancer detection are
three types of radial artery sensors: pressure, photoelectric, presented in section III. Section IV discusses the
and ultrasonic sensors according to physical quantity, experiment result, and Section V concludes the paper.
relationships, and sensitivity factors. They concluded that
pressure sensor is more suitable for disease diagnosis as it II. DATA ACQUISITION AND PREPROCESSING
can change the elastic property and thickness of the vessel
wall. According to the pulse diagnosis theory [10], lung The dataset is acquired from Shandong Academy of
cancer patients’ vessel wall is different from that of Chinese Medicine. Fig. 1(a) shows the pulse signal
healthy individuals, thus the pulse signal acquired form acquisition device. The sampling rate is 800 Hz. The raw
pressure sensor in our experiment has useful information continuous signal is collected from the Cun, Guan and Chi
for detecting lung cancer. wrist locations of 20 healthy individuals and 16 lung
cancer patients. Different levels of pressure are imposed
Machine learning has been applied to analyze the radial
on the radial artery by moving the robotic arm when
artery pulse signal which is a popular and effective way to
acquiring the radial pulse signal. An example of the raw
analyze the radial artery signal [13-15]. In [11], the author
continuous signal is shown in Fig. 1(b), and Fig. 2(a) is
extracted the spectral features from radial artery pulse
the enlarged local signal.
signal and adopted support vector machine to classify the
275
7th International Conference on Information Society and Technology ICIST 2017
(a)
(a)
(b)
(b)
Figure 3. (a) Examples of the invalid periods (b) Examples of the valid
periods
276
7th International Conference on Information Society and Technology ICIST 2017
divided into 5 parts (a1, a2, a3, b1, b2) in time domain as TABLE 1.
shown in Fig. 4. DEFINITIONS OF FEATURES
pressure depth of probe descent
The features are extracted from different parts. Then we
apply the Principle Component Analysis (PCA) to find ave_single average value of the corresponding baseline
twelve features that are summarized in table 1. The twelve len_period Length of the period
features are used to train the classifiers and identify the pmax_index index of the maximum
lung cancer patients’ signals. num_09 number of the points fits maximum-
Feature pressure stands for different pressure imposed minimum>= 0.9
to volunteers’ radial artery. When changing the imposed ave_a1 average value of a signal in b1 clip
pressure, the baseline of the signal changes as well. Thus, ave_a2 average value of a signal in b2 clip
we calculate the baseline’s average value (ave_single) to ave_a3 average value of a signal in b3 clip
be as a feature. Period’s length (len_period) is given in ave_b1 average value of a signal in c1 clip
seconds, and average values of a1, a2, a3, b1, and b2 are ave_b2 average value of a signal in c2 clip
also extracted as features. A lung cancer patient’s cardiac
rapid ejection period is longer than that of a healthy min_psub distance between the peaks in heart slow-firing
individual as shown in Fig.4. Accordingly, we extract the blood period
location record the acquiring location within Cun,
index of the maximum amplitude (pmax_index). In addition,
Guan, Chi
a lung cancer patient’s heart slow-firing blood period is
different from that of a healthy individual in shape.
Feature min_psub is the distance between the peaks in
heart slow-firing blood period. Since the radial artery IV. CLASIFICATION
signal is acquired from three probes on Cun, Guan and The training set is made by signals of 2 healthy
Chi, the location is also taken as a feature. individuals and 2 lung cancer patients. The rest of the data
is a testing set. At the extraction of the aforementioned
features we use the following classifiers to classify the
periods of healthy individuals and lung cancer patients:
Linear SVM (support vector machine), Coarse Gaussian
SVM, Fine KNN (k nearest neighbors), Cosine KNN,
Cosine KNN, Subspace discriminant, Subspace KNN,
Simple Tree and Quadratic discriminant. Each
individual’s classification results are counted respectively
to calculate the lung cancer case rate. In this experiment,
the one whose lung cancer periods are accounted for 50%
or more is diagnosed to be a lung cancer patient. The
diagnosis results are shown in Table 2.
As shown in Table 2, the support vector machine with
linear kernel achieves 78.13% accuracy. Whereas the
SVM with Gaussian kernel and coarse distinctions is
found to be 71.88% accurate. Compared with SVM, K-
nearest neighbors achieves better accuracy above 80%.
Fine KNN set finely detailed distinctions between classes
with 1 neighbor. Cosine KNN is calculated by medium
(a) distinctions between classes using a Cosine distance
metric, and the number of neighbors is set to 10. Subspace
KNN adopts subspace with nearest neighbor learners.
Subspace discriminant uses subspace with discriminant
learners. Quadratic discriminant is the classifier that
achieves the highest accuracy of 90.63%.
TABLE 2.
DIAGNOSIS RESULT
Classifier Diagnosis Accuracy
Linear SVM 78.13%
Coarse Gaussian SVM 71.88%
Fine KNN 84.38%
Cosine KNN 87.50%
Subspace discriminant 75.00%
Subspace KNN 87.25%
Simple tree 65.63%
Quadratic discriminant 90.63%
(b)
Figure 4. The single period radial artery signals (a) signal of healthy
individual; (b) signal of lung cancer patient.
277
7th International Conference on Information Society and Technology ICIST 2017
278
7th International Conference on Information Society and Technology ICIST 2017
mare.simeonov@gmail.com
nikola.korunovic@masfak.ni.ac.rs
miroslav.trajanovic@masfak.ni.ac.rs
manfred.zehn@tu-berlin.de
mitkovic@gmail.com
Abstract—Selfdynamisable internal fixator (SIF) is a type of the healing process. In those cases an extra surgery is still
medical device used in internal fixation of long bones. required. It is highly desirable to minimize the number of
Occasional SIF failure requires an extra surgery, which such occurrences.
causes an additional trauma for the patient. To minimize the Failure of SIF components is related to its stress state
risk of failure, a methodology was established for during the healing process. For each specific patient and
optimization of SIF structure and position. It is based on type of femoral fracture there exists an optimal
sensitivity studies or structural optimization procedures, configuration and position of SIF components that
performed using finite element method (FEM). Automatic minimizes component stresses and thus the possibility of
creation of FEM models, for any combination of values of failure. In that manner, a research goal was set which
dimensional and positional parameters, is enabled trough
aimed to establish a methodology for optimization of SIF
creation of a robust and flexible CAD model and
configuration and position considering any subject-
establishment of bi-directional associativity between CAD
specific case of femoral fracture. At the same time, the
and FEM models. This paper focuses on the results of
sensitivity study, performed using the defined methodology.
whole optimization process was foreseen to be as robust
and fast as possible so it may be used in everyday medical
I. INTRODUCTION practice.
Fractures of long bones typically occur as a II. METHODOLOGY
consequence of traffic accidents or extreme sports injuries The defined methodology for optimization of SIF
and usually require stabilization by surgery. configuration and position consist of four steps, explained
Selfdynamisable internal fixator (SIF), invented by prof. below.
Mitković, is a type of medical device used in internal
fixation of long bones [1]. The structure of SIF is modular 1. In the first step, the subject-specific CAD model of
(Fig. 1). One of the modules, trochanteric unit with bar, is femur is created. The creation process consists of two
produced in four different sizes. The number and mutual stages: the creation of a polygonal model of the femur
position of clamps can also be changed, depending on and the creation of a CAD model of the femur.
application. Polygonal model is obtained as a result of a pre-
defined procedure, which starts with CT scanning of
patient’s lower extremities. The CAD model is than
created from medical images, in a process that
includes the extraction of the zone of interest, point
cloud creation, triangulation, surface fitting and
creation of solid features. In order to obtain the CAD
model of femur, it is necessary to remove the
segments of the polygonal model that actually don’t
belong to the femoral bone (parts of the vascular
system visible due to insertion of contrast media or
Figure 1. SIF, configured for subtrochanteric femoral fracture surrounding bones). Than the interior of the model is
treatment cleaned, whereby the excessive geometry that mainly
originates from trabecular bone structure is removed.
Selfdynamisable internal fixator is designed to allow for After the polygonal model is healed and, if required,
the change of axial position of fixated bone parts, which its smoothness is increased, a closed NURBS surface
means that the bone parts can get closer over time. In this that represents the outer envelope of the model can be
way, there exists a greater probability of healing process created using surface fitting procedures. By filling the
success, thus an extra surgery, which represents an surface with solid geometry, CAD model of femur is
additional discomfort to the patient, is usually avoided [2]. created [3], [4].
Occasionally a failure occurs in SIF components during
279
7th International Conference on Information Society and Technology ICIST 2017
280
7th International Conference on Information Society and Technology ICIST 2017
281
7th International Conference on Information Society and Technology ICIST 2017
Abstract—Conjugate to Speech Science are Physiology, This research explores the major factors that enable this
Anatomy, Linguistics, Physics, Computer Science, synaesthetic augmentation. Neurophysiological studies
Psychology and so on, that provide tools and methodologies provide complementary intelligence about the way that
which can decipher the key elements of phonological activity acoustic evidence indicates how speech signal variability
in depth. Additionally, sonic perception provides in relates with multimodal, kinetic and predominantly
synaesthetic terms an acoustic augmentation due to the sparkling bodily sensations [3][4].
complex activity of tuned voicing with articulation-related Music seems to be the most prominent triggering
physiological and neuronal dipoles. All these features affect mechanism in terms of form, harmony and emotional
the oral and aural communication channel and enhance the
arousing. It seems to offer a global substrate for the
understanding of human communication in terms of
interconnection of oral and aural communication.
Disordered Voicing, Vocal Affections, Music Experience and
Nevertheless, it is evident that speech communication
Human Attitudes.
plays a significant role as well, when the multitude of the
Keywords—Phonation, Singing, Oral and Aural so many spoken languages (some 7,200 worldwide) is
Communication, Neurophysiology, Anatomy, Synaesthesia. taken into account [4].
Given that this research is very new, its line of
investigation needs solid foundations. For this reason, it
relies heavily on well-attested findings from the diachrony
I. INTRODUCTION of music research. For instance, in 1993 James [5] had
stated: "Music contains in its essence a mystery: everyone
Human Communication relies heavily on how we
agrees that it communicates, but how?"
perceive the acoustic signal. In the multimedia world we
live in, in sensory terms, 89% of input is ascribed to Music cognition therefore has emerged the last 40 years
vision, 10% to hearing, and 1% for all the other senses. as an interdisciplinary field that brings together scientific
However, in the practice of everyday communication, fields like cognitive psychology, artificial intelligence,
people utilize hearing, while assigned to some other tasks, neuroscience and linguistics [6]. As the focus of popular
at a rate significantly higher than the 10% that is credited music shifts from instrumental to vocal renditions,
to sonic stimuli. To give some examples, a listener cognitive processes may be described more easily in
receives a multitude of aural signals that trigger the neurophysiological terms, although contemporary music is
sensory mechanisms for hearing when talking to his relentlessly genre-crossing, combining elements of free
phone, when listening to music in his car or when harking jazz, rock, pop, dance music, ballads, etc. Under this
at the discussions of his professional environment. prism, we may understand the purpose of music as an
Although he cannot see the sources of phonation, he may interaction with an intellectual system that aims to
well imagine how the person singing or talking may look "engage, inspire, and enthrall a group of musicians into
like in anatomical and neurophysiological terms. doing music that they are excited about, so that that
excitement is passed on to the audience [7]."
Literally, synaesthetic perception promotes a union of
senses: It is a neurological oddity, in which a sensation of Therefore, although in neurophysiological terms,
one of the senses triggers involuntarily and automatically speech communication is the medium for delivering the
a parallel sensory or cognitive pathway [1]. It is a focal message, when music is involved, our intellect forms
point of recent research in acoustics, relating chromatic more complex dialects, as they have been discovered thus
reconstruction with musical impression. far on studies of the Brain Computer Interface
performance of post-deaf subjects, rehabilitated with
Even further, as far as sensory augmentation and
Cochlear Implants [8]. So, apart from the lexical
substitution are concerned, the last few years scientists
semantics of human communication, the "new music"
have been arguing that humans work with a lot more than
revolution, as it is abundantly received via music for
five senses, that they have some tenths of emotions, and
films, radio-emissions and televised live concerts,
that cerebral activity continuously shapes impressions of
permeates many other sensations to the everyday sonic
the world we live in [2].
synaesthetic potential of the human oral-aural
communication channel [9].
282
7th International Conference on Information Society and Technology ICIST 2017
283
7th International Conference on Information Society and Technology ICIST 2017
a. b. c.
Figure 2. Singer V's (female) perceived characteristics
lifted, and listeners augment their synaesthetic awareness singer. Al the test-bed songs were recorded in a studio.
more or less at the same extent on trans-national basis. They range from pop style ballads up to some kind of hard
Nonetheless, it is the first time that a synaesthetic rock renditions. This paper is the first step in a research
association is attempted between the sounds we hear and that is bound to reveal a wealth of new bodily and
the neurophysiological sketch-up of the source, as far as psychokinetic features that affect the singing potential.
bodily construction is involved.
IV. RESULTS AND DISCUSSION
III. METHODOLOGY AND RESULTS In order to improve the transparency of the results, we
As previously stated, some 10 semi-professional singers will discuss them for each singer individually.
were used, both male and female, singing various songs.
Their mother tongue is Greek. Six very characteristic Singer V. (female)
recordings were chosen, some in English and some in The estimated age of singer named V. was 31.4 years
Greek. (SD = 2.9), which is higher than actual (28). Students
estimated that the singer was 167.7cm high (SD = 4.7),
The evaluators tried to figure out the anatomical, which is close to the actual height (169cm), but highly
neurophysiologic and temperaments build up of the overestimated her weight (M = 73.3; SD = 7.1), which
singers. They were even asked to describe how they was actually 51kg. An independent samples t-test was
imagine the upper part facial characteristics that depend conducted to examine whether there was a significant
on the nasal cavity with the nostrils. Indeed, this part plays gender difference between students’ estimation of the
a significant role in giving personified "images" of speech singer age, height and weight. The test revealed that there
perceptual characteristics based on the articulatory and were no statistically significant differences between male
phonological characteristics [4]. and female students (p = .544, p = .450 and p = .560
The survey was conducted in January 2017. The respectively).
evaluators were a pool of 32 students aged 20-25. The t-test was also conducted to examine whether there
Students participated anonymously, voluntarily and was a significant gender difference between young
individually. They submitted electronically their evaluators in relation to their self-estimation of the
anonymously processed reports. perceived singer hair, eyes and mouth. A statistically
The role of their tutor was to give initial instructions significant difference was revealed between male and
and ensure testing conditions. Testing was administered female students in perceived singer hair length (t = 1.08,
via the Aristotle University of Thessaloniki IT Center df = 11.4, p = .304). Males (M = 1.95, SD = 0.22)
services. perceived significantly longer hair length than did
The students that participated in the survey augmented females (M = 1.80, SD = 0.42). In total, 53% of students
their music perception by participating in a serious game correctly perceived brown hair color, 81% correctly
for musical intelligence. They expanded their personal perceived brown eye color. Most of the students imagined
experience from singing and listening to songs with the singer having a medium size mouth (53%) with
deduced reasoning and created a virtual profile of the normal lips (59%).
When students were asked to determine V.’s mood,
a. b. c.
Figure 3. Singer G.'s (male) perceived characteristics
284
7th International Conference on Information Society and Technology ICIST 2017
a. b. c.
Figure 4. Singer S.'s (female) perceived characteristics
59% perceived correctly (calm/relaxed), but the rest were center and right.
not far off as they described it as melancholic and A multiple linear regression was calculated to predict
pleasant. The results are presented on Figure 2, left. the singers’ voice profile (voice, style, volume) based on
V. has a calm, demanding, sweet-strenuous (loud) and student gender, but no significant regression equation was
tender voice. V.’s voice and volume perceived by found (p = .408, p = .399 and p = .317 respectfully).
students is presented on Figure 2, center and right.
A multiple linear regression was calculated to predict Singer S. (female)
singers’ voice profile (voice, style, volume) based on The estimated age of singer named S. was 29.5 years
student gender, but no significant regression equation was (SD = 2.9), which is much higher than actual (20).
found (p = .410, p = .319 and p = .642 respectfully). Students estimated that the singer was 169.1cm high (SD
= 4.7), which is very close to the actual height (165cm),
Singer G. (male) but extremely overestimated her weight (M = 78.0; SD =
The estimated age of singer named G. was 27.5 years 5.1), which was actually 42kg. An independent samples t-
(SD = 3.2), which is higher than actual (21). Students test was conducted to examine whether there was a
estimated that G. was 172.2cm high (SD = 4.1), which significant gender difference between students' estimation
underestimated the actual height (183cm). Similarly, they of the S. age, height and weight. The test revealed that
underestimated his weight (M = 77.5; SD = 4.1), which there were no statistically significant differences between
was actually 90kg. The t-test revealed that there were no male and female students (p = .975, p = .988 and p = .854
statistically significant differences between male and respectfully).
female students estimation of the singer age, height and Students mostly perceived the singer with the long hair
weight (p = .604, p = .855 and p = .216 respectfully). (91%), but only 34% correctly imagined brown (more did
black). Most students perceived the singer with brown
Students mostly perceived the singer with long hair eyes (91%). Most of the students imagined the singer
(90%), 87% imagined brown. A statistically significant having medium size mouth with normal lips (53%).
difference was revealed between male and female students When students were asked to determine S.’s mood,
in perceived singer eye color (t = -2.50, df = 20, p = .021). 44% perceived incorrectly as melancholic. S. is
Males (M = 2.52, SD = 0.87), perceived significantly more passionate, but students did not perceive it. The results
often green eye color than did females (M = 3.00, SD = are presented on Figure 4, left.
0.00). Each female student perceived G. with brown eyes, S. has a passionate and sweet voice. S.’s voice and
which was correct. Most of the students imagined that the volume perceived by students is presented on Figure 4,
singer has big size mouth (55%) with normal lips (78%). center and right.
When students were asked to determine G.’s mood, the A multiple linear regression was calculated to predict
results were colorful, as presented on Figure 3, left. This is S.’s voice profile (voice and volume) based on student
due to the fact that G. has a very creative character. gender, but no significant regression equation was found
G. has special voice quality (timbre). His voice and (p = .186 and p = .230 respectfully). However, when the
volume perceived by students is presented on Figure 3, same linear regression was calculated to predict S.’s
voice style based on student gender, a significant
a. b. c.
Figure 5. Singer A. 's (male) perceived characteristics
285
7th International Conference on Information Society and Technology ICIST 2017
a. b. c.
Figure 6. Singer T.'s (female) perceived characteristics
regression equation was found [F (1, 29) = 8.5; p = .007; R2 found (p = .738, p = .440 and p = .093 respectfully).
= .228]. The results show that student gender [β = -1.1; t
(29) = -2.9; p = .007] significantly predicted perceived Singer T. (female)
singing style. The estimated age of singer named T. was 28.2 years
(SD = 3.5), which is almost actual (27). Students
Singer A. (male) estimated that the singer was 169.5cm high (SD = 4.1),
The estimated age of singer named A. was 29.7 years which overestimated the actual height (160cm), and
(SD = 2.6), which is bit lower than actual (34). Students highly overestimated her weight (M = 78.3; SD = 6.5),
estimated that the singer was 174.6cm high (SD = 4.6) which was actually 50kg. An independent samples t-test
which is the actual height (175cm), but underestimated his was conducted to examine whether there was a
weight (M = 79.0; SD = 5.5) which was actually 85kg. An significant gender difference between students' estimation
independent samples t-test was conducted to examine of the singer age, height and weight.
whether there was a significant gender difference between The test revealed that there were no statistically
students estimation of the singer age, height and weight. significant differences between male and female students
The test revealed that there were no statistically significant (p = .898, p = .428 and p = .673 respectfully).
differences between male and female students (p = .714, p Most students perceived T. with the long hair (78%)
= .112 and p = .440 respectfully). and 50% correctly imagined brown color. About 50% of
Students mostly perceived A. with short hair (59%), the students incorrectly perceived T. with the blue/green
44% correctly imagined brown. The statistically eyes, as she has brown eyes. Most of the students
significant difference was revealed between male and imagined that singer having a medium size mouth (38%)
female students in perceiving singer eye color (t = 0.90, df with normal lips (69%).
= 11.8, p = .387). Males (M = 2.86, SD = 0.48) correctly When students were asked to determine T.’s mood,
perceived significantly more often brown eye color than only 12% perceived correctly (lively). Most of the
did females (M = 2.60, SD = 0.84). Students mostly students incorrectly perceived her mood as
imagined A. with big mouth and normal lips (53%). relaxed/melancholic (69%). The results are presented on
When students were asked to determine A.’s mood, Figure 6, left.
most incorrectly perceived him as wiry (59%), while 16% T. has an assertive, energetic and dynamic personality,
perceived him as serious. This maybe the consequence of with thin and sinewy, shrilling voice. T. ’s voice and
the fact that A. has a very strong personality, so students volume perceived by students is presented on Figure 6,
wrongly perceived him as nervous. The results are center and right.
presented on Figure 5, left. A multiple linear regression was calculated to predict
A. has a raw and rough voice. His voice and volume singer’s voice and style based on student gender, but no
perceived by students is presented on Figure 5, center and significant regression equation was found (p = .359 and p
right. = .490 respectfully). However, when the same linear
A multiple linear regression was calculated to predict regression was calculated to predict T.’s voice volume
singer’s voice profile (voice, style, volume) based on based on student gender, a significant regression equation
student gender, but no significant regression equation was was found [F (1, 29) = 7.2; p = .012; R2 = .199]. The results
a. b. c.
Figure 7. Singer Y.'s (female) perceived characteristics
286
7th International Conference on Information Society and Technology ICIST 2017
show that student gender [β = 0.79; t (29) = 2.7; p = .012] percentage of blue eyes, while in Greece most of them
significantly predicted perceived singing volume. have brown. Knowing the nationality of the singer
strongly biases such a prediction.
Singer Y. (female) In any case, the performer exhibits an exaggerated,
The estimated age of singer named Y. was 28.9 years augmented behavior and attempts to surpass the limits of
(SD = 3.3), which is excellent, as actually it is 28. his repertoire. Being well trained, he attempts to deliver a
Students estimated that the singer was 170.1cm high (SD stable musical and artistic rendition. The performer tries to
= 4.8), which indeed is the actual height (170cm), and adopt himself to the singing role's prerequisites. Although
slightly overestimated her weight (M = 79.0; SD = 6.5), some neurological factors, like mood and emotion, may be
which actually is 75kg. An independent samples t-test imitated, nonetheless the singing voice reveals a common
was conducted to examine whether there was a denominator linked with bodily construction, personal,
significant gender difference between the students' cultural and psychological ego (Figure 8).
estimation of the singer age, height and weight. The test These parameters can be more or less predicted.
revealed that there were no statistically significant
differences between male and female students (p = .237,
p = .091 and p = .577 respectfully). REFERENCES
The statistically significant difference was revealed
between male and female students in perceived singer [1] M. Bedny, A. Pascual-Leone, S. Dravida, and R. Saxe, "A
sensitive period for language in the visual cortex: Distinct patterns
hair length (t = 1.13, df = 14.4, p = .277). Males (M = of plasticity in congenitally versus late blind adults", Brain Lang.,
1.81, SD = 0.40) correctly perceived significantly more 2012, 122(3), pp. 162–170.
often long hair than did females (M = 1.60, SD = 0.52). [2] S.J. Blakemore and S. Choudhury, "Development of the
Only 37% of students correctly imagined brown color adolescent brain: implications for executive function and social
hair. Most of students correctly perceived the singer with cognition", J Child Psychol Psychiatry, 2006, 47(3-4), pp. 296-
brown eyes (84%). About half of the students imagined 312.
that singer having medium size mouth with normal lips. [3] O. Bai, V. Rathi, P. Lin, D. Huang, H. Battapady, D. Y. Fei, L.
When students were asked to determine Y.’s mood, Schneider, E. Houdayer, X. Chen and M. Hallett, "Prediction of
only 19% perceived correctly (serious/wiry). Most of the human voluntary movement before it occurs", Clinical
Neurophysiology, 2011, 122(2), pp. 364–372.
students incorrectly imagined her as melancholic and
relaxed, probably because she has a very sweet voice. [4] D. Politis and M. Tsalighopoulos, "Oral and Aural
Communication Interconnection: The Substrate for Global
The results are presented on Figure 7, left. Musicality", in D. Politis, M. Tsalighopoulos, I. Iglezakis (Eds),
Y. has an assertive, energetic and dynamic personality, Digital Tools for Computer Music Production and Distribution,
and the same time a sweet, bass voice. Y.’ s voice and IGI Global, PA:Hershey, USA, 2016, pp. 1-24.
volume perceived by students is presented on Figure 7, [5] J. James, The Music of the Spheres - Music, Science and the
center and right. Natural Order of the Universe, Grove Press, USA, 1993.
A multiple linear regression was calculated to predict [6] D. Temperley, The Cognition of Basic Musical Structures, The
singer' s voice profile (voice, style, volume) based on MIT Press, MA:Boston, USA, 2001.
student gender, but no significant regression equation was [7] J. Zorn, "The Game Pieces", in C. Cox and D. Warner (Eds),
found (p = .854, p = .929 and p = .229 respectfully). Audio Culture - Readings in Modern Music, The Continuum
International Publishing Group, NY:New York, 2007.
[8] D. Politis, M. Tsalighopoulos, and G. Kyriafinis, "Dialectic &
V. CONCLUSION Reconstructive Musicality: Stressing the Brain-Computer
Interface", in Proceedings of the 2014 International Conference
With a rough estimate around 75% evaluators may on Interactive Mobile Communication Technologies and Learning
decode the singer's voice and connect it with the actual (IMCL2014), 13-14 November 2014, Thessaloniki, Greece.
neurophysiological characteristics of his/her bodily [9] D. Deutsch, The Psychology of Music, .2nd Ed., Academic Press,
construction and personality. Some characteristics, like the USA, 1999.
color of the eyes, are rather irrelevant with the singing [10] J. Sundberg, The Science of the Singing Voice, Northern Illinois
capacity, but the evaluators prove to be good gamblers, University Press, USA, 1987.
obviously extrapolating data from their ethnological [11] J. Iwarsson, M. Thomasson and J. Sundberg, "Lung Volume and
Phonation: A Methodological Study", Logopedics Phoniatrics
habitat. Vocology, 1996, 21(1), pp. 13-20.
Generally speaking, some parameters may be strongly [12] L. Raphael, G. Borden, and K. Harris, Speech Science Primer -
dependent on ethnological, linguistic and cultural biases. Physiology, Acoustics and Perception of Speech, Lippincot,
For instance in Serbia singers would have an increased Williams & Wilkins, USA, 2007.
a. b. c. d.
Figure 8. Singers V. and T. in their everyday communication
287(a. and c.) and in their artistic transformation (b. and d.)
7th International Conference on Information Society and Technology ICIST 2017
Abstract— Due to its position and anatomy, human anatomical position, stable functional fixation, a
mandible is sensitive to trauma. Fractures of the mandible traumatic surgical technique and active function.
are different by type, origin (e.g. injuries, tumors), Treatments of the mandible fractures are changing
localization (eg. collum, the ramus, angle of mandible, the from decade to decade, but there are still controversies
body of the mandible and alveolar part). Various types of regarding the optimal treatment. Rigid internal fixation
implants are used for the fixation of human bones fractures. can be achieved by a variety of plating systems. Types of
Anatomically correct and geometrically accurate plates include: mandible plates 2.0, locking plates 2.0,
personalized implants are necessary in order to improve the (Locking) reconstruction plates, dynamic compression
quality and duration of the intervention, and postoperative plates, universal fracture plates [2]. The mentioned types
recovery of patients. The aim of this research is to create a
of plate differ in shape, length, thickness, the use of
geometrical model of the personalized implant plate type
screws (standard, locking head, Bio-Cortical), as well as
which geometry and topology fully matches to the shape of
different biomechanical stability.
the bone of the patient. The side of the implant, which is in
contact with a periosteum outer layer of the mandible, is Mandibular condylar fractures are one of the most
aligned with the shape of the mandible’s outer surface near common injuries of the facial skeleton. In literature, the
the fracture. The obtained model can be used for production injury of human condylar neck is usually treated with a
of plate implants, and/or for simulation of orthodontist single mini dynamic compression plate or with double
interventions. plates. The use of a single plate for fixation of fractures
of the condylar neck is the most common, although the
authors point to complications concerning fracture plate
I. INTRODUCTION or screw loosening [3,4].
The human mandible (lat. mandibula) is the most The analysis of retrospective studies on the treatment
prominent and the only movable facial skeleton bone, of patients with fractures of condylar neck with single
which is the most sensitive to trauma. The fractures of the and double plates indicates that the use of double plate
mandible are frequent, difficult and complicated, incurred provides greater stability and fewer complications than
as a result of illness and injury. In treating fractures, it is the single plate [5].
a great responsibility of doctors to establish the normal By comparing three different fixation techniques of the
function of bone tissue and return the aesthetic mandible condyle, the authors confirmed that the
appearance of the injured to the previous state. Treatment technique of using two plates provides a stable and
of fractures of the mandible depends on the patient's age, functional fixation of the human mandible condyle [6].
the type of fracture (single, double, open, closed), loss of “This is primarily due to the ability to neutralize the
bone tissue and anatomical localization (the body of the functional stress imposed on the neck of the mandibular
mandible, ramus ...), the clinical experience of the condyle through the arrangement of the 2 plates“.
surgeon and the patient's general condition [1]. Very often the use of standard pre-identified implants
Maxillofacial surgeons apply internal and external can lead to intra-operative and post-operative
fixation for the treatment of fractures of the mandible. complications in the patient. Complications occur as a
Internal fixation is a surgical technique used to support result of differences in the size and shape of the patient's
the treatment of bone fracture using screws, pins and bone and the implant plates. Poorly positioned implant
plate implants within the human body. The external plate, causes an inadequate transfer of load during the
fixation is used to stabilize bone fragments outside of the healing process of bones, which makes the treatment
human body. With this type of fixation within the human more difficult. Also, one of the disadvantages of this
body only pins and screws are placed. method of fixation is the duration of the intervention.
Attempts to successfully treat bone fractures have During the intervention maxillofacial surgeon devotes a
existed since the ancient times. Advances in lot of time shaping and bending a standard plate to adjust
understanding of the anatomy and histology influenced it to the patient.
the development and progress of various techniques of In order to lessen the above mentioned complications,
bone fracture treatments. The main principles that must the authors suggest the application of personalized
be met during medical interventions in surgery for fixing implants. Personalized implants allow for proper
broken bones are fixing the broken fragments in
288
7th International Conference on Information Society and Technology ICIST 2017
stabilization of broken fragments, and better adapt to 2) Preparatory processes (Importing and sorting out
oblique fractures. The geometry and shapes of the the cloud of points, creating a polygonal model)
personalized implant is adapted to the anatomy and 3) Definition of the Referential Geometrical
morphology of a certain patient [7]. Entities (RGE) of the human mandible
According to statistics, about 36% of the mandible 4) Definition of the anatomical points and creation
fractures occur in the condylar region, which also
of the spline curves
represents the highest percentage of incidence of all
mandibular fractures (according to Dingman and Natvig). 5) Creation of the surface model of human
For this reason, main focus of this research is creating mandible.
implants for fixation of human mandible condyle process,
which varies in shape and size, and needs to be adjusted Creation of the anatomical model
to the patient’s specific needs. The quality of the Anatomical model "describes" relations between
geometric model of the implant depends on the 3D anatomical landmarks (which can be taken from medical
surface model of the mandible. The side of the implant,
which is in contact with a periosteum outer layer of the literature) and their position on the polygonal model [8].
mandible, is aligned with the shape of the mandible’s
outer surface near the fracture. In this way, the geometry Preparatory processes
of the implant corresponds to the patient's bone. The Preparatory processes include the following: CT
surface model of mandible was created in CAD software scanning of the human mandible, preprocessing of raw
CATIA reverse engineering techniques. Geometrical
accuracy of the obtained surface model of plate implant data (scans), their transformation into STL
was tested by the application of the deviations analysis in (STereoLithography) format, importing the scanned
CATIA software. models in STL format into CATIA application. At the
end of the preparatory processes the polygonal
geometrical of the human mandible model is created.
II. DESIGN OF CUSTOMIZED IMPLANTS TYPE PLATE FOR
MANDIBLE FIXATION
Definition of the Referential Geometrical Entities
The steps used to create the volumetric geometrical (RGE) of the human mandible
model of plate implant are [8]: MAF is based on Referential Geometrical Entities
1) CT scan of the specific patient mandible (RGEs). RGEs (lines, planes, curves, points, axes) are
2) Creating a 3D surface model of mandible defined on polygonal model, according to the bone
3) Creating a model of a fracture geometry and morphology [11].
4) Selecting the locations on the mandible where Every bone in the human skeletal system has
the implant will be placed distinctive features that describe it. On the human
mandible the following characteristic of anatomical
5) Adjusting the geometry of the plate according to
pointscan be extracted and defined: Mental Foramen,
the requirements of the surgeon Gnathion, Gonion, Condylion and Mandibulare cut [12].
6) Creating a customized 3D model of the plate In the same paper, authors stated that the configuration of
7) Analysis and optimization of the shape and the mandible can be accurately perceived by means of ten
dimensions of plate. (10) basic central and bilateral morphometric parameters.
Geometry analysis of the human mandible is realized Morphometric parameters are measurable dimensions,
on CT scans. 64-slice CT scanner was used (MSCT) determined on the basis of the position of the
(Aquillion 64, Toshiba, Japan), and standard protocol for characteristic anatomical points. Definition of the
recording was applied: voltage of 120 kVp, tube current mandibular anatomical landmarks and morphometric
of 150 mA, rotation time of 0.5 s, and thickness of 0.5 parameters were performed on a polygonal model of the
mm. The mandible sample came from a Serbian adult human mandible.
male aged 45 years. The raw data, coordinates of points The first step in definition of the geometrical model is
of scanned tissue, were imported into the appropriate the creation of Coordinate system which is used for
CAD software for reverse modeling. In this research identification of RGEs, on polygonal model.
CATIA V5 R21, CAD software was used. Scan of Origin of the coordinate system was positioned at the
mandible was without fracture, and model of fracture middle point of the distance between the most lateral
defined by standard classification [2] was created to the points on the two condyles. The constructed planes of the
mandible model in the later stage of the reconstruction Coordinate System of the mandible are: Mediosagittal,
process. Mandibular and Coronal. A detailed description of
constructed planes of the Coordinate System is given in
A. Creating 3D surface model of the human [13].
mandible
Definition of the anatomical points and creation of the
3D surface model of human mandible is created in spline curves
CATIA, using Methods of Anatomical Features (MAF)
The spline curves (geometrical entities) were defined
[9]. The steps that used to create surface model of the
following the bone geometry and its specific morphology
mandible are [10]:
Spline curves were created by the cross section of the
1) Creation of the anatomical model adequate planes and polygonal models. Based on the
289
7th International Conference on Information Society and Technology ICIST 2017
c)
a) b)
290
7th International Conference on Information Society and Technology ICIST 2017
plane. Process of creating of the 3D volume model of the The presented method has enabled the creation of 3D
personalized plate implant is shown on the Fig. 6. personalized plate implant for fracture fixation of the
human condyles, which can be used for further
production. Figure 9 shows the printed model of the
personalized implant plate.
a) b) c)
Figure 6. Process of the creating of the 3D volume model of the
personalized plate implant
ACKNOWLEDGMENT
The paper presents results of the project III 41017
Virtual Human Osteoarticular System and its Application
in Preclinical and Clinical Practice sponsored by the
Ministry of Education, Science and Technological
Development of the Republic of Serbia for the period of
Figure 8. Maximum deviations of the surface model of the personalized 2011- 2017.
plate implant from the input human mandible model
291
7th International Conference on Information Society and Technology ICIST 2017
292
1
7th International Conference on Information Society and Technology ICIST 2017
Abstract—The dynamic nature and structural heterogeneity of image through inpainting the image spectrum to obtain the
proteins are essential for their functions in live organism. More complete images.
and more researchers are paying attention to learn the structure The main issue is to recover the corruption of image to
of proteins, so obtaining the protein image has become a mean-
ingful problem. During cryo-EM image acquisition, the radiation alleviate the electron-beam-induced damage. Given the cor-
dose is limited due to several reasons. The electron-beam-induced rupt image spectrum that is sampled at the 5-degree stride,
radiation damage is one of the most important causes of artifacts, we design an encoding-decoding neural network based on
sometimes the misinterpretation of micrographs. In the view of Deep Convolutional Generative Adversarial Net and Context
above reasons, it is difficult to obtain the complete image through Encoders to recover the image spectrum. The encoder hierar-
physical methods.
Aiming at the problem above, we proposed a feature learning
chically extracts the contextual feature maps from the corrupt
based contextual spectrum inpainting method, which was used images, and the decoder fabricates the missing pixels with
to inpaint the missing region of an image through inpainting the aforementioned hierarchical features to weave the missing
the image spectrum. Based on the common Deep Convolutional regions.
Generative Adversarial Net and simplified Context Encoders, The contributions of our paper are:
we customized an encoding-decoding generator to minimize the
deviation of the generated image spectrum with regard to its 1) We propose a dedicated framework for spectrum recov-
latent ground truth. The encoder hierarchically learned the ery with regard to 5-degree rotating sampling, which is
contextual visual features, and the decoder recovered the missing demonstrated to be effective in spectrum inpainting.
parts of the spectrum with the recombination of the chosen 2) We simplify the cumbersome architecture of Context
feature maps. The L2 loss function calibrated the error generated Encoders, and specialize a lightweight architecture to
by the decoder between original images and recovered images.
The recoveries of corrupt image spectrum was presented as an satisfy the requirements of biological image recovery.
evaluation standard to demonstrate the superiority of our method 3) We validate our framework on the task of spectrum
over some widely used inpaining strategies, and the experiment inpainting, and test the generalization of our framework
on the face images from Olivetti Face database [1][2] were on Olivetti Face database [1][2].
performed to validate the generalization of our methods for the
This paper is organized as follows. Section II presents some
face image completion.
related work concerning image inpainting and convolutional
Index Terms—Spectrum inpainting, Convolutional neural net- neural network, section III proposes our network architecture
work, Encoder-decoder, L2 loss and implementation, section IV shows the experiments and
superiority of our method, and section V concludes the whole
paper.
I. I NTRODUCTION
The dynamic nature and structural heterogeneity of proteins II. R ELATED W ORK
are essential for their functions in live organism, which has The convolutional neural network (CNN) has attracted re-
been attracting more and more attention from researchers searchers’ attention in recent decades in image classification,
and becomes one of active and challenging research topics. object detection, and image retrieval [3]. More and more deep
Therefore the method of obtaining the protein image turns out formulated CNN architectures tended to reach the accuracy
to be a meaningful problem. Although the radiation dose is limit - especially in image classification in ImageNet Large
limited, the electron-beam-induced radiation damage is one Scale Visual Recognition Challenge (ILSVRC) competitions
of the most important causes of artifacts and, sometimes, [4], such as AlexNet [5], VGGNet [6], GoogLeNet [7], and
the cause for misinterpretation of micrographs. In view of ResNet [8], which could exceed human-level performance of
above reasons, obtaining the complete image through physical 5.1% top-5 test error [9]. These breakthrough CNN archi-
methods is rather challenging. Aiming at the aforementioned tectures automatically extracted latent features from a large-
problems, we use a feature learning based contextual spectrum volume image database, approximately reorganized these la-
inpainting method which could inpaint the missing region of tent features and estimated the likelihood, which category the
image belongs to, and the feature extraction layers could be
† Corresponding author employed as preprocessing in stylization transfer.
293
2
7th International Conference on Information Society and Technology ICIST 2017
Some breakthrough methods based on CNN have been the Context Encoders to predict accurately and confusingly,
introduced for largely corrupt image inpainting. In particular, making the prediction more real and confusing.
Context Encoders [10] provided an innovative architecture for
directly predicting missing pixels in the central area of a
B. Network Architecture
corrupt image, which combined the reconstruction loss and
adversarial loss derived from the state-of-the-art Generative Inspired by Context Encoders and Deep Convolutional Gen-
Adversarial Network (GAN) [11] and Deep Convolutional erative Adversarial Network, we design a generating sketch,
Generative Adversarial Network (DCGAN) [12] when cal- defined as encoding-decoding generator, to recover the corrupt
ibrating the fabrication of missing area. Even though the images, which could fill the missing pixels with the latent
prediction from Context Encoders was encouraging and ade- features learned from the residual images. We also devise
quate to recover the corrupt central field, the major drawbacks an encoding part that takes as input the corrupt images
could not be ignored that the cumbersome architecture may with missing regions and hierarchically extracts the feature
prolong the time consumption in training and test stage when maps. At the decoding part, the latent hierarchically-extracted
inpainting some specified images that maintained sufficient features are properly integrated to fill the missing regions, i.e.
contextual features. Our method simplified Context Encoders the generating sketch reconstructs the corrupt images. Figure
to satisfy the requirements of spectrum inpainting. 1 shows the overall architecture of our generating sketch as
well as the training strategy.
III. M ETHODOLOGY 1) Encoder: The encoder takes as input the mini-batches
In this section, we present our formalized method for of the corrupt images, propagates forward the mini-batches
spectrum inpainting, which utilizes the convolutional neural of the corrupt images or feature maps that are filtered by the
network and simplified Context Encoders to complete the convolutional layers. The ith convolutional layer convolves the
missing pixels. We first review the delicate structure of Con- mini-batches by x × i × 64 trainable 4 × 4-sized convolutional
text Encoders in subsection III-A, and we further exhibit kernel in which the stride and the pad of convolution is 2 and
our simplified network architecture for predicting specified 1, respectively; i.e. the width and the height of the corrupt
missing area in spectrum and loss function in subsection III-B image are halved when passing each convolutional layer. The
and III-C, respectively. At last we give some implementation batch normalization layers normalize the mini-batches after
details regarding our network architecture in subsection III-D. these mini-batches are convolved by each convolutional layers.
The corrupt image is compressed to a bottleneck vector after
being convolved by 5 convolutional layers. Figure 2 shows
A. Context Encoders the compression of the size of images and the augmentation
The Context Encoders have been proved to be very effective of feature maps via the encoder.
in semantic inpainting tasks, as well as in CNN pre-training on 2) Decoder: The decoder executes a reverse deconvolution
classification, detection and segmentation tasks [10]. Derived for the bottleneck vector, namely, the bottleneck vector is
from Generative Adversarial Net (GAN) [11] and Deep Con- moderately up-sampled by the deconvolutional layers and is
volutional Generative Adversarial Network (DCGAN) [12], expanded to a mini-batches of recovering images that have
they show encouraging results of filling in the regions that the same size as the corrupt ones. The deconvolutional layers
are “dropped out”. First Context Encoders take as input a take as input the bottleneck vector or the mini-batches of
corrupt image, propagate forward the input image, hierarchi- the feature maps; it propagates forward them to make an
cally extract the semantic features by convolutional layers, expansion in which the deconvolution scans the the feature
normalize the inputs or feature maps, and finally reshape the maps with fractional stride [13], or fills the surroundings of
feature maps as a feature bottleneck layer for generating the each pixel with zero-padding [14]. The ith deconvolutional
recovering images in the next step. At the second half of the layer deconvolves the mini-batches by 2×(5−i)×64 trainable
pipeline, the latent feature representation vector is deconvolved 4 × 4-sized deconvolutional kernel in which the stride is 2
to be moderately up-sampled, namely, to expand themselves and the pad is 1, respectively; i.e. the width and the height
to become a full-size images, where corrupt regions are filled are doubled when passing each deconvolutional layer. The
with pixels inferred from the surroundings. batch normalization layers also exist to normalize the mini-
The vanilla Context Encoders are trained to predict the batches behind the deconvolutional layers. The mini-batches
pixel distribution for filling the corrupt regions. Loss func- of recovered images are generated in which the missing pixels
tion accounts for the deviation with respect to the ground are inferred by the decoder compared with the corrupt images.
truth content and the prediction. Aiming at declining the Figure 3 shows the expansion of the bottleneck vector and how
deviation, Reconstruction (L2) loss function is employed to decoder works.
capture the overall differences between the original region
and the prediction, while the adversarial loss, derived from
the discriminator in GAN architecture, tries to confuse the C. Loss function
discriminator, i.e. make the prediction look real. The joint We first present the L2 loss function as the reconstruc-
loss function linearly combine the reconstruction loss and the tion loss, which captures the overall differences between a
perceptual loss, balancing the proportion of these two losses reconstructed image and its ground truth, which is defined in
with the given weight λ. The joint loss iteratively calibrates equation (1).
294
3
7th International Conference on Information Society and Technology ICIST 2017
Fig. 1. The pipeline of our method. The encoder takes the corrupt images as input, and the decoder reorganizes the latent feature representation to generate
the recoveries.
D. Implementation Details
Derived from the open source code from [10], we implement
our model with Theano [15] and Lasagne [16]. The number
of convolutional layers is adjusted corresponding to the size
of input images, which convolve the input images with 5 × 5-
sized trainable filter kernels, a 2-pixel stride, a 1-pixel padding
around the images or feature maps, and are followed batch
normalization layers. The number of bottlenecks is set at 2560
Fig. 2. Our encoder. The encoder convolves the residual image to extract the and the non-linearity activation function are set with ReLU
latent feature representations.
[5]. The mini-batch size is set at 100, and it can be elastically
adjusted corresponding to the scalable training dataset.
(a)
(b)
(c)
(d)
(e)
Fig. 4. Original image spectrum (ground truth) (a), sampled image spectrum (b), and inpainted image spectrum by our method (c), field of experts (d) and
Navier-Strokes based method (e)
TABLE I
C OMPARISON OF CORRELATION , NRMSE, PSNR AMONG OUR METHOD , F IELD OF E XPERTS , AND NS- BASED METHOD ON IMAGE SPECTRUM
296
5
7th International Conference on Information Society and Technology ICIST 2017
(a)
(b)
(c)
(d)
(e)
Fig. 5. Original Olivetti face images (a), sampled face images (b), and inpainted face images by our method (c), field of experts (d) and Navier-Strokes based
method (e)
TABLE II
C OMPARISON OF CORRELATION , NRMSE, PSNR AMONG OUR METHOD , F IELD OF E XPERTS , AND NS- BASED METHOD ON O LIVETTI FACES DATABASE
297
6
7th International Conference on Information Society and Technology ICIST 2017
Table I shows the correlation, NRMSE and PSNR between [3] Y. Guo, Y. Liu, A. Oerlemans, S. Lao, S. Wu, and M. S. Lew, “Deep
the recovered images and their ground truth. The correlation learning for visual understanding: A review,” Neurocomputing, vol. 187,
pp. 27–48, 2016.
is the principal evaluation in 3D reconstruction during the [4] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma,
cryo-EM image acquisition, which is employed to demonstrate Z. Huang, A. Karpathy, A. Khosla, M. Bernstein et al., “Imagenet large
the superiority of our method. As shown in table I, we scale visual recognition challenge,” International Journal of Computer
Vision, vol. 115, no. 3, pp. 211–252, 2015.
can see that our method achieves the strongest correlation [5] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification
between the inpainted image and its ground truth, and our with deep convolutional neural networks,” in Advances in neural infor-
method dramatically performs well in NRMSE as well, which mation processing systems, 2012, pp. 1097–1105.
[6] K. Simonyan and A. Zisserman, “Very deep convolutional networks for
demonstrate that our method outperforms these traditional large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
methods in spectrum inpainting. [7] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan,
Figure 5 shows some results from image inpainting. As we V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,”
in The IEEE Conference on Computer Vision and Pattern Recognition
can see, the image spectrum inpainted by our method recover (CVPR), June 2015.
the principal characteristics after fine-tuned by the Olivetti [8] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
Face images, such as the central high-frequent features, and recognition,” in The IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), June 2016.
alleviate the blur compared with other traditional methods; [9] ——, “Delving deep into rectifiers: Surpassing human-level performance
Our method quantitatively achieves great recoveries on the on imagenet classification,” in Proceedings of the IEEE international
face inpainting, which is demonstrated in table II. conference on computer vision, 2015, pp. 1026–1034.
[10] D. Pathak, P. Krahenbuhl, J. Donahue, T. Darrell, and A. A. Efros,
“Context encoders: Feature learning by inpainting,” in Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition, 2016,
V. C ONCLUSION pp. 2536–2544.
We propose a model that is derived from Context En- [11] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley,
S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in
coders [10] and DCGAN [11] to inpaint corrupt images that Advances in neural information processing systems, 2014, pp. 2672–
are rotating-sampled at 5-degree interval. The convolutional 2680.
encoder takes as input the uncorrupt context, hierarchically [12] A. Radford, L. Metz, and S. Chintala, “Unsupervised representation
learning with deep convolutional generative adversarial networks,” arXiv
extracts the contextual features, and fills the missing area with preprint arXiv:1511.06434, 2015.
the generated pixels associated with the contextual features. [13] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks
We use the L2 loss function to capture the overall similarity for semantic segmentation,” in Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, 2015, pp. 3431–3440.
between uncorrupt ground truth and corrupt context, which [14] V. Dumoulin and F. Visin, “A guide to convolution arithmetic for deep
can generally calibrate the fabrication of missing pixels. The learning,” arXiv preprint arXiv:1603.07285, 2016.
experimental results show that our model outperforms the [15] Theano Development Team, “Theano: A Python framework
for fast computation of mathematical expressions,” arXiv e-
common inpainting method, such as Field of Experts and prints, vol. abs/1605.02688, may 2016. [Online]. Available:
Navier-Strokes based method, and achieves the feasible results http://arxiv.org/abs/1605.02688
qualitatively and quantitatively on image spectrum and Olivetti [16] E. Battenberg, S. Dieleman, D. Nouri, E. Olson, A. van den Oord,
C. Raffel, J. Schlüter, and S. K. Sønderby, “Welcome to lasagne,”
Face database. http://lasagne.readthedocs.io/en/latest/index.html, 2015.
In future, we plan to investigate a novel texture inpainting [17] S. Roth and M. J. Black, “Fields of experts: A framework for learning
method to enhance the textural details of the inpainted im- image priors,” in Computer Vision and Pattern Recognition, 2005. CVPR
2005. IEEE Computer Society Conference on, vol. 2. IEEE, 2005, pp.
ages since the corruption strongly devastates the almost all 860–867.
of high-frequent details that our decoder cannot recover. A [18] M. Bertalmio, A. L. Bertozzi, and G. Sapiro, “Navier-stokes, fluid
universal inpainting method should also be aggregated into dynamics, and image and video inpainting,” in Computer Vision and
Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE
our pipeline to ameliorate the generalization of our method Computer Society Conference on, vol. 1. IEEE, 2001, pp. I–I.
when a common corruption (such as central corruption in an
image) occurs.
ACKNOWLEDGEMENTS
This research is sponsored by National Natural Science
Foundation of China (No. 61571049, 61601033, 61401029),
the Fundamental Research Funds for the Central Universities
(No. 2016NT14), The China Postdoctoral Science Foundation
Funded Project (No. 2016M590337), and by SRF for ROCS,
SEM.
R EFERENCES
[1] F. S. Samaria and A. C. Harter, “Parameterisation of a stochastic model
for human face identification,” in Applications of Computer Vision,
1994., Proceedings of the Second IEEE Workshop on. IEEE, 1994,
pp. 138–142.
[2] F. S. Samaria, “Face recognition using hidden markov models,” Ph.D.
dissertation, University of Cambridge, 1994.
298
7th International Conference on Information Society and Technology ICIST 2017
Abstract— This paper presents a methodology for evaluation Driving simulators enable controlled and robust acquisition
of driving performance based on speeding, acceleration, lane of various driving performance indicators, such as lane
control and safety distance. All these variables are measured deviation, reaction time and lateral acceleration [6-8].
in a motion-based driving simulator. We report on a user These kinds of indicators usually focus on operational level
study in which we obtained the proposed variables for 29 skills of drivers [9].
drivers. These results will enable us to propose a general
evaluation score of driving performance, which can be used III. EXPERIMENTAL DESIGN
for profiling driver behavior.
We conducted a between-subject driving simulator study
consisting of two driving scenarios. Participants were given
I. INTRODUCTION no specific instructions on how to drive in each condition
and they were given no explanation on what the goal of the
The number of vehicles in traffic increases every year
experiment was. Our aim was for participants to drive
and consequentially also the number of traffic related
normally and according to their typical driving behavior.
accidents. Despite extensive prevention activities for traffic
safety has been done, more than 26 000 people died on The first scenario contained the road with different
European roads in 2015 and there was no improvement technical properties (e.g. curves with different radii,
compared to 2013 and 2014 [1]. The motivation for this different speed limits, altitude changes and changes
study is to evaluate driving behavior by examining several between urban and landside surroundings) and no traffic.
key driving features, such as safety distance, considering We measured speeding (i.e. current speed and speed
speed limits and traffic rules and accelerations. Our aim is violation), lateral shift (i.e. position of the vehicle on the
to get an objective evaluation score of driving performance, lane), acceleration and deceleration.
which can be used for driver behavior profiling (i.e. safe The second driving simulator scenario included a
and unsafe behavior). motorway with varying traffic intensity. This scenario was
The aim of this research is to classify drivers according designed to assess the safety distance (i.e. distance to the
to their overall driving behavior, considering speeding, leading vehicle).
winding, acceleration and keeping safety distance. We want The study was performed in a NERVteh [10] motion-
to define classes of drivers with exact boundaries within based driving simulator, consisting of a 4DOF motion
they are considered safe or unsafe. platform and SCANeR™studio DT simulation software
[11]. User study included 29 participants. They all started
II. RELATED WORK with a test drive to get acquainted with the simulator and
then executed both conditions twice. Consequentially we
Researches all over the world have tried to identify the
collected result set of 58 trials.
causes for traffic accidents, to understand driver's behavior
that has led to them, and categorize critical driving profiles
that most commonly contribute to them. Majority of these IV. MEASURED VARIABLES
researches has been based on examination of past events
and accident records or self-evaluation questionnaires. A. Speeding
Based on accidents, it was shown that human factors Speeding is defined as the average normalized exceeded
contribute to 95% of traffic accidents, and driving behavior speed. The normalized exceeded speed is defined as
was identified as the most important one [2, 3].
Bonsall et al. [4] explored the choices of values for 𝑣(𝑡)
− 1 , 𝑣(𝑡) ≥ 𝑣𝑙𝑖𝑚𝑖𝑡 (𝑡)
safety-related parameters in simulation models. This study 𝑉(𝑡) = {𝑣𝑙𝑖𝑚𝑖𝑡(𝑡) , (1)
includes an extensive research of safety-related parameters 0 , 𝑣(𝑡) < 𝑣𝑙𝑖𝑚𝑖𝑡 (𝑡)
and comparison between different sources. Parameters are
desired speed and headway, reaction times, rate of
acceleration and deceleration and other parameters not where 𝑣(𝑡) is the current speed and 𝑣𝑙𝑖𝑚𝑖𝑡 (𝑡) the current
significant for our research. speed limit. The value of V represents the percentage of the
speed excess. The average of the value V takes into account
The use of driving simulators seems much more
the duration and also the intensity of all the speed excesses.
appropriate than real vehicle particularly from a safety
point of view, as they enable also simulation of various
dangerous driving conditions without exposing the driver
to a real physical risk. Other important advantages are
controllability, reproducibility and standardization [5].
299
7th International Conference on Information Society and Technology ICIST 2017
a(t)
-1, |a| ≥ a max
A(t) = { amax , (3)
Figure 1: Distribution of the percentage of speed violation
0 , |a| < a max
T(t)
1- , T(t) ≤ TMIN
S(t) = { TMIN , (4)
0 , T(t) > TMIN
300
7th International Conference on Information Society and Technology ICIST 2017
TABLE I.
SCORING SYSTEM FOR SAFE DRIVER BEHAVIOR PROFILING
5 4 3 2 1
Lane
0 – 0.5 0.5 – 2.5 2.5 – 5.5 5.5 - 10 10 - ∞
shift
Safety
0 – 0.3 0.3 – 1.5 1.5 - 5 5 - 10 10 - ∞
distance
301
7th International Conference on Information Society and Technology ICIST 2017
ACKNOWLEDGMENT
This research was supported by Slovenian Research
Agency. The authors would like to thank NERVteh for
providing the measurement equipment.
REFERENCES
Figure 7: Distribution of acceleration score
[1] Road safety in the European Union (2016). The European
Commission. Available on: https://ec.europa.eu/transport/
road_safety/sites/roadsafety/files/vademecum_2016.pdf .
[2] Sabey, B. and Taylor, H.; The Known Risks We Run: The Highway,
from General Motors Symposium on Societal Risk Assessment, R.
C. Schwing, W. A. Albers, General Motors Corporation, and
Research Laboratories, Societal risk assessment: how safe is safe
enough? 1980.
[3] T. E. Boyce and E. S. Geller, “An instrumented vehicle assessment
of problem behavior and driving style:,” Accident Analysis &
Prevention, vol. 34, no. 1, pp. 51–64, Jan. 2002.
[4] Bonsall, P.; Liu, R.; Young, W.; Modelling safety-related driving
behaviour—impact of parameter values, Transportation Research
Part A: Policy and Practice, Volume 39, Issue 5, June 2005, Pages
425-444.K. Elissa, “Title of paper if known,” unpublished.
[5] J. de Winter, P. van Leeuwen, and R. Happee, "Advantages and
Disadvantages of Driving Simulators: A Discussion, " in
Proceedings of Measuring Behavior 2012, A. Spink, F. Grieco, O.
Figure 8: Distribution of safety distance Krips, L. Loijens, L. Noldus, and P. Zimmerman, Eds. Utrecht, The
Netherlands, 2012, pp. 47-50.
We conclude that the data collected in driving simulator [6] Reymond, G., Kemeny, A., Droulez, J. & Berthoz, A. 2001, "Role
is very useful in the means of driver classification. The of lateral acceleration in curve driving: Driver model and
experiments on a real vehicle and a driving simulator", Human
results are showing realistic distributions among the four factors, vol. 43, no. 3, pp. 483-495.
observed values. These results enabled us to propose five [7] P. Feenstra, R. van der Horst, B. Grácio, and M. Wentink, “Effect
classes of driver behavior from very safe to very dangerous. of Simulator Motion Cuing on Steering Control Performance:
With such scoring system, we can create also a profile of Driving Simulator Study,” Transportation Research Record:
each individual driver. Such profile gives us a valuable Journal of the Transportation Research Board, vol. 2185, pp. 48–
insight into his or her driving behavior and offers an 54, Dec. 2010.
opportunity to reconsider his or her driving habits in real [8] Cai, H., Y. Lin, and Ronald R. Mourant. "Study on driver emotion
life situations. in driver-vehicle-environment systems using multiple networked
driving simulators." Proceedings of the Driving Simulation
In this study we considered the average values, Conference North America (DSC’07). 2007.
calculated from total number of violations in the driving [9] K. Kircher and C. Ahlstrom, “The impact of tunnel design and
session. Therefore this metric is linear. However, breaking lighting on the performance of attentive and visually distracted
distance and collision severity don’t increase linearly with drivers,” Accident Analysis & Prevention, vol. 47, pp. 153–161,
the driving speed. There is a quadratic relation between Jul. 2012.
these variables and the driving speed, since there is a [10] NERVteh, motion-based driving simulator (2016), http://nerv-
teh.com/.
quadratic dependency of kinematic energy on velocity.
Future profiling should therefore penalize higher speed [11] OKTAL, SCANeR™studio DT, version 2.3 (2016)
http://www.oktal.fr/en/automotive/range-of-simulators/software.
302
7th International Conference on Information Society and Technology ICIST 2017
303
7th International Conference on Information Society and Technology ICIST 2017
A. Reference measurement DRT response times. The independent variable was the
As a reference measurement for detecting changes in cognitive load, presented in three difficulty levels.
cognitive load, we used the tactile Detection response
task (TDRT) [3]. With this method, a small vibromotor is The experiment consisted of 5 trials, each lasting
placed on the driver’s collar bone, and is presented tactile approximately 90 seconds. In the first trial, participants
(vibration) stimuli every 2-5 seconds. The driver than did not perform any task. This trial was used a reference
responds to the stimuli by pressing a button, placed on the measurement. In the second trial, participants had to drive
right hand index finger, against the steering wheel. in the simulator and answer to DRT stimuli. In the
Response times are than processed as the indicators of remaining three trials, they had to drive, answer to DRT
cognitive load. stimuli and perform a cognitive task that differed in
difficulty in each of the trials.
B. Cognitive task
For inducing cognitive load, we used the Delayed Digital
Recall task (modified n – back task) as a cognitive task IV. RESULTS
[2]. This task requires the driver to vocally repeat The results from a linear model analysis (Between-
auditory presented single digit numbers with a certain subjects effects test) for the n - back task showed
time delay, as presented in Table 1. We used three levels significant results between response times (p<0.05) to
of difficulty: 1-back, 2-back, and 3-back task. DRT stimuli when exposed to different levels of
cognitive load. Pos hoc tests showed statistically
Table 1: Consecutive order of presented numbers and significant differences between trials without the
required answers for the 2-back task cognitive task and trials with the 1-back and 3-back task
Consecutive order of n – back task (p<0.01)(Figure 2).
numbers
Presented Between-subjects effects test showed that there were
1 3 8 5 3 9 4 5 statistically significant differences in heart rate (p<0.01).
number
Driver’s Post hoc Bonferroni test showed significant differences in
-- -- 1 3 8 5 3 9 heart rate between the rest mode trial compared to all
answer
active trials and trials without the cognitive task
compared to all trials with a cognitive task (p<0.01).
Significant differences were also found for the trial with
C. Experiment design
highest cognitive load (3-back task) compared to trials
The experiment was performed in a NERVteh driving with lower level of cognitive load (p<0.05)(Figure 3).
simulator [1], using countryside scenery, where the driver
drove on a single lane two-way road.
304
7th International Conference on Information Society and Technology ICIST 2017
Figure 3. Mean heart rate for each trial. Figure 5. Mean skin temperature for each trial
305
7th International Conference on Information Society and Technology ICIST 2017
sends data only when a change occurs. This means that the of cognitive load in driving. ISO/DIS 17488. Assessed September
data samples were different for each trial, which may have 2015.
also influenced the results. [4] Microsoft Band SDK. Available at:
https://developer.microsoftband.com/bandSDK, accessed
December 2016.
I. CONCLUSSION [5] Mehler, Bruce, Bryan Reimer, Joseph Coughlin, and Jeffery
The results showed significant differences in the Dusek. “Impact of Incremental Increases in Cognitive Workload
observed parameters, indicating that biometric data such on Physiological Arousal and Performance in Young Adult
Drivers.” Transportation Research Record: Journal of the
as GSR, heart rate and skin temperature obtained with a Transportation Research Board 2138 (December 1, 2009): 6–12.
low cost device can be observed as indicators of increased doi:10.3141/2138-02.
cognitive load. [6] Borghini, Gianluca, et al. "Measuring neurophysiological signals
However, the post hoc test revealed that these indicators in aircraft pilots and car drivers for the assessment of mental
couldn’t differentiate between different levels of cognitive workload, fatigue and drowsiness." Neuroscience & Biobehavioral
load, as there were not found any significant differences Reviews 44 (2014): 58-75.
among the trials with different levels of cognitive load. [7] Jorna, Peter GAM. "Spectral analysis of heart rate and
psychological state: A review of its validity as a workload index."
This was also present in the results for the DRT, rising Biological psychology 34.2 (1992): 237-257.
question whether cognitive load was properly induced. [8] Reimer, Bryan, et al. "An on-road assessment of the impact of
Another significant influencing factor that must be cognitive workload on physiological arousal in young adult
considered is also the relatively small number of drivers." Proceedings of the 1st international conference on
participants in the study, and the limited capability of the automotive user interfaces and interactive vehicular applications.
device for appropriate collection and representation of the ACM, 2009.
observed biometric data. In our future work, we are going [9] Shi, Yu, Natalie Ruiz, Ronnie Taib, Eric Choi, and Fang Chen.
to repeat the study with a professional wrist band for “Galvanic Skin Response (GSR) As an Index of Cognitive Load.”
In CHI ’07 Extended Abstracts on Human Factors in Computing
capturing biometric data (ex. Empatica [14]) and compare Systems, 2651–2656. CHI EA ’07. New York, NY, USA: ACM,
the results with this study. 2007.
[10] Palinko, Oskar, et al. "Estimating cognitive load using remote eye
ACKNOWLEDGMENT tracking in a driving simulator." Proceedings of the 2010
symposium on eye-tracking research & applications. ACM, 2010.
This research was supported by Slovenian Research
[11] Melnicuk, V., S. Birrell, P. Konstantopoulos, E. Crundall, and P.
Agency. The authors would like to thank NERVteh for Jennings. “JLR Heart: Employing Wearable Technology in Non-
providing the measurement equipment. Intrusive Driver State Monitoring. Preliminary Study.” In 2016
IEEE Intelligent Vehicles Symposium (IV), 55–60, 2016.
doi:10.1109/IVS.2016.7535364.
REFERENCES [12] Microsoft Band SDK. Available at:
https://developer.microsoftband.com/bandSDK, accessed
[1] Nervteh Simulation Technologies. Available at: http://nerv- December 2016.
teh.com/, assessed December 2016, [13] OnePlus 2. Available at: https://oneplus.net/si/2, accessed
[2] Mehler B, Reimer B, Dusek JA. MIT AgeLab delayed digit recall December 2016.
task (n-back). Cambridge, MA: Massachusetts Institute of [14] Empatica, Available at: https://www.empatica.com/e4-wristband,
Technology. 2011 Jun. accessed April 2017
[3] Road vehicles -- Transport information and control systems --
Detection-Response Task (DRT) for assessing attentional effects
306
7th International Conference on Information Society and Technology ICIST 2017
Abstract—Smart equipment can support feedback in sports sensors (accelerometers and gyroscopes) are integrated in
training, rehabilitation, and motor learning process. Smart every modern smartphone. Consequently, many motion
equipment with integrated sensors can be used as a activity applications for smartphones and wearables using
standalone system or complemented with body-attached only accelerometer data exist.
wearable sensors. The first part of our work focuses on real-
time biofeedback system design, particularly on application
Processing
specific sensor selection. The main goal of our research is to Sensor
prepare the technical conditions to prove the efficiency and device
benefits of the real-time biofeedback when used in selected
motion-learning processes. The tests performed on two Biofeedback
prototypes, smart golf club and smart ski, proves the
device
appropriate sensor selection and the feasibility of
implementation of the real-time biofeedback concept in golf
User
and skiing practice. We are confident that the concept can Figure 1. General architecture of a biofeedback system.
be expanded for use in other sports and rehabilitation.
In many sport disciplines and rehabilitation scenarios
Index Terms — smart equipment, motor learning, sport, various types of equipment is essential or even
rehabilitation, biomechanical biofeedback indispensable for performing the desired task (tennis
rackets, baseball bats, golf clubs, alpine skis, stretching
I. INTRODUCTION devices, crutches, etc.). In fact, in the most relevant human
Feedback is the most important variable for learning actions are transferred through the equipment. All this
except the practice itself [1]. equipment can be supported with different types of
During the practice, the natural (inherent) feedback sensors, not only accelerometers, gyroscopes and
information is provided internally through human sense magnetometers. For example, strain sensors are the perfect
organs. Augmented feedback is provided by external choice to detect and measure force, torque, and bending in
source, traditionally by instructors and trainers. Modern different parts of the equipment. An appropriate sensor
technical equipment can help both the performer and the fusion algorithm can give precise information on
instructor. In many sports disciplines video recording is a performers’ actions and equipment reactions. A feedback
classical method for providing additional feedback from the sport equipment could therefore improve the
information for post analysis and terminal feedback. performer’s skills, especially if it is provided in real-time,
Modern technical equipment can provide more precise that is, without a significant delay.
measurements of human kinetics and kinematics Sport equipment companies have already started
parameters and can considerably improve the quality of embedding the digital technology into their products.
feedback information. Modern optical motion tracking Some examples of smart sport equipment, which are
systems are using passive or active markers and several already available on the global market, are: smart shoes,
high-speed cameras. An alternative to optical tracking smart tennis racket, smart basketball, smart baseball bat,
systems are inertial motion unit (IMU) based systems, and smart golf clubs [2]. In the near future it is possible to
which use several wearable sensors attached to the human predict many improvements in technology that could drive
body. Both types of motion tracking systems are down the prices and encourage a widespread adoption of
professional and expensive equipment that can be used not smart sport equipment.
only for biomechanics research in sports, rehabilitation
and ergonomics but also as an animation tool in movie II. MOTIVATION
industry and virtual reality tracking. Augmented feedback In majority of research work in sport and rehabilitation
supported by technical equipment (sensors and actuators) wearable sensors are used for the purpose of monitoring
is defined as biofeedback because a human is inside the and post processing signal and data analysis. The feedback
feedback loop. The general architecture of a biofeedback information is given with delay after the performed
system is presented in Figure 1. activity, what is defined as terminal feedback. The same is
To achieve a widespread use of biofeedback true for the majority of sport and healthcare applications
applications, important feedback information concerning already available on smartphones; post processing with
knowledge of performance should be provided with less presentation of some vital or important parameters. The
complex and cheaper technical equipment. Miniature IMU concurrent feedback, which is given in real time within
307
7th International Conference on Information Society and Technology ICIST 2017
the currently performed action, is very useful for motor A. Smart golf club
learning, but is not frequently used. The smart golf club prototype includes: (a) two strain
The primary aim of our research is the development of gage sensors, which measure the golf club shaft bend and
technical equipment that would allow implementation of (b) 3-axis MEMS accelerometer and 3-axis MEMS
real-time biomechanical biofeedback systems. We are gyroscope, which measure acceleration and angular speed
convinced that such systems would allow a leap in of the golf club. The latter two sensors are a part of the
research in this field. The important tasks in our research independent Shimmer 3 IMU equipped with Bluetooth
are the selection of appropriate sensors and the assurance communication interface. Strain gage sensor signals are
of suitable conditions for sensor signal transmission and acquired by the National Instruments cRIO professional
processing. As it is evident from research papers, IMU measurement system with module 9237.
sensors are the most often used sensors in golf and skiing Shimmer 3 device can reliably stream sensor data up to
[3][3]-[10]. The principles of sensor signal fusion are sampling frequencies of 512 Hz, which was used in our
similar in the fields of sport training, rehabilitation, and experiments. The accelerometer's dynamic range is up to
biomedicine. The main focus of our current research is ±16 g0 and the gyroscopes dynamic range is up to ±2000
motor learning in sport with the help of feedback deg/s. The precision of both is 16 bits per sample. In the
information provided by sensors integrated into sport experiments the Shimmer 3 device is fixed to the club's
equipment. The results of our research are easily shaft just below the grip as seen in Figure 2.
implemented also in the field of rehabilitation equipment
as motor learning in rehabilitation is generally less
demanding from that in sport.
III. SENSORS IN SMART SPORT EQUIPMENT
The integration of sensors into sport and rehabilitation
equipment enables the acquisition of information about
motion, static positions, and acting forces. Using the
sensor signal analysis and specific a priori knowledge, the
validation of the movement correctness can be achieved
and appropriate feedback information can be
communicated within the biofeedback loop. This
procedure was tested in practical experiments in the field
of training in golf and alpine skiing. For the validation of
the above presented concept, we developed two original
smart equipment prototypes: smart golf club and smart
ski. Based on first measurement results we have started
with the development of real-time biofeedback system
usage concepts and procedures. They include a systematic Figure 2. Smart club prototype used in field.
approach by defining the useful short practice lessons,
adapting the precision of real-time biofeedback system to
the capabilities of the user (amateur vs. professional), the Sensor signals are synchronized and processed by the
choice of correct feedback modality, and the appropriate distributed LabVIEW™ application running on the laptop
amount of feedback information adapted to the limited and cRIO platform. After streamed sensor signals are
perception capabilities of users during training. aligned by their impact samples, they are segmented into
separate swings, each containing 1500 samples, with
IV. SMART SPORT EQUIPMENT PROTOYPES impact sample at index 1000. At the sampling frequency
of 500 Hz the duration of each swing is 3 s. In the graphs
During our research we have developed two fully presented in this paper we plot only swing signal samples
functional smart sport equipment prototypes; smart golf between indexes 250 and 1000 or 1.5 s time frame.
club and smart ski. We present their properties and field
test results in the following subsections. Figure 3 shows the player signatures produced by the
two strain gage sensors orthogonally placed on the shaft of
308
7th International Conference on Information Society and Technology ICIST 2017
the golf club. Figure 3(a) shows trajectories (signatures) of Figure 4 shows the sensor signals with marked points in
a perfectly performed straight swing of three different time that correspond to the distinctive phases of the golf
players. We see that their signatures are distinctively swing. Sensor signals are acquired by two strain gage
different. Figures 3(b) and 3(c) show the trajectories of sensors (top graph), 3-axis accelerometer (middle graph),
different swing types of player 1 and player 3 respectively. and 3-axis gyroscope (bottom graph). Trajectories show
It can be seen that the differences in trajectories of the high consistency of swings, repeatability, and precision of
same swing type of different players is greater than the the measuring system.
difference in trajectories of the different swing types of
the same player.
(a) address (b) takeaway (c) backswing (d) top of the bsw (e) downswing (f) impact
1.5
Strain gage
1
0.5
Strain
-0.5
-1
-1.5
0 200 400 600 800 1000 1200 1400
300
Accelerometer
200
Acceleration [m/s2]
100
-100
-200
0 200 400 600 800 1000 1200 1400
1000
Gyroscope
500
Angular speed [deg/s]
-500
-1000
-1500
0 200 400 600 800 1000 1200 1400
Time [ms]
Figure 4. Sensor signals and phases of the swing: (a) address, (b) takeaway, (c) backswing, (d) top of the backswing, (e) downswing , (f) impact.
309
7th International Conference on Information Society and Technology ICIST 2017
(a) (b)
Figure 6. Skiing experiments with smart ski prototype and the corresponding signals and calculated plots.
310
7th International Conference on Information Society and Technology ICIST 2017
V. CONCLUSION
REFERENCES
We show the most important results of experimenting [1] Bilodeau, E. A., Bilodeau, I. M., & Alluisi, E. A. (1969).
with both presented prototypes: smart golf club, smart ski. Principles of skill acquisition. Academic Press.
With the smart golf club prototype we have shown that the [2] Lightman K. Silicon gets sporty. IEEE Spectrum. 2016 Mar,
differences in trajectories of the same swing type of 53(3), 48-53, DOI: 10.1109/MSPEC.2016.7420400.
different players is greater than the difference in [3] Nam, C. N. K., Kang, H. J., & Suh, Y. S. (2014). Golf swing
trajectories of the different swing types of the same player. motion tracking using inertial sensors and a stereo camera. IEEE
With the smart ski prototype we can precisely and timely Transactions on Instrumentation and Measurement, 63(4), 943-
measure the action of the skier and the reaction of the skis 952.
and terrain at the same time. The acquired information [4] Betzler, N. F., Monk, S. A., Wallace, E. S., & Otto, S. R. (2012).
from the integrated sensors is used in the testing of ski Effects of golf shaft stiffness on strain, clubhead presentation and
wrist kinematics. Sports biomechanics, 11(2), 223-238.
performance and for ski technique improvement or
[5] Ueda M, Negoro H, Kurihara Y, Watanabe K. Measurement of
learning. The developed application allows the ski expert angular motion in golf swing by a local sensor at the grip end of a
to analyze the performance of the skier based on several golf club. IEEE Transactions on Human-Machine Systems. 2013,
measured and calculated parameters. The application is 43(4), 398-404, DOI: 10.1109/TSMC.2013.2266896
currently capable of recognizing different phases of [6] Michahelles, F., & Schiele, B. (2005). Sensing and monitoring
carving technique and diagnoses typical errors in regard to professional skiers. IEEE Pervasive Computing, 4(3), 40-45.
the load distribution during the steering phase of the turn. [7] Kirby, R. (2009). Development of a real‐time performance
measurement and feedback system for alpine skiers. Sports
Based on first measurement results we have started with Technology, 2(1‐2), 43-52.
the development of real-time biofeedback system usage
[8] Nakazato, K., Scheiber, P., & Müller, E. (2011). A comparison of
concepts and procedures. They include a systematic ground reaction forces determined by portable force-plate and
approach by defining the useful short practice lessons, pressure-insole systems in alpine skiing. J Sports Sci Med, 10(4),
adapting the precision of real-time biofeedback system to 754-762.
the capabilities of the user (amateur vs. professional), the [9] Nemec, B., Petrič, T., Babič, J., & Supej, M. (2014). Estimation of
choice of correct feedback modality, and the appropriate alpine skier posture using machine learning techniques. Sensors,
amount of feedback information adapted to the limited 14(10), 18898-18914.
perception capabilities of users during training. [10] Yu, G., Jang, Y. J., Kim, J., Kim, J. H., Kim, H. Y., Kim, K., &
Panday, S. B. (2016). Potential of IMU sensors in performance
analysis of professional alpine skiers. Sensors, 16(4), 463.
311
7th International Conference on Information Society and Technology ICIST 2017
Abstract—The spatial density of wireless sensor nodes, are communicated to the actuator that provides augmented
wearables, and other mobile devices are showing the feedback to the user. The feedback is given through one of
explosive growth. Without finding new solutions or concepts the human senses; most commonly through the modalities
the existing wireless sensor networks will become a of vision, audition, and taction.
bottleneck. Healthcare and sport wearable sensors come in A selection of the most suitable set of sensors,
great varieties regarding their power, data rate, required actuators, wireless network technologies, and services for
quality of service, and intended use. Another issue is the each biofeedback application is a demanding task. It
number of nodes communicating in one domain. We list the requires deep knowledge of properties, capabilities,
most used wireless technologies with their properties. We demands, shortcomings, and benefits of each of the
present different biofeedback application scenarios in
elements of such systems.
healthcare and sports and match them to the most
appropriate existing wireless technology that is expected to The remainder of the paper is organized as follows.
sustain scalability in number of nodes or increased data Section II presents the motivation for this paper. The
rates for the expected application lifetime. various versions and constraints of biofeedback systems in
healthcare and sport are given in Section III. While
Index Terms—Wireless sensor networks, Wearable sensors, Sections IV studies the properties of sensors and actuators,
Internet of Things, Biofeedback systems, Healthcare, Sport Section V presents the properties of wireless technologies
used in healthcare and sport. Section VI explains three
different application scenarios for wireless sensor based
I. INTRODUCTION biofeedback systems. Section VII concludes the paper.
Sensor technology and sensor networks are rapidly
penetrating various applications in healthcare, recreation, II. MOTIVATION
and sports. The latter two fields are identified as one of the The main research motivation of this paper is the
most rapidly growing areas of personal and consumer identification and selection of the most appropriate
mobile applications. Although many manufacturers have wireless communication technologies for various
started developing such products, they seem to be slowed biofeedback applications. As expected, there is no one-
down by sensor network architectural standard fits-all solution to the above challenge. The sources of the
deficiencies. data are sensors that are very heterogeneous in many
Biofeedback systems are important in motor learning in aspects. They produce data rates from of a few bytes per
rehabilitation, sports and in also in other fields of minute for measuring patient’s temperature to a few
medicine where patients in the loop use given feedback Mbit/s for a high-resolution and high-speed camera used
information to influence their physiological processes. in sport. Sensors can be used for measuring physiological
Another area of use is tele-care and health monitoring processes of a human (heart rate, glucose levels, blood
where the same technologies can be used. A general saturation, etc.) to measuring performance of an athlete or
architecture of the biofeedback system is presented in sport equipment (high dynamic movement, high
Figure 1. frequency vibrations, bending, strain, etc.). They require
different levels of quality of service in terms of reliability,
from very reliable medical devices that monitor vital body
Sensor(s) functions to pedometers where reliability is not an issue.
They can produce synchronized or asynchronous data
streams.
User Processing device The same heterogeneity is expressed in the variety of
sensor network technologies; from technologies that cover
body area (BAN) to technologies that cover metropolitan
Actuator(s) area (MAN), from technologies with bitrates of a few
kbit/s to technologies of a few hundreds of Mbit/s, from
Figure 1. General architecture of the biofeedback system. frequencies of 400 MHZ to frequencies of 60 GHz, etc.
Another important problem is a growing spatial density
In a biofeedback system user’s parameters are of wireless sensor nodes. Depending on a sensor network
measured by sensors. Sensor signals and data are technology this fact can represent smaller or bigger
transferred to the processing unit for analysis. The results problem. For example, WLAN technologies perform well
312
7th International Conference on Information Society and Technology ICIST 2017
313
7th International Conference on Information Society and Technology ICIST 2017
314
7th International Conference on Information Society and Technology ICIST 2017
Figure 4. Some of the communication possibilities for biofeedback systems used in healthcare and sports.
the open space version a relay over the gateway or mobile parameters such as reliability, quality of service, and
device is possible, providing a high enough bit rate is sensor device spatial density.
available. An example of usage in healthcare is giving
voice feedback to the person rehabilitating its gait. An REFERENCES
example of usage in sport is giving auditory feedback to a [1] Cavallari, R., Martelli, F., Rosini, R., Buratti, C., & Verdone, R.
gymnast performing practice on a vaulting horse. (2014). A survey on wireless body area networks: technologies
and design challenges. IEEE Communications Surveys &
C. Application scenario 3 Tutorials, 16(3), 1635-1657.
A biofeedback application using signals from a [2] Chen, M., Gonzalez, S., Vasilakos, A., Cao, H., & Leung, V. C.
(2011). Body area networks: A survey. Mobile networks and
combination of high dynamic IMU, sport equipment applications, 16(2), 171-193.
sensors, and actuators with video feedback can be very [3] Cao, H., Leung, V., Chow, C., & Chan, H. (2009). Enabling
demanding. Bit rates can easily exceed 10 Mbit/s. Even in technologies for wireless body area networks: A survey and
the star topology only LAN technologies can satisfy the outlook. IEEE Communications Magazine, 47(12), 84-93.
demands of the personal version of the biofeedback [4] Pyattaev, A., Johnsson, K., Andreev, S., & Koucheryavy, Y.
system. Confined space versions are limited to different (2015). Communication challenges in high-density deployments of
wearable wireless devices. IEEE Wireless Communications, 22(1),
ranges, depending on the LAN technology used. Open 12-18.
space versions are not viable. An example of usage in [5] Adame, T., Bel, A., Bellalta, B., Barcelo, J., & Oliver, M. (2014).
sport is giving real time high-definition video feedback IEEE 802.11 ah: the WiFi approach for M2M communications.
projected to a skier’s goggles in a personal version of the IEEE Wireless Communications, 21(6), 144-152.
system. Even more demanding scenarios are possible [6] Movassaghi, S., Abolhasan, M., Lipman, J., Smith, D., &
when using biofeedback applications in group sports. Jamalipour, A. (2014). Wireless body area networks: A survey.
IEEE Communications Surveys & Tutorials, 16(3), 1658-1686.
[7] Ayub, K., & Zagurskis, V. (2015). Technology Implications of
VII. CONCLUSION UWB on Wireless Sensor Network-A detailed Survey.
Biofeedback systems are important in motor learning in International Journal of Communication Networks and
rehabilitation, sports and in also in other fields of Information Security, 7(3), 147.
[8] Kwak, K. S., Ullah, S., & Ullah, N. (2010, November). An
medicine. Given the heterogeneity of sensors, actuators, overview of IEEE 802.15. 6 standard. In 2010 3rd International
and wireless technologies, countless scenarios of their use Symposium on Applied Sciences in Biomedical and
in biofeedback systems in healthcare and sport are Communication Technologies (ISABEL 2010) (pp. 1-6). IEEE.
possible. We matched some scenarios to the most [9] Deslise, J. J. (2015). What’s the difference between IEEE 802.11
appropriate existing wireless technology that is expected af and 802.11 ah?. Microwaves RF, 54, 69-72.
to sustain scalability in number of nodes or increased data [10] Baños-Gonzalez, V., Afaqui, M. S., Lopez-Aguilera, E., & Garcia-
rates for the expected application lifetime. It is important Villegas, E. (2016). IEEE 802.11 ah: A Technology to Face the
IoT Challenge. Sensors, 16(11), 1960.
to accept that none of the existing wireless technologies [11] Hämäläinen, M., Paso, T., Mucchi, L., Girod-Genet, M.,
can satisfy all possible demands of different application Farserotu, J., Tanaka, H., ... & Ismail, L. N. (2015, March). ETSI
scenarios. We discuss the possible steps towards a TC SmartBAN: Overview of the wireless body area network
solution in terms of a compromise between the range and standard. In 2015 9th International Symposium on Medical
bit rate. In our future work we plan to include other Information and Communication Technology (ISMICT) (pp. 1-5).
315
7th International Conference on Information Society and Technology ICIST 2017
adjukic@rcub.bg.ac.rs
jelena.maric1989@yahoo.com
tamaraoradic@gmail.com
Abstract—Recent studies indicate that every outdoor for sport and recreation and could play an important part
physical activity has positive effect on public health and in the sphere of public health [7].
quality of life of every individual [6]. The growth in wireless In present time, use of Information and Communication
subscriptions, which has reached over 6 billion wireless Technologies have become integral part of everyday life
subscribers in the world, shows that the use of Information in any major city. Today, use of mobile apps is shaping
and Technologies (ICTs), more precisely mobile applications the way we perceive the world around us and can be of
intended for improving health and well being (mobile health great significance in overlapping virtual and real
applications-M-health apps) has become an integral part of
surroundings while constantly increasing available
everyday urban city life [16]. The aim of this research is to
information and possibility for interactive development.
establish the connection between the use of emerging M-
The focus of this research is on the special type of mobile
health apps and outdoor physical activity, public health and
applications, more particularly on the effect of mobile
well-being of citizens. Important research questiones that
health apps (M-health). They are defined as a mixture of
guided this study are: Does the use of M-health apps have a
positive influence on increasing outdoor sport and
smart mobile technologies, medical sensors and ICTs
recreation activities; How could open public spaces be re-
[7,8,9]. M-health combine e-health and smart phone
designed in order for people to use them more frequently for technologies [10].
recreational activities; Which are the possible ways for
upgrading the M-health apps to be more user friendly. The II. MOTIVATION
methodology is based on extensive literature review and Main motivation for this research was to identify the
critical analysis of existing theoretical researches. possible ways for increasing outdoor physical activity and
Quantitative as well as qualitative data presented in this well being of citizens in Belgrade, with the help of ICTs
paper is obtained by using the methods of interviews and and mobile health apps. For the purpose of this paper,
questionnaire conducted with 300 selected stakeholders background research has been focused on analysis of
from Belgrade. The results of the research are showing existing researches and extensive literature review.
general level of physical activity, recreation preferences and Recreation in open space is crucial component of any
habits of citizens in Belgrade, with special focus on the society’s quality of living; Recent studies indicate that
manner, degree and frequency of use of both M-health every form of recreation in the open public space
applications and outdoor spaces. Special contribution of this influence positively on health and wellbeing of an
paper is manifested in presenting the positive effects of M- individual, as well as on general quality of life [11]. The
health apps on increasing outdoor physical activity of users open space model provides places for activity, recreation,
in Belgrade, but also in providing specific gueidilines social engagement and enjoyment. It contributes to the
regarding improvement of M-health apps and open public social and environmental health of the community,
space design acording to users’ suggestions.
providing a sustainable surroundings [12]. At the same
time, interest of healthcare organizations and experts for
I. INTRODUCTION promoting the prevention of health and manners of
healthy behavior among citizens is rising [13,14] which
According to World health Organization non-adequate certainly represents a major turn in the implementation of
physical activity is considered as an important issue M-health applications for promoting health and healthy
regarding public health, especially in urban city areas with living.
great population, such as Belgrade. A large number of
Mobile health utilizes technology, such as smartphones,
researchers reached the conclusion that any physical
to observe and improve quality of public health.
activity in the open air has a positive effect on the health
According to recent researches [15] approximately one in
and well being of individuals, significantly reducing the
five users of smartphone utilize at least one M-health
level of stress and obesity, therefore improving overall
application (app) to boost their goals related to health, and
quality of urban life of citizens [1,2,3,4,5,6]. Open public
38% of users of health application have downloaded at
spaces, with good maintainace and equipment can be used
least one app for physical activity (PA). Nowadays M-
316
7th International Conference on Information Society and Technology ICIST 2017
health apps serve a wide variety of purposes, such as help mobile phone users have downloaded at least one mobile
in smoking cessation, weight loss programs, monitoring health app onto their device and that exercise and nutrition
diet and physical activity, medical treatments, and were the most likely areas of mobile health apps
management of disease [16]. The main advantages of downloaded. Regarding recent studies and trends in using
using mobile phones and similar technologies for ICTs and mobile apps in everyday life, M-health apps are
monitoring health are that devices like this are personal, in expansion and are mainly used by citizens in urban
intelligent, connected, and always with people [17]. areas [20]. In previous researches that had been done by
However, in this research main goal is to analyze the different authors [21] was stated that mobile technologies
effect of M-health apps on the level of outdoor physical cannot physically carry drugs, doctors, and equipment
activity of users in Belgrade, and to examine how both between locations, but they can carry and process
outdoor spaces and M-health apps could be improved. The information in many forms. The crux of M-health apps is
majority of these applications are activated by phone in the exchange of information, which can ease collecting
motion, in other words, the movement of users offers the data in the field of healthcare [22].
following information: the length of the route distance in
kilometers, height difference, calories burned, and a IV. METHODOLOGY
variety of daily, weekly, monthly and annual reports. In this research special attention was brought on
Application may also warn users on insufficient physical methodological process, consisted of both quantitative and
activity during the day. Some operating systems such as qualitative analysis, based on participative and
IOS, used by Apple devices have a version of the collaborative approach-interview and questionnaire
application incorporated in each mobile device. (involvement of users). Method such as content analysis
The usage of mobile health applications has not yet and expert observations were also used during the
been the subject of research in our region, therefore, this research. Having in mind the general aim of the research
paper starts from the assumption that the usage of M- the methodological process was divided into several steps:
health applications in our region increases with the 1. extensive literature review; 2. selection of stakeholders;
increased usage of smart technologies and devices, in 3. developing and conducting the survey; 4. collecting and
other words the use of M-Health applications can have presenting the results. Extensive literature review
positive influence on the increase physical activity by the consisted out of content research regarding current trends
user of application, on the territory of Belgrade. in M-health apps development and outdoor recreational
With current technological advances smartphones offer activities, presented in first part of the paper. Furthermore,
wide variety of opportunities for the expansion and background research served as a knowledge base for
growth of M-health applications, using them to record PA, developing the survey.
and to motivate people to get more involved in PA [18]. In
this research focus was on self-monitoring applications of A. Selection of stakeholders
physical activity (PA), particularly those free and In total, more than 300 people contributed to this
accessible for users on territory of Belgrade. research. Stakeholders participated in interviews and/or
questionnaire analysis. Criteria for choosing the
III. RESEARCH QUESTIONES participants were: different age, gender, education, habits,
level of physical activity, etc. Stakeholders were chosen
First step in this research was to identify three main from the selected location of open public spaces in
research questions: Does the use of M-health apps have a
Belgrade commonly used for sport and recreational
positive influence on increasing outdoor sport and
activities We identified 6 key locations: 1. Sava riverfront,
recreation activities; How could open public spaces be re-
2. Danube riverfront, 3. Ada Ciganlija, 4. Kalemegdan
designed in order for people to use them more frequently
park, 5. Tašmajdan park and 6. Olimp (Figure 1).
for recreational activities; Which are the possible ways for
upgrading the M-health apps to be more user friendly.
According to the International Telecommunication
Union 95% of global population is covered with mobile
service. Due to the expansion of smartphones, 3G and 4G
network the use of applications is constantly increasing
[19]. Mobile devices have reached more people in many
developing countries than power grids, road systems or
water works, offering possibility to improve healthcare
system, with a tremendous opportunity for developing
countries and communities to advance. It is also worth
mentioning that governments are expressing interest in M-
health apps as a complementary strategy for strengthening
health systems and achieving the health-related
Millennium Development Goals (MDGs) in low and Figure 1. Position of 6 key location used for conducting the
middle-income countries. A study published in the Journal survey
of Medical Internet Research shows that 58 % of surveyed
317
7th International Conference on Information Society and Technology ICIST 2017
V. RESULTS
The Results gathered from the participants regarding
general data (Figure 1) showed that majority of the
participants were actively involved in sports (52%) while
74% of all users said that they are still indulged in
physical activity. Almost 25% of the participants claim
that they exercise 2-3 times a week, approximately 20 to
30 minutes (46%).
Figure 3.1 Pie charts -Results of the Special part of the research
Figure 2. Results of the General part of the research
318
7th International Conference on Information Society and Technology ICIST 2017
319
ICIST 2017 Proceedings Vol.2
7th International Conference on Information Society and Technology ICIST 2017
Abstract — In this paper we discuss the communication Standardization is the most important prerequisite for
standards for Internet of Things applications, with focus on global deployment of the specific technology. The use of
Smart City deployment. Different communication standardized information technology has various benefits
technologies can be used for applications based on the types to companies, individuals and users. The main motivation
of devices in the smart environment and their available of this work is to identify implementation challenges for
resources and limitations. A selection criteria for using different communication standards for IoT
appropriate communication standard also depends on the applications and to propose the most promising
specific smart environment application. Promising technologies for the standardization of specific Smart City
technologies for both indoor and outdoor smart applications. Without a comparative understanding of the
environments are considered. Communication standards standards it is quite difficult both to choose the most
comparison is made by considering the various parameters appropriate communication technology and then to design
such as throughput, operating frequency bands, nominal it into an application.
range and energy efficiency. Additionally, it was analyzed
In this paper the comparative analysis of
which standard is the most suitable for appropriate Smart
City application.
communication standards for Smart City deployment is
made. The wireless standards such as IEEE 802.11ah
Keywords — communication standards, comparative WiFi, Bluetooth low energy and IEEE 802.15.4 ZigBee
analysis, Internet of Things, Smart City. PRO for both indoor and outdoor smart environments are
considered. The comparison is based on technology
specific parameters including range, topology, security,
I. INTRODUCTION data throughput, latency, robustness to interference and
power consumption. The main goal of this paper is to
Internet of Things (IoT) has been defined from different
analyze and present potential technologies which can be
perspectives and hence numerous definitions for IoT exist
applied in IoT applications and propose possible
in the literature. The reason for this stems from the fact
directions for standardization. So far many authors
that it is syntactically composed of two terms – Internet
proposed the proprietary solutions, while defining the
and things. The first term pushes towards a network
unique solution for specific IoT application should enable
oriented vision of IoT, while the second tends to move the
global deployment of IoT and Machine to Machine
focus on things (objects). When these terms are put
(M2M) communications.
together, IoT semantically means a world-wide network of
interconnected objects uniquely addressable, based on The rest of the paper is organized as follows. In section
standard communication protocols. Unique identification II, the wireless technology specific parameters which are
of objects and the representation of exchanged relevant for further analysis are defined. Section III
information is important issue. This brings the third contains an exposure of method for communication
perspective of IoT – semantic perspective. The routing is standards comparison. The main part of this paper is
based upon the information itself, so as to find sources explained in section IV, where the results of comparative
which can deliver the requested content without having an analysis are presented and the appropriate communication
a priori location for the content. The main concepts of technology for the specific Smart City application is
these three visions of IoT are presented in [1]. Semantic proposed. Based on obtained results, conclusions are
oriented IoT visions have been already proposed in many pointed out in section V.
research papers such as [2]. Also, according to IP for
Smart Objects (IPSO) alliance, the IP stack is a light- II. WIRELESS STANDARD PARAMETERS
weight protocol that already connects a large number of Parameters of wireless connectivity are mutually
devices and has all the qualities to make IoT a reality. dependent, so it is important to consider all of them when
The investigation in this paper is based on the IoT making a choice of wireless standard. Every application
perspective of things. The focus is on how to integrate the has different requirements, but it is essential for the
IoT objects into a common framework using current overall analysis to go through the main technology
communication standards. The purpose of the IoT specific parameters.
communication technologies is to connect heterogeneous
objects together to deliver specific smart service. The A. Range
network oriented perspective is briefly mentioned just to Range is invariably the first parameter which is
describe incorporation of communication protocol used by considered when choosing a wireless standard. The
proposed technology into the IP architecture. theoretical range given in product specifications has
320
7th International Conference on Information Society and Technology ICIST 2017
always bigger value than in reality, because as soon as a incorporate guard bands at either end of allowed spectrum,
radio is brought inside a building, fading, interference and which, although they lie within the allowed license free
reflections will reduce the range significantly [3]. Range is band, are set aside to ensure that transmissions do not
closely tied up with throughput and output power. extend beyond the band edge.
Wireless performance degrades as the distance between
transmitter and receiver increases. It is not a linear D. Topology
decrease. Generally, the connection maintains a link that The manner in which the connections are allowed
supports a data rate close to the maximum the transmitter between devices is called the network topology. The first,
and receiver can support, until it reaches a point when it most straightforward and still most widely used is simple
starts to decrease. There is no strict definition of where cable replacement, which is also what that particular
range stops. In most cases it is considered as the point topology is called. All of the wireless standards can
where throughput falls by 5-10%. perform it, differing predominantly in their speed of
What underlies the decrease in throughput is the fact throughput.
that data start to get lost in the background noise. As Next comes point-to-multipoint topology or the piconet,
transmitter and receiver move further apart, the strength of where one device acts as a master or central device,
signal arriving at the receiver gets smaller, until it connecting to several other devices. This is also known as
becomes impossible to detect the signal from the a star network. Different standards allow differing
background noise. Radio signal interference also results in numbers of concurrent live connections in a piconet. The
corruption of individual bits within the data stream. maximum number of connections is determined by the
size of the local address field within each standard [4].
B. Throughput Although topologically identical, client server topology is
Throughput is the rate of successful data delivery over a considered as a distinct form of piconet or star-network
communication channel. It fails as the range increases and topology, as it implies a central unit that acts as an access
the Bit Error Rate (BER) rises. To get the highest point to provide connectivity to another network. A
throughput, wireless standards attempt to cram more than limitation of the piconet is that a slave is a slave. It can
one bit of information into each bit transmitted. Bluetooth only communicate with its master device. If slave nodes
and WiFi maximize throughput by trying to select the are given the ability to talk directly to each other, the
most complex coding scheme that the link quality will topology develops into a cluster network. Cluster
support. As the link quality declines, they will networks have the advantage that some of the workload
automatically step down to a less aggressive coding can be removed from the master, as transactions no longer
scheme until they achieve a reliable link [4]. It is fairly need to pass through it. However, the master needs to set
obvious that the more data crammed into each bit, the up the routing tables in the cluster nodes in the first place,
more susceptible it will be to noise, so the highest data which generally increases its complexity.
rates will typically have the lowest range. The important The next level of complexity comes with the scatternet
thing regarding range and throughput is to choose wireless topology. This is an extension of a piconet, but allows a
standard that works well within the envelope of slave or peripheral device to be simultaneously connected
performance required for specific application. to two or potentially more networks [3]. It has the
limitation that although a device may be part of multiple
C. Interference networks, that device generally only functions within one
If two radios within range of each other both transmit at at a time, it is not able to act as a bridge between the two
the same time and at the same frequency, this results in networks. That is the reason for deployment of tree or
signals arriving at the respective receivers that interfere hierarchical network. The difference is that the backbone
and are likely to be corrupted. Radios using reliable data nodes of a tree network have routing capabilities, whereas
links will not get an acknowledgement of reception and so the connecting nodes in a scatternet may only share data.
they will attempt to retransmit the data. Different radio This means that any node connected to the backbone can
standards use different techniques to try and ensure that send data to any other node within the tree.
they don’t clash with each other on the retransmission, but At the top end of complexity is the mesh network,
even if the next transmissions do not overlap and are where individual nodes may be able to talk directly to
successful, it means that the throughput will be decreased each other if they are within wireless range, or through a
as a result of retransmission. In a situation where this selection of alternative routes. Whereas the tree network
starts to occur, it will probably happen repeatedly. uses a single backbone for all inter-cluster routing, mesh
Each standard defines a spectral mask for its networks allow multiple routes throughout the network.
transmission, which is a set of limits on the output power This adds redundancy to the network, allowing traffic to
within and around each transmit channel [3]. These are be maintained even if some of the nodes or links between
defined to meet the requirements of global regulatory nodes fail.
regimes as well as the practical performance of low-cost
radios. In an ideal world, a radio’s transmissions would E. Security
reduce to zero outside the extent of its channel. Real radio Devices connecting through a wireless link need to be
implementations are not that perfect, they provide a peak sure that they have connected to the correct device. As
output at the centre of the channel, and the power then anyone with a suitable receiver can listen in to the wireless
falls off gradually at either side. As well as the issue of not conversation, they also need to encrypt their data, so that
interfering with a radio on an adjacent channel, there is an eavesdropper who intercepts it cannot decode it. These
generally a tighter requirement at the two ends of the two processes are known as authentication and encryption.
band, where spectrum is allocated for other applications. Each of the wireless standards has a credible approach to
To help protect them from each other, most standards security.
321
7th International Conference on Information Society and Technology ICIST 2017
F. Power consumption of data and coexisting with other radios. This approach is
Many wireless products are designed to be mobile, also based on the tools and techniques that are needed,
which implies that they will run on batteries. Because of along with a discussion of the system design and
that, minimizing the power consumption of the wireless architecture necessary to embed wireless connectivity into
link is critical. a wide range of devices. Finally, the major opportunities
for communication technologies including IEEE 802.11ah
A low-power radio enables data to be sent efficiently WiFi, Bluetooth low energy and IEEE 802.15.4 ZigBee
over the air. It includes power management schemes to PRO are highlighted, exploring the key IoT applications
ensure that the radio can remain in deep sleep mode where it is expected to make its biggest impact.
whenever possible. Power consumption goes down as the
range and throughput decreases. There’s a complex IV. RESULTS
relationship between coding schemes and energy
efficiency. As more bits are put into each packet, the In this section, results obtained by described
processing requirements increase, but so does the BER for comparison method are presented. The aim was to
the link, so although a higher coding rate may be compare wireless standards designed for IoT applications
technically more efficient, as it requires fewer packets, on physical layer. Technology specific parameters have
this may be offset by the need for a higher transmit power been examined and their influence on wireless technology
to get a reliable transmission [3]. performances is analyzed.
Most low-power applications are centred around IEEE 802.11ah is a new wireless networking protocol
sensors, which only need to communicate their state that operates in the sub-gigahertz license-exempt band
intermittently. Rather than looking at the energy per bit of (900 MHz), compared to conventional WiFi networks
transmitted information, the battery life depends on the operating in 2.4 GHz and 5 GHz bands [5]. The
duty cycle and how quickly radio can move from its sleep consequence of using lower frequency band is the
state to perform a wireless transaction and then return to extended range. This standard enables single-hop
sleep. For many radios the current consumption in receive communication over distances up to 1000 m. Relay
mode is greater than it is in transmit mode. So if a Access Points are used to extend the connectivity to
transmission needs to be acknowledged, it is important Access Points that are two-hops away [6]. Bluetooth low
that it is done as quickly as possible, so that the receiver energy and ZigBee PRO standards usually operate in
does not sit around burning current. Also, there are times unlicensed Industrial, Scientific and Medical (ISM) 2.4
associated with the device waking up, stabilising and GHz band. The usage of higher frequency bands decreases
performing its measurement before it can transmit [4]. the coverage range for these two technologies (maximum
100 m).
G. Latency As more devices use unlicensed frequency bands, the
Wireless connection has finite delay between data being specific communication technology have to deal with
presented at the transmitting end and being received at the increased interference level. ZigBee PRO standard is
other. Latency, which is the measure of this delay, is using direct sequence spread spectrum (DSSS) technique
different for each wireless standard. It is also which can accommodate data rates up to 250 Kbps. The
implementation dependent and may vary for different DSSS technique works on the principle that much of the
hardware. Most of the IoT devices use data services noise within a frequency band will come from narrowband
instead of voice, so considered IoT applications are not transmissions. Rather than trying to avoid these, DSSS
delay sensitive. This technology specific parameter has uses a spreading function to transform its signal across a
important role in analysis of real-time IoT applications. wider frequency range within the spectrum. At the
receiving end, the reverse transformation takes place in a
III. COMPARISON METHOD correlator, reproducing the original signal. Any noise
arriving at the receiver goes through the same reverse
Wireless standards have more in common than is transformation, with the result that it is reduced from
generally realised. They have evolved from a few basic being of similar amplitude to the received signal, to an
radio configurations, distinguishing themselves with order of magnitude or more lower. Classic Bluetooth
protocol stack that adapt these radios to particular standard takes a different approach using frequency
connection topologies. The focus in this comparison hopping (FH) technique. The radios jump from channel to
method is on commonalities, explaining where and why channel in synchronization with each other 1600 times
the communication standards differ and how these every second. The reasoning behind this is that if a
differences affect their use. Bluetooth radio encounters interference and packet is lost,
The features that are relevant for choosing a radio then it will be repeated at a different frequency, where
including range, topology, security, throughput, latency there is a good chance that there is no interference.
and power consumption are described in previous section. Bluetooth low energy has implemented a modified
These are the core differences that should be considered as frequency hopping scheme known as adaptive FH which
the first step in making a choice which wireless standard enables data rates from 125 Kbps to 1 Mbps [7]. Adaptive
to choose for appropriate IoT application. After that FH works by scanning the spectrum, looking for channels
comes the essential part of the analysis explaining how to which are being used. The radios then modify their
move from a standard to a product and the way in which hopping sequence to avoid these channels. IEEE 802.11ah
wireless standards can be used for applications. standard is using orthogonal frequency division
This looks at the practical side of the wireless multiplexing (OFDM). OFDM is a method of digital
parameters that are mentioned, including topologies, signal modulation in which a single data stream is split
methods for making connections and securing them, across several separate narrowband channels at different
managing power consumption, transmitting different types carrier frequencies. This technique has high spectral
322
7th International Conference on Information Society and Technology ICIST 2017
efficiency and ability to cope with narrowband co-channel can be put in bins and detect when rubbish reach certain
interference and frequency-selective fading due to level (Waste Management). Wireless sensor networks
multipath propagation. Using OFDM with appropriate using ZigBee technology can be spread around the city
modulation and coding scheme, IEEE 802.11ah standard and measure noise level in some part of the day and make
provides data rates from 650 Kbps to 78 Mbps. These urban noise maps which are quite popular nowadays.
techniques differ but they are all good in decreasing
interference level in the communication system. TABLE I: TECHNOLOGY SPECIFIC PARAMETERS
Although Bluetooth low energy and ZigBee PRO Parameter WiFi Bluetooth ZigBee PRO
standards achieve much lower data rates than IEEE 802.11ah low energy
802.11ah, that is enough for most of the IoT applications Frequency range 900 MHz 2.4 GHz 900 MHz, 2.4 GHz
which are proposed to be implemented using these two Coverage 1 km 100 m 10-100 m
standards. IEEE 802.11ah standard has tree network
Throughput 0.65-78 Mbps 0.125-1 Mbps 20-250 Kbps
topology where an access point allows multiple clients to
connect to a separate network. Bluetooth low energy has Interference robustness OFDM adaptive FH DSSS
scatternet network topology which is inherited from the Topology tree scatternet mesh
classic Bluetooth standard. ZigBee PRO standard has the Security WPA2 128 bit AES 128 bit AES
most flexible network topology. That is mesh topology Power consumption 1 mW-1 W 0.01-0.5 W 1-100 mW
which provides redundancy and reliable wireless links. Latency 47 ms 6 ms 108 ms
From the security point of view, the analyzed wireless
standards have similar level of security so they are not V. CONCLUSION
competitive regarding to that criteria. The highest data
rates provided by IEEE 802.11ah have as a consequence This paper shows the results of communication
big power consumption comparing to other two standards standards comparative analysis for Smart City
and requires power suppliers. ZigBee PRO provides long applications. The IEEE 802.11ah WiFi, Bluetooth low
battery lifetime due to low duty cycle so it is the most energy and IEEE 802.15.4 ZigBee PRO standards are
energy efficient wireless technology. Bluetooth low considered in the analysis. The technology specific
energy has improved power management algorithm parameters on physical layer are compared and it was
comparing to the classic Bluetooth standard. This determined which wireless standard can be applied for the
technology also achieves the lowest latency due to short specific IoT application deployment. Improvements of
distance between the devices. IEEE 802.11ah standard has IEEE 802.11ah standard in terms of throughput, delay
improved latency comparing to the previous IEEE 802.11 performance and network coverage range makes this
standards. ZigBee PRO has the biggest latency which is standard ideal for large areas with high number of IoT
approximately 100 ms per one hop, but this latency is not devices. High energy efficiency and mesh network
critical for the most IoT applications [8]. Table I gives a topology defines IEEE 802.15.4 ZigBee PRO standard as
point-wise comparison for the analyzed standards. the primary solution for IoT applications which require
environment monitoring. With low latency for exchanging
After technology specific parameters comparison, the
data, Bluetooth low energy became promising technology
most appropriate Smart City IoT applications are proposed
for short range real-time IoT applications.
for each analyzed wireless standard. The most suitable IoT
application which can be implemented by IEEE 802.11ah
is one which requires long range and high number of IoT REFERENCES
devices. The Relay Access Points can be spread all around [1] D. Bandyopadhyay, and J. Sen, “Internet of Things: Applications
the city and detect movement and dynamically turn on and challenges in technology and standardization,” Wireless
Personal Communications, vol. 58, pp. 49–69, May 2011.
lights (Smart Street Lighting) or detect traffic jam and
[2] M. Amadeo, C. Campolo, A. Iera, and A. Molinaro, “Named data
damaged roadways and dynamically propose re-routing networking for IoT,” 2014 European Conference on Networks and
for end users (Vehicle Traffic Monitoring and Smart Communications (EuCNC), vol. 1, pp. 1-5, June 2014.
Parking). [3] N. Hunn, “Essentials of Short-Range Wireless,” 1st ed. New York:
Bluetooth low energy standard is the appropriate for Cambridge University Press, 2010.
IoT applications where devices should be on short [4] C. X. Mavromoustakis, G. Mastorakis, and J. M. Batalla, “Internet
distance and constantly measure the physical parameters. of Things (IoT) in 5G Mobile Technologies,” Modelling and
Sensors can be put on human body and measure blood Optimization in Science and Technologies Book Series, vol. 8,
Switzerland: Springer International Publishing, 2016.
pressure and body temperature (Smart Healthcare) or they
[5] S. Sundaram, “A Quantitative Analysis of 802.11 ah Wireless
can measure speed and burned calories for runner (Sport Standard,” International Journal of Latest Research in
and Fitness Applications). Using this technology, the Engineering and Technology (IJLRET), vol. 2, pp. 26–29,
costumer with specific shopping profile in mall can be February 2016.
discovered and the advertisement can be sent to him [6] N. Ahmed, H. Rahman, and Md. I. Hussain, “A comparison of
according to his shopping profile (Smart Advertising). 802.11ah and 802.15.4 for IoT,” Information and Communications
Technology Express, vol. 2, pp. 100–102, September 2016.
ZigBee PRO offers full wireless mesh low-power
networking capable of supporting more than 64000 [7] K. Cho, C. Jung, J. Kim, Y. Yoon, and K. Han, “Modelling and
analysis of performance based on Bluetooth Low Energy,” 2015
devices on a single network. With mentioned properties, IEEE Latin-American Conference on Communications
this standard is the most suitable for IoT applications (LATINCOM), vol. 1, pp. 1–6, November 2015.
which require periodical measurements of physical [8] M. Franceschinis, C. Pastrone, M. A. Spirito, and C. Borean, “On
environment such as water or gas metering (Smart the performance of ZigBee PRO and ZigBee IP in IEEE 802.15.4
Metering). Sensors can measure humidity of the ground in networks,” 2013 IEEE International Conference on Wireless and
city parks or air pollution and make actions due to Mobile Computing, Networking and Communications (WiMob),
environmental changes (Environment monitoring). They vol. 1, pp. 83–88, October 2013.
323
7th International Conference on Information Society and Technology ICIST 2017
324
7th International Conference on Information Society and Technology ICIST 2017
325
7th International Conference on Information Society and Technology ICIST 2017
source tools, and cloud platforms such as the Dydra and compiles SPARQL queries to native x86 machine
platform for prototyping the solution. code. The distributed RDF storage system is built in C,
relying on Raptor, http://librdf.org/raptor/ at the interfaces.
III. TRAFFACCS IMPLEMENTATION Punya framework is derivative of the MIT's App
Inventor platform for developing Android applications.
A. Design Based on block-based programming approach, the
Drawing on a wealth of sensors in smartphones, we application development in Punya is very fast, and thus
created an application for smartphone devices that is helps researchers to focus on solving the problem. Part of
based on the Android operating system. The TraffAcc source code behind the block-based coding is presented in
application is aimed at recording data on traffic accident Figure 2.
in a very fast and efficient way, as is presented in Figure
1. IV. TRAFFACC VALIDATION
After starting the application, the user needs to login to Our research on the state-of-the-art applications for
his/her Twitter account (step 1). After that, the user takes a emergency notification has showed that there aren't many
picture of the incident itself using the smart phone camera applications in the domain that utilize graph databases and
(step 2) and enters details of the accident (type of Linked Data technologies.
accident, e.g. with death, with injuries, with material
damage, step 3). The exact date and time of the accident is A. Linked Data Approach
created automatically when the user takes the photo. By The term Linked Data here refers to a set of best
pressing the ‘submit’ button (step 4), the data will be practices for publishing and connecting structured data on
stored on server in RDF triple form. The application will the Web. These best practices have been adopted by an
inform the user about the uploading of information in the increasing number of data providers over the past three
central data store. years, leading to the creation of a global data space that
contains many billions of assertions - the Linked Open
B. Implementation Data (LOD) cloud, http://lod-cloud.net/. The cloud has
For the purpose of testing the application, we used the been enlarged from 12 datasets in 2007 to 1,139 in
Dydra database, a powerful graph database in the cloud, January 2017.
https://dydra.com/, i.e. hosted on Amazon Web Services, Our research on the state-of-the-art applications for
http://aws.amazon.com/. The Android application - emergency notification has showed that there aren't many
TraffAccs was developed with the Punya framework, applications in the domain that utilize graph databases and
http://punya.mit.edu/project. Linked Data technologies.
Both tools (Punya as well as Dydra) are results of Linked Data enables datasets to be linked together
experts from MIT CSAIL (Computer Science and through references to common concepts. HTTP URIs are
Artificial Intelligence Laboratory, used to identify any entity or concept (people, locations,
http://www.csail.mit.edu/) with the aim to help researchers products, events etc. in the LOD cloud), so that data
in the field of semantic technologies develop Android consumers are provided with more information when
Linked Data applications in a quick and easy way. accessing data, including relationships, i.e. links to other
Dydra back-end infrastructure is based on core related URIs. The standard for the representation of the
messaging technologies such as AMQP, information that describes those entities and concepts, and
http://www.rabbitmq.com/ and Redis, http://redis.io/. is returned by dereferencing the URIs, is RDF, see
Currently, Dydra is offered as a commercial database via https://www.w3.org/RDF/.
the company Datagraph, Inc. The proprietary distributed
SPARQL query engine, called SPOCQ, is built in Lisp
326
7th International Conference on Information Society and Technology ICIST 2017
327
7th International Conference on Information Society and Technology ICIST 2017
V. CONCLUSION REFERENCES
Smart technologies revolutionize the way the cities [1] B. R. Lindsay, “Social Media and Disasters: Current Uses, Future
locate, mitigate, and prevent safety issues. The current Options, and Policy Considerations”, Congressional Research
prototype demonstrates the use of the TraffAccs Service, https://www.files.ethz.ch/isn/133049/R41987.pdf
application in the data collection process only. The [2] T. Ichiguchi, "Robust and Usable Media for Communication in a
presented results have a dual role, on the one hand, to Disaster", Quarterly Review, 4, October 2011, pp. 44-55,
http://data.nistep.go.jp/dspace/bitstream/11035/2871/1/NISTEP-
introduce a citizen-centric application for reporting traffic STT041E-44.pdf
accidents, and on the other, to demonstrate a wide range [3] M. Romano, T. Onorati, I. Aedo, and P. Diaz, “Designing Mobile
of possibilities of the Punya frameworks for building Applications for Emergency Response: Citizens Acting as Human
cloud services. Sensors”, Sensors 2016, 16, 406, doi:10.3390/s16030406.
Although in this very early phase it is obvious that the [4] S. Auer, V. Bryl, S. Tramp (Eds) Linked Open Data -- Creating
solution has advantages over the classical traffic accident Knowledge Out of Interlinked Data, LNCS vol. 8661, Springer
notification channels because it provides means to collect International Publishing. ISBN: 978-3-319-09846-3 (Online)
a lot of information in a very simple and efficient way [5] Berners-Lee, T. (2006). “Design Issues: Linked Data”. Retrieved
from http://www.w3.org/DesignIssues/LinkedData.html
328
7th International Conference on Information Society and Technology ICIST 2017
[6] K. Derdus, and V. Ozianyi, “A Mobile Solution for Road Accident [8] F. Schütte, R. Casado, E. Rubiera, “Solving interoperability issues
Data Collection”, Pan African International Conference on in cross border emergency operations”, in: T. Comes, F. Fiedrich,
Information Science, Computing and Telecommunications, 2014. S. Fortier, J. Geldermann, T. Müller (eds). Proceedings of the 10th
[7] S. Consoli et all, “Producing Linked Data for Smart Cities: The International Conference on Information Systems for Crisis
Case of Catania”, Big Data Research Volume 7, March 2017, Response and Management, Baden-Baden, Germany, 2013.
Pages 1–15.
TABLE I.
COMPONENTS USED TO BUILD THE TRAFFACCS APPLICATION
Palette Description1
Nonvisible components
Camera Media component takes a picture using the device’s camera. After the picture is taken, the name of the file on the
phone containing the picture is available as an argument to the AfterPicture event.
Location Sensor component providing location information, including longitude, latitude, altitude (if supported by the device),
sensor and address. This can also perform geocoding, converting an address (not necessarily the current one) to a latitude
(with the LatitudeFromAddress method) and a longitude (with the LongitudeFromAddress method).
Location sensor should be enabled (property Enabled set to True), and the device should allow location sensing
through wireless networks or GPS satellites (if outdoors).
Location information might not be available immediately when an app starts. The user will have to wait a short time
for a location provider to be found and used or wait for the OnLocationChanged event.
LinkedData Linked Data component.
Twitter Social component that enables communication with Twitter. Once a user has logged into their Twitter account (and
the authorization has been confirmed as successful by the Is Authorized event), many more operations are available:
• Searching Twitter for tweets or labels (SearchTwitter)
• Sending a Tweet (Tweet)
• Sending a Tweet with an Image (TweetWithImage)
• Directing a message to a specific user (DirectMessage)
• Receiving the most recent messages directed to the logged-in user (RequestDirectMessages)
• Following a specific user (Follow)
• Ceasing to follow a specific user (StopFollowing)
• Getting a list of users following the logged-in user (RequestFollowers)
• Getting the most recent messages of users followed by the logged-in user (RequestFriendTimeline)
• Getting the most recent mentions of the logged-in user (RequestMentions)
Palette Description
Clock Sensor component that provides the instant in time using the internal clock on the phone. It can fire a timer at
regularly set intervals and perform time calculations, manipulations, and conversions.
Methods to convert an instant to text are also available. Acceptable patterns are empty string, MM/DD/YYYY
HH:mm:ss a, or MMM d, yyyy HH:mm. The empty string will provide the default format, which is “MMM d, yyyy
HH:mm:ss a” for FormatDateTime and “MMM d, yyyy” for FormatDate.
Visible components – User Interface (UI)
Button Buttons are components that users touch to perform some action in the app. Buttons detect when users tap them.
Many aspects of a button’s appearance can be changed. You can use the Enabled property to choose whether a button
can be tapped. In TraffAccs the following buttons are used: Twitter, Camera, Submit, and Location.
TextBox Users enter text in a text box component. The initial or user-entered text value in a text box component is in the Text
property. If Text is blank, you can use the Hint property to provide the user with a suggestion of what to type. The
Hint appears as faint text in the box. In TraffAccs, TextBox is used for inserting details.
Label Labels are components used to show text. A label displays text specified by the Text property. Other properties, all of
which can be set in the Designer or Blocks Editor, control the appearance and placement of the text.
In TraffAccs are labels for presenting date and time, location and caption.
LinkedData-Form A form for grouping data properties. LinkedDataForm is subject, grouped data are predicates, and their data are
objects in RDF tripe. In TraffAccs, LineddDataForm is used (see Figure 4).
329
7th International Conference on Information Society and Technology ICIST 2017
jovan.eps@gmail.com, jelena.luzija@gmail.com
Abstract — This paper explores the relationship between progress in the field of transfer functions with SoC /
performance of different generations of MCU solutions for FPGA system on these solutions. Previous works [2] [3]
classical computing processing, as well as FPU/DSP [4] show, that computer systems based on low power
processing of input information. Finally, it questions how multi-core SoC CPUs with HMP can take role of pre-
fast software-controlled I/O ports (GPIO) are. By using the processing a very fast local sensor network data prior to
battery of standardized computer tests, and custom GPIO, sending them to Big Data system, with new advance in the
GPU / DSP tests, the report gives clear and measurable area of MCU’s performance. The central question in this
results. The results of measuring carried out in the study researc was whether this new generation of MCUs has
clearly show that it is possible to unload the network sufficient processing power to take part of the pre-
resources necessary for transmitting large quantities of processing (IoT gateway) role.
information if they represent a part of a complex pre-
treatment data for Big Data processing systems on the MCU II. CYBER PHYSICAL SYSTEMS (CPS) AND
platforms themselves with IoT sensor devices. INTELIGENT DEVICES
Keywords: Microcontrollers - MCU, Internet of Things - Current CPS concept can be described as integration of
IoT, SoC development systems, Cyber physical systems, Big sensor networks embedded with microcontrollers and SoC
Data, Sensor system, Sensor networks, RISC CPU systems with provided internet connection (IoT) on local
Architecture, Floating point unit – FPU, Digital signal side to cloud computing infrastructure that process,
processing DSP, ARM, PIC32, Xtensa LX106, ESP32, analyze, manage and store data as part of Big Data system
NodeMCU IoT development boards, Arduino development on centralized distant/global side, with newly-developed
boards, GPIO. challenges.
Development of this concept has evolved in past several
years from early proposals like ARM-IBM ARM mbed
I. INTRODUCTION platform (2014) to new dimension where AI-enabled
intelligent devices serve in “enhanced intelligence to IoT
In early 2016, IoT in its scope and complexity exceeded
applications, meeting the demands of analytics, learning,
the Internet PC - mobile world. According to the latest
estimates, the world is now using about ten billion devices VR and AR applications in robotics, machine vision and
and computers, while IoT devices are numbering more IoT gateways” [5].
than a hundred billion units [1]. If we add diversity by Currently, the main promoter of development in this
which PCs (excluding servers) have two different forms, direction is the need for intelligent systems aimed for
while mobile devices are found in dozens, IoT devices autonomous vehicle steering. This application of the
exist in millions of forms and shapes and are a leader in concept requires robust MCU/SoC systems with high
this area as well. The exponential increase in the number performances, capable of real-time independent work,
of IoT / sensor system inevitably leads to the need for with a high degree of reliability. If we take the fact that a
higher volume transmission and processing of contemporary car already has about a hundred
information. The existing network infrastructure and microcontrollers built-in for current needs, and that the
Internet connections with Big Data systems become a vehicles will need to communicate not only within their
bottleneck of IoT - Big Data architecture. This has led to own system, but also with other moving cars over the
the need that treatment and processing is executed locally shared link in the near future, it becomes clear why it is
from within the IoT / sensor networks. Intel and other necessary to process as much information as possible
leading manufacturers have embraced this idea under locally. It will be expected from high-performance MCUs
different names: local, on-premises, before/pre-processing to independently process all their peripheral sensor
and likewise. subsystems and synthetise only important data, which
shall be shared with other vehicles and BigData systems.
Since 2014, with the emergence of a new generation of In this way, the global network will be relieved of
high-performance 32-bit RISC microcontrollers based on excessive traffic and amount of data.
the ARM Cortex-M4-7, MIPS 32 and Tensilica Xtensa
LX106 architectures, adding FPU / DSP cores and support The need to enable the vehicle steering systems to 'see'
for network and wireless technology, have made a great their surrounding through a combination of high-
330
7th International Conference on Information Society and Technology ICIST 2017
resolution optical sensors and LIDAR systems can be information quantities. By relieving the systems of the
compared to the living world, where eyes, as the sense of need to collect more remote system data, which then need
vision, have lead to development of animal species in the to be preprocessed, the microprocessors will preserve the
early period of evolution. Just as the eyes have required a systems' necessary capacities for faster and more
more complex nervous system, the new optical sensors in demanding analytics of retrieved data, more valuable in
sensor systems require the ability to process a volume of meaning.
information larger by an order of magnitude.
Consequently, the existing architecture of sensor systems
becomes increasingly complex on a daily basis, giving a III. NEW GENERATION OF MCUS
new concept to the IoT, currently consisting of sensor
IoT connected device boom on the market in 2014,
systems passively collecting the information and
forwarding it via Internet to cloud-centralised Big Data was followed by the introduction of low-cost ESP8266
hubs for processing and analytics. WiFi SoC modules based on 32-bit RISC Tensilica
Xtensa LX106 MCUs with support for Floating point unit
Just like with the animal kingdom, where the
emergence of eyes has overrun the ganglion nervous - FPU and Digital signal processing - DSP with base
system with multiple nodes in order to develop more clock of 80-160MHz, and 64KB+96KB RAM. This
complex centralised systems including bigger brains, it is novelty was full on chip integrated 2.4GHz radio
certain that the further development of intelligent sensor transceiver needed for IEEE802.11n – WiFi. With the
systems will be directed towards higher capacity of price of just a few USD, modules like the one shown in
autonomous processing by so-called peripheral Figure 2 were making their way to all available IoT
intelligence. At the same time, this is the only way to add device applications.
adequate performances to future autonomous vehicles.
The existing IoT-Big Data system includes the data
transfer using asynchronous packet network such as
Internet, and their later central data processing in a remote
Big Data centre. It is still impossible to enable an adequate
response speed in cases where a real-time control is
necessary next to monitoring a large volume of sensor
information. This practice will be even less possible in the
future.
Recognition
Sensors Memory /
This device marks a distinctive line that separates the
Analysis / Experience new IoT-ready MCUs from other 32bit MCUs, by
Processing requiring them to have more than 100MIPS, integrated
FPU + DSP, and, most importantly, built-in Ethernet or
Physical
System / World
Actuators Learning WiFi support. The rest of ESPs specifications are shown
in Table I. Figure 3 presents its typical IoT application.
In the very same year (2014), the market has seen an
Decision
emergence of the new 32bit generations of IoT-ready
microcontrollers with a range of architectures and
manufacturers, beginning with ARM Cortex M4 series,
based on ARMv7M architecure, introducing an optional
Figure 2. An expanded concept of autonomous AI enabled IoT CPS
system (based on the new ARM platform for AI -2016.)
FPU-SP unit, ethernet support, MicroCHIP PIC32MZ
high-performance MCU series with similar options based
on MIPS32 RISC architecture. This first series had a
Figure 1 shows the elaborated concept of introducing sufficient level of performances for processing the
artificial intelligence to embedded microcotrolling medium-speed sensors and loading local Web servers,
systems, as well as their connection over sensors and enabling performances in the range between 100-
initiators with the real physical system. Therefore the need 200MIPS. The emergence of the second generation of
for greater autonomy of processing capacities for local microcontrollers in the end of 2015 and during 2016,
sensor systems and initiators is inevitable. It will be being led by ARM Cortex M7, dual core ESP32 and faster
expected that the high-performance microcontrollers PIC32MZ microcontrollers (MIPS32-Warrior) has
intended for work with critical real-time conditions will strengthened these solutions’ positions. One shared feature
individually process all their peripheral sensory of all these is having built-in FPU+DSP units, LAN
subsystems by synthesising only a small set of data which support, WiFi (+BT) network connection and significantly
will be shared with other vehicles and Big Data Systems more RAM and FLASH ROM neccessary for more
either asynchronously or in real time. complex programmes and multimedia contents. In
This approach will relieve the global network resources average, their performances are greater than 200MIPS
from excessive traffic, and Big Data systems from (32bit) on Dhrystone 2.1 benchmark tests and 2MFLOPS
needless preprocessing and compression of excessive (IEEE 754 64bit-floating point) on Linpack tests.
331
7th International Conference on Information Society and Technology ICIST 2017
332
NET Computing power GPIO ports speed C /Asm.
32bit Intiger HW
Floting point Ring- Groups
instr.per sec. Bit-banging arithmetic Clock / RAM CPU/MCU
instr./sec. oscilator (2- CPU/MCU Platforms by
LAN DMIPS* (I/O toggle acc./ FPU / (cash) architecture
MFLOPS** Dhraystone (C) GIPO’s inv. generation
Linpack DP rate) DSP
v2.1 VAX MIPS loop)
LAN 100 700MHz / ARM1176 (v6) Broadcom Raspberry
42 875 8.7MHz 22 MHz FPU 32bit
Mbps 512MB BCM2835 Pi 1 B+
ARM Cortex-A7
LAN 100 FPU+Neon Broadcom
170.92 2019.41 16.92MHz 42.16 MHz 900MHz / 1GB (v7-A) 32bit Pi 2
Mbps SIMD 4x core BCM2836
SoC CPU
ARM Cortex-A53 32-64bit
LAN 100 FPU+Neon Broadcom
180.14 3039.87 19.3MHz 55.8 - 57 MHz 1.2GHz / 1GB (v8-A) 64bit Pi 3 2012...
Mbps SIMD+DSP 4x core BCM2837
ARM Samsung
LAN 1 FPU+Neon Cortex-A15/A7
885.03 8302.97 9.9MHz 4.. 22.5 MHz 2GHz / 2GB (v7-A) 32bit
Exynos5422 Odroid XU4
Gbps SIMD+DSP
8x core Cortex
MCU
10.15 2-cycle AVR 8bit Arduino Uno
/ 0.0943 1.5894 MHz 3.974MHz 16MHz / 2KB RISC
ATmega328P 8bit
(20) Multiplier / Nano V3
1996/7
HW int32 ARM
51.17 Cortex-M3 Atmel Arduino
/ 0.618 1.998 MHz 4.422MHz MUL/DIV 1- 84MHz / 96KB
TABLE I.
333
cycle 32bit
HW MCU
76.28 MUL/DIV 40-50MHz / MIPS32 M4K PIC32MX250 ChipKit 32bit
/ 0.369 2.5 MHz 4.000MHz 32bit
(83) 32x32bit-1- 32KB F128xx PIC32MX 2004-
cycle
2007...
HW int32 ARM
Cortex-M3 STM32F103R Embedded
/ 0.552 91.75 723.4kHz 1.5911 MHz MUL/DIV 1- 72MHz / 20KB (ARMv7-M) B Pi
cycle
SUMARY TABLE OF MICROCONTROLLER GENERATIONS
32bit
Tensilica Xtensa
920kHz 2.294MHz@16 FPU 80-160MHz / Node MCU
/ 1.207 113.40 LX106 32-bit ESP8266EX
@160 0 +DSP 64KB +96KB MCU 1.0
176 (224.8)- Tensilica Xtensa
2.805 1.846MHz 4.543 FPU 160-240MHz / LX106 32-bit Node
/ 411.11 on 1- MCU
ESP32s
( 1x core) MHz@240 MHz@240 +DSP 520KB MCU32
2xcore 2x core
LAN 200MHz /
MCU IoT
MIPS32 Warrior
7th International Conference on Information Society and Technology ICIST 2017
IV. COMPARATIVE TEST AND INDICATORS Figures 4 and 5 show the performances of the tested
systems in two most significant synthetic tests,
A. Synthetic test results Dhraystone 2.1 for integer int32 operations and Linpack
test package for floating-point operations (double
precision). [7] [8]
Dhrystone
10000
8302,97 Dhrystone SoC CPU
Dhrystone MCU
3039,87
Expon. (Dhrystone MCU)
2019,41
875
1000 763,05 744,64
319 y = 12,077e0,48x
R² = 0,9299
MIPS
176
113,4
91,75
100 76,28
51,17
10,15
10
Cortex-M3…
MIPS32 Warrior
ARM Cortex-A7
Tensilica Xtensa
- down: ARM
ARM Cortex-M7
ARM Cortex-M7
- down: MIPS32
A53 (v8-A) 64bit
Cortex-M3 (v7-
M-class M5150
ARM Cortex-
ARM 1176 (v6) -
Tensilica Xtensa
A15 (v7a) 8x
LX106 32-bit 2x
down: AVR 8bit
- down: ARM
LX106 32-bit
ARM Cortex-
(v7) 4core
M4K 32bit
(v7E-M)
(v7E-M)
4x core
core
32bit
M)
core
RISC
System
Linpack DP
1000000 885030
Linpack DP SoC CPU
Linpack DP MCU
170920
180140 Expon. (Linpack DP MCU)
100000
42000
22116
11670
10000
3588 y = 76,675e0,594x
2805
kFLOPS
R² = 0,88
1207
1000 618 552
369
100 94,3
10
ARM Cortex-A7
Tensilica Xtensa
ARM Cortex-A53
ARM Cortex-A15
MIPS32 Warrior
ARM Cortex-M7
ARM Cortex-M7
Cortex-M3 (v7-
- down: MIPS32
M-class M5150
ARM 1176 (v6) -
Tensilica Xtensa
Cortex-M3 (v7-
LX106 32-bit 2x
down: AVR 8bit
- down: ARM
(v8-A) 64bit 4x
- down: ARM
LX106 32-bit
(v7a) 8x core
(v7) 4core
M4K 32bit
(v7E-M)
(v7E-M)
32bit
core
M)
core
System
RISC
M)
334
7th International Conference on Information Society and Technology ICIST 2017
Figure 6 shows how introducing DP FPU unit in ARM B. Testing of GPIO port performances
Cortex M7 microcontrollers has improved performances One of the microcontrollers’ key features is that they
in comparison to SP FPU performance by nearly 5.9 are able to directly monitor and control physical processes
times. thanks to their architecture which includes built-in analog-
On the other hand, presented growth tendencies are digital ADC converters, comparators and counters, direct
only a roughly set framework, since data on time of input ports on one and digital and digital-analog DAC
certain microcontroller series’ occurrence are not inserted. outputs on the other hand. Almost all MCU architectures
It must be borne in mind that this is a rough comparison of (ESP32 MCU architectures in Figure 4) have this set of
models dominant on market, which necessarily differ in structural features in their smaller or bigger scale, but their
clock speed, RAM memory capacity and other features. level of performances is drastically different and easily
Onwards, the following shall be presented successively: becomes a limiting factor for certain demanding
AVR 8bit from 1996. Atmel SAM ARM-Cx.-M3 from applications. This analysis gives priority to speed at which
2005, PIC32MX from 2007, ST ARM Cx.-M3 from 2007. the software applications can access and process general
These are followed by a new generation with ESP8266- input/output ports – GPIO.
2014, ESP32-2016/2017, PIC32MZ-2012-2014, and ST When it comes to Bit-banging (I/O toggle rate) test, the
ARM Cx.-M7 from 2015 and 2016. The time difference speed at which a MCU architecture, using a software
between emergence of AVR 8-bit and ARM Cortex M7 code, can control the internal bus, as well as set and erase
32bit microcontrolers is 20 years. It can be noticed that the digital states from a certain port, is crucially significant.
growth of int32 performances grows by one order of This information is different from hardware-solved
magnitude, i.e. 7 to 10 times per decade, in average. The SPI/I2C protocols and maximum work speed of a port
growth of performances between certain generations frequently cited by manufacturers. It represents the really
within the same architecture like ARMv7 – ARM Cortex achievable performances of software-hardware
M3 and M7 is between 7 and 10 times, whereas somewhat architecture. A simplified scheme of the test procedure
similar example shows two generations of PIC32MX and can be seen in Figure 8, implying the use of electric device
PIC32MZ MIPS-based MCUs and slightly smaller for measuring the resulting clock rate and the look of the
difference – which in the tested case proved to be about 6
times. However, the fact that the time span between
PIC32MX and MZ series’ first market appearance was
less than five years [9] should be taken into consideration.
This growth ratio is even more striking on the example
of difference between ARM Cortex M4 MCU
performances (which we were unable to test) with
1.25MIPS/MHz and M7 which has achieved over
3.4MIPS/MHz in our test, where the time span between
market appearance of first models M4 and M7 was shorter
than 4 years. There is a proof that the performance
development and improvement does not stop in the
example of an accelerated ’H7’ subversion of M7 series,
which will bring upon an increase, more precisely
doubling of RAM and L1 cache, which will almost
certainly double the performances compared to the current
state. In this way MCU will achieve performances of Figure 7. Bit-banging MCU GPIO port test setup.
Cortex R5 and A7 CPU series, at least in the area of int32
operations with over 1.5GIPS and about 50MFLOPS. curve.
335
7th International Conference on Information Society and Technology ICIST 2017
In both of these tests, a multiple increase of transferring only a synthetized small subset of data that is
performances in new generation of MCUs is clearly of direct interest for particular Big Data analytics.
noticeable, compared to the previous generations.
ACKNOWLEDGMENT
C. Identified trends
This work was fully financed and supported by authors
After everything mentioned above, an increase in the themselves without any institutional help or support. The
speed of new MCU generation is visible in the area of author would like to thank the commercial companies:
analog-digital conversion of input electric units, as well as Insel elektronika d.o.o. Novi Sad, Serbia for supporting
almost mandatory existance of adequate high-speed DAC development with rare electronic components, and
output modules. This largely freens the MCU solutions Premier Farnell UK Limited, UK for their help in
from the need for specialised ADC and DAC circuits, procurement of new MCU/SoC devices.
enabling a direct reading of sensory input electric units in
this way, as well as controlling analog actuators next to
digital and pulse. REFERENCES
A faster bus, peripheral ports and hardware built-in
serial SPI/I2C and similar protocols are an inevitable
consequence of increase in internal processing [1] Intel co., "Solution Blueprint Internet of Things (IoT) Improving
Manufacturing Processes, Customer Experiences, and Energy
performances and sensory environment in which these Efficiency with the Internet of Things (IoT)", IOT Solution World
MCU systems need to work. Congres 2016, Barcelona, Spain, 2016.
[2] J. Ivković, Dissertation: "The Methods and Procedures for
Accelerating Operations and Queries in Large Database Systems
V. CONCLUSION
and Data Warehouse (Big Data Systems)", Zrenjanin: University of
The results of the measurements carried out show that Novi Sad, Serbia, 2016, p. 211.
the latest generation of high-performance MCU platforms [3] J. Ivković, A. Veljović, B. Radulović and L. Paunović, "4.0
themselves or as a part of IoT sensor devices have INDUSTRY – SoC/IoT SYSTEMS AND INTERNET
computing capacity to pre-process complex data locally INTELIGENT DEVICES”, in ITOP2016, Faculty of technical
and to unload the network resources necessary for science Čačak, Čačak, Serbia, 2016.
transmitting large quantities of information for Big Data [4] Irena J. Petrick, ""Getting to Industry 4.0 with the Internet of Things",
processing. Vol.1 Winter 2016, Intel co.," Intel co., 2016.
If we consider that nature of MCU’s work can be [5] ARM Ltd, "Internet of Things," ARM Ltd. UK, 2017.. [Online].
considered as pure, somehow parallelized and independent Available: http://www.arm.com/markets/internet-of-things.
[Accessed 01 03 2017].
of serialized network exchanges it then has a great
potential for Massive parallelized heterogeneous [6] Espressif Systems Inc., China, "ESP32 Datasheet v1.1," 01 03 2017.
[Online]. Available:
computing in Cyber Physical Systems. https://espressif.com/sites/default/files/documentation/esp32_datashe
The latest generation of MCUs shows promising et_en.pdf. [Accessed 01 03 2017].
increase in performance over previous generations, rate
[7] J. Ivković, DHRYSTONE Benchmark C, Version 2.1 for ARM
and nature of this increase is exponential and beyond rate Cortex-M7 MCU in C++ ( ST Nucleo-144 Stm32F746 and
proposed by Gordon Moore [10] , but at the same time Stm32F767 ), London, UK: ARM Ltd. The ARM mbed IoT Device
new MCUs preserve both low consumption and great Platform (developer.mbed.org), 2017.
energy efficiency with inherited robustness and reliability [8] J. Ivković, Linpack port to ARM MCU from Arduino IDE and
by using older and more mature production technologies. roylongbottom.org ARM based source, London, UK: ARM Ltd.
That trend can only accelerate in near future driven by The ARM mbed IoT Device Platform (developer.mbed.org), 2016.
increasing demand for more both powerful and low-cost [9] Microchip Technology Inc.,USA, "PIC32MZ-family:
MCU/SoC devices. PIC32MZ2048EFG144 Datasheet," Microchip Technology Inc, 11
New generation of MCUs can take and perform the role 07 2016. [Online]. Available:
of small on-premises or web servers, that process local http://www.microchip.com/wwwproducts/en/PIC32MZ2048EFG14
sensitive private information (of local importance) while 4. [Accessed 20 02 2017].
relieving network and Big Data analytics of need to [10] G. Moore, "Cramming more components onto integrated circuits,"
transfer and store great amounts of raw data. In that way, Electronics Magazine, pp. 114-117, 19 April 1965.
security and confidentiality of locally collected sensitive
raw information will be increased, at the same time
336
7th International Conference on Information Society and Technology ICIST 2017
Abstract—Internet is gradually developing from network or Comet were developed [1]. However, all of these
of computers into a vast network that connects a large technologies have failed to present the adequate solution
number of small intelligent things (Internet of Things). One to the problem of real time communication because they
of the main problems we face with them is that they possess required the installation of additional browsers add-ins
very limited resources. So, they are not able to perform
some complex tasks in the form of interactive
(plug-ins), which from the viewpoint of the server was
communication with complex networks and presentation of difficult to achieve. It was not until the arrival of HTML
these data in real time mode. On the other hand, we have a 5 and the launch of a new protocol, called WebSocket,
strongly growing technology, the Internet, which has the that these problems were mitigated, and full bidirectional
possibility to distribute an enormous amount of data to real time connection between client and the server could
anyone, anytime and anywhere (3A) without any problems. be established[2].
The ability to create a network of small, almost invisible The rapid developments of Web technologies have
sensor nodes, which will continuously collect information
from the real world and then this information distributed to enabled instantaneous collection and presentation of the
a large number of clients around the world, apply this large amounts of data in real-time, from different
technology unimagined. This paper explores the possibility geographical locations. Cluster computing, Grid
of implementation of a new technology, the WebSocket computing and Cloud computing enable this enormous
protocol, which should reduce the communication traffic amount of data to be collected, processed and presented
between the source and destination, enabling real-time to the clients. All this data is provided by a large number
detection of changes, ie. reduce the delay in the transfer of of distributed processing sensors such as Wireless Sensor
data and thus enable longer life such applications. A critical Networks (WSN). WSN represent a distributed computer
analysis of the application WebSocket protocol from the network consisting of a large number of autonomous
viewpoint of exchanged packets as well as the size of headers
in these packages, compared to some other solutions ie. wirelesses Sensor Nodes (SN), which can independently
protocols for real time communication is done. form a network infrastructure through which they collect
process and exchange data. They are arbitrarily
I. INTRODUCTION distributed over a wide geographical area, and generally
have self-supply in the form of small batteries. As the
Internet has become an integral part of our lives. There dimensions of one SN, and thus the battery capacity, is
is no one part of the human everyday activity that is not very small, its life expectancy would be limited to only a
in some way related or dependent of the Internet. few days if all its parts are working continuously. We
Consequently, the requirements and challenges placed should note that changing the batteries in most cases is
before this network grow every day, from plain practically unfeasible, so it follows that the efficient
information age of WEB 1.0 to the modern Web 2.0 consumption of electricity is a decisive factor that
multimedia systems which mainly require work in real influences longer life expectancy of a SN and thus the life
time as well as full bi-directional connection between of the application that runs on the WSN [3]. Further
users. The most of the communication between Internet integration of WSN would be connecting it to the
users today takes place via the standard HTTP protocol network of networks, Internet. The ability to create a
that is being used on top of the TCP and IP protocols. All network of small, almost invisible SN's, which will
communication is based on standard client-server model, continuously collect information from the real world and
where the client (active component), sends a request to forward it to the Web server, enables unimagined
the server (passive component) which responds. It is clear application of this technology. It is no wonder that the
that this model of communication cannot fulfill the opinion of many scientists is that this technology will
classic real time communication requirement in which completely change our life and enable it to be more
server needs to send information about some change at comfortable and better in the future [4].
the time this change happens. Additionally, all
communication protocols are very demanding in terms of In Chapter 2 the basic characteristics of the exchange of
the number of useful (payload data) to be sent, as we also packets between SN's in WSN is shown. Chapter 3
send a large amount of control data (large overhead). As a explains the three basic techniques that are currently used
consequence, the demand for high-bandwidth networks in WEB communications to exchange data in real time
has appeared, as well as the occurrence of delays within mode. Chapter 4 gives us a detailed description of a new
these networks in case of sending the large amounts of technique that promises to fully enable a large number of
data. In recent years, many technologies that should small machines to collect and transmit data in real time
satisfy the real time communication such as Flash, Ajax without the need for large hardware and software
resources. In chapter 5, the cost-benefit analysis of
337
7th International Conference on Information Society and Technology ICIST 2017
WebSocket protocol application in WSN is given, and happen, and react only to some environment change or
compared to the standard HTTP techniques. Chapter 6 call from the parent sensor node. All these changes can be
concludes this paper. accommodated in only a few packages that are sent to the
master sensor node. It is therefore necessary that the
II. CHARACTERISTICS OF PACKETS ECHANGE INTO initial transmission upon connection is as short and as
WSN simple as possible.
The transfer of packages in WSN significantly differs d) A small number of resented packets – is a very
from the packet transmission in the standard wired and important characteristic which must be satisfied in order
wireless networks. The limited resources of these to avoid unnecessary use of electricity. If there is a
networks are one of the main problems that makes the necessity to resend the package an effort should be made
standard protocols of TCP / IP set not directly applicable. to ensure that the minimum number of sensor nodes is
As an example, we can note the fact that the energy included in this process. In this sense, it is certain that
consumed for the transfer of only 1 bit of information via HBH transfer has advantages over ETE, as only two
the communication channel, equals to the energy sensor nodes are involved in the retransmission of the
consumed to execute more than 1000 CPU instructions package. The disadvantage is that it can‟t guarantee us
[5]. Consequently, if we can reduce our data transition reliable transmission of the whole multi-part message
through the channel for only 1-2%, we will be able to save from the transmitter to the receiver, which in turn ETE
considerably more energy than in any of the other fully guarantees.
activities of the SCT. Taking this fact into consideration, it e) A little header in regard to payload data – in most
is obvious that the greatest savings in power consumption apps, a communication inside BSM is conducted by HBH
can be achieved by using energy-efficient communication mechanism. Each sensor node beside the basic function,
protocol. However, there are some other specifics that the detection of some event, needs to satisfy the function
affect the data transfer and which we must consider when of the router in diverting a large number of packages
we are choosing the protocol that will be used in the passing through it. Since those packages usually contain a
WSN. These specifics are primarily related to the: very small quantity of payload data, it if important to
a) A mechanism for traffic density control - at WSN reduce those packages‟ header to minimum, so the entire
the highest traffic density is happening around the sink size of the package would be reduced too [6].
and the surrounding sensor nodes, as this is the place that f) All the sensor nodes ought to have equal treatment
collects all the data coming from the sensor nodes in in communication – since it is a network with a large
WSN-a (upstream). To effectively solve this problem, it number of sensors which monitor some region, there is
is necessary to develop three mechanisms, the one for always a possibility for some of the sensor nodes to be
effective detection of congestion, the one for congestion neglected in communication. Because of that, the bits of
avoidance and one for the control of traffic density in the information which reach the sink can be wrong because
WSN-u. All three mechanisms base their decisions on they are incomplete, so they can lead to wrong decisions.
controlling the buffer size in sensor nodes, and on the That‟s why all the sensors in BSM must be equally
control of the frequency/speed of sending the data (ACC- treated, so the timely and realistic information of the
Active Congestion Control). changes in the monitored region could be provided.
b) A mechanism for secure package transfer control g) Collaboration with the neighboring nodes – the
– the secure transmission of the data at WSN in most existence of mutual communication between neighboring
cases can have a different meaning to that of traditional nodes i.e. the network layer is preferred. If that
wired connections where it is necessary to guarantee the communication exists, the routing protocol can inform
correct transmission for each packet. For some the transport protocol from the lower layer about the
applications of WSN it is sufficient to receive only one problems which appeared in the communication (for
correct package from a sensor node for some regions, as instance, to inform that the loss of packages appear due to
most of them will send the same information. This gives the route failure_ and not due to increased traffic) [6].
us a greater freedom in the development of transport Observing all exposed specifics, it would be ideal for
protocols for WSN. However, in some applications, this all of them to be fulfilled when applying the protocol in
is not the case. When we have a transmission of system BSM. Nevertheless, it‟s just an ideal case which is hardly
data, the changes or modifications to the sensor nodes feasible, so we will define three basic goals which must be
program or the transfer of multi package data (more fulfilled, and they are: A little header in regard to payload
packages of the same message) we have to ensure data, reduced electrical energy consumption of each
complete secure transfer. To solve this problem, it is sensor node, and easy implementation, i.e. the minimal
better to use the hop-to-hop (HBH) mechanism then resources needed for the protocol to be realized.
traditional end-to-end (ETE) mechanism because they
reduce the traffic density and therefore the savings in III. TECHNIQUES FOR ACHIEVING REAL TIME
CONECTION
electricity consumption are higher. In addition, ACK and
NACK mechanisms or confirmation and negation of One of the most important factors in real time
receipt of valid messages are at our disposal in this case. communications is the delay (latency). This factor is even
Their combination can be an effective way to signal the more evident in the multihop topologies that are the
loss of individual packages or the correctness of their primary topology in which WSN work. The main role of
receipt. the middle class (gateway or proxy server) in these
c) A simplicity of the initial transfer - most topologies is to enable the exchange of information
applications in WSN are considered reactive where between different services and sensor nodes. Standard
sensor nodes are passive actors of some event. They protocol that is implemented at the application level for
simply observe the environment expecting an event to the exchange of information on the Internet, HTTP
338
7th International Conference on Information Society and Technology ICIST 2017
protocol, because of the number of packages that In order to provide constant communication between
exchange when establishing and tearing down a the client and the server suggested a new technique called
connection and the size of these packages (large HTTP long pooling (Fig. 2). The working principle of
overhead), is not suitable for use in the WSN. From one
side it‟s using would cause a very intensive Client Server
communication between SNs, so their lifetime will be HTTP long pooling
very short, and on the other side this brought big delays in
the delivery package. All of this would completely
Requirements for data
eliminate this protocol, and it could not be used in real
time applications in WSN. Therefore, it was necessary to
make some changes within the HTTP protocol. All of
these modifications aimed at simplifying and eliminating Requested data
its previously exhibited poor performance. Thus, it was Requirements for data
developed several techniques that are in their own way
brought something new to this field, but they are all
partially solved these problems. Some of the most
advanced techniques are HTTP pooling, HTTP long Requested data
polling and HTTP streaming.
HTTP pooling (Fig. 1) is one of the earliest and Figure 2. HTTP long pooling
Client Server this technique consists in the fact that the client
establishes a connection to the server and keeps that
HTTP pooling connection open continuously until it reaches the
detection of new data or until after a certain time that is
predefined. Upon receipt of new information or
R equirements for data expiration of the specified time, the connection is open
No data in response again immediately ie. the client sends a new request for
R equirements for data new data again and waits a defined period or until there is
No data in response
a new change detection. In this way avoided the bad
characteristics of the prior protocol: delay in the detection
R equirements for data of the event. The basic problem of this technique was that
No data in response the server is not the best able to meet the increased
R equirements for data
number of simultaneous demands of different clients.
Indeed, if there were more customers who make a
Requested data
simultaneous HTTP requests to one server, it had to
memorize all these requests. This is significantly
Figure 1. HTTP pooling burdened both his memory and its processing capacity.
Globally, the problem of a large overhead compared to a
simplest ways that are applied on the Web, to enable real rounded communication here has been partially solved,
time communication without making any special because the number of packets that are exchanged in this
demands, both on the client and server side. The basic communication decreased. However, the problem still
idea of this technique lies in the fact that the customer remained at the level of individual packages [8].
performs periodically polling the server via standard Another attempt to provide real time communication
HTTP port 80. Regardless of whether there is new data or was the technique HTTP streaming (Fig. 3). It is the best
not, server always responds to requests received. The solution from protocols that use the HTTP protocol
great advantages of this solution are that its completely. However it is still a one-way communication
implementation is very simple and it does not require the (half-duplex). Here the customer is instructed only one
fulfillment of certain specific conditions attached.
However, there are serious disadvantages of this solution, Client Server
which is reflected in the inability to predict the moment of
change of state (information) to be sent, but there is some HTTP streaming
delay when sending data because it directly depends on
the moment when the request came, not when the change Open connection
really happened. For this reason, it is possible to come up Requested data
sending empty packets but requires frequent frequency,
which can reduce the delay in the detection of change, can
Requested data
produce much stronger traffic, and therefore the higher
server load and greater consumption. This technique also Requested data
did not give the solution for increase overhead (packet
size is large in relation to the payload data being sent), Requested data
which is characteristic of the HTTP protocol. In general,
Close connection
we can say that HTTP pooling gives good results when
monitoring synchronous events in which the period of
change is known in advance, bat with asynchronous event Figure 3. HTTP streaming
gives very poor results [7].
339
7th International Conference on Information Society and Technology ICIST 2017
request to the server, and the request is continuously in Web applications for a full two-way communication with
force, ie. the connection was always open. a server that is in fact comply with the minimum
requirements HTTP connection. The basic idea is to
The basic problem here is that the server never closed establish socket connections between clients and servers
the connection, until a complete transfer of data is not using HTTP protocol only at the beginning of the
finished. If it was necessary to serve more clients, then establishment of these connections. After that, changes to
the server must opened a special new connection with the WebSocket protocol that allows independent and
every new client. This meant that if multiple clients simultaneous sending packets in both directions to the
simultaneously sends a request for connection they have previously formed socket connection. This means that
to be buffered until the previous client-server request both the client and server can independently initiate the
does not relieve ie. previous close connections to the new transfer of packets when they have a need for it. This has
opened. This buffering packets, which require connection avoided delays in the delivery of the detected changes, it
to the server, it was inter-network devices such as proxy allows for complete full-duplex connections, and creating
servers or firewalls. This is further hampered their work the conditions for enabling the classic real-time
because they have to keep the message until it opens a communication between the client and the server.
new connection. This is a serious problem because some Therefore this model does not belong to the traditional
customers heard from a lot of delay in communication, model of the request-response, which was in force in the
which led to a drop in performance real-time applications. classical HTTP protocol because the client and server, in
This technique proved to be a good solution for this case, act as clients because they can independently
applications where it is necessary to quickly and initiate communication. Disconnecting may also perform
constantly changing data that they show (one-way both participants in communication and whenever they
communication), as well as various information displays want it. Unlike the TCP protocol, WebSocket protocol
and track the current location of a moving object. When it belongs to a group of protocols that are based on sending a
comes to some interactive applications with multiple message (message-based). This means that it serves the
users, or when the need for mutual communication, this right framework around each data to be sent, which
technique is facing the same problems as long HTTP simplifies the communication between the client and the
pooling and does not give satisfactory results [7, 8]. server. It is not necessary to send an additional message
IV. WEBSOCKET PROTOCOL indicating that the data transfer is completed. In addition,
the server/client programs, for example, JavaScript, is not
In the previous chapter, we explained a few basic required in the formation of a complex or reading a
techniques that were a prerequisite for the development of message that is sent or received. As in recent years
a large number of solutions for real time applications on WebSocket protocol developed intensively, its frame
the Web, such as Ajax, JSON, Comet, WebRTC and specifications are often changed. In the beginning he was
others. However, all these techniques have not solved the quite simple. Protocol Version HyBi00 this framework
main problem of communication with the WSN: a large limits with two bytes, the initial byte is 00 and the last FF.
number of packages that are required in the establishment However, in order to eliminate the many disadvantages of
and termination of connections and consequently a large such a simple framework conceived, primarily in the
number of bytes transferred in header, ie. higher overhead. security of data transmitted on, it suffered some changes
This is especially true in a very demanding HTTP so that the size of the header, which was initially only two
protocol, which is the basic protocol in almost all the bytes now increased to min. six bytes. In Fig. 5 shows one
proposed solutions. With this in mind, it is clear that none such typical frame of WebSocket protocol which belongs
of them fits as an acceptable solution for use in the WSN. to the version HyBi10 [9, 10]. It consists of the following
How the need to connect WSN with Web steadily fields:
increased, it was necessary to develop some new
solutions. One of the techniques that emerged in recent 1. FIN(1b) - indicates the last frame in a single message,
years and that offers satisfactory solutions to the problems 2. RSV1, RSV2, RSV3 (1 bit each) - bits reserved for
that are mention in Subsection 2 is a WebSocket protocol future protocol functions,
3. Opcode (4 bits) - defines the type of frame that is sent,
Client Server 4. Mask (1 bit) - indicates when the frame is protected,
5. Payload_len (7 bits, 7 + 7 + 16 bits or 64 bits) - useful
WEB socket
length (payload) data,
6. Masking-key (32 bits) - values that protect payload
data (XOR the payload data)
Conventional HTTP handshake 7. Payload data - the size of this field depends on the
specifications given in the field payload_len,
Both sides send packets Observing the structure of WebSocket framework, we
note that he is quite simplified and that the overhead when
it significantly reduced to a minimum. The minimum size
RSV1
RSV1
RSV1
Mask
Client or Server
FIN
Opcode payload_len
closes the connection
masking_key
Figure 4. WebSocket communication
masking_key
WebSocket protocol enables full two-way, full duplex payload_data
connection, which is based on a client server technology
(Fig. 4). It has been developed to allow the mechanism Figure 5. WebSocket packet
340
7th International Conference on Information Society and Technology ICIST 2017
OF PROTOCOLS
30000000
To test the performance summarized above techniques
20000000
which provide us with the possibility of connection with
the SN's in WSN we analyzed the number of packets sent 10000000
and the amount of bytes sent in the header for all mention
protocols. We assumed that each SN communicates 0
directly with the gateway device that is connected to the 0 200 400 600 800 1000
Web through standard protocols from TCP/IP set. We Sensor nodes (num)
have also adopted that the average header with the HTTP Figure 7. The quantity of headers (in bytes) depending of number of
protocol 500B, a WebSocket HyBi00 2B, with the sensor nodes
WebSocket HyBi10 6B, defined period of open HTTP
connection with the long-pooling with a 300s, the period In Fig. 7 graphics are displayed quantity of bytes that
of open connections with the HTTP streaming is the same have been sent in the header of each packet as a function
as a testing period and amounted to 3600s and with of the number of SN's for all protocols. From the
changes on sensor within a period of 60 s. The analysis displayed graphics we can draw the following
was conducted for only one jump (hop) when sending conclusions:
packets ie. only to send packages between SN's and the 1. For all protocols as the number of sensor nodes grows
gateway device. It is more than obvious that the increase the number of bytes located in the header grows, ie.
of the number of jumps performed a linear increase in the overhead increases,
number of packets sent, and therefore the benefits
WebSocket protocol even more come to the fore. 2. The protocols of the WebSocket family-a have much
lower overhead compared to protocols that use the HTTP
HTTP pooling protocol, and therefore the consumption of electricity is
HTTP long pool.
120000 Sreaming decreased,
WEB socket 00
WEB socket 10 3. The biggest overhead is present in the HTTP protocol
100000
pooling, as expected,
80000 4. The difference in the size of overhead dramatically
Packets (num)
60000
increases the benefit of WebSocket protocol as the
number of sensor nodes increases.
40000
One of the most important design requirements that
20000 relates to SN operation, deals with energy consumption. In
order to increase the lifetime of SN, sensor node should
0
operate with minimum possible energy consumption.
0 200 400 600 800 1000 Battery, as the main source of power supply in WSN, can
Sensor nodes (num) only store a limited amount of energy. This requires
Figure 6. The number of sending packets for different number of sensor power aware computation/communication component
nodes technology, low-energy signaling and networking, and
power aware software communication. Lifetime refers to
In Fig. 6 it presented graphics which show dependence the period for which SN is capable of sensing and
number of package in function of different number of transmitting sensed data to the base station(s). In WSNs,
SN's for all protocols. Looking at the graphs shown in thousands of nodes are powered with limited battery
Fig. 6 we can conclude the following: power budget. As a result, the lifetime analysis becomes
1. The number of packets in all protocols increases an important aspect for efficient usage of the available
linearly with the number of SN's. energy. In our work, we use the CPU within SN is based
2. Protocols WebSocket HyBi10 and WebSocket on low power microcontroller MSP430 F123 and its
HyBi00 have exactly the same number of packets, communication part on RF modulator CC 2420. MSP430
3. The total number of packets that are sent from family runs into two operating modes, Active with 300 μA at 1
WebSocket protocol and HTTP steaming are slightly MHz and 3 V power supply, and Low Power Mode 3 with
different, power consumption of 0,7 μA. In Transmitting mode the
4. The maximum number of packets is sent for HTTP RF modulator consume 17.4 mA, in Receiving mode 19.7
pooling protocol, mA, and in Sleep mode 1 μA. Data transfer rate is 128
5. By increasing the number of SN's the difference kbps and packet length is 64 byte. In Fig. 8 the
between the total packets sent between the protocols is dependence of power consumption in term of time (hours)
increasing. for all four protocols is presented. As we can see form
In general we can say that WebSocket protocol is graphics on Fig.8 WebSocket protocol gives the best
always at an advantage compared to protocols that use results. This means that sensors nodes which using this
standard HTTP protocol and that only HTTP streaming protocol have the longest lifetime expectancy, ie. the
protocol can roughly be compared to the WebSocket WSN applications which use WebSocket protocol will
protocol.
341
7th International Conference on Information Society and Technology ICIST 2017
almost double the longer lifetime of the WSN applications these two opposites so that it can support a two-way
which use HTTP pooling protocol. communication with the Web with very limited resources
of each SN. On the one side it is fully compatible with the
HTTP protocol, and on the other, it has adapted to the
7000
HTTP pooling limited resources of the SN. In addition to enabling a fully
6000
HTTP long pooling
Streaming
independent bi-directional communication, reducing the
WebSocket number of packets in the communication and reducing the
Power consumption (mA)
5000
overhead in most of exchanged packets, it also reduced
4000
the delay in providing information. These facts enabled
3000 the real time mode applications in WSN. By reducing the
2000 intensity of traffic, the power consumption of each SNs is
1000
also reduced, which enables a significantly longer life of
the applications in WSN. As the WebSocket protocol is
0
still very young protocol, further research in this area as
-1000
-100 0 100 200 300 400 500 600 700 800
well as future solutions could qualify it as a preferred
Time (hours) solution for most real time applications in WSN.
Figure 8. The power consumption REFERENCES
Power saving is crucial issue in battery powered SNs [1]
Shruti M. Rakhunde, Real Time Data Communication over Full
which affecting to lifetime of application in WSN. As we Duplex Network Using WebSocket, IOSR Journal of Computer
know that sending and receiving packets are one of the SNience pp.15-19, http://iosrjournals.org/iosr-jce/papers/ICAET-
largest consumers of energy, it is obvious that if we want 2014/volume-5/3.pdf , acc. 30.01.2017
to achieve the longest life expectancy in the SN's network, [2] Pimentel, V.; Nickerson, B.G.,"Communicating and Displaying R
and therefore applications, it is necessary that the eal-Time Data withWebSocket,",Internet Computing, IEEE, vol.1
6, no.4, pp.45,53, July-Aug. 2012
communication must be reduced to a minimum. From the
analyzes that have been done in this paper and graphics [3] I.F.Akyildiz, T.Melodia, K.R.Chowdhury,„‟A survey on wireless
multimedia sensor networks‟‟, Computer Network (2006),
shown in Fig. 6, Fig. 7 and Fig. 8 can be concluded that doi:10.1016/j.comnet. 2006.10.002
the least number of packages and the lowest overhead
[4] J. Gubbia, R.Buyyab, S.Marusic, M.Palaniswami, “Internet of
exactly we have with WebSocket protocol. Bearing in Things (IoT): A vision, architectural elements, and future
mind the characteristics of packet transmission in the directions”, Future Generation Computer Systems 29, 2013,
WSN (see Subsection 2) it is not difficult to conclude that pp.1645–1660
the WebSocket protocol is very good solution for the [5] P.Dutta, D.Culler, S.Shenker, ``Procrastination Might Lead to a
implementation to the SN's in the context of a WSN real Longer and More Useful Life``,
time applications. http://citeseerx.ist.psu.edu/viewdoc/
download?doi=10.1.1.93.5780 acc. 14.12.2015
VI. CONCLUSION [6] M.Kosanović, ”Primena standarnih transportnih protokola u
In recent years, there is much controversy surrounding bežičnim senzorskim mrežama” , YU INFO 2009, Kopaonik 2009
the choosing of appropriate technology that will be able to [7] Upgrading HTTP to WebSocket, acc. 25.01.2017
display a large amount of data collected from various http://enterpriseWebbook.com/ch8_WebSockets.html
[8] R. Narayana Swamy, Dr. G.Mahadevan, Event Driven
sources for a large number of clients in different locations, Architecture using HTML5 Web Sockets for Wireless Sensor
all in real time. As the Internet became the main medium Networks,http://isems.org/images/extraimages/3.pdf,ac.20.12.2016
for presenting these type of data by using protocols from [9] What has been changed since WebSocket HyBi 00?
TCP/IP set, it is logical that we maintain the usage of https://developers.google.com/Web/updates/2011/08/What-s-
these protocols in order to achieve compatibility with a different-in-the-new-WebSocket-protocol?hl=en, acc. 20.12.2016
wide range of devices that support them. On the other [10] The Web protocol draft, May 2010, acc. 22.01.2017
hand, the primary objective pursued when developing https://tools.ietf.org/html/draft-hixie-theWebSocketprotocol-76
applications for WSN is the efficient energy consumption [11] Jasdeep Jaitla, WebSockets vs Rest: Understanding the difference,
and the usage of the less complex protocols because of the https://www.pubnub.com/blog/2015-01-05-WebSockets-vs-rest-
api-understanding-the-difference/, acc. 25.01.2017
limited resources of the SN. WebSocket protocol that is
analyzed in this paper is designed precisely to reconcile
342
7th International Conference on Information Society and Technology ICIST 2017
Abstract— Internet of Things as recently emerging and fast - The technology could not be introduced in various
growing technology is spreading in almost all areas of situations what impoverish capacity for validating
human activities. This cause some arising questions which technology by experience and
shall be discussed from different perspectives in order to - Poor experience base decelerates development of
evaluate its role, importance and influence in future technology.
development of humanity as well as technology and
economic. Utilization of IoT in some specific areas of human Consider this scenario broadly a paradox appears: rapid
activities could be related to different challenges especially development of technology could become the factor which
in situations where information obtained by IoT are slows development.
burdened by errors or in case of system cancellation. This These consequences are valid in case of pure market
could affect decision making and result in wrong decision condition and not include subventions of government.
which could lead to different detriments in case of time Figure 1 illustrates potential scenario of mutual relations
deficiency and emergency. This paper aims to dissert IoT between technology development speed and sale rate.
technology from aspects of complex system theory and in
some hypothetical scenarios.
I. INTRODUCTION
Internet of things (IoT), as a rapidly growing
technology, affects almost every domain of human
activities [1]. According to [2] there are more objects
connected to the internet than there are people in the
world. As such it deserves broad analysis and being far of
its maturity it needs to be analyzed from different points
of view including perspective of challenges and Figure 1. Potential scenario of mutual relations between technology
opportunities [3]. development and sale rate
As new technological possibility, internet of things Described scenario could be called “pessimistic” and
cannot be treated as technology which was experienced could lead to abandonment of technology in spite of
enough for its overall and ultimate valuation. efforts and its theoretical possibilities to improve human
Consequently it shall be analyzed and described in terms life.
of possible scenarios.
Another scenario of technology development is based
IoT is also rapidly developing technology [4]. on level of accordance between technology and investors’
Possibility for rapid development of any technology has at needs. Talking about customers’ needs it is only possible
least two consequences: to graduate accordance level between the products
- It implicitly admits that existing technologies are characteristics and customers’ perception of accordance
burdened by imperfections and with his need.
- Existing technology, after a while, will obsolete. Also customer cares about his financial possibility to
Those consequences could repudiate potential investors. buy and introduce product into his system. Modeling this
Really practical question, which every investor asks situation it is necessary to introduce basic economic
himself before investment, is: “why shall I investigate principle that any investment is acceptable if gains are
today in technology which tomorrow will be improved bigger than invested amount of money. In this case, if the
and lower priced, if I do not have to?” market is large enough, technology could be rapidly
On the other side, if level of sale of certain technology developed. This case is illustrated by figure 2.
does not cover all costs including research and Considering objects whose safety is main goal (as
development as well as investment in new production dams, bridges, mines etc.) cause that economic principle
capacities, technology cannot develop as fast as it could. which says that any investment is acceptable if provides
Lower level of sale has, again, at least two possible greater return, shall be replaced by principle that any
consequences: investment less than assessed consequences of damage
343
7th International Conference on Information Society and Technology ICIST 2017
which could be avoided at reasonable probability are - communication point (denoted by capital letter
acceptable. “C”) means that it is equipped with device for
communication between other devices;
- processing point (denoted by capital letter “P”)
means that it is equipped with devices for
processing data;
- decision point (denoted by capital letter “D”)
means that it is equipped with devices and/or
software for decision making based on available
data and
- intelligence point (denoted by capital letter “J”)
means that decisions and data are checked before
they will be transferred to the outer world.
Described models structure and hierarchy is illustrated
by simplified model and given on figure 3.
Figure 2. Relation of technology development rate and its meeting of
investors’ needs
A. Figures and Tables As depicted, information point is situated into the object
Internet of Thing (IoT) or things connected via internet which is observed and intelligence units finally decide
[5] could be modeled on the high level of abstraction. In which kind of form of information to end user shall be
order to analyze some possibilities of IoT we shall delivered or which action shall be taken. Of course, one
describe the model on which the following explication object could be equipped (discretized, approximated) by
will based. multiple information points which could gather different
kind of information, as shown on figure 4.
IoT is, before all, materialized by devices equipped by
different abilities. Model of IoT could be formally
described as follows:
- IoT devices discretize certain space or object i.e.
IoT sensors could gather information in discrete
points;
- Information are gathered in discrete points of time;
- One sensor could gather only one kind of
information;
- To form IoT system at least two devices are
necessary;
- IoT devices must, at least, be able to gather certain Figure 4. Simplified model and hierarchy of IoT in case of multiple
kind on information; information points
- Devices which IoT is consisted of must be able to
Let define the position of information points by unique
exchange information between each other and
coordinates in space as:
- connection between devices shall be wireless.
𝐼𝑖 (𝑥𝑖 , 𝑦𝑖 , 𝑧𝑖 ) … (1)
The consequence of above listed assumptions is
complex system of mutually connected devices which where Ii is ith information point and 𝑖 = 1,2, … , 𝑛.
communicate to each other. But only gathering Once defined position of information points allows
information and communication between devices is not determination of distance and directions between each
enough to complete IoT system. Structure of IoT could be other as:
divided into elements and descripted as follows: 2 2 2
- information point (denoted by capital letter “I”). 𝑙𝑖,𝑗 = √(𝑥𝑗 − 𝑥𝑖 ) + (𝑦𝑗 − 𝑦𝑖 ) + (𝑧𝑗 − 𝑧𝑖 ) … (2)
Information point is the point in which information
are gathering; 𝑥𝑗 − 𝑥𝑖
cos 𝛼𝑖,𝑗 = … (3)
𝑙𝑖,𝑗
344
7th International Conference on Information Society and Technology ICIST 2017
345
7th International Conference on Information Society and Technology ICIST 2017
Possibility 1: Real risk “High” and perceived risk is Possibility 3: Investors’ capacities are “Low” and
“High”. IoT is real necessary and its introducing for dam results obtained by IoT need “Low” expert knowledge.
monitoring is justified. There is a need for IoT and level of Risk of wrong explication exists but is small, because
investment must be measured by potential damage. knowledge of low level expertise could be obtained by
Possibility 2: Real risk “Low” and perceived risk is trainings.
“High”. IoT is not real necessary and its introducing for Possibility 4: Investors’ capacities are “High” and
dam monitoring is justified. There is no need for IoT but results obtained by IoT need “Low” expert knowledge.
decision of its introduce is made. It could be stated that Risk of wrong explication is small.
investment was unjustified, but it could be compensated This chain of analysis shall and must be considered
by obtained experience and knowledge. from every perspective which contains risk for misuse or
Possibility 3: Real risk “High” and perceived risk is wrong explication of results.
“Low”. IoT is not real justified at all. Described possibilities of discontinuity of results in
Possibility 4: Real risk “High” and perceived risk is space and time must be analyzed from the aspects of risk
“Low”. IoT’s introducing was omitted despite it was and their possible influence on conclusions about dam
necessary. safety.
It is obvious according to descripted model that IoT will Accuracy of data including reliability analysis also must
be introduced only is perceived risk is high independently be an indivisible part of IoT analysis. Checking accuracy
of the level of real risk. Following this idea it shall be of sensors could be resource demanding in engineering
considered further two steps: sense but also including time and finance. Question also
- Level of accordance of investors needs and IoT could go about recording data in certain information point.
capacities and
- Abilities of investor to properly explicate results
obtained by IoT system.
First step is to analyze investment in coordinates of
investors’ need and capacity of IoT.
Figure 7. Accordance of IoT with investors’ needs Described chain of decision making is illustrated on
figure 9.
Again exits four possibilities which will be explicated
as follows.
Possibility 1: Investors’ needs are “High” and capacities
of IoT are “High”. Investment in IoT is justified.
Possibility 2: Investors’ needs are “Low” and capacities
of IoT are “High”. Investments in IoT are not necessary.
Possibility 3: Investors’ needs are “Low” and capacities
of IoT are “Low”. Investments in IoT are not necessary.
Possibility 4: Investors’ needs are “High” and capacities
of IoT are “Low”. Investment in IoT is not justified.
Finally, but may be of crucial importance, assessment
of investors capability to provide proper explications of
obtained results will be analyzed. For this analysis the
same method is performed and illustrated on figure 8.
Figure 9. Chain of decision making for IoT introduction in dam safety
Possibility 1: Investors’ capacities are “High” and management
results obtained by IoT need “High” expert knowledge.
Risk of wrong explication exists but must be accepted. Bearing in mind that described model of IoT is
Possibility 2: Investors’ capacities are “High” and maximally simplified it could be stated that its complexity
results obtained by IoT need “Low” expert knowledge. in reality is only be much greater. Consequently it could
Risk of wrong explication is small. be considered as a complex system. If assume that
absolutely reliable systems neither exists nor could be
346
7th International Conference on Information Society and Technology ICIST 2017
constructed at acceptably level of price it is possible to [2] Chen, Yen-Kuang. "Challenges and opportunities of internet of
state that IoT shall be carefully considered especially in things." Design Automation Conference (ASP-DAC), 2012 17th
Asia and South Pacific. IEEE, 2012.
application with considerably presence of risks.
[3] Covington, Michael J., and Rush Carskadden. "Threat
IV. CONCLUSION implications of the internet of things." Cyber Conflict (CyCon),
2013 5th International Conference on. IEEE, 2013.
IoT is relatively new and promising technology in [4] Xia, Feng, et al. "Internet of things." International Journal of
various domains. Utilization of IoT could improve various Communication Systems 25.9 (2012): 1101.
aspects of life, engineering and scientific domain. [5] Coetzee, Louis, and Johan Eksteen. "The Internet of Things-
Consequently, because of immanent complexity of promise for the future? An introduction." IST-Africa Conference
Proceedings, 2011. IEEE, 2011.
observed phenomenon as well as complexity of IoT itself
[6] Sun, Enji, Xingkai Zhang, and Zhongxue Li. "The internet of
it is necessary detailed consideration before its things (IOT) and cloud computing (CC) based tailings dam
introduction. monitoring and pre-alarm system in mines." Safety science 50.4
(2012): 811-815.
ACKNOWLEDGMENT [7] Hartford, Desmond ND, and Gregory B. Baecher. Risk and
This paper was created within research carried out on uncertainty in dam safety. Thomas Telford, 2004.
projects of the Ministry of Education, Science and [8] Martać, Rastko, et al. "Using internet of things in monitoring and
Technological Development of Serbia TR35030. management of dams in Serbia." Facta Universitatis, Series:
Electronics and Energetics 29.3 (2015): 419-435.
REFERENCES [9] Milivojević, Nikola, et al. "Information system for dam safety
management." 4th International Conference on Information
[1] Bandyopadhyay, Debasis, and Jaydip Sen. "Internet of things: Society and Technology-ICIST. Vol. 1. 2014.
Applications and challenges in technology and standardization." [10] Sun, Enji, Xingkai Zhang, and Zhongxue Li. "The internet of
Wireless Personal Communications 58.1 (2011): 49-69. things (IOT) and cloud computing (CC) based tailings dam
monitoring and pre-alarm system in mines." Safety science 50.4
(2012):811-815.
347
7th International Conference on Information Society and Technology ICIST 2017
Abstract—Big data is a relatively new approach and fast created in the last two years alone”. What more this
growing area of interest in different domains of economic, amount of data is expected to increase in future.
organizational, bioscience and technical sciences and Consequently resources for their management are not
practice. Market and even different objects and engineering homogenously spread and are scarce. Another issue is the
structures as well as natural phenomena emit huge amounts representativeness of collected data. They represent the
of information. That information is only connection with past (information shall appear before they are registered
outer world and only base for rational decision making. and collected), are erroneous (the accuracy of information
However decision making based on data obtained from differs to zero) and imperfect (it is possible to omit
outer world shall include some assumptions which seem to collecting some crucial information and collect
be very logical but not discussed in literature as they shall information of lower level of importance). These facts
be. Information obtained from outer world represent the may have consequence in process of decision making if
past state-of-the-art of outer world, represent only a part of decisions are based on information which are not
outer world, are erroneous, imperfect and could be treated validated or reliable.
and interpreted in different ways. As such, they, even
though only and necessary base for decision making in
On the other side, well known and basic, economical
enterprises, could be the source of wrong decisions and principle says that any investment makes sense while the
suboptimal results on market. This paper intends to discuss gains are bigger than costs. From aspect of investments
some advantages and challenges of Big Data related to Big Data shall be then analyzed through the amount of
decision making process. invested money related to their contribution to the
companies’ goals. Here also the problem could appear:
I. INTRODUCTION investment in Big Data infrastructure shall be realized
before the contributions could be assessed. Bearing in
Big data becomes an increasing area of scientific mind all elements necessary for Big Data infrastructure
research as well as practical use in companies for decision establishment and its influence on companies’ business,
making. Definitions of Big Data could be, paradoxically, introducing Big Data shall be treated as a strategic
understood more qualitatively and subjective than decision [4]. Big Data contribution to US GDP shows that
quantitatively. In literature [1] Big Data is descripted in it vary depending on economic sector, while position of
next way: “Big Data refers to datasets whose size is the sector is defined by “value potential index” and “ease-of-
beyond the ability of typical database software tools to capture index” [5].
capture, store, manage, and analyze”. Another description
Also some problems in process of data managing were
also in same literature also is general: Big Data is “data
noticed. “When we define data, information and analytics
that becomes large enough that it cannot be processed
as services we see that traditional measurements
using conventional methods”. These definitions are
mechanisms, mainly time and cost driven, do not work
simplifying the nature of big data reducing it on and
well.” [6]. Solution of this problem is proposed by putting
comparing them with the ability of actual technology to
analytics and Big Data in cloud. Big Data in cloud may
process it. From aspect of nowadays market complexity
solve certain class of problems but another problem could
the big data shall be defined in more wide and complex
appear: managers will be out of control over data they use
way. According to literature [2] “Businesses are collecting
for their decision. They are also out of control over quality
more data than they know what to do with” and “to turn
of used data, i.e. they may not possess adequate
all this information into competitive gold, they’ll need
knowledge about accuracy, promptness and relevancy of
new skills and a new management style”. This suggestion
data. They only could believe that data from cloud are
(to acquire new skills and new management style)
reliable and accurate. Another problem is: managers of
highlights the real role and importance of Big Data in
companies they compete to also could use the same data.
business. Actually, Big Data could have big impact on
Bearing in mind before mentioned, logical question arise:
management but it is only its auxiliary toll.
how to gain the competitive advantage in case when
Big Data, because of their complexity and amount, different companies use same data and same tools
needs knowledge and logistic for their appropriate (including possibly same knowledge) for decision making.
management, analysis and usage. According to [3]
Errors of data are mentioned in only a few books and
“Every day, we create 2.5 quintillion bytes of data – so
papers. “Large data sets from internet sources are prone to
much that 90% of the data in the world today has been
348
7th International Conference on Information Society and Technology ICIST 2017
errors and losses, hence unreliable. The sources of the data where:
should be understood to minimize the errors caused while - 𝐶 – choice between two options;
using multiple datasets. The properties and limits of the
dataset should be understood before analysis to avoid or - 𝐾 – knowledge on which base decision is
explain the bias in the interpretation of data made and
(Boyd&Crawford, 2011).” [7]. Especially, usage of Big - 𝐷 – available data.
Data in medicine is sensitive on errors and “users of
medical big data must proceed with caution and recognize
Possible interval of choice could vary depending of
the data’s considerable limitations and shortcomings.
error in knowledge and errors in data. To express total
These include data errors, missing information, and lack
increment of function (1) we will use first derivative:
of standardization, record fragmentation, software
problems, and other flaws.” [8].
Questions about influence of Big Data as a socio 𝜕𝐶 𝜕𝐶
∆𝐶 = ∆𝐾 + ∆𝐷 … (2)
technical phenomenon on whole society focusing different 𝜕𝐾 𝜕𝐷
issues and possible trends also appears and were discussed
[9]. The main questions about Big Data could be concisely
expressed as “Will large-scale search data improve our where ∆K and ∆D represents the error in knowledge
lives or is there a chance for its abuse and could they limit and error in available data respectively.
or reduce human wellbeing?” From (1) and (2) immediately follows that choice shall
Big Data in business is undivided of their analytics and belong to interval (𝐶 − |∆𝐶|, 𝐶 + |∆𝐶|) with the same
97% of companies with revenues exceeding $100 million probability. At the same time let’s introduce the interval of
were found to use some form of business analytics [10], successful decision as S∈(a,b) where a is lower limit
because Big Data could be considered as row material for where success is acceptable and b is maximum possible
production and new source of immense economic and limit of success. Figure 1 illustrates relation between
social value [11]. Efforts to help organizations to interval of success and interval of our choice.
understand the opportunities of information and advanced
analytics [12] as well as developing frameworks for
decision making based on complexity science [13]
highlights the fact that market is complex and could be
approximated with Big Data. But here is necessary to note
that new market element ( Big Data with supporting
infrastructure) do not simplify the market complexity,
what more it is possible to state that market complexity is
increased with including new possibilities of Big Data
utilization. Figure 1. Magnetization as a function of applied field. Note how the
caption is centered in the column
Possibilities, mentioned before, point out advantages,
disadvantages and challenges which shall be discussed in
order to assess the role and importance of big data in Strictly explicated the ideal situation is when the
decision making process in appropriate way. knowledge and available data are perfect i.e. free of errors.
But because of different influences this couldn’t be the
II. METHODOLOGY case in real world. Implicit assumption in model listed
above (1), (2) is its completeness i.e. all relevant data are
Research methodology in this paper is based on the available. This case is also impossible in real world. We
analysis of decision making in market conditions. will consider that solution for incompleteness of available
Decision making is an issue of great (may be of critical) data is searched in Big Data concept. But this approach
importance for companies’ success or falls. Decision only increases the overall complexity. Actually, real world
making based on multiple criteria [14] or behavioural base produces some data on which base is established certain
[15] are deeply investigated in past while contemporary knowledge. New knowledge (and actions based on it)
literature about decision making include data-driven creates new real world, which creates a new amount of
decision making. Problems with too much information data and finally new data creates new knowledge. Figure 2
could result with distract rather than inform person [16]. illustrates this situation.
This problem in companies is increasing, in spite of the
fact that they have employed data scientists [17] because
the data increasing rate has become very high.
The decision making for the purpose of this paper will
be considered in its simplest form in order to investigate
the influence of erroneous data on final decision. The
decision making is treated as choice between two
possibilities. Choice is based on available information and
the knowledge of decision maker. Formally it could be
expressed as:
𝐶 = 𝐶(𝐾, 𝐷) … (1)
Figure 2. Mutual influence between real world, data and knowledge
349
7th International Conference on Information Society and Technology ICIST 2017
Let’s assume that C is right choice and that it is not (market) and any changes needs to be known for certain
possible to choose two options at the same time. The level of quality of decision.
figure 3 illustrates that situation. Further analysis shall include more detailed insight in
problem of decision making related to market.
Some assumptions have to be made before
formalization of errors and incompleteness of data could
be performed in process of decision making, as follows:
𝛿𝑖 ~𝑁(0,1) … (7)
350
7th International Conference on Information Society and Technology ICIST 2017
𝜕𝐷
≠0 … (10)
𝜕𝑡
351
7th International Conference on Information Society and Technology ICIST 2017
irreplaceability it could not be abandoned. Preference (b) changed in way they couldn’t follow. Measuring this
was rising till 9th year but it suddenly was abandoned and situation with above proposed model they failed from
replaced by preference (c) which was reached higher place position 𝐶1 to position 𝐶3 .
in customers’ values system. Blackberry, Apple iOS and Android succeed and taken
a big share of the market (position 𝐶1 ) and Microsoft even
though was third on the market still was in zone of 𝐶3
position.
Further investigations shall include analysis about
causes of different success of different companies.
Because evidently, only Big Data cannot explain, the
difference between success or failure (failure, from the
companies’ point of view could be also insufficient
market share) it cannot be considered as a sufficient
condition for success.
Figure 8. Changes in customer preferences of first and second order IV. CONCLUSION
Proposed model highlights the fact that possessing Big
Elementary question aroused from the introduced Data about market and adequate infrastructure are not
breaks in customer preferences is: “Could be breaks in guarantee for success automatically. Quality of data,
preferences change predicted?” Answer depends on the knowledge and prediction of changing world as well as
model of used knowledge. If the model of independency management able to make right decisions are necessary
between knowledge and Big Data is used then it is and sufficient condition for success, while Big Data could
possible to suppose that discontinuities could be included. be only necessary condition with limited importance.
If knowledge is constructed only on the Big Data sets the
breaks could not be predicted for the first time because ACKNOWLEDGMENT
this case is not recorded in past. Of, course, by evolving
model of knowledge based on Big Data in the future it is This paper contains the results from the project
possible to predict that discontinuities could probably TR35030 of the Ministry of Education, Science and
occur. Technological Development RS.
352
7th International Conference on Information Society and Technology ICIST 2017
[16] Donhost, Michael J., and Vincent A. Anfara Jr. "Data-driven [18] Doz, Yves L., and Mikko Kosonen. "Embedding strategic agility:
decision making." Middle School Journal 42.2 (2010): 56-63. A leadership agenda for accelerating business model renewal."
[17] Provost, Foster, and Tom Fawcett. "Data science and its Long range planning 43.2 (2010): 370-382.
relationship to big data and data-driven decision making." Big [19] Butler, Margaret. "Android: Changing the mobile landscape."
Data 1.1 (2013): 51-59. IEEE Pervasive Computing 10.1 (2011): 4-7.
353
7th International Conference on Information Society and Technology ICIST 2017
Abstract— This paper presents a relation model of a attitudes and an influence on them represent a dominant
database with the possibilities of automatic linking with factor in that sense. In that sense, Bonchi (2011) [11]
external information on social networks. Such an approach stresses the importance of identifying potential users with
enables a further search for information by using social the purpose of implementation of marketing activities.
networks until a needed level is achieved. The analysis is Also, research on the use of social networks in some
related to the improvement of companies’ marketing companies indicate the significant potential for improving
activities. customer services [12].
• Finding social network data using a graph mining [4] Analyses can be carried out from different original
• Models for social networks and chat data mining [5] bases, starting from an analysis of capabilities, education
and work experience, then obtaining relevant information
• Mining newsgroups [6]
about persons, companies and so forth. The most
Marketing activities of an enterprise represent one the important reason for which the tables are significant for
most important factors of a social network analysis. the analysis of these data, which is certainly the goal of
Numerous researches are indicative of a significant this paper, is their capability of automatic connecting with
influence of social networks in an enterprise’s marketing social network pages and of obtaining further more
activities [7] - [10]. A possibility of analyzing buyers’
354
7th International Conference on Information Society and Technology ICIST 2017
detailed information on the corresponding social networks connecting with pages on their social networks in the form
themselves. of the ultimate result. Fig. 5 demonstrates the form with
The most important Hyperlinks for connecting with one of the numerous possibilities of analyzing information
them are as follows: in this manner.
• Company’s social network link
• Person’s social network link
• Labor capability social network link
• Faculty’s social network link
• Work-experience group social network link
• Link of the group on a social network
One of the possibilities of analyzing data is to begin
with recording cities, as the basic element of the analysis,
as has indeed been explained in the Introduction. The
relational model of connecting the tables enables
information about cities to connect with more detailed
data about corresponding companies. On the basis of this
model, then, a large number of more detailed information
about companies is obtained through connecting with
companies’ websites and their corresponding social
networks. One of the forms for the analysis of this
information is demonstrated in Fig. 2.
355
7th International Conference on Information Society and Technology ICIST 2017
356
7th International Conference on Information Society and Technology ICIST 2017
REFERENCES
[1] Y. Dong, Q. Ke, Y. Cai, B. Wu and B. Wang, "TeleDatA: data
mining, social network analysis and statistics analysis system
based on cloud computing in telecommunication industry", in
Third International Workshop on Cloud Data Management,
Conference on Information and Knowledge Management CIKM,
Figure 8. Analysis of information beginning with capabilities Glasgow, 2011, pp. 41-48.
[2] L. Yu, et al. "BC-PDM: data mining, social network analysis and
text mining system based on cloud computing", in 18Th
International Conference on Knowledge Discovery and Data
As we can conclude, the discussed relational model of Mining - ACM SIGKDD, Beijing, 2012, pp. 1496-1499.
the formed database enables us to have a large number of [3] B. Ch. Ju, "An Initial Exploration on Intelligent Contents to Social
different approaches in the analysis of information. Fig. 9 Network Research Via Fuzzy Based Data Mining", in Third World
accounts for the analysis of data based on information Congress on Software Engineering, Wuhan, 2012, pp. 151-154.
about persons, education and companies, as one of the [4] Ch. Jiang, F. Coenen and M. Zito, "Finding frequent subgraphs in
other numerous initial models for analyzing information. longitudinal social network data using a weighted graph mining
approach", in 6TH International Conference on Advanced Data
Mining and Applications: Part I, Chongqing, 2010, pp. 405-416.
[5] V. H. Tuulos and H. Tirri, Combining Topic Models and Social
Networks for Chat Data Mining, In: International Conference On
WEB Intelligence IEEE/WIC/ACM, 2004, Beijing. [Annals...].
Washington:IEEE, 2004, pp. 206-213.
[6] R. Agrawal, S. Rajagopalan, R. Srikant and Y. Xu, "Mining
newsgroups using networks arising from social behavior", in
12TH International Conference On World Wide Web, Budapest,
2003, pp. 529-535.
[7] J. Wortman, "Viral marketing and the diffusion of trends on social
networks", Technical Report MSCIS. University of Pennsylvania,
2008.
[8] D. Watts and J.Peretti, "Viral marketing for the real world",
Harvard Business Review, Vol 85, 2007. pp. 22–23.
[9] J. Leskovec, L. A. Adamic and B. A. Huberman, "The dynamics
of viral marketing", ACM Transactions on the Web TWEB, vol. 1,
no. 1, 2007, Article No. 5.
[10] S. Hill, F. Provost and C. Volinsky, "Network-based marketing:
Identifying likely adopters via consumer networks", Statistical
Science, vol. 21, no. 2, 2006, pp. 256–276.
[11] F. Bonchi, "Influence Propagation in Social Networks: A Data
Mining Perspective", IEEE Intelligent Informatics Bulletin, Vol.
12, No. 1, 2011, pp. 8-16.
[12] Th. V. Rocha, C. L. Stedefeldt Jansen, E. Lofti and R. R. Fraga,
Figure 9. One of other possibilities of analyzing data Customer Relationships", Review of Business Management, vol.
"An Exploratory Study on the use of Social Networks in Building
15, no. 47, Apr./Jun. 2013, pp. 262-282.
III. CONCLUSION [13] Person’s social network webpage, Available:
This paper demonstrates a methodological reference to https://www.linkedin.com
the development of the computer support, which can
universally be applicable in companies’ business. The key
357
7th International Conference on Information Society and Technology ICIST 2017
Abstract—Scheduled tasks are a ubiquitous part of our daily II. DESIGN AND IMPLEMENTATION
lives, whether generating reports for monthly data analytic
or sending newsletters to subscribed customers is needed. Kronos DSL is developed with textX [9]. textX is a
Domain-Specific Languages (DSL) are a viable approach meta-language (i.e. a language for language definition)
that promise to solve a problem of target platform diversity for DSL specification in Python developed at Chair of
as well as to facilitate rapid application development and Informatics, Faculty of Technical Sciences. textX is
shorter time-tomarket. This paper presents Kronos, a cross- based on the Arpeggio PEG parser [10]. From a single
platform DSL for scheduled tasking implemented using grammar description textX dynamically builds a meta-
textX meta-language. Tasks described using Kronos DSL model (in the form of Python classes) and a parser for
can be automatically created and started with provided your language. Parser will parse expressions on the
task-specific information
defined language and automatically build a graph of
Python objects (i.e. the model) corresponding to the
Keywords: DSL, Scheduled tasks, Python meta-model. It has the following additional features: 1)
automatic linking - textual references to other objects in
I. INTRODUCTION the language will be resolved to proper python references
Kronos [1] is a Domain-Specific Language (DSL) for automatically, 2) model/object post processing - callbacks
specification and execution scheduled tasks. Domain- (so called processors) can be registered for models and
Specific Languages are languages focused on a particular individual classes, which enables model/object post-
domain [2, 3]. Conversely, General-Purpose Languages processing (validation, additional changes etc.), 3)
(GPL) are languages that are made to solve as many grammar modularization / imports - grammar can be split
problems as possible regardless of the domain (examples out into multiple files and then files/grammars can be
of GPLs are Java, C, Python). Certain studies show that
imported where needed, and 4) meta-model/model
DSLs can increase productivity by up to 1000 percent [4,
5]. visualization, both meta-model and parsed models can be
visualized using the GraphViz software package [11].
Scheduled tasks are tasks that are executed at a specific
point in time and that can repeat at scheduled intervals Kronos is a DSL for scheduled tasks that is platform
without explicit people intervention. Big servers (clusters, independent, and currently allow creating and running
cloud and distributed systems etc.), personal computers tasks in the background. It also has a possibility to call
and their operating systems used this concept for a long web services even for different versions, passes security
time. Even mobile systems use some form of these tasks. keys for authentication, adds priority to every task, even
This wide usage is based on the fact that people can write ability to sync tasks that access shared data. In future
and scheduled tasks without their explicit presence (send Kronos can be extended with generator for Cron service
newsletter, periodic info, extract analytics data, run or for various other scheduled tasks code-based solutions,
diagnostic tools etc.). Idea behind scheduled task is not and calling different operations that are not on web.
new. This was one of tools that was available and Solution is designed to be as close as it can to plain
originally developed on UNIX operating systems by Ken English and to eliminate unnecessarily programming
Thompson.
languages constructs. With this is mind, users can easily
DSLs for scheduled tasks can increase productivity and write their own tasks, and leave it to Kronos to handle
reduce learning curve, especially if people wrote these everything else.
tasks in some programming language, or if some DSL
exists but they are not user-friendly. Tasks are pieces of A. Kronos meta-model and grammar
program code that need to be executed at some specific The grammar of the Kronos DSL, available at [12], is
time, together with its meta-data. Main motivation for the
development of Kronos DSL was the ability to write tasks defined using textX. With the help of textX it is possible
description in a language that is easy to read, write and to create a meta-model of the Kronos language. Figure 2
learn. describes the meta-model, shown as an UML-like class
This paper is organized as follows. Section II present diagram generated by textX. Each class in the figure
design and implementation of Kronos Domain Specific represents one textX rule, which is translated to a Python
Language. Section III present related work. Section IV class at run-time. There is no code-generation involved.
summarize conclusions and briefly propose ideas for textX works as an interpreter. Association links in the
future work. diagram represent textX match rule references or link rule
references.
358
7th International Conference on Information Society and Technology ICIST 2017
Match rule references have containment semantics where represented via text: first, second, third etc., or in ordinal
the contained object (child) can exists only if its container representations: 1st, 2nd, 3rd etc. Kronos is able to detect
(parent) object exists. Link rules don’t have this more Months, Days, and Ordinal numbers but they must
restriction. be separated with "," symbol. Also, users can use every
The core concept of Kronos DSL is Kronos rule which keyword, if they what to specify that tasks run every day,
starts with kronos: keyword and contains one or more Job month or ordinal respecting its place of usage. If the task
rules that contain all the information needed for proper runs whole day, Synchronized rule can be applied here
task execution. Only few rules are mandatory, and the too.
rest are optional. The user must specify short description
(plain text, mainly for logging reasons), defined in the
Description rule, URL for service to be executed (URL
rule) and the time interval for task execution (Scheduled
rule). Optionally, users can specify: 1) priority to task
execution over the others specified tasks (value of 1 is
assumed if not given), 2) security key in the form of
authorization and/or authentication token (empty string is
assumed if not given), 3) ability to sync tasks that access
shared data, 4) and service version.
Figure 1 presents textX grammar of Job rule, and its
usage in the task example, with Kronos header.
359
7th International Conference on Information Society and Technology ICIST 2017
downloading files from the Internet and downloading has implication that Selective tasks will get checked
email at regular intervals. every second until their run time comes.
Google App Engine Cron Service [7] allow users to Plans for further development of the Kronos DSL
configure regularly scheduled tasks that operate at include: 1) create better optimizations for sleep time of
defined times or regular intervals. These cron jobs are Selective tasks, 2) ability to add new tasks without
automatically triggered by App Engine Cron Service. For stopping currently active tasks, 3) extending grammar
instance, you might use this to send out a report email on and implementing ability for tasks to call local code,
a daily basis, to update some cached data every 10 instead just web services, 4) synchronize tasks that access
minutes, or to update some summary information once an shared data, and 5) integrating code generator for
hour. A cron job will invoke a URL, using an HTTP GET generating scripts for other similar tools that are more
request, at a given time of day. An HTTP request invoked complex to use 6) design plugins for modern text editors
by cron is subject to the same limits as other HTTP like vim, emacs, sublime, or implementation of GUI tool,
requests, depending on the scaling type of the module. 7) implement to work in distributed envirement [14].
Free applications can have up to 20 scheduled tasks. Paid
applications can have up to 100 scheduled tasks. REFERENCES
Csharp Schtick [8] a scheduled task runner built on [1] M. Simic, Kronos scheduled task dsl (2016), URL
https://github.com/MilosSimic/Kronos, last accessed 5 April 2017
Schyntax (a domain-specific language for defining event
[2] M. Mernik, J. Heering, and A. Sloane, ACM Computing Surveys
schedules in a terse, but readable, format) (CSUR) 37, 316–344 (2005).
APScheduler [13] is a Python library that lets you [3] M. Fowler, Domain-Specific Languages, Addison-Wesley
schedule your Python code to be executed later, either Professional, 2010.
just once or periodically. You can add new jobs or [4] T. J.-P. Kelly S., Domain-Specific Modeling: Enabling Full Code
Generation, Wiley-IEEE Computer Society Pr, 2008.
remove old ones on the fly as you please. If you store
[5] Metacase, Nokia case study, Tech. rep. (2007).
your jobs in a database, they will also survive scheduler
[6] Cron (2016), URL
restarts and maintain their state. When the scheduler is http://pubs.opengroup.org/onlinepubs/9699919799/utilities/cronta
restarted, it will then run all the jobs it should have run b.html, last accessed 5 April 2017.
while it was offline. [7] Google, Google app engine cron service (2016), URL
https://cloud.google.com/appengine/docs/python/ config/cron, last
IV. CONCLUSION accessed 8 April 2016.
[8] C. Schtick, C# schtick (2016), URL
Since readily available, cross-platform DSLs for https://github.com/schyntax/cs-schtick, last accessed 5 April 2017.
scheduled tasks that are easy to learn and read are not [9] I. Dejanovic, Textx (2016), URL
available to the best of our knowledge, we made a https://github.com/igordejanovic/textX, last accessed 1 March
2017.
contribution to that area with Kronos. Kronos DSL uses
[10] I. Dejanovic, G. Milosavljevi ´ c, and R. Vaderna, ´ Knowledge-
declarative plain English-like syntax to describe job Based Systems 95, 71 – 74 (2016), ISSN 0950-7051, URL
schedule specification. http://www.sciencedirect.com/science/article/pii/S0950705115004
Disadvantage of the current implementation is that 761.
every time user adds new task, Kronos needs to be [11] Graphviz, Graphviz (2016), URL http://graphviz.org/, last
accessed 7 April 2017.
stopped and started again, in order for the new task to be [12] M. Simic, Kronos scheduled task dsl grammar (2016), URL
applied. Although, there is a possibility to add new tasks https://github.com/MilosSimic/Kronos/tree/master/grammar, last
without restarting Kronos using API (Python code) it is accessed 2 April 2016.
still not very user-friendly. [13] APScheduler,
https://apscheduler.readthedocs.io/en/latest/https://apscheduler.rea
Currently, Kronos is able to run both Every and dthedocs.io/en/latest/, Accessed 4 april 2017
Selective Tasks, but optimizations on sleeping and [14] K Ousterhout, P Wendell, M Zaharia, I Stoica, University of
calculating when to run is done only for Every tasks. This California, Berkeley,
https://cs.stanford.edu/~matei/papers/2013/sosp_sparrow.pdf,
accssed april 4 2017
360
7th International Conference on Information Society and Technology ICIST 2017
Abstract—This paper presents a software tool for generating ADO and generation of e-courses that are compliant to the
e-courses using accreditation document ontology defined for data stored using ADO.
Serbian Higher Education. On this way, one can get e- In addition to the compliance of a generated course with
courses which structure and organization is compliant with the accreditation documents, our research deals with the
the formal regulative of higher education system in Republic selection of an appropriate electronic format for an e-
of Serbia. The tool facilitates creating e-courses for Canvas course. We have chosen to generate courses for Canvas
LMS since it provides populating information from LMS [5], mainly for two reasons. Canvas course structure
accreditation documents programmatically. The verification has significant expressivity ensuring that the most of
of the tool has been carried out through a case study on accreditation data can be represented within a Canvas
accreditation documents for the Software engineering and
course. In addition, Canvas LMS has been used at our
information technologies study program from the Faculty of
university for few years meaning that we are familiar with
Technical Sciences, University of Novi Sad.
its functionalities and course structure.
The rest of the paper is structured as follows. The next
I. INTRODUCTION chapter presents other researches in this field. Chapter
Usage of Learning Management Systems (LMS) in three gives a description of Canvas LMS and its course
higher education is increasing day by day. Such education format. Then, the ontology for accreditation documents is
environment has made teaching process easier in terms of presented. Chapter five presents the software tool for
course management, data availability, as well as generating courses in Canvas LMS. In the six chapter, a
communication among course participants [1]. Alongside case study on a representative study program from the
facilities that LMS provides, some activities, such as University of Novi Sad is presented. Finally, the last
creating and populating courses, must be done manually. section gives paper’s summary and outlining plans for the
In order to make the course creation process more further research.
effective, there is a need to automatize some parts of this
process by generating courses programmatically. II. RELATED WORK
Although the course generation is a widely researched In this chapter, we present other researches on the topic
problem, existing solutions mostly do not relate the of generating courses.
proposed course model to the official accreditation In [6], authors present a system for e-courses generation
documents. In particular, this problem has not been based on competencies ontology. Their solution provides
tackled for the Serbian higher education so far. a deductive object-oriented knowledge base system for
While creating a course, a teacher should obey the rules querying and reasoning on one side as well as a planning
defined in the accreditation documents. The created system for generating courses. The focus is on personal
course should have a structure and contain information evaluation planning, consisted of the initial state (current
proposed by the state officials. Accreditation documents knowledge level), the goals (educational objectives) and
define the structure and content of the courses that is the actions (available learning objects). The result of the
proposed by the official state organization. Accordingly, system is a course content in a form of a table, consisting
e-courses inside LMSs should comply with those of direct links to learning objects such as web pages, pdf
regulations. The main aim of this work is a tool for files, etc. But, there is no relation between the system and
automatic generation of e-courses that are in accordance roles and data defined in accreditation documents. In
with the accreditation documents. addition, a course is not generated in a format which can
In our previous work [2], accreditation documents in be imported directly into some LMS.
Serbian Higher education have been formally described The system presented in [7] enables automatic course
using the accreditation document ontology (ADO). ADO generation based on explicitly represented machine-
has been implemented as a component of the semantically readable instructional design templates. Authors have
driven document management system (SDDMS) defined three inputs for course generation, i.e. ontology of
described in [3]. Based on the techniques of the semantic learning goals, learning resources, and instructional design
web [4], ADO enables extending SDDMS with a new set templates. Instructional design templates provide an
of services that use information stored within the explicit and formal specification of instructional strategy
accreditation documents. in a course. The generated course is given in an LMS-
The tool presented in this paper has been built as a compliant format, but still data structure and content are
semantic service on top of ADO. The service provides not conformant to accreditation documents.
reasoning on accreditation documents represented within
1
361
7th International Conference on Information Society and Technology ICIST 2017
III. CANVAS LMS Root manifest tag contains three children tags -
Canvas by Instructure is a new generation open-source metadata, organizations, and resources. Metadata tag
LMS with the built-in cloud support [5]. It has been describes course and imscc archive global meta
implemented using Ruby on Rails platform as the web information. The tag organizations define course modules
application framework. Canvas provides a wide range of and organization of their items through the course life
e-learning functionalities and tools such as managing cycle. Beside title, the item may contain a file from course
curriculum, learning material, assessment, file sharing, resources. The tag resources stores metadata on all
scheduling, course organization, creating content learning resources placed in archive file hierarchy. Each
collaboratively, information sharing, etc. Canvas courses learning resource is represented by the resource tag that
are described by defining learning outcomes and course specifies resource identifier and resource type. In addition,
structure represented through learning resources files resource tag has child tag file, which references a physical
organized into modules. The evaluation of learning file within the course archive. In addition, different course
process is supported through specialized grading tools sections can be described in more details using XML
such as assignments, quizzes, and polls. Users have their configuration files stored within the course_settings
own file repositories and they can share experiences folder. We are going to describe the most important
through discussions and messaging system. Course configuration files from this folder.
organization and navigation can be customized as well as File course_settings contains course details and
student profiles. In addition to a predefined set of tools, multiple settings such as course identification, file storage,
Canvas may be extended by additional third-party grading scheme, license, visibility and course formats.
applications and plug-ins. Course content and structure are organized into modules
which contain items of type header or attachment,
A. Course format respectively. All modules are grouped by module tag into
Canvas LMS uses its own course format to represent module_meta XML file. Course evaluation methods are
course entity. It is possible to import course data in the represented using learning outcome XML tags within the
system from an archive file stored on the disk. The learning_outcomes file. The outcome of each evaluation
proposed e-course generator generates files in the format method is described with min/max points necessary for
recognized by Canvas LMS. The file is populated the successfully master the course. Learning outcomes can
programmatically with the data fetched from the be arranged by learningOutcomeGroup XML tag.
accreditation documents. Announcements are defined within XML files stored in
Canvas course format is organized into zip archive with the announcement folder. Assignments are composed of
the .imscc extension. The structure of this archive is HTML and XML settings file placed into folders which
presented in Figure 1. name start with the prefix assignment. Additionally
The archive contains XML and HTML files organized assignment group configuration file is located into
into folders whereby the main imsmanifest XML file is previously mentioned course_settings folder. Quizzes and
located in the root folder. The course structure and settings assessment methodology are described within files placed
are described in the imsmanifest XML file. into quizzes and non_cc_assessments folder. Course home
2
362
7th International Conference on Information Society and Technology ICIST 2017
page and other wiki pages are stored in the wiki_content described with the courseContent attributes and their data
folder. are mapped to the module_meta.xml file. Methods for
The next section presents the ontology of accreditation knowledge evaluation are defined through the mandatory
documents that is used as a basis for the generation of e- or optional exam parts which require a certain number of
courses in Canvas LMS format. points for mastery. Combined data of courseOutcome and
knowledgeEvaluationMethods attributes produce methods
IV. ONTOLOGY OF ACCREDITATION DOCUMENTS to determine the total score. The configuration of these
(ADO) course outcomes and knowledge evaluation methods are
placed into the learning_outcomes.xml file. Other course
In the second chapter, we mentioned ADO and its role information such as teaching methods and goals are bound
in SDDMS. Our software tool uses knowledge stored to the courseGoal and courseMethods attributes which
within ADO to retrieve course data that will be exported data are saved in a form of wiki pages under the
in Canvas LMS e-course format. wiki_content directory. All the mentioned XML
ADO represents domain ontology for the semantic configuration files are linked with the main
description of accreditation documents in Serbian higher imsmanifest.xml file and altogether represent course imscc
education. The ontology has been designed as an archive ready for import.
extension of generic document ontology presented in [9]. We must mention that not all data and relations among
This generic ontology is based on ISO 82045 families of them can be successfully exported from our ADO
standards. ADO allows advanced search and machine knowledge base to a Canvas course. The reason is that a
processing of accreditation data [2]. Canvas course is less expressive than our ADO model and
According to the guidance for creating accreditation it is not speciffically designed to support all the data
documents in Serbian higher education [10], ADO stored within the Serbian accreditation documents.
consists of components that cover various topics such as
study program, teachers and teaching staff, teaching V. E-COURSE GENERATOR TOOL
process and so on. This paper focus is set on standards In this chapter, we present a software tool that
that classify data related to study programs, curriculum facilitates creating e-courses for Canvas LMS by
and teaching staff. Ontology classes and data properties supporting automatic population of information from
that represent data from the standards are shown in Figure accreditation documents. The tool is implemented in Java
2. programming language as a console application. The tool
components are shown in the UML component diagram in
Figure 3.
E-Course generator component uses Jena Fuseki [11]
Server to provide the semantic reasoning on ADO. These
two components combined create a knowledge base on
accreditation documents, their curriculum and teaching
staff data. This knowledge base enables finding logical
relations between teachers and courses from various
accreditation documents. The tool retrieves data stored in
the ontology to generate a course that can be used in the
Canvas LMS.
E-Course generator component via REST service
executes SPARQL queries over Jena Fuseki server.
Obtained results contain in the XML form course data
taken from the accreditation documents. In order to create
Canvas course, it is necessary to parse retrieved results
Figure 2. ADO semantic structure of focused standards
and extract relevant data.
The result of the course generation process is a Canvas
The Curriculum Standard describes courses and their imscc archive which represents the Canvas course filled
specifications such as title, type, semester, ECTS points, with the data retrieved from the accreditation documents.
content, teachers, literature, course and evaluation The course generation process is presented in Figure 4
methods, etc.. The Teaching staff Standard describe in a form of an UML activity diagram. The generation
teachers who are involved in the teaching process. process starts with defining semantic conditions that will
Teaching staff is represented by their personal data, be used during course searching in accreditation
competences, publications and courses that they teach. We documents. After that, the generator uses previously
have described these standards using ontology classes and defined conditions to generate SPARQL query. Results of
properties that can be seen in Figure 2. query executed on Jena Fuseki Server are stored in the
Course generation requires mapping of course results.xml file. Then, the tool parses results file and create
information from accreditation documents, represented by course information which are than transformed to the
ADO, to Canvas LMS course format. Basic information records stored in the separate files that make the imscc
from ADO, like course title, type, code and etc. are bound archive. Finally, the user can import generated Canvas
to courseTitle, courseType, courseCode, courseECTS, course archive into the Canvas LMS.
courseSemester, courseStatus, etc. Canvas course
attributes. This information is stored in the
course_settings.xml file. Course content and structure is
3
363
7th International Conference on Information Society and Technology ICIST 2017
VII. CONCLUSION
The paper presents the software tool that facilitates
creating e-courses for Canvas LMS. The tool automatizes
course creation process and populates the course with the
data stored in the official accreditation documents defined
for Serbian Higher Education. The tool relies on the
Figure 3. Component diagram of the generating tool accreditation document ontology which represents
accreditaion data. The ontology has been designed as an
extension of a generic document ontology, based on ISO
VI. CASE STUDY 82045 families of standards, that enables different
semantic services on a semantically-driven document
The proposed tool has been verified by generating
management system.
twelve Canvas courses from the accreditation document
for the Software engineering and information technologies The main limitation of the proposed solution is the
study program at the Faculty of Technical Sciences, expressiveness of the Canvas e-course format which does
University of Novi Sad. not enables representing all the data stored within the
accreditation document ontology. This problem could be
The verification process consisted of two steps. At first
overcame if the Canvas LMS supported dynamic
we retrieved data on courses via SPARQL query. After
extensions of its course model. On this way, one could
that, the tool generated selected courses into Canvas LMS
add additional information into the course, resulting with a
format.
course model which is more conformant to the
Course search results are stored in the results.xml file accreditation data.
where sparql tag is root and contains the head and results
Beside the suport for the dynamical extension of the
tag. Variables within the head tag represent search
course model, future work will be aimed at the integration
attributes whose data are stored into the results tag. Every
of the proposed tool to the mentioned semantically-driven
result tag within the results tag present one course which
4
364
7th International Conference on Information Society and Technology ICIST 2017
ACKNOWLEDGMENT
Results presented in this paper are part of the research
conducted within the Grant No. III-47003, Ministry of
Education, Science and Technological Development of the
Republic of Serbia.
REFERENCES
[1] Szabo M. (2002), "CMI Theory and Practice: Historical Roots of
Learning Managment Systems", Proceedings of E-Learn: World
Conference on E-Learning in Corporate, Government, Healthcare,
and Higher Education 2002 (pp. 929-936), Chesapeake, Virginia.
[2] Nikolic N., Savic G., Molnar R., Gostojic S., Milosavljevic B.,
"Management of Accreditation Documents in Serbian Higher
Education using ontology based on ISO 82045 standards", 6.
International Conference on Information Science and Technology
(ICIST), Kopaonik, 28. Februar - 2. Mart, 2016, ISBN 978-86-
85525-18-6, pp. 148-153.
[3] Gostojić, S., Sladić, G., Milosavljević, B., Zarić, M., Konjović, Z.
(2015), “Semantic Driven Document and Workflow Management”,
International Conference on Applied Internet and Information
Technologies (ICAIIT 2014), October 24th, 2014, Zrenjanin,
Serbia, pp. 229-234.
[4] Berners-Lee, T, Hendler, JLassila, O (2008), “The Semantic Web",
Scientific American.
[5] Canvas Guides by Instructure, https://guides.instructure.com/.
[6] Kontopoulos, E., Vrakas, D., Kokkoras, F., Bassiliades, N.,
Vlahavas, I., An ontology-based planning system for e-course
generation, Expert Syst. Appl. 35, no. 1-2, 2008.
[7] Savic G., Segedinac M.. Koniovic 2.: Automatic Generation of E-
Courses Based on Explicit Representation of Instuctional Design,
2 6-00 Computer Science and Information Systems (ComSlS).
2012, Vol. 9, No 2, pp. 839-869, ISSN 1820-0214.
[8] Savic G.. Segedinac M.. Konjovic Z.: The Implementation of the
IMS LD E-course Generator. E-society journal. 2012. Vol. 2. No
1. pp. 121-131. ISSN 2217-3269.
[9] R. Molnar, S. Gostojić, G. Sladić, G. Savić, Z. Konjović,
“Enabling Customization of Document-CentricSystems Using
Document Management Ontology”, 5. International Conference on
Information Science and Technology (ICIST), Kopaonik, 8-11
Mart, 2015, pp. 267-271, ISBN 978-86-85525-16-2.
[10] Ministry of Education and Sport, “Akreditacija u visokom
obrazovanju (In Serbian)”, Ministry of Education and Sport of the
Republic of Serbia, [online] Available at:
http://www.ftn.uns.ac.rs/1379016505/knjiga-i-uputstva-za-
akreditaciju.
[11] Apache Jena, Triple store Fuseki,
https://jena.apache.org/index.html
5
365
7th International Conference on Information Society and Technology ICIST 2017
Abstract— This paper proposes a new approach towards Educational resources can be mapped to the set of
linking existing learning management systems with ontology educational objectives and using ontologies enables users to
based knowledge. Ontology is a central part of the system as it easily achieve such a mapping. The plugin that we propose
contains knowledge represented as the rules used during
in this paper enables educational resources to be organized
generation of teaching courses. Stream processing platforms
are used as bridges between LMS and ontology. Stream automatically in accordance with the organization of the
platforms are used to query data from LMSs, process them objectives thus facilitating building appropriate learning
and add insert information about available teaching materials paths. The plugin does not require educational resources to
into the ontology. With this approach existing LMSs can be be stored in one particular LMS, but allows users to form
extended with the possibility to generate teaching courses learning paths that spread throughout multiple learning
without modifications. management systems.
I. MOTIVATION III. METHODOLOGY
The main goal of this research was to propose a plugin for The research presented in this paper consists of
existing learning management systems, such as CanvasLMS developing a prototype plugin that enables automatic
which is used in the paper that provides the functionality of building of learning paths for stated educational outcomes.
generating educational courses by using ontology and The plugin is composed of three main modules:
stream processing frameworks. ● The data creation and manipulation module,
The proposed plugin comes with a UI that enables users ● The ontology,
to define their intended outcomes. The outcomes will serve ● The user interface.
as a starting point in the process of course generation. The
plugin enables users to obtain educational resources
A. Data creation and manipulation module
collected from learning management systems and arranged
in the appropriate order that correspond to the generated The first module is made of two components. Data
course. creation component connects to the data stores of the
existing learning management systems (since we are dealing
II. RESEARCH QUESTIONS with Canvas LMS in this paper, the prototype plugin
In traditional educational settings when creating a new connects to the PostgreSQL database), listens to the event of
educational course user had to manually combine creating new files and sends important data to Kafka topic.
educational resources. First, the user has to find appropriate The data manipulation component is Samza job that
educational resources which will lead to the achievement of processes incoming messages from the mentioned Kafka
stated educational objectives. After that, he or she has to topic. Processed messages are stored into the plugins store
sort educational resources in appropriate order, thus forming (in current prototype implementation MongoDB is used).
a learning path. In such an approach the appropriate B. Ontology
educational resource are being discovered based on their Reference [1]: “A vocabulary for exposing IEEE LOM
name, date of creation, content, amount of time required to metadata as Linked Open Data” from University of Alcala is
learn them, etc. If it is a student who designs his or her used as base for ontology in this paper. Ontology maps
educational course without teachers help, he or she proceeds IEEE LOM elements, a metadata standard for educational
in the process of learning step by step often without contents, to RDF ones based on Linked Data principles. The
knowing whether the resources are going to lead to the ontology is intended as a bridge between educational
intended outcomes or do they have some missing metadata and Linked Open Data (LOD) and represents the
preconditions. knowledge required for the creation of educational courses.
366
7th International Conference on Information Society and Technology ICIST 2017
In the ontology annotated educational resource are stored, unnecessary data. Modified messages are sent to MongoDB.
therefore issuing sparql queries, execution engine generates Added entities in MongoDB represent unannotated files.
learning courses. Files that are used in the creation of new educational
C. User interface courses are ones that have been annotated with educational
User interface (UI) can be implemented totally objectives.
independent of rest of the pipeline, it needs to be able to Inside ontology stored on OpenLink Virtuoso server is
issue REST calls. In this paper UI is an Angular client stored entire logic behind creation learning paths. Plugin
application through which a user can select starting and uses ontology as it stores all annotated materials within
desired educational objectives and preview the educational itself and with single SPARQL query ontology creates
resources proposed in the generated course. The UI also desired learning path.
provides a page view for showing newly added resources A. Extension
and enables users to annotate the resources with the To connect multiple LMS to the plugin for each LMS
educational objectives. separate Kafka Connect job is necessary. Each LMS stores
incoming materials differently, using different databases,
IV. SOLUTION
therefore data extraction is different. Kafka Connect can be
Data creation in the prototype plugin is achieved by using configured to extract data from different databases. As
Kafka Connect. Kafka Connect is a stream processing Kafka Connect only connects existing systems to Kafka, all
framework that provides the ability to connect existing data transformations need to be inside Apache Samza job.
systems to Apache Kafka. It provides two operations: Entire transformation logic can be implemented inside
● Sink – storing data from Kafka to the system, one Apache Samza job or divided into multiple jobs.
● Source – exporting data from the system to Transformed data is stored in same MongoDB collection
Kafka. regardless of source LMS. Also, system treats all data the
Since our prototype plugin supports only Canvas LMS, same regardless of origin.
and Canvas LMS uses PostgreSQL for storing all
information about users, files, events and activities, in this
research, we use only Kafka Connect source for PostgreSQL REFERENCES
which already comes with Confluent kafka-connect-jdbc. [1] “IEEE Standard for Learning Object Metadata”, Electrical and
Confluent kafka-connect-jdbc supplies sinks and sources for Electronics Engineers Standards Association, New York, vol. 1484,
pp. 21, January 2002.
many widely used systems, and a user can easily implement
[2] C. Brooks, G. McCalla, “Towards flexible learning object metadata,”
their own based on existing ones if needed. IJCEELL, vol. 16, February 2006.
Apache Kafka is used as a messaging platform. It is [3] Enayat Rajabi, Miguel-Angel Sicilia, and Salvador Sanchez-Alonso,
distributed streaming platform that enables horizontal “Interlinking Educational Resources to Web of Data through IEEE
scaling across multiple servers. Messages that arrive in LOM”, Information Engineering Research Unit, Computer Science
Department, University of Alcala, 28805, pp. 02, February 2015.
Apache Kafka topics represent the creation and upload of
[4] E. Juergens, M. Felikas, M. Herrmannsdoerfer, F. Deissenboeck, R.
new files into the Canvas LMS. Each arrival of message in Vaas, and K.-H. Prommer, “Feature profiling for evolving systems”,
Apache Kafka will trigger Apache Samza jobs process Program Comprehension, IEEE 19th International Conference, June
method. 2011.
The Apache Samza is in charge of manipulation of [5] CA. El Saddik, A. Ghavam, S. Fisher, R. Steinmetz, “Metadata for
Smart Multimedia Learning Objects,” IEEE Transl. J. Magn. Japan,
incoming data. Depending on created data from Kafka vol. 2, pp. 740–741, August 1987 [Digests 9th Annual Conf.
Connect, Apache Samza job can enrich, modify or exclude Magnetics Japan, p. 301, 1982].
367
7th International Conference on Information Society and Technology ICIST 2017
368
7th International Conference on Information Society and Technology ICIST 2017
and possibly not consistent. New proofs can appear at any (one or more), and thus can share knowledge and acquired
time, can have any content, and can change the truth of experience.
any existing statement. Additionally, the system may not Finally, the short-term memory is located on the third,
have sufficient resources (in terms of time, storage space, the highest level of abstraction. It contains knowledge that
etc) to consult its entire knowledge to solve the problem. is directly available to rules for executing and it is the
Therefore, it is possible that the system does not apply the starting point for reasoning in DNARS. The main task of
full set of rules for executing logic. The whole logic is short-term memory is to serve as a module for runtime
organized into nine levels [9]: optimization. Its content should be fully fit in the working
NAL-1: Inference rules for inheritance. memory of one machine, providing maximum
NAL-2: Similarity, instance, and property copulas. performance [1].
NAL-3: Compound terms. The knowledge bases based on NAL statements can
actually be represented using of the so-called property
NAL-4: Arbitrary relations among terms. graph. property graph is a directed, multi-relational graph
NAL-5: Higher-order statements. with any number of properties attached to vertices and
NAL-6: Variables. edges [20][21]. That is, it is a graph which can include
different types of edges (e.g. for representing inheritance
NAL-7: The concept of time. and similarity), and in which each vertex or an edge can
NAL-8: Support for operations, i.e. procedural have any number of key → value pairs attached to it.
statements.
NAL-9: Self-control and self-monitoring. III. SIEBOG AGENT MIDDLEWARE
Each level introduces additional grammar and Siebog multiagent middleware combines modern
execution rules, and extends expressivity of logic and the principles of server and client development system into a
ability of the system that are based on it. The current single software framework for agents. On the server-side
version of Siebog implements first four levels of NAL [5][7], Siebog offers load-balancing agents per cluster
within DNARS [1]. nodes, as well as resistance to hardware and software
DNARS is the reasoning system that is based on the faults (fault-tolerance). On the client-side [8], Siebog
NAL and the general principles of the development of functions as a platform-independent system that is
non-axiomatic reasoning in combination with designed so that it can be executed on a number of
contemporary approaches and standards for the processing devices, such classical desktop computers, smartphones
of large amounts of data [1]. DNARS distributes and tablets, smart TVs, etc. Server and client side are
reasoning among cluster of computers in order to improve integrated into a single framework that delivers cross-
performance. It incorporates an efficient knowledge platform agents communication, heterogeneous mobility,
management system, and a set of inference engines for as well as code sharing. Finally, Siebog is connected to a
answering questions and deriving new knowledge. DNARS, in order to enable the development of intelligent
DNARS also uses a graph-based representation of multiagent systems with unique capabilities.
knowledge [20][21]. Siebog multiagent platform has been extended to
When defining the architecture of DNARS, a special include support for reasoning based on DNARS [1]. Since
attention is dedicated to the organization of the knowledge annotations are the standard meta-programming constructs
base. The base is designed as a distributed, scalable of Java EE, the same approach can be applied for defining
architecture which consists of three levels. At the lowest server-side DNARS agents. A set of annotations for
level, all knowledge is fragmented and distributed to marking of agent goals are offered to software developers.
multiple computers in a cluster. So-called horizontal Annotation based development of intelligent agents has
scaling is used: to increase the amount of knowledge, the also been used elsewhere. Currently, Siebog agents that
system performance is maintained by simply adding new rely on DNARS reasoning are programmed in terms of
computers to the cluster [22]. This approach has two main beliefs and actions. The three main annotations for belief
advantages: and action management are @Beliefs, @BeliefAdd, and
@BeliefUpdated. In the background the system then
The knowledge base can contain and manage of automatically translates the signals sent from DNARS’s
large amounts of data. By introducing appropriate Event manager, in calls of appropriate methods. The
rules for the data distribution, fast finding and integration of client-side Siebog with DNARS was
obtaining the relevant statements is possible. implemented the same way as the integration with the
Fault-tolerance is enabled because the data is server-side of the Siebog, i.e. through the communication
copied to the computer cluster, i.e. there is no with the appropriate web services. In terms of support for
single point of failure, and the system can continue intelligent agents, Siebog left the traditional BDI
to operate regardless of the hardware or software architecture. Instead, it offered the possibility of
faults. development of intelligent multiagent systems which
The entire knowledge is then divided into one or more involve agents with innovative reasoning skills.
knowledge domains. Domains organize data from a lower In this way, Siebog and DNARS represent a unique
level into logical categories. During runtime, the system multiagent framework for the development and execution
may consult one or more domains. This organization also of intelligent agents in Web environments, and offer new
supports sharing data between clients. The knowledge that and interesting ways for the practical application of agent
belongs to one client can be placed in one domain. technology.
However, more clients may work with the same domain Agent-oriented Domain-specific language ALAS is
extended with new language constructs to support new
369
7th International Conference on Information Society and Technology ICIST 2017
distributed system for non-axiomatic reasoning. The DnarsBeliefAUAnnotation: annotation = BeliefAU '{'
original purpose of ALAS constructs was to hide much of (body=Body | body=GraphInclude)*
the complexity of the agent code, but, on the other hand to
'}';
be sufficient to show how agent works. In this way, the
developer can focus on solving a specific problem [4]. DnarsBeliefsAnnotation:
The goal of the language is to allow mobility of agents i.e. beliefs' '{' Judgement (',' Judgement)* ';'
moving and executing agents on different platforms and '}';
different virtual machines. The idea is that all agents BeliefAU: ann_name = 'beliefadd' |
(client-side and server-side) can be created in the same ann_name = 'beliefupdated' belief = BeliefAUDeclaration;
programming language and then to be transformed into
the appropriate language of the target platform, during the BeliefAUDeclaration: '('(belief=Judgement ',')?
migration. This paper describes one example of func_param=ID ')'
transformation of the ALAS code to the Java code (Listing Judgement: term = Term copula = Copula
2) because server-side of the Siebog is implemented in term = Term truth = Truth ? ;
Java. The process of upgrading the ALAS to support Term: term = AtomicTerm | term = CompoundTerm;
client-side (JavaScript) is in progress, too. Reference [23]
describes previous version of ALAS which was Copula: copula = Inherit | copula = Similar;
implemented in Xtext and the process of conversion of the Truth: '(' number = NUMBER ',' number = NUMBER ')';
ALAS code to the JavaScript using Google Web Toolkit AtomicTerm: ID;
(GWT). CompoundTerm:
The latest version of ALAS was developed with the PrefixCompTerm | InfixCompTerm | ImgCompTerm;
support of the textX framework [15]. TextX is a meta- PrefixCompTerm: '(' ct=CompTerm term+=AtomicTerm')';
language and a tool for building Domain-Specific
InfixCompTerm: '(' term=AtomicTerm ct+=CompTerm')';
Languages in Python. It is built on top of the Arpeggio
PEG parser (Parsing Expression Grammar) [17], and ImgCompTerm:
takes away the burden of converting parse trees to abstract '(' image = Image term = AtomicTerm
representations from language designers. From a single Option1 | Option2 ')';
grammar description, textX constructs Arpeggio parser CompTerm: connector=Connector term=AtomicTerm;
and a meta-model in run-time. The meta-model contains
all the information about the language and a set of Python Listing 1. A part of ALAS grammar
classes inferred from grammar rules. The parser will parse
programs/models written in the new language and With these rules the appropriate ALAS constructs are
construct Python object graph a.k.a. the model conforming defined: beliefs, beliefadd i beliefupdated. In the Java
to the meta-model. Notable features of textX are: (1) code, these constructs will be converted into the following
construction of the meta-model and parser from a single annotations: @Beliefs, @BeliefAdd i @BeliefUpdated as
description, (2) automatic resolving of model references, you can see in Listing 2.
(3) meta-model and model visualization (using GraphViz Language model (agent), like language grammar, can
dot tool3), (4) repetition expression modifiers, (5) be written in a plain text editor without dependence on
grammar rule modifiers, (6) case sensitive/insensitive any development environment. By using the appropriate
parsing (configurable per meta-model), (7) whitespace commands, the syntax can be checked from the command
handling control (per meta-model and per grammar rule), prompt (shell) to see if the model corresponds the
(8) direct support for language code comments (comments grammar and if there are syntax errors in the written
are treated as whitespaces), (9) object and model post- agent. Besides the command prompt (shell), the execution
processing, (10) grammar modularization, (11) extensive can be achieved by writing and by running the appropriate
error reporting and debugging support. Python script.
The syntax of the language is defined using textX's During the execution of the appropriate command, it
grammar that consists of a set of rules [14]. Since textX is creates a parser, parse tree and meta-model of language.
based on the Arppegio parser, PEG notation for describing Parser checks the grammar validity and compatibility
of grammar is at the base of the ALAS grammar. The model (agent) with meta-model (grammar). If the model
main advantage of the PEG grammars compared to some corresponds to the meta-model, the console will print
other grammars, for example, context-free grammars messages that model and meta-model are OK; if there is
(CFG) is the use of ordered choice operator that enables an error, the console will print the error and the exact
uniformity of parsing. If the input text belongs to a location of the error.
language that describes the given PEG then there is only
one valid tree that describes it, i.e. grammar can’t be After that, by typing automatically proposed commands, it
ambiguous. Extension of the language grammar is .tx. generates both concrete syntax tree and abstract syntax
There are several .tx files that together form the grammar tree. After these commands the corresponding directory
of ALAS language. pictures that represent concrete and abstract syntax tree
will be generated. Creating the appropriate diagrams is
Besides the modification of existing properties and supported by the GraphViz library [24].
rules to accommodate the textX, in the language grammar
are added both new rules that allow agents to support One of the main reasons for using the textX is the PEG
DNARS and rules to program the flow control. Two basic parser on which the textX is based. By using this parser, it
rules that are used to support DNARS are is easy to access any segment of the language model. It is
DnarsBeliefsAnnotation and DnarsBeliefAUAnnotation. very useful if it is necessary, for example, to transform a
Listing 1 shows a part of the definition of these rules. text model (ALAS code) to some other programming
370
7th International Conference on Information Society and Technology ICIST 2017
beliefs { @Beliefs
tiger -> cat (1.0, 0.9), public Statement[] initialBeliefs() {
cat -> animal (0.9, 0.6); return new Statement[] {
} "tiger -> cat (1.0, 0.9)",
"cat -> animal (0.9, 0.6)"
beliefadd(tiger->animal, belief) { }
log('tiger is a type of animal: {}', belief); }
graphinclude;
} @BeliefAdd(subj = "tiger", copula = "->", pred = "animal")
public void beliefAdd(List<Statement> belief) {
LOG.info("tiger is a type of animal: {}", belief);
graph().include(belief.toArray(new Statement[0]));
}
Listing 2. A part of the ALAS code and the appropriate Java code
371
7th International Conference on Information Society and Technology ICIST 2017
Organizational Science Development internationalization and [19] F. Sommers and G. Englebretsen, “An invitation to formal
cooperation, Portorož, pp. 1037-1043, 2015. reasoning: the logic of terms”, Ashgate Publishing Ltd, 2000.
[13] Xtext, http://www.eclipse.org/Xtext/documentation/, on [20] I. Robinson, J. Webber and E. Eifrem, “Graph databases”,
December 10, 2016. O'Reilly Media, Inc., 2013.
[14] TextX, http://www.igordejanovic.net/textX/, on December 4, [21] M. A. Rodriguez and J. Shinavier, “Exposing multi-relational
2016. networks to single relational network analysis algorithms”, Journal
[15] I. Dejanović, R. Vaderna, G. Milosavljević and Ž. Vuković, of Informetrics, vol. 4, pp. 29-41, 2010.
“TextX: A Python tool for Domain-Specific Languages [22] M. Michael, J. E. Moreira, D. Shiloach and R. W. Wisniewski,
implementation”, Knowledge-Based Systems, vol. 115, pp. 1-4, “Scale-up x scaleout: A case study using Nutch/Lucene”, In IEEE
2017. International Parallel and Distributed Processing Symposium, pp.
[16] Appregio, http://www.igordejanovic.net/Arpeggio/, on December 1-8, 2007.
4, 2016. [23] D. Sredojević, M. Vidaković and D. Okanović, D. Mitrović and
[17] I. Dejanović, G. Milosavljević and R. Vaderna, “Arpeggio: A M. Ivanović, “Conversion of the agent-oriented domain-specific
flexible PEG parser for Python”, Knowledge-Based Systems, vol. language ALAS into JavaScript”, 5th Symposium on Computer
95, pp. 71-74, 2016. Languages, Implementations and Tools, AIP conference
proceedings, vol. 1738, Rhodes, Greece, 2015.
[18] R. Smith, “Aristotle's logic. The Stanford Encyclopedia of
Philosophy”. URL https://plato.stanford.edu/entries/aristotle- [24] Graphviz, http://www.graphviz.org/, on December 10, 2016.
logic/, on December 5, 2016. [25] Jinja2 - The Python Template Engine, http://jinja.pocoo.org/, on
December 10, 2016.
372
7th International Conference on Information Society and Technology ICIST 2017
Abstract—Although web browsers were initially designed As they represent the sources where most of the
for displaying simple documents, they have gradually optimizations can be carried out, we focus on improving
evolved into complex pieces of software. Modern browsers the application parts that rely on browser’s network and
are expected to provide support for complex web rendering and JavaScript components.
applications, which are often competing with native
applications. By understanding how the browser displays II. THE BROWSER
content, developers can implement efficient architectural
solutions and more easily overcome obstacles inherited from A typical modern web browser consists of the following
the browsers’ early design. They can efficiently utilize components [1]:
hardware resources and provide user experience user interface, which includes all the visible
comparable to that of native applications. The purpose of elements of the browser, such as address bar and
this paper is, therefore, to present concisely the architecture navigation buttons, except the main window
of modern browsers, and to suggest some of the most showing the web page content;
effective methods for optimizing web applications with the browser engine, which coordinates the user
focus on networking, rendering and JavaScript. interface and rendering;
rendering engine, which renders the content. If it is
I. INTRODUCTION written in HTML (HyperText Markup Language),
When introduced in the early nineties, the web browser the engine parses, interprets and displays the
was designed for displaying simple web documents. content, otherwise it uses one of the third-party
However, along with the evolution of the Web that plug-ins (e.g. in case of Macromedia Flash);
followed, the browser also evolved into a complex piece user interface backend, responsible for rendering
of software. Instead of merely displaying web content, the content using visual components provided by
modern browsers are capable to provide an environment the underlying operating system (buttons, form
for running complex applications with rich user interfaces elements, etc.). As a consequence, web pages can
with many transitions and animations, multimedia content look differently on different operating systems
and interactions with the user. even if they are displayed using the same browser;
Despite using relatively complex content generation networking component, which is responsible for
and delivery models, already being platform-independent, the transfer of content from and to the web server;
easily accessible and customizable, web applications are JavaScript engine, which runs the client-side
also very often expected to provide similar performance scripts written in JavaScript language;
and user experience as the native applications. This is,
local data storage, which stores the client-side
however, hard to achieve, especially because web
application data using web cookies or various other
applications are often subjected to inefficient utilization of
technologies, such as localStorage and IndexedDb.
client-side hardware resources, which is partly also due to
the browser’s design inherited from the early design of the
web. However, many of these inefficiencies can be The interaction between the components is illustrated in
avoided by exploiting the knowledge on how the browser Figure 1.
handles documents.
Since many of the browser vendors are unwilling to III. THE OPTIMIZATIONS
disclose the detailed information on the components of
their browsers and the implemented mechanisms are A. Network
subject to frequent changes, it is difficult to find topical Timely transmission of content from the server is one
literature. The purpose of this paper is, therefore, of the major factors that affect user experience. If transfer
to present concisely the architecture of modern times are too long, the application may act unresponsive.
browsers, and The two most problematic limitations that may affect the
to suggest optimizations of the common aspects of application responsiveness are limited network bandwidth
web applications that enable relatively and limited number of connections the browser can
straightforward performance improvements. We maintain with a same source. To alleviate these problems,
especially focus on the improvements that affect we can optimize the:
user experience, which has become one of the key size of the resources (using compression and
competitive advantages in the software market. minimization),
number of the HTTP (HyperText Transfer
Protocol) requests, and
373
7th International Conference on Information Society and Technology ICIST 2017
the resource loading strategy (using separate transferred separately from the web server. If many HTTP
application domains and preloading). requests are needed to display a web page, some of them
will have to wait as the browser limits the number of
simultaneous connections.
One way to reduce the number of image transfer
requests is using the so-called CSS sprites. This method
includes combining multiple images into a single one and
transferring it as such to the browser. Then, by using the
CSS (Cascading Style Sheets) language, the individual
images of the composite image are displayed within the
web page. In this way, we achieve the same effect as if
transferring the images separately but save a lot of time
because a single HTTP request to the server is required.
An example of CSS sprites can be found in [3].
Many web applications use cookies for a variety of
purposes, such as for identifying and analyzing users etc.
Once created, a cookie is sent to the server within each
HTTP request to a particular domain. Although it does not
contain a lot of data, it is ineffective to send a cookie
Figure 1: Browser components and their interaction when not needed, for example when requesting static
content, such as images and scripts. To avoid sending
1) Reducing the size of resources cookies, such content can be served from another domain
HTTP compression can be used in order to make the which does not use cookies. On the other hand, loading
connection between a server and a browser optimally resources that can block the displaying of a web page
utilized. The compression works by compressing the data from another domain (e.g. CSS and some JavaScript
the server sends to the browser with a lossless documents) can have quite the opposite effect to the one
compression algorithm that both, the browser and the intended. This is due to the fact that transferring resources
server understand. In this way, the amount of the data from another domain requires a new DNS (Domain Name
transferred is reduced, and consequently the data System) query, TCP (Transmission Control Protocol)
throughput is increased. connection and TLS (Transport Layer Security)
When the browser requests a resource from the server, handshake (when using HTTPS (HTTP Secure) protocol),
it advertises the compression methods it supports (for which increases web page display times.
example: "Accept-Encoding: gzip, deflate") in the Accept-
3) Resource loading strategies
Encoding field of the HTTP request. If the server supports
If there is a high probability a user will need or access
one of the advertised methods, it compresses the contents
particular resources of the application, such resources can
with the appropriate algorithm and informs the browser
be loaded in advance in a process known as preloading.
about the compressed content using the Content-Encoding
Preloading a commonly used content enables a more
field in the HTTP response (for example: "Content-
fluent interaction with the web application. When needed,
Encoding: gzip") [2].
the preloaded content is, namely, already available to the
The data throughput can be further optimized by browser and can therefore be displayed almost instantly,
removing redundant content in the documents, which does which significantly improves user experience.
not affect the functionality of the web application. One
The types of preloading include link prefetching, DNS
example is removing the redundancy in JavaScript
prefetching and prerendering. The selected preloading
documents by removing whitespaces, comments and
type can be indicated by the corresponding value of the rel
unused code, and using short names for variables and
attribute (“prefetch”, “dns-prefetch” or “prerender”) of the
functions. This process, known as minimization, is similar
link element referring to the HTML-external resource [4].
to lossless compression with a fundamental difference: it
does not require the reverse process. Several tools for B. Rendering
minimizing JavaScript code exist including UglifyJS, YUI
compressor, Google Closure compiler and JSMin. The When the browser receives the requested content, it
disadvantage of minimization is a poor code readability passes it to the rendering engine. In popular open source
which can be, however, improved by the browser tools rendering engines, such as Gecko and Webkit, the
known as prettifies. rendering is performed as follows [1]:
Web applications often include a large amount of 1. HTML and CSS documents are transformed into a
images. If their size is not optimized, the transfer includes document object model (DOM);
also unnecessary data, which consequently prolongs the 2. The dimensions and the layout of the visual elements
page loading time. It is therefore recommended to use is computed;
lossless or even lossy image compression methods. The 3. The render tree containing a visual representation of
lossy image compression can be more effective than the the document is constructed;
lossless type, however it has a greater impact on the image 4. The elements of the render tree are painted –
quality. displayed on the screen.
2) Optimizing the HTTP requests Figure 2 presents the process of displaying a document
As the images are stored separately from HTML in the example of the WebKit rendering engine.
documents within which they are displayed, they are also
374
7th International Conference on Information Society and Technology ICIST 2017
375
7th International Conference on Information Society and Technology ICIST 2017
executing them in the main thread usually results in an when the application does not require all the objects in the
unresponsive user interface. pool.
One solution is to use the so-called web workers. A
shared worker can be used by any script within the
domain while a dedicated worker can only be used by the
script that created the worker [5].
The code intended to be executed in a dedicated worker
thread must be in a separate file and can be invoked from
the main script using:
To send some data from the main thread to a web Figure 3. Memory usage over time. Top line: new objects are
worker, the postMessage method is invoked: created when needed and are removed from the memory after
use by the garbage collector. Bottom line: object pool with a
worker.postMessage('My value');
constant number of reusable objects is used.
The code that listens for the messages sent from the IV. DISCUSSION AND CONCLUSION
worker back to the main thread is the following: The environment in which web applications are
executed is constantly evolving. Some of the techniques
worker.addEventListener('message', function(e) { mentioned in this paper will, perhaps, be obsolete in the
console.log('Worker says: ', e.data);
}, false); near future. On the other hand, new challenges will
emerge and new solutions will have to be developed. It is
There are two ways to terminate the web worker: the essential that developers understand the environment in
terminate method can be called in the main script or the which their applications are executed and utilize the full
method close can be called within the worker itself. potential it offers.
The applications written in JavaScript can be very The effect of the optimizations on the application
complex and can use many thousands of objects. Creating performance depends on many factors including content
an object requires a certain space in memory, which is of the application, quality of its design and code before the
released when the object is no longer in use and optimization, as well as the testing conditions. By
referenced from other objects. Releasing the memory can implementing the presented optimizations in our sample
mainly be done in two ways: manully or automatically. In Single Page Application, we were able to improve our
the first one, a programmer manually removes objects application so that the performance metrics were within
from the memory. However, as this requires a lot of effort the margins established by the RAIL model [7]: all
and discipline, most high-level languages rely on the so- responses to user actions were bellow 100 ms, rendering
called garbage collectors which automatically remove the rate was above 60 Hz and all the content was loaded in
waste objects from the memory. In this case, the execution less than one second.
environment determines which objects are suitable for Our further work in this field will investigate how the
removal, which may be a processor-demanding and, improvements of application performance reflect in the
consequently, time-consuming task. improvements in the actual user experience.
The applications with a large quantity of waste objects
have a typical saw tooth-shaped memory usage over time, REFERENCES
as shown in Figure 3. The sudden drops in memory usage [1] T. Garsiel and P. Irsh, “How Browsers Work: Behind the scenes
correspond to the operation of the garbage collector. This of modern web browsers,
http://www.html5rocks.com/en/tutorials/internals/howbrowserswork/,
can result in an increased load of the main thread and accessed 7.4.2017
consequently in an unresponsive user interface.
[2] Fielding, et al., “Hypertext Transfer Protocol -- HTTP/1.1, 25. 06.
When using plenty of objects of the same type, the 1999”, https://tools.ietf.org/html/rfc2616, accessed 7.4.2017
problem can be solved with an object pool. Instead of [3] w3schools.com, “CSS Image Sprites”,
constantly creating new objects, the existing ones that are https://www.w3schools.com/css/css_image_sprites.asp, accessed
no longer in use can be reused. As the object pool contains 7.4.2017
a constant number of objects, the memory usage is [4] Wikipedia, “Link prefetching”,
relatively constant over time (Figure 3). The https://en.wikipedia.org/wiki/Link_prefetching, accessed 7.4.2017
implementation of the object pool is described in [6]. [5] Mozilla, “Using Web Workers”, https://developer.mozilla.org/en-
US/docs/Web/API/Web_Workers_API/Using_web_workers,
It should be noted that using object pool does not accessed 7.4.2017
actually decrease memory consumption but it only reduces [6] SourceMaking.com, “Object Pool”,
the time and the resources spent on unnecessary removal https://sourcemaking.com/design_patterns/object_pool, accessed
and creation of objects. Using object pools can also have 7.4.2017
some side effects, such as slower script startup due to the [7] M. Kearney, “Measure Performance with the RAIL Model”,
initial pool initialization, and inefficient memory usage https://developers.google.com/web/fundamentals/performance/rail,
accessed 7.4.2017
376
7th International Conference on Information Society and Technology ICIST 2017
Abstract— This paper presents an overview of the integrated regardless of their technical skills, to create a
framework that provides Python code generation based on specification of information system and generate a
the visual specification provided by Kroki mockup tool. working prototype with the click of a button.
Kroki enables end-users to create a specification of the Implementation of the latest Django web
developed information system by sketching a user interface
framework allows developed prototypes to be up to date
and execute a working prototype based on that specification
within seconds. Working prototype evaluation is extremely with the modern web technologies and gives users more
useful for identifying potential architectural errors that are realistic insight into look and feel of the final product.
the result of miscommunication between the parties The main goal of this part of the research is to replace the
involved in development, so having a functional prototype old and somewhat outdated web prototyping framework
available enables such flaws to be corrected on-site, before with the one that suits modern web-based business
the projects advance to the implementation phase. In its first information system requirements. Since the problem of
iterations, Kroki mockup tool provided adaptive integrating a new framework into Kroki tool has been
frameworks that generated working Java applications based addressed in our previous publications, the main focus of
on the user specification. This paper presents an additional
this research is on the Kroki Django module alone. In
Kroki framework that is used to generate Python web
applications based on the Django framework. This order to achieve this, the main body of knowledge and
additional module is introduced in order to provide more technological solutions has been gathered by
diversity among generated prototypes and in order to foster implementing the previous Java frameworks, but some
acceptance of modern web technologies such as HTML5, platform specificities and challenges that modern web
responsive web design, Twitter Bootstrap and such. technologies bring also arose and needed to be addressed.
II. RELATED WORK
I. INTRODUCTION
In order to implement a Kroki module for prototype
The main motivation of the Kroki project[5][6] is to execution, existing solutions in the field of rapid web
enhance software requirements elicitation process by application prototyping based on the user interface
narrowing the communication gap between the mockups are considered. Some of the notable solutions
stakeholders and the development team[3][4]. The main have been summarized here while more comprehensive
goal is to provide participatory workflow which would overview is reported in the full paper.
allow customers and developers to develop negotiated FileMaker[9] is a professional tool for business
information system specification within a few application development that has been available for over
collaborative sessions. In the research conducted by the 30 years. The tool is able to generate a working business
Standish Group[2] that included over 350 US based application based on the database model and user
software companies, the data gathered from over 8000 interface specification. The tool sports a very attractive
development project showed that the main reasons for design and a myriad of predefined application templates
project failure are: the lack of user involvement, in order to make development as easy as possible. Unlike
requirements incompleteness and changing requirements. the proposed solution, FileMaker is a commercial tool
The results clearly state that improvement of the and it is not intended to be used as a prototyping tool.
requirements quality is crucial. Another research, SOLoist[10] is Java-based model-driven
documented in the Information System Manifesto by development framework for rapid business application
James Martin[1], that clearly displays the magnitude of development. All model constructs and application
specification flaws impact reports that 64% of errors elements are compliant to the underlying UML profile,
originate in the software analysis and design phase, called OOIS UML. Currently, it is the only available
despite the fact that the client has formally signed the solution that is based on the UML profile or domain
specification. The same author notes that 95% of project specific language that we have come across, beside
cost is spent on correcting those faults. According to the KROKI tool. The main difference between the SOLoist
Agile manifesto, best evaluation is achieved if it is based and KROKI is that it doesn't fully embrace agile
on something that works. With all this in mind, Kroki
tool provides a simple workflow that enables all parties,
377
7th International Conference on Information Society and Technology ICIST 2017
development practices in order to foster end-user an underlying engine are its wide acceptance, maturity
participation. and widespread community which makes this technology
All of the considered solutions can be organized current industry standard. Django provides built-in
into two main groups. The first are tools used for business database management features such as synchronization
application development which aim at automation of the and migration tools and integrated object-relational
development process by generating as much mapping which, coupled with the framework extensibility
programming code as possible and leaving the custom temple engine and HTML forms framework, makes it a
logic to be programmed manually. Other tools are used great foundation for modern data-driven web
for software prototyping and mainly were able to applications.
generate just the basic user interface skeleton without any Besides modernizing the Kroki web prototype
functionalities. Django module of the Kroki tools is frameworks, a step has been made towards a more
designed to provide a hybrid solution that would be able flexible extension of supported languages and platforms
to produce a functional three-tier Python web information for prototype execution. In order to provide the
system that would serve for hands-on evaluation of the transformation from Kroki generated XML files to target
final product in the early phases. Django project a parser and adapter modules have been
developed. This modularization greatly decouples
III. KROKI TOOL mockup specification modules from the execution
Kroki tool[5][6] provides a streamlined participatory frameworks and enables the development of custom
workflow which encourages client involvement in the generators if the need for additional platforms and
design phase of software development and tend to reduce language support arises in the future.
time and cost of the overall project. In order to provide
more realistic insight into developed project appearance IV. DJANGO FRAMEWORK
and functionalities, users must be presented with the Django[11] is a Python web framework aimed at
prototype that suits their expectations and the final providing a simplified and streamlined experience in
product as much as possible. With that in mind, modern web applications development. In order to
executable prototypes and their corresponding provide this, the framework is based on MVC (Model-
frameworks need to be updated regularly according to View-Controller) design pattern and the ORM (Object-
current industry trends and standards. Kroki mockup Relational Mapping) techniques. All application entities
editor can be seen on Figure 1, while the built-in UML are described in the models.py file, which serves as a
editor is displayed on Figure 2. basis for database generation. Mapping is executed by
With this in mind, a new adaptive module based on the running Django migrate task which updates the database
current state-of-the-art technologies has been developed according to contents of models file. At any moment, an
which further enhances the quality of the prototypes administration web dashboard is available that provides
generated from the Kroki specification. The resulting basic manipulation over the generated entities.
platform for the generated applications is Python Django URL mapping is specified in the urls.py file
programming language and Django web framework which maps URL patterns to specific python methods,
which are mature and affirmed technologies for web called views. URL patterns are provided as regular
development. On top of that, a modern responsive web expressions and the frameworks maps the first matched
design guidelines have been fostered using the Twitter pattern to a corresponding view, or sends a 404 HTTP
Bootstrap 3[12] framework and custom responsive error code if no match is found. If the matching view has
jQuery modules. The major benefits of using Django as been invoked, its task is to process the client request,
378
7th International Conference on Information Society and Technology ICIST 2017
379
7th International Conference on Information Society and Technology ICIST 2017
380
7th International Conference on Information Society and Technology ICIST 2017
381
7th International Conference on Information Society and Technology ICIST 2017
Abstract—Nowadays we have various languages, framework allowing a greater decoupling between entities. Entities do
and tools for specification and implementation of distributed not need to be aware of each other (space decoupling), or
systems. Nevertheless, developing a robust distributed system event exist at the same time (time decoupling). Examples of
still poses a challenging task. Starting from the definition that
a distributed system is a system where networked components such techniques are: group communication, publish-subscribe,
communicate via passing messages, we wanted to explore and message queues, tuple spaces, etc.
analyse languages for specifying the architecture of such system How entities in a distributed system communicate is de-
(Architecture description languages - ADLs), as well as languages scribed by a protocol. Holzmann in [2] provides a full protocol
for specifying protocols for communication between system’s definition:
components. Our main focus was to find out what are the main
similarities and differences between different ADLs, and between • it defines a precise format for valid messages (a syntax),
languages for protocol specification. • it defines the procedure rules for the data exchange (a
grammar),
I. I NTRODUCTION • it defines a vocabulary of valid messages that can be
Coulouris et al. in [1] define distributed system as one in exchanged, with their meaning (a semantics).
which hardware or software components located at networked Protocol is often most easily understood as a finite state
computers communicate and coordinate their actions only by machine (FSM) [2]. The FSM is 5-tuple (X, I, O, N, M), where
passing messages. This definition covers an entire spectrum • X is a finite set of states,
of systems in which networked computers can be used. These • I is a finite set of inputs,
networked computers may be spatially separated by any dis- • O is a finite set of outputs,
tance. They may be in the same room or even on separate • N is a state transition function (N: I × X =⇒ X),
continents. • M is an output function (M: X × I =⇒ O).
In order to design a distributed system, first we must N and M define a behaviour of FSM. For given input,
understand what are the fundamental building blocks of such the output function will generate the output, and the state
system. In [1], Coulouris et al., state that to understand the transition function will generate the new state of FSM.
distributed systems, we must provide answers to the following There are several other ways of representing protocols, such
questions: as: Petri nets and FIFO nets. Their best attribute is simplicity,
• What are the entities that are communicating in the but as the size of the model increases the readability quickly
distributed system? becomes an issue.
• How do they communicate? Languages analyzed in this paper can be divided in two
• What (potentially changing) roles and responsibilities do groups:
they have in the overall architecture? • Architecture description languages, and
• How are they mapped on to the physical distributed • Languages for protocol specification and validation
382
7th International Conference on Information Society and Technology ICIST 2017
Language Application domain Dynamic reconf. Supports heterogeneous platforms Debugging tools Visualization tools
Conic General Yes Yes Yes Yes
Durra Embedded systems Yes Yes Yes No
Darwin General Yes Yes Yes Yes
POLYLITH Large Ada distributed systems Yes Partiall Yes No
Reality Large Ada distributed systems Yes No No Yes
Rapide General Yes No No No
DREMS ML Real-time distributed embedded systems Yes Yes Yes Yes
TABLE I
C OMPARISON OF ANALYSED ADL S
direct effect on software’s extensibility, modularity, fault- • Component based model - distributed system consists
tolerance, portability, scalability, and many other features of of a collection of modular and reusable components,
quality software solution. with clearly defined interfaces. Common characteristics
As stated in [3], definition of Architecture Description of components can be abstracted into a component types
Languages is still vague, and no formal definition of such which can be used used to create multiple component
languages is provided by the community. However, an Ar- instances, and thus enhancing the reusability of the com-
chitecture Description Language - ADL, can be defined as a ponents.
language that provides features for modeling a software sys- • Separation of structure from behaviour - both structure
tem’s conceptual architecture, distinguished from the system’s and behaviour of systems are described by the separate
implementation. languages. For example, in Conic, the structure of the
In order to better understand ADLs, we analysed following system is described by the Conic Configuration Lan-
languages: guage, while behavior of its components is described by
• Conic[4] - a language-based environment for building the Conic Module Programming Language. In Durra, the
distributed systems. It provides a large set of tools for definition of components and communication channels is
program configuration, compilation, debugging, etc. The separate from the definition of configurations, etc. This
Conic consists of Conic Module Programming Language separation is really beneficial when considering deployed
- used for programming individual task modules, and systems and evolution as it allows us to focus on change
Conic Configuration Language - used for configuration at the component level rather than on some finer grain[6].
description and hierarchical structuring of distributed • Dynamic reconfiguration - mainly due to separation of
components. Darwin is a successor of Conic, and it is still one of the biggest challenges when it comes to
based on π-calculus. distributed systems. Most of the languages from above
• POLYLITH[7] - an environment for rapid prototyping have support for different platforms, though in some cases
of distributed applications, with primitives that aid de- (POLYLITH) that support is only partial.
bugging and evaluation. Lead by the fact that these languages share many common
• Reality[8] - an executable design language for distributed traits, even though they are build for different purposes and dif-
Ada systems. Its main purpose is to enable rapid proto- ferent domains, we feel that a unifying language for develop-
typing of large distributed systems. ing a distributed systems could be of a great benefit. However,
• Rapide[9] - an object-oriented language for designing the there are a few general UML-based solutions like Object Man-
architecture of not only distributed systems, but software agment Group’s SysML1 SysML is a general-purpose graphical
systems in general. modeling language for specifying, analyzing, designing, and
• DREMS ML[10] - a “wide spectrum” architecture design verifying complex systems that may include hardware, soft-
language for distributed computing platforms, based on ware, information, personnel, procedures, and facilities[11].
Model-Driven Development approach. DREMS ML is But, according to [12] domain-specific approach would offer
part of a complex architecture DREMS (Distributed Real- greater benefits than general UML-based approach.
Time Managed Systems), which provides a large set
B. Protocol specification and validation
of tools for modeling, analysis, debugging, testing and
maintenance of distributed systems. In order to understand what are the common characteristics
Comparison of analysed ADLs is shown in the Table I. of languages for protocol specification, we analysed following
Although these languages are used in different domains, the languages:
principles behind these languages are quite similar. We noticed 1 Object Managment Group’s SysML - http://www.omgsysml.org/what-is-
that their most common traits are: sysml.htm
383
7th International Conference on Information Society and Technology ICIST 2017
Language Protocol layering Protocol representation Message format Data types
LOTOS Yes Finite state machine Built-in Abstract data types
Estelle Yes Finite state machine Built-in Built-in
RTAG Yes Attribute grammar notation User-defined User-defined
Mace Yes Finite state machine Built-in Built-in
GAPAL Yes Finite state machine User-defined User-defined
TABLE II
C OMPARISON OF ANALYSED LANGUAGE FOR PROTOCOL SPECIFICATION AND VALIDATION
384
7th International Conference on Information Society and Technology ICIST 2017
385
7th International Conference on Information Society and Technology ICIST 2017
386
7th International Conference on Information Society and Technology ICIST 2017
387
7th International Conference on Information Society and Technology ICIST 2017
aims is to determine the position of the business process “Technologies and platforms for business
modeling integration with software engineering, within process management and workflow
teaching subjects at bachelor study levels. management” – models of business process,
Business process model mapping to software design is tools for implementation of business process
examined within content of higher education subjects, at management solutions. [26]
bachelor level of study (since the students population is “Enterprise Modeling and Business Engineering”
the largest). There are two groups of empirical analysis – “The connection between business
population, and for each there has been selection of IT processes and information technology as well
related study program (information technologies, as static and dynamic modeling are covered
computing, software engineering, business informatics) at through various process and data modeling
technical and economy&IT faculties. This research techniques” [30].
include small population, as per the need for preliminary Within previously presented subjects (that include
status analysis. mapping of business process models to software design
Serbian universities – analysis include population models), it has been shown that short description do not
of 4 faculties (the largest faculties related to provide details on how the mapping is conducted. Only
IT/Business processes field in Serbia): specific software tools are mentioned to be planned to be
University of Novi Sad (Faculty of technical used within the subject teaching process. Next section will
sciences [26], Faculty of economy [27]) and describe previously [7] proposed methodology for
University of Belgrade (Electrotechnical mapping business process model elements to software
faculty [28], Faculty of Economy [29]). design elements.
Worldwide universities – Technical University of
Wien [30], Technical University of Berlin II. THE PROPOSED METHODOLOGY OF MODELS
[31]. MAPPING
Results of analysis show partial (approx.. 20% of In this section, previously proposed [7]
analyzed websites) insufficiency of on-line availability of methodology of mapping elements of business
teaching subject content, which makes impossible to process model (primitive processes and data stores)
conduct deeper analysis. Other websites contain short to elements of software design models (use case
description about each subject content. Different types of
models are covered within various teaching subjects: model – software functions and conceptual data
model) is described.
1. General-named subjects (such as Information
systems development/modeling, Software Business processes are mapped to set of software
development/design, Business Informatics, functions (Figure 1), by using a mapping table where
Software Engineering, Enterprise Information each primitive business process is mapped to set of
Systems, Systems design and simulation) software functions: software functions that support
2. Particular (more precise) names of subjects the primitive business process directly “need to have
(Model-driven software development, software functions”, i.e. “first priority software
Technologies for business process management functions”) and those of additional functionality
and workflow management, Enterprise Modeling (“nice to have software functions”, i.e. “second
and Business Engineering) priority software functions”). Each “directly
3. The most precise subjects with particular method supporting software function” is followed by the set
usage names (Structured-system analysis, of “precondition” software functions (that enable
Object-oriented modeling and design) functionality of the specified software function of
Teaching subjects of first and second category include first priority).
various models (often not particularly specified or
specified as combination of methods). Each data store from business process model is
decomposed by using 2 procedures:
Methods/tools that are specified within subject content
description include: a) Applying rules of relational model
Structured system analysis normalization – by applying 1st, 2nd; 3rd and other
normal forms upon each data store from business
UML and object-oriented models
process model, the result is normalized conceptual or
Static and dynamic modeling relational data model.
Formal methods of modeling and specification b) Aplying decomposition based on syntax
(Petri Nets, predicate logic, propositional logic structure analysis to entities of conceptual data
BPMN, orchestration, choreography model – if each data store structure is described by
MDE (Model Driven Engineering), MDA using specific syntax (special types of parenthesis :
(Model Driven Architecture). {}, [], // <>) in grouping data items, then
Mapping business process models to software design decomposition could be lead with interpretation:
models has not been specified in most subjects, except {} = 1:M , [].=XOR, // - union, <> - strong data
within very few subjects: items cohesion.
“Information systems 2”, including BPMN,
orchestration and choreography [28].
388
7th International Conference on Information Society and Technology ICIST 2017
Figure 1. Mapping primitive business process to set of software functions (according to [7])
389
7th International Conference on Information Society and Technology ICIST 2017
models in their curriculum, but their integration is not Design: the Mapping Approach“, International conference on
supported (or not described at the subject description Applied Internet and Information Technologies" ICAIIT2012,
Zrenjanin, ISBN 978-86-7672-166-5, pp.17-20.
university/faculty website) precisely enough.
[8] H-E. Eriksson, M. Penker: “Business Modeling with UML,
Regarding the third research objective, more precise Business Patterns at Work”, John Wiley & Sons, 2000.
and more comprehensive approach to business-oriented [9] S. Kent: “Model driven engineering”, In: International Conference
software design is described. This approach enables on Integrated Formal Methods. Springer Berlin Heidelberg, 2002.
setting development priorities for delivery (clear p. 286-298.
distinction from “need to have”, “nice to have” and [10] D.C. Schmidt: “Model-driven Engineering”, IEEE Computer,
“precondition” software functions). Including the February 2006, Vol 39, No 2, pp. 25-31.
proposed methodology in IT related universities [11] B. Rumpe: “Agile Modeling with the UML”, In. M.Wirsing et. Al.
curriculum could improve students’ experiences towards (Eds.): RISSEF 2002, LNCS 2941, pp. 297-309, Springer-Verlag
Berlin Heidelberg, 2004.
more complete software design models, based on business
process models mapping. [12] R. S. Aguilar-Saven: “Business process modelling: Review and
Framework”, Int. J. Production Economics 90 (2004) 129-149.
Regarding the fourth research objective, it has been [13] N. P. Dalal, M. Kamath, W. J. Kolarik, E. Sivaraman: “Toward an
shown that students have appropriate skills to do software Integrated Framework for Modeling Enterprise Processes”,
design based on business process model elements. Communications of the ACM, March 2004/Vol 47, No.3.
Empirical results show that students are able to do direct [14] R. M. Dijkman, M. Dumas, C. Ouyang:”Semantics and analysis of
mapping of a primitive business process to software business process models in BPMN”. Information and Software
functions of first priority and second priority, while Technology, 50(12), 2008, pp. 1281-1294.
“precondition” software functions are the least successful. [15] E.E.Jacobsen: “Concepts and Language Mechanisms in Software
Modelling”, PhD Dissertation, Faculty of Science and Engineering
The problem of appropriate mapping of business of University of Southern Denmark, 2000.
process model elements to software design elements is [16] Y-G Kim, S.T. March: “Comparing Data Modeling Formalisms”,
fundamental to support quality of software solution within Communications of the ACM, June 1995, Vo. 38, No.6. pp 103-
enterprise information systems. Therefore, it is of a great 115.
importance to improve higher education in this field by [17] S. Biazzo: “Process mapping techniques and organizational
applying research results that connect business process analysis, lessons from sociotechnical system theory”, Business
elements with software design elements. Even more, Process Management Journal, Vo. 8, No 1, 2002, pp. 42-52.
software industry could be analyzed regarding application [18] J. Barjis: “The importance of business process modeling in
software systems design”, Science of Computer Programming 71
of modeling generally and particularly mapping business (2008), 73-87.
process models to software design. This way, software
[19] C. Ouyang, M. Dumas, W. M. P van der Aalst, A.h.M ter
industry could be improved in quality and software Hofstede: “From Business Process Models to Process-oriented
development process estimations based on the proposed software systems: The BPMN to BPEL Way”, QUTePrints, 2006.
mapping. [20] K. Levi, A. Arsanjani: “A Goal-driven Approach to Enterprise
component identification and specification”, Communications of
REFERENCES the ACM, October 2002/Vol 45, No 10, pp. 45-52.
[1] J. Erickson, K. Lyytinen, K. Siau: “Agile Modeling, Agile [21] K. Namiri, N. Stojanovic: “Pattern-Based Design and Validation
Software Development and Extreme Programming: The State of of business Process Compliance”, In R. Meersman and Z. Tari et.
Research”, Journal of Database Management, 16(4), 88-100, al (Eds.): OTM 2007, Part I, LNCS 4803, pp. 59/76, Springer-
October-December 2005. Verlag Berlin Heidelberg, 2007.
[2] B. Rumpe: “Agile Modeling with the UML”. In: Wirsing M., [22] W.M.N. Wan-Kadir, P. Loucopoulos: “Relating evolving business
Knapp A., Balsamo S. (eds) Radical Innovations of Software and rules to software design”, Journal of Systems Architecture 50
Systems Engineering in the Future. Lecture Notes in Computer (2004), 367-382.
Science, vol 2941. Springer, Berlin, Heidelberg, 2004. [23] M. P. Papazoglou , P. Traverso, S. Dustdar, F. Leymann:
[3] H. Smith, P. Fingar: “Business Process Management: The Third :”Service-oriented computing: State of the art and research
challenges”. Computer, 40(11), 2007.
Wave”, Technical Excerpt Theory, Business Process Management
Initiative, Meghan-Kiffer Press, 2003. [24] T. Gardner: “UML Modeling of Automated Business Processes
[4] D. Georgakopoulos, M, Hornick, A. Sheth: “An Overview of with mapping to BPEK4WS”, Proceedings of the First European
Workshop on Object Orientation and Web Services, pp. 35-43.
Workflow Management: From Process Mdeling to Workflow
Automation Infrastructure”, Distributed and Parallel Databases [25] T. M. Cudnik: “Higher Education and Management Information
Journal, 3, 119-153, Kluwer Academic Publishers, 1995. Systems”, 1977. https://eric.ed.gov/?id=ED140695
[5] Lj, Kazi, B. Radulovic, N. Chotaliya: “Business process [26] University of Novi Sad, Faculty of Technical Sciences,
reengineering impact to n-tier architecture of information system: http://www.ftn.uns.ac.rs/
teaching model”, Conference “Reengineering of business [27] University of Novi Sad, Faculty of Economy sciences,
processes in education”, Cacak, Serbia, 2011, Proceedings, pp. http://www.ef.uns.ac.rs/
218-225. [28] University of Belgrade, Electrotecnical Faculty,
[6] Lj. Kazi, M. Ivkovic, B. Radulovic, M. Bhatt, N. Chotaliya: “The http://www.etf.bg.ac.rs/
Role of Business Process Modeling in Information System [29] University of Belgrade, Faculty of Economy,
Development with Disciplined Agile Delivery Approach”, ICIST http://www.ekof.bg.ac.rs/
2015 5th International Conference on Information Society and [30] Technical university of Wien, http://www.tuwien.ac.at/
Technology, Proceedings, ISBN:978-86-85525-16-2, pp. 455-
493. [31] Technical university of Berlin, http://www.tu-berlin.de
[7] Lj. Kazi, B. Radulovic, D. Radosav, M.Bhatt, N. Grmusa, N.
Stiklica: „Business Process Model and Elements of Software
390
7th International Conference on Information Society and Technology ICIST 2017
Abstract — Hydro informatics systems shall encompass all experts’ knowledge. This dualism of experts’ knowledge
data related to hydro phenomena of specific area or interest. could be the source of various problems. In general it is
Hydro phenomena do not recognize the frontiers or hardly to expect that one person (despite the intelligence
different human interests and only follows the natural lows. and education) could possess adequate knowledge in
As a complex phenomenon hydro resources are affected by domain of researched phenomenon as well as the
various natural forces which could not be predicted and information technology. On the one, side experts for
consequently with various results in their behavior. These researched phenomenon do understand the topics but
impacts can result with different unexpected variations of could not master the informatics system while, on the
hydro resources availability. Striving to manage hydro other side, experts in information technology could not
resources, because of their importance for life and economy, possess the knowledge about researched phenomenon. If
humanity developed different methods of their research. exceptions exist they are very rare and shall not be the
Collecting huge amount of information could facilitate the considered as a rule. This case is illustrated by figure 1.
hydro resources management. However that information
could be burdened by various errors. Problems could be
multiplied if erroneous information is processed with
incomplete or imperfect models. This paper aims to
research errors contained in information propagation
through the models for hydro resources management.
I. INTRODUCTION
Hydro resources are of essential importance for
contemporary civilization [1] and they deserve every
attention in order to remain sustainable and manageable in
future [2]. According to that, introduction of information
technology in management of hydro resources is
understandable and justified [3]. At the same time some
problems in managing hydro resources could arise as a Figure 1. Necessary ingredients for hydroinformatics system
consequence of rising complexity of systems and
knowledge related to them [4]. Hydro resources as a Consolidation both necessary ingredients in order to
complex natural phenomenon are researched mostly by form hydroinformatics system shall be issue of careful
different kind of measurement and treated by various consideration. Balance between possibilities of
models in order to explain and predict their behavior [5]. information technology (IT), level of experts’ knowledge
Whole process of data collecting, classifying and about hydro phenomenon and needs of those for
processing is intertwined with errors or uncertainties [6]. successful management of hydro resources is necessary
Those errors could affect conclusions and decisions with condition.
consequence of possible suboptimal utilization of hydro Data needed to explain any hydro phenomenon are also
resources. This paper aims to research the influence of the subject of different treatment between experts [10].
errors in process of data collecting, classifying and Experts for hydro phenomena treat data as expression of
process trying to describe their possible influence on certain characteristic of researched phenomenon with
decision making in management of hydro resources. concrete meanings, while experts of IT treats data only as
Hydroinformatics as a specific informatics system an object they must manipulate to in databases. So, in
encompasses all measured and available data in digital system of knowledge of phenomenon researches, concrete
form [7]. Also for its functioning, adequate hardware and data have sense only in specific context while in system of
software infrastructure is necessary [8]. And least, but not knowledge of IT expert data is treated as objects which
last: for properly explication of data related to hydro shall be properly stored, processed and displayed. These
phenomena - expert’s knowledge is unavoidable different approaches results in different kinds of
ingredient [9]. Maintenance of hydroinformatics hydroinformatics systems for treating hydro resources
infrastructure, which is complex system by itself, needs [11].
391
7th International Conference on Information Society and Technology ICIST 2017
Structure of data is also crucial issue for hydro - By virtue of data and hydroinformatics system it is
phenomenon understanding [12]. Data are spatially and possible to improve knowledge about hydro
timely distributed in manner to approximate and discretize phenomenon (resource);
researched hydro phenomenon. Also data are not - With the help of knowledge of hydro resources
absolutely accurately determined. management could plan their utilization related to
Finally the management, as phenomenon itself, shall goals and
encompass all hydro resources putting them into - Goals realization influence hydro resources which
predefined goals. Goals, to make view of hydro resources accordingly produces new information.
complexness complete, are usually defined from the
market which rarely observes real capacities but are
mostly led by customers’ desires [13].
Hierarchy in hydro resources management could be
arranged in way where goals are on the top and hydro
resources are in the base of management process.
Management shall have abilities and skills to balance
capacities of hydro resources and goals. The hierarchy of
hydro resources management is illustrated by figure 2.
392
7th International Conference on Information Society and Technology ICIST 2017
393
7th International Conference on Information Society and Technology ICIST 2017
4000 19.
3000 [5] Stevović S, Milošević H, Stevović I, Hadrovic S. Sustainable
2000 management of water resources in Prokletije region. Industrija.
1000 2014;42(1):47-61.
0 [6] Mowrer, H. Todd, and Russell G. Congalton, eds. Quantifying
Jul 14
Jul 27
Feb 22
Sep 17
Sep 30
Dec 4
Apr 27
Maj 10
Maj 23
Okt 13
Okt 26
Dec 17
Dec 30
Jan 1
Mar 6
Apr 1
Apr 14
Jan 14
Jan 27
Feb 9
Mar 19
Jun 5
Jun 18
Aug 9
Jul 1
Aug 22
Sep 4
nov 8
nov 21
394
7th International Conference on Information Society and Technology ICIST 2017
Abstract— Besides cataloguing various kinds of different data repositories using Double Core, Friend of a
publications, a library information system (LIS) can also Friend and Databib Terms. Although some approaches for
catalogue supplement materials including dataset. This is cataloguing research dataset already exist, those
especially important if the LIS is used at an academic approaches cannot be applicable for the specific need of
institution, where the institution librarians should also University of Novi Sad. Academic LISs are very often
catalogue scientific outputs of its researchers. This paper interoperable with institutional repositories, virtual
presents how dataset can be catalogued in library research environments and research information systems.
information systems based on worldwide used bibliographic Besides BISIS library information systems used by
format MARC 21. University of Novi Sad libraries, research management
system at University of Novi Sad (CRIS UNS) has been
also developed on a MARC 21 compatible data model in
I. INTRODUCTION order to enable an easy integration of research
Library information system (LIS) catalogues various management system and academic LIS at faculties of
kinds of publications including supplement materials and University of Novi Sad. The research question is whether
dataset. Cataloguing of research dataset is especially it is possible to catalogue research dataset in the MARC
important if the LIS is used at an academic institution, 21 format and use those MARC 21 records to describe
where the institution librarians also catalogue scientific research dataset for the sake of library information
outputs of its researchers. Cataloguing dataset within LIS systems and the CRIS UNS research management system.
will enhance the reuse of scholarly data and development Authors of paper [20] explain the process and discuss
of Open Science. A diverse set of stakeholders advantages of cataloging the data and adding records for
representing academia, industry, funding agencies, and the data to their integrated library system based on MARC
scholarly publishers designed a set of principles referred 21.
as the FAIR Data Principles, i.e., a guideline for those CRIS UNS is the main result of the DOSIRD UNS
wishing to enhance the reusability of their data holdings. project (http://dosird.uns.ac.rs/). The CRIS UNS system is
Implementation of dataset cataloguing within LIS is in built on CERIF compatible data model based on the
accordance with those principles. MARC 21 format. MARC 21 is used for representing
The MARC 21 formats are standards for the authority data as defined in CERIF (authors, institutions,
representation and communication of bibliographic and projects, and events). Furthermore, MARC 21 is also used
related information in machine-readable form. The for representing data about published results. This
formats are maintained by the Library of Congress in approach provided consistent handling of authority and
consultation with various user communities. There is a lot bibliographic data and the support for multilingual
of library information systems (LIS) worldwide which are content. The thorough use of MARC 21 facilitates data
based on MARC formats including BISIS – a LIS exchange with CERIF and MARC-based systems
developed and used at University of Novi Sad. Three including BISIS LIS, comprehensive bibliographic
versions of BISIS LIS were based on UNIMARC format. reports, and evaluation of research results. We would also
Version 4 includes its own editor for creating like to use MARC 21 format for cataloguing datasets.
bibliographic records in UNIMARC and MARC 21 [17,
18], and mapping from UNIMARC to MARC 21 was II. DOSIRD UNS
defined [19]. The motivation for dataset cataloguing in the
The first faculties in Novi Sad were founded in 1954.
BISIS LIS is improvement of reusability of researchers’
The University of Novi Sad was founded on 28th of June
data.
1960 and it represents an autonomous institution for
Various aspects of cataloguing dataset are main topic of education, science and arts. While software infrastructure
a set of scientific publications. In the paper [21] authors for educational domain of the University of Novi Sad is
represent creating a “data catalog” that describes different well developed, there is lack of software infrastructure for
datasets, big population-level studies, as well as small science and arts. The main goal of the DOSIRD UNS
datasets created by researchers. Authors of the paper [23] project is developing of software infrastructure for
propose an approach to describe security-related datasets research domain of the University of Novi Sad. The
by means of a Dataset Description Language (DDL) and a project has been started in 2009.
Dataset Description Sheet (DDS). Online catalog of
CRIS UNS (Current Research Information System at
research data catalog, Databib [22], is a tool and resource
University of Novi Sad) system is under development
for locating repositories of research data. Librarians and
since the year 2009 [1, 2] and it is the main result of the
other information specialists identify and catalogue
395
7th International Conference on Information Society and Technology ICIST 2017
DOSIRD UNS project. It has been implemented with an by the Library of Congress in consultation with various
intention to fulfill all specific requirements prescribed by user communities. As it is already stated, there is a lot of
rule books of the University of Novi Sad, Provincial library information system (LIS) worldwide which are
Secretariat for Science and Technological Development of based on MARC formats.
Autonomous Province of Vojvodina, and Ministry of A MARC 21 record starts with the leader which is the
Education, Science and Technological Development of first field; it has fixed-length, and contains 24 character
Republic of Serbia. In addition, the substantial positions where the character positions specified the
requirement to be fulfilled by the system CRIS UNS is its meaning of data (code) stored on that position. After the
interoperability with systems possessing large databases leader, the MARC 21 record lists control fields which also
of scientific research results, thus improving the visibility have fixed-length and character positions. After the list of
of the scientific results achieved by researchers from control fields, the MARC 21 record has a list of data fields
University of Novi Sad and therefore raising the rating of which have different structure. Data fields consist of a
the University. maximum of two indicators and a limited set of subfields,
Informational requirements imposed to this system are: which are the carriers of information.
1. Access to the application via an arbitrary modern The MARC 21 format for bibliographic data is
web browser. designed for cataloging different types of material like
2. Researchers entering their references by textual material, computer files, cartographic, music
themselves without the need to be at home on material, as well as datasets. Establishing the link between
any standard for references describing. bibliographic records describing different kinds of
material is possible using the linking entry fields [6]. We
3. Interoperability of our system with diverse
suggest linking of certain scientific publication with
systems containing scientific content such as
CRIS systems, institutional repositories, library related dataset in that way.
information systems, digital repositories, etc. [4,
10, 11, 12] IV. DATASET METADATA FORMATS
4. Capability to search the database containing A research results presented in some publication can be
scientific results [3, 9]. based on a dataset. Dataset should be catalogued and
available for society for at least two reasons. The first one
5. Capability to perform evaluation of scientific
is openness of science which accelerates further
research results following the rule book(s)
development of knowledge based society. Some new
prescribed by the Ministry of Education, Science
research can be conducted using the same dataset which
and Technological Development of Republic of
has been created in some previously published research.
Serbia [13, 14, 15].
The second reason is transparency of science. Each
6. Reporting for the faculties, University, Provincial published research should be repeatable and verifiable. If
Secretariat for Science and Technological some research publication describes methodology, dataset,
Development of Autonomous Province of and results of research, and dataset is not available for
Vojvodina and Ministry of Education, Science downloading, then someone cannot repeat the same
and Technological Development of Republic of research to check the results. This is especially important
Serbia [16]. taking into account that a lot of published researches are
PHD UNS is digital library of dissertations defended at funded by local government or European Union funding
the University of Novi Sad since its foundation. It has programs. Besides preservation of dataset digital files, it
been developing since 2010 at University of Novi Sad, should be also catalogued using some metadata format.
Serbia. The first version was put into operation in 2013. DCAT is an RDF vocabulary recommended by W3C
The digital library of dissertations is a web application (https://www.w3.org/TR/vocab-dcat/). It is designed to
implemented using Java platform and set of open-source facilitate interoperability between data catalogs published
libraries written in Java. Data about dissertations are on the Web, and defines three main classes (Figure 1):
stored in MARC 21 format [7, 8]. Architecture of the
digital library enables an easy integration with library dcat:Catalog represents the catalog
information system as well as easy integration with dcat:Dataset represents a dataset in a catalog.
research information system based on CERIF standard [5]. dcat:Distribution represents an accessible form of
The PHD UNS library stores the entire process of a a dataset
dissertation’s submission and defense. The digital library In addition, specific application profiles for DCAT are
is integrated with the CRIS UNS system in order to create being developed that are relevant for the use of DCAT in
a unified central catalog of all scientific-research outputs certain domains:
published by the University
DCAT-AP is a profile based on DCAT for
describing public sector datasets in Europe.
III. MARC 21
GeoDCAT-AP is an extension of DCAT-AP for
The MARC 21 formats (www.loc.gov/marc) are describing geospatial datasets, dataset series,
standards for the representation and communication of and services.
bibliographic and related information in machine-readable
form. The standard consists of five documents: MARC 21 CKAN (Comprehensive Knowledge Archive Network)
Format for Bibliographic data, MARC 21 Format for is a web-based system for the storage and distribution of
Authority Data, MARC 21 Format for Holdings Data, open data. Furthermore, CKAN makes data discoverable
MARC 21 Format for Classification, MARC 21 Format and presentable, and provides a web page with a rich
for Community Information. The formats are maintained collection of metadata for each dataset, making it a
396
7th International Conference on Information Society and Technology ICIST 2017
valuable and easily searchable resource. The central entity of “core” metadata for a dataset, but it can be even
in CKAN data model is the Dataset entity which can be extended with an unlimited amount of arbitrary additional
linked with the Resource entity (actual data: files, APIs metadata in the form of “extra” key/value associations.
etc.) and with set of metadata. CKAN supports a variety Datasets can be grouped and linked to each other.
Figure 1. DCAT
397
7th International Conference on Information Society and Technology ICIST 2017
398
7th International Conference on Information Society and Technology ICIST 2017
399
7th International Conference on Information Society and Technology ICIST 2017
dragan.ivanovic@uns.ac.rs, joel.azzopardi@um.edu.mt
Abstract— This paper presents the development of a content and collaborative recommendation approaches,
recommendation system component, and its integration into and the use of different similarity functions. The second
the PHD UNS digital library - a Digital Library consisting STSM targeted the development of the actual
of PhD Dissertations from the University of Novi Sad. This recommendation component that uses the best
component is used to improve the PHD UNS user experience recommendation approach and the integrate of this
by providing a list of PhD dissertations recommended component into the operational PHD UNS digital library.
specifically for that user, and also re-ranking search results
based on personalized recommendation scores. The II. PHD UNS
developed component has been put in operational mode on:
http://www.cris.uns.ac.rs/searchDissertations.jsf. The CRIS UNS system (http://cris.uns.ac.rs) is a
Recommendation is performed on the basis of the user’s research information system that has been developed at
download history.A sub-system that is able to capture users’ the University of Novi Sad and has been continuously
explicit and implicit feedback has also been developed. This maintained and updated since 2008 [1, 2, 3, 4]. It was
feedback will be used to evaluate whether the originally developed within the project DOSIRD UNS
personalisation component improves PHD UNS users’ (http://dosird.uns.ac.rs/). The digital library of Ph.D.
experience. dissertations of the University of Novi Sad (PHD UNS)
stores the entire process of a dissertation’s submission and
defense. The digital library is integrated with the CRIS
I. INTRODUCTION UNS system in order to create a unified central catalog of
A recommendation system assists users to find and all scientific-research outputs published by the University
retrieve relevant digital content from a potentially huge [5, 6, 7, 8].
amount of information present within a data source. Given Table 1. Export of data
a large set of digital items and a user's requirements System URL Number
(represented within a user model), the recommendation of
system presents to the user a small set of the items that are exported
well suited to his/her information need. The digital library
of PhD dissertations of University of Novi Sad (PHD
UNS) stores more than 2,000 dissertations in pdf format. NDLTD http://search.ndltd.org/search.php?q 847
Each year this number is extended by approximately 300 =publisher%3A%22University+of+
Novi+Sad%22
new PhD dissertations that have been defended at
University of Novi Sad. Usage of this digital library is
quite substantial, in fact there are over 1,000 logged DARTEuro http://www.dart-europe.eu/browse- 871
search query requests per month pe results.php?institution=578
The motivation for this research is the improvement of
the PHD UNS user experience by providing a NaRDuS http://nardus.mpn.gov.rs/ 837
personalized list of recommended PhD dissertations, and
also by providing a personalized search service that re-
ranks search results through the utilisation of personalized OATD https://oatd.org/oatd/search?q=publi 848
recommendation scores. Recommendation systems sher%3A%28Novi%20AND%20Sa
require the construction (and update) of user models, and d%29
we used the PHD UNS download logs and the PhD
dissertations’ contents for this purpose. OPENAire https://www.openaire.eu/search/dat 825
This research has been implemented within two Short + asource?datasourceId=opendoar___
Term Scientific Missions (STSMs) funded by _::ae1d2c2d957a01dcb3f3b39685cd
KEYSTONE (IC1302) COST action. The objectives of b4fa
the first STSM consisted of implementation and
experimentation with: different recommendation As at 2017, PHD UNS contains metadata concerning
algorithms (both content-based and collaborative-based), 4,877 different dissertations, 1,202 of which have a
the application of Latent Semantic Analysis (LSA) to both
400
7th International Conference on Information Society and Technology ICIST 2017
digital copy of the actual dissertation document stored on message includes information such as: download date
the PHD UNS system. 878 of these latters dissertations and time; titl; and other bibliographic information about a
are published under an open-access license. In order to dissertation. Moreover, each log message includes
improve accessibility of these dissertations, a web information on how the user found the downloaded
interface has been developed that allows users to search dissertation – whether it was through the search interface
within this digital library [9]. Morover, the dissertations’ provide (www.cris.uns.ac.rs/searchDissertations.jsf), or
metadata has been exported to other systems using the whether it was from the web portal of one of networks
OAI-PMH protocol [10, 11, 12] (Table 1). presented in the previous table. An example of a logged
The PHD UNS library also logs any download of digital message is shown in Listing 1.
files associated with these Ph.D. dissertations. A logged
401
7th International Conference on Information Society and Technology ICIST 2017
402
7th International Conference on Information Society and Technology ICIST 2017
403
7th International Conference on Information Society and Technology ICIST 2017
404
7th International Conference on Information Society and Technology ICIST 2017
Abstract— In this paper, we are going to suggest how to necessary data would be available at one place and all
change traditional data storage method within performed operations would have the central influence.
telecommunication company. Before reengineering, data All created tables are properly connected within SQL
was collected on excel spreadsheets, and also in various Server Management Studio. Certain relational tables are
technical and sales applications. The key disadvantage of also created which have the function to join different
this type of data storage is inability to properly connect all tables. Reeingeering of the process of collecting data will
the data. Our goal is to create the database that is going to save the time, which is necessary for getting desired
gather and connect all data in one place. The relational information. It will have an indirect influence on the sales
model is created and it is filled up with real data about department’s performance as well.
business customers. The first version of this database
One of the first definitions of the reengineering discuss
contains seven excel tables. Excel table’s entities are
properly connected with mutual connections. By performing about the fact that reengineering is visible through the
queries on this database, the sales department can gather new way in which the business process functions in a
important information about business customers. We will company [1]. Radical changes of the process should
change the process of collecting data by introducing a new increase the quality of performance in the form of faster
database. Old excel tables are replaced with the relational execution and improvement of the service itself [2].
model. In this way we save the employee’s time because they Reengineering of the process will provide a lot of
have all necessary information in one place. Also, employees benefits. One of the key benefits is described in this paper.
could perform queries on all data, and select only the It enables the creation of consistent and high-quality end-
specific data that meet the given criteria. to-end process flows with opportunities for performance
improvement, and for re-use of existing processes and
systems.
I. INTRODUCTION
It is necessary to come up with business strategy which
Sales department in any telecommunications company enables the rise of company’s profits on one side and
which works with enterprise customers, collects large
customer satisfaction on the other. The tools used for this
amounts of data. As the telecommunication market grows
rapidly, the same is happening with the amount of data purpose can be very complex. The simple example
collected. All these data should be stored in one place in described in this paper shows how we can achieve
order to simplify their further research and processing. significant improvements in process flow with low
While the smaller amounts of data are processed, it is investments. The attention is directed to the end user all
acceptable to store them in excel tables. However, as the the time because it is considered that meeting specific
amounts of data grows, it becomes necessary to find a requirements and user’s needs are leading us to success
more suitable way for data storage. At the same time nowadays [3].
relations between data are becoming more and more The paper describes an innovative relational model for
complicated. Cooperation with the customer involves data collection. Comparing with traditional methods of
several different departments. When the key account collecting and storing data this new model should bring
manager wants to have all information about the customer, improvements. The goal of the creation of the database is
he has to gather information from several different to gather and connect all the data in one place. The
departments. purpose of that is to help sales department to achieve
Sales department, in one telecommunication company, better results.
is trying to overcome this problem by increasing the
number of excel tables for data storage. These tables II. TRADITIONAL METHODS FOR COLLECTING DATA
categorizes data according to the type. Contract numbers ABOUT BUSSINES CUSTOMERS
and names of the companies are stored in one table. The
other table is used for collecting data about customers and A. The structure of data collected
their contacts. Over time, the number of those tables In the beginning there weren’t a lot of information to
increases, and it is no longer possible to have all
be stored. In time the amount of data collected has been
customer’s data at one place.
increased significantly. More complex technical solutions
Therefore, it would be more efficient to reengineer which were implemented for key customers, additionally
traditional forms of storing data so they can be used more
effectively. Reengineering of data collection will lead to increased the amount of data for storing.
the creation of the new modern storage system. All Currently all data about business customer are kept in
one excel table. As the data amount is increasing, this
405
7th International Conference on Information Society and Technology ICIST 2017
way of data storage isn’t enough any more. The data are needed for data reconciliation.
categorized into technical data, financial data and sales In the mentioned telecommunications company the
data. These data categories are distributed in several following customer’s data are collected: name of the
different excel tables. Different excel tables are used by company, VAT number, address, contact person for
different departments (technical, sales, finance and administrative and technical requests. It is also important
marketing) in order to communicate with customers to define all services which customer uses according to
properly. The bigger number of excel tables makes data the type of the infrastructure. After that, it is necessary to
access more difficult. Data processing is also becoming define bandwidth and prices which clients pay for each
more complicated. If one piece of data changes, it is individual service. All contracts signed with customers
necessary to make that change in every single table where should also be listed. The purpose of collecting all these
that piece of data exists. As more departments are data is further processing. Sometimes it will be necessary
involved, mistakes are often made, and a lot of time to list all administrative contacts in order to inform them
about changes in the billing system. The other thing,
which is also necessary is the list of all contact persons
TABLE I. for technical issues, in order to send them the
DATA STORAGE BEFORE REENGINEERING PROCESS
announcement for further maintenance activities. The
Type of the data Usual data storage location important information is contract expiration’s date
MPLS database used by because the key account managers are obligated to
technical support department, initiate negotiations about the contract renewal with the
VAT nummber, company
the sales department and
name customer, a few months before the expiration date. All
finance department that issue
invoices to the customers these information are not available on one place, and each
Data requirement – inability to access them easily time a new list with necessary data has been made from
If there is a change in the name of the company, the key account the beginning.
managers must provide this information to the financial sector The management demands reports in different forms,
immediately and also to the department for technical support. So which is becoming more difficult task for all employees.
the same information must be forwarded to two different sides at These reports usually collect data from different
the same time.
If there is compensation deal with the customer, then it is departments and different excel tables. Table 1 shows
necessary that the marketing department gets this information too. several examples of hindered data access.
Type of the data Usual data storage location
B. Neccesity to change traditional methods for
MPLS database used by
Bandwidth and the type of technical support department, collecting data about bussines customers
service finance department Data managing and analysis have always been the
sales department challenge for every company. In the past, there was a
Data requirement – inability to access them easily system which collected only important data and discarded
Management often requires reports which include these the rest. That kind of system existed because there were
information, but it is usually in very different forms: not any possibilities to store big amounts of data. That is
connected to all other contracts which customer has the reason the data which was considered unnecessary at
connected to all other services in a particular location, etc. that moment, were discarded as useless. Despite the fact
Type of the data Usual data storage location that technology advanced and the new media for storing
finance department unlimited amounts of data were developed, the old
Price
sales department business system continued to exist. The data which didn’t
Data requirement – inability to access them easily provide useful information at the moment were discarded,
Management often makes various reports containing a list of even though the new technological capabilities were
customers with information about bandwidth and prices for the introduced. The data storage itself wasn’t the problem.
services. The real problem was the data analysis, which continued
Type of the data Usual data storage location to become more and more difficult with the increase of
MPLS database data [4].
Technical contact person
sales department
In order to overcome these problems, the relational
Data requirement – inability to access them easily model is created. Now it is possible to store all data about
Management sometimes wants a list of all technical contacts for business customers, even the unstructured ones. It is
all customers. possible to perform different queries in order to select data
Type of the data Usual data storage location according to the various criteria.
finance department
sales department, where every
Reengineering of the process of collecting data and the
Contract signing date and creation of the unique database would have multiple
key account manager keeps
contract expiration date benefits for all employees in business to business sales
record about their own
customers department. After the contract has been signed all data
Data requirement – inability to access them easily about that particular customer would be entered into this
Every month management wants to know which contracts are
database. Later, each key account manager would make
about to expire. That way they can maintain income by offering changes for his customers. In such way all current data
additional services. about business customers would be at one place.
These information could also help targeting the specific In order to demonstrate the usefullness of this database,
customers for different actions it is filled with real data about business customers. These
406
7th International Conference on Information Society and Technology ICIST 2017
data are owned by operator’s sales department. Further, Connection of these two tables is of type 1: N. One
the purpose of queries that are created upon this database customer can have several contact persons, but one
is to gather current information about business customers. contact person is representing only one company. Table
These information should help key account managers and Contact person defines a primary key as the identification
the sector managers to get accurate information regarding number of a contact person and represents internally
existing customers and services they use.
defined number that is unique for each contact person.
Beside the fact that the purpose of the test database is to Table named Contact person contains data: ID of a
make data access easier, it should also enable the data contact person, name and surname, contact address, e-
categorization. All this is done to improve customer’s
mail, fixed and mobile phone. With similar logic, other
satisfaction about services they use. It saves time and
contributes to more comfortable business activities. tables are defined and connected.
Using relationships between these tables various
In order to have a better view of all advantages of this
queries could be set. These queries will be applied to all
new database, a particular query will be made and the
results of such query will be presented. data contained in the database. The results include all
data from the database that meet the required criteria.
III. THE NEW METHOD OF DATA COLLECTION Fig. 1 presents one simple request that combines these
tables and gets unique output. The output is in the form of
We are going to describe the experimental relational a table that contains the desired data.
database which is merging all the tables in a unique This example shows all contact persons from the
database. The idea is to enter data only once and to make companies which contain word “private” in their
it accessible to all employees in desired format. The description. An additional requirement select only outputs
output data can be presented in different ways. It depends where field “Compensation” is empty, which means that
on the current demands of database users. Data there is no compensation deal with that particular
integration saves employees’ time. Also, new database customer. This list of all contact persons from the private
significantly increases accuracy of the data that are sector companies can make the data search easier. In
coming as the results of the different queries. All these addition to that, refined search can be done, and within
improve efficiency of the process of collecting data, and the private sector only insurance companies can be
reduce the possibility of error to a minimum. Simulation selected. The table which comes as the result contains the
includes some key data related to the business customers. following columns: type of the company, description,
Key tables included in this simulation are joined in the name and surname and mobile phone number. Beside
SQL Server Management Studio. Type of relationships these columns, we can add more columns such as
between the tables are 1: N or N: M, depending on the contract numbers and service specifications. This
relations between entities. possibility would make further operator’s actions easier.
The first step of the process of collecting data about
the new customer is to enter all necessary data in the table SELECT TC.[Type of the company], TC.description,
named Customers. This table contains data: identification CP.[name and Surname], CP.Mobile_phone AS 'Mobile
number, customer name, tax identification number, phone'
company address and identification number of the type of FROM Customers C, [Type of the customer] TC,
Contact_person CP
client. Tax identification number is the unique WHERE
identification of the customer. One customer with his tax C.ID_Type_of_the_customer=TC.ID_Type_of_the_Customer
number may sign several different contracts. For that AND C.ID_customer=CP.ID_customer
reason it is necessary to introduce an artificial primary AND TC.Compensation IS NULL
AND TC.[Type of the company] LIKE 'private%'
key that is different for each customer and that is the
customer's identification number. This table contains the Type of the Name and
addresses of the company headquarters. Customer may company Description Surname Mobile phone
have several different addresses on which he has Đorđe
delivered different services. Private sector Recycling Uskoković 060/817 10 11
External Key for table named Customer is at the same Private sector protective gear Vladimir Tustić 062/800 98 58
time identification number for type of the customer and
represents the primary key for the table named Type of Private sector production Dragan Ilić 065-927-3018
the customer. These two tables are connected with the
connection type 1: N, which means that one customer can Figure 1. Request „Private sector“
only be categorized as one type (private sector, foreign
representative offices, embassies, etc.). While one Suggested database can be used in different situations.
customer’s type can have multiple associated customers. For example, we can list all insurance companies with
Table named Type of the customer contains data: the their contact person. Sometimes we need that information
identification number for type of the customer, which is quickly in order to reach the customer before the
also primary key, type of company, size of the company, competition company. If we become aware of a new
company description and the existence of partnerships. regulatory policy about insurance companies we want to
For table Customers, through its primary key - use this databse to quickly get contact persons from all
Customer ID number, associated is table named Contact copanies that meet desired criteria.
person for which this number represents an external key.
407
7th International Conference on Information Society and Technology ICIST 2017
One specific case discuss the situation when there was Expectations regarding the further development of the
requested that all insurance companies need to have telecommunications market are relatively consistent. It is
disaster recovery location. The features of this database expected expansion in demand for Internet services. It is,
were very important in this situation. In a matter of also, expected increase in investment in development of
minutes all customers could be listed, as well as their telecommunications services within the company [5]. This
expectation leads to big confusion among operators who
contracts, services and even the prices they pay. Contact
are not flexible and which are not ready for quick changes.
persons with their phone numbers could also be added.
The unique list would be created, and it would be given Therefore, it is necessary to prepare the operators for
changes. Detailed analysis of all data, will give the
to key account managers, so they can call their customers
information about tendencies of further development.
and offer them disaster recovery location in operator’s
data center. The tendencies of price decrease of Internet services
[6], will also point out the need to be proactive in the way
There are various possibilities for the application of
of providing the telecommunication services to the end
this database. For example, we could list all business users. When the customer is satisfied with the service and
customers whose contracts expire in March or April or the way that is treated, never breaks cooperation with the
May. Furthermore, using those search results, the sales operator.
department can make plan about communication with
customers in order to renew contracts. In that way, the V. CONCLUSION
detailed search for customer’s information and their
The rapid growth of telecommunication market is very
services, can always be up-to-date. The best solutions can
stressful, but it is also a big challenge for
be offered when it is necessary to review the whole
telecommunication operators. It is necessary to keep up
situation on the specific location, or when it is observed
with changes, which are fast and permanent. Every
the particular type of customer or the particular service.
customer’s demand should be considered with expertise
Based on all reliable information strategic plans about
and the understanding of the big picture.
how to adjust the technical solutions for customers could
Once again, the emphasis is on containing all related
be made. If we go further, we can see the tendencies of
data in one place. However, personal satisfaction as well
price decrease and the increase of demand for the links
as competency to fulfill the requirement should not be
with greater bandwidth.
ignored. The existence of the analysis which points out to
The purpose of reengineering the data storage is the
market’s tendencies will contribute to strategic planning
better usage of the same data. The fact is that in different
and anticipation of customer’s needs in the future.
queries all the data can be included and that couldn’t be
Filtering data from a single database can be easily and
done before. The situation in which the technical service
can quickly get the desired results. The usage of these
data are kept in technical department and are available
results will influence directly on the sales activities of the
only on request, means that previous analysis couldn’t
sales department for business customers. It will shorten
reach all information about the services.
the time required for obtaining information about users.
This database could be changed and we can create
Also, improvement of the customer satisfaction due to a
additional tables and columns if it is necessary for the
proactive approach is expected. Analysis of existing data
activities of the department which uses it as a tool.
allow the prediction of future technical solutions for the
Knowing the fact that the telecommunication market is
various projects. Also, gives us a more clear insight into
rapidly growing and changing, the flexibility of the
the current situation of services used by business
suggested database presents its main advantage.
customers. Expected is future development of this
Development and adjustment of the database to the needs
database in accordance with the needs of the sales
of the market can be the subject of further analyses.
department. Customizing the user's preferences and the
IV. THE ADVANTAGES OF THE NEW SOULTUON employees need is a key priority for the further
development of the database.
Preliminary data have been collected and entered into
several tracking systems to make it accessible to everyone. REFERENCES
Any modification of data means a change in all places
[1] M. Hammer and J. Champy, “Reengineering The Corporation”,
where such data is kept. Data entry errors and the discord 2003
between the data wasted a lot of employees’ time. The [2] N. Gospić, G. Đukanović, A Đurović “Reinženjering procesa u e-
unique database that is suggested is easy to use because it komunikacijama”, Univerzitet u Beogradu Saobraćajni fakultet,
keeps all the information in one place. Changes that are 2015.
entered automatically are displayed on all the following [3] S. Kujović and D. Dulović, “CRM in telecommunication
outputs. Implementation of a centralized database will company”, INFOTEH-JAHORINA Vol. 10, Ref. B-I-7, p. 95-99,
minimize the possibility of mistakes. March 2011.
Through the new database, employees will always be [4] Judith Hurwitz, Alan Nugent, Dr. Fern Halper, Marcia Kaufman,
Big Data for dummies, 2013.
able to obtain the required information about business
[5] R. Pavlović, Big Data i poslovna inteligencija, INFOTEH-
customers. It will also save time and improve business JAHORINA Vol. 13, March 2014.
processes. Reports prepared for management are now easy [6] Ž. Kočović, Forecasts of development of ICT in Serbia to 2016,
to manage and have a variety of features. INFOTECH 2016 ICT Conference & Exhibition
408
7th International Conference on Information Society and Technology ICIST 2017
Abstract—This paper describes basics of the Invenio Systems (CRISs), is used for data exchange from
institutional repository and CRIS systems and their data scientific-research domain and can be utilised in IR
models. The result of this research is mapping scheme of the domain. It is reasonable to assume that some mappings of
data from Invenio to the CERIF standard. the IRs data to the CERIF model needs to be proposed in
order to solve the problem of the data exchange among
I. INTRODUCTION diverse IRs.
In this paper the scheme for mapping data from Invenio
Nowadays, there is a trend for knowledge
dissemination which consequently includes preservation IR to CERIF format is proposed. Specifically, this paper
of publication and accompanying resources. Therefore, will address the mapping of different kinds of
one of the most important tasks is how to preserve and publications and other recourses that are related to them
make that data accessible and interoperable. One of the from the Invenio to CERIF model. That scheme can be
most important tasks is how to preserve and make that used as a guideline, supporting the exchange between
data accessible. Institutional Repository (IR) can solve the Invenio repositories and CRIS systems. Motivation for
mentioned issue. In [1], an IR is addressed as an electronic this work was also to extend and improve research from
system that captures, preserves and provides access to the [10] [11] [12].
digital work products of a community. The three main
objectives for having an institutional repository are: II. INVENIO IR
1. creating global visibility and open access for an The Invenio is one of the most popular IR solutions and
institution's research output and scholarly it is used by the CERN and other research institutes which
materials. are outside the CERN such as SLAC National Accelerator
2. collecting and archiving content in a "logically Laboratory, Fermilab, and the École Polytechnique
centralized" manner even for physically Fédérale de Lausanne. The Invenio is a free software suite
distributed repositories. enabling client to run his own digital library or document
3. storing and preserving other institutional digital repository on the web. The technology offered by the
assets, including unpublished or otherwise easily software covers all aspects of digital library management,
from document ingestion through classification, indexing,
lost ("grey") literature. and curation up to document dissemination.
Firstly, there had been only several implementation of the
The Invenio is a completely an open source software
IRs with limited set of features, which later became
library management package that provides the tools for
useful and world-wide popular tools such as Digital management of digital assets in an institutional repository.
commons and ContentDM [2]. It is true that The software is typically used for open access repositories
improvements in the field of WEB technologies for scholarly and/or published digital content and as a
contributed to rapid development of IRs around the digital library. Actually, it is a free software licenced
world. The availability of open-source technologies affect under the terms of the GNU General Public Licence
on the rapid development of IRs worldwide, particularly (GPL). This provides a straight-forward advantage for
among academic and research institutions. Thus, it is not institutions with smaller budgets, that have programmers
surprising the existence of several open-source software on their staff. There is a possibility to get commercial
platforms available for developing IRs such as Invenio support in case of interest. Prior to July 1, 2006 the
[3]. Greenstone (GS) [4], EPrints [5], DSpace [6], package was named CDSware, then it was renamed to
Fedora [7] and SobekCM [8]. Although IRs had been CDS Invenio, and now it is known simply as Invenio. The
latest version of the Invenio is 3.0.1 [13]. The package
used for a long time, they still don’t have a mutually
was recently chosen to be the digital library software of
agreed and standardized representation of their data. some famous national universities around the world. [14]
Consequently, that can cause difficulties in data exchange There are more than 60 registered institutions which are
between diverse IR systems. using the Invenio software. The number of new users is
So, to overcome these problems in the data exchange, one increasing on weekly basis.
of the possible solutions is to rely on some predefined Power of the Invenio is demonstrated in the CERN
standard outside the IR field. Common European repository where the IR manages over 1,000,000
Research Information Format (CERIF) standard [9], bibliographic records in high-energy physics since 2002,
which is the basis of Current Research Information covering articles, books, journals, photos, videos, datasets,
and more.
409
7th International Conference on Information Society and Technology ICIST 2017
Invenio runs on Unix-like systems and requires the CRIS UNS due to the fact that it is REST compatible
Python/Flask web application server, MySQL, [21]. In the Invenio is implemented the package which
PostgreSQL or SQLite database server, Elasticsearch for could exchange data with famous repositories such as the
information retrieval and Redis for caching. The IR has OPENAIRE [22]. Data from this repository could be
great multi-language support in this IR [15]. easily mapped to the CERIF format [23]. Evalution
In the Invenio all resources (documents) are organized metrics of publication can be stored in the Invenio with
into collections [Figure 1]. Collections can be regular the support of packages DataCite and invenio-metrics.
and/or virtual collections which are only important for the This metrics are supported in CRIS UNS model [24].
Invenio GUI. Collections can be customized in order to Information about authors could be easily mapped to
have different web interfaces, workflows and other CERIF [25] because the Invenio has package for acquiring
features. Each collection can be customized for the author’s data based on unique authors’ ids [26] by using
general look and feel of its web pages. Linking rules are invenio-orcid.
defined in order to implement relations between
documents.
Basic entity in Invenio is the record [16] that contains
metadata and may be associated with one or more
documents (the digital content). Document can be stored
in one or more revisions and a revision in one or more
formats. Each record contains a unique identifier and can
store articles, books, theses, photos, videos, museum
objects and more. MARC is the standard metadata schema
used in the Invenio, but other metadata sets can also be
defined. Records can be submitted by an author or a
librarian, through custom and fully configurable web
interfaces. Workflows can be customized to create the
proper steps for submission, review, conversion of
documents, approval etc. Alternatively metadata and files
can be ingested using customized conversion scripts,
harvested from OAI-PMH compatible repositories or sent
by e-mail. Also, this IR could serve as an OAI-PMH Data
Provider and can exports records in MARCXML and
BibTeX format and supports RSS feeds. Typically the
MARC XML is natively used in the Invenio.
The Invenio is a modular framework of official and
independent collaborative component packages. List of all
are available official packages are described in
documentation [13]. To be more precise all packages are
located in two different GitHub organistations. Firstly,
official packages are located in inveniosoftware collection
[17]. The inveniosoftware is a collection of base packages,
core packages, and additional feature packages that are Figure 1 - Invenio data model
maintained in a coherent, homogeneous way by the
Invenio project team. Secondarily, community III. CERIF MODEL
contributuons are situated in a collection of third-party CERIF is a standard that describes data model which
packages extending [18] Invenio functionalities. They are can be used as a basis for an exchange of data from
maintained by contributing teams and may follow scientific-research domain. CERIF Standard describes the
different practices. The packages may also incubate an physical data model [27] and the exchange of XML
experimental or unproven feature that can later mature messages between the CRIS systems [28]. The best
into the main organisation. In the following, authors will feature of CERIF is that it can be expanded and adapted to
address modules that can be used to create and exchange different needs. In practice, CERIF is often mapped to
data from scientific research domain such as publications, other standards that also represent the data of scientific-
books, monoghrah etc. The most important are those research domain, for example CERIF/MARC21 mapping
modules which can be involved in process of exchanging described in [29]. Authors of [30] recommend an
data between the Invenio and CERIF like systems. extension of CERIF that incorporates a set of metadata
The Invenio has numerous ways to import or export required for storing theses and dissertations. Another
records’ data. One of the most popular is Invenio- example is [31] where authors argue how CERIF can be
oaiserver which provides possibility to exchange used as a basis for storage of bibliometric indicators.
information with other OAI-PMH compatible systems. Hereinafter we will present main entities of the CERIF
CRIS UNS has opportunity to export and import PhD data model version 1.5
theses [19] with OAI-PMH protocol. Besides OAI-PMH, Base Entities - represent the core (basic) model
the Invenio has implemented Dcxml for generating XML entities. There are only three basic entities
representation in accordance with Dublin Core format [20] cfPerson, cfOrganizationUnit and cfProject.
which is also supported format in CRIS UNS [10]. The
Invenio has a support for Rest API with Invenio-rest and Result entities - A group of entities which
Invenio-JSONSchemas. This feature could be useful for includes results from scientific research like
410
7th International Conference on Information Society and Technology ICIST 2017
411
7th International Conference on Information Society and Technology ICIST 2017
TABLE I.
GREENSTONE DOCUMENTS METADATA FIELDS
multipl
Invenio Enitities CERIF Entities Used classification
e
Cerif Link Enitity
cfResPubl;cfResPublTitle (cfTtitle)
X
bib24X(tag="245 $a", value)
bib10X(tag="100 $a", value) cfPers; cfPersName cfPers_ResPubl scheme: Person Output
(cfFirstName/cfLastName/cfOtherNa X Contributions; Classificattion:
me) Author
bib26X(tag="260 $b", value) cfOrgUnit; cfOrgUnitName(cfName) cfOrgUnit_ResPubl scheme: Organisation Output
Roles; Classificattion: Publisher
bib26X(tag="260 $a", value) cfPAddr (cfCityTown) cfOrgUnit_PAdrr scheme: Organisation Contact
Details; Classificattion: Postal
Address
bib26X(tag="260 $c", value) cfResultPubl(cfResPublDate)
bibdoc (creation_date, octype, cfMedium cfMediumCreationDate, cfResPubl_Medium scheme: Organisation Contact
docname); bibrec_bibdoc cfMimeType);cfMediumTitle(cfTitle) X Details; Classificattion: Postal
Address
software architecture and its components was done. the adequate CERIF entities. The first column is related to
Functionality of the relevant modules, their the indentified bibrec types from the default installation of
interconnections, and the database is inspected more the Invenio. The second column is reserved for the
closely based on software documentation [13] and source appropriate CERIF entity which in this case is the
code. Moreover, the authors identified available modules cfResPubl. The third column The fourth and the fifth
that support data exchange between the Invenio and the columns are used to provide different kinds of semantics
other repositories with special emphasis on the CRIS in the CERIF model for the cfResPubl by using powerful
systems. semantic layer of CERIF model. The CERIF predefined
Data in the Invenio are stored in the records which are classifications and classifications schemes are used to
in accordance to the MARC 21 standard. The system represent the primary collections of the Invenio. In the
provides data export in the MARCXML [35] that could be following, it is presented how to map the record which is a
beneficial for the systems that are the MARC 21 member of the book collection where a part of the relevant
compatible [34]. Also, the Invenio has a service (Invenio- mapping information is shown in the Table 1. This record
oaiserver) for exchanging data via the OAI-PMH protocol. is represented via appropriate MARC fields (entities from
Despite the fact that the CRIS UNS and some systems the Invenio model). The Book Title is described with the
have support for mentioned standards [19], vast majority Invenio entity bib24X which field tag has value “245 $a”
of the IRs and CERIF compatible systems do not and field value with actual name of the book e.g. “Game
implement the MARC 21 and the OAIPMH. Thus, the of thrones”. The title value will be mapped to the CERIF
Invenio data should be represented in accordance to well- entity cfPublTitle, precisely in its attribute cfTitle. The
known CERIF standard for representing scientific author of the book is stored in the Invenio entity bib10X
research data. which field tag has value “100 $a” and field value with
All the Invenio’s records are represented with bibrec actual name of the author e.g. “Martin, George R.R”. The
entity [Figure 3]. The bibrec is the primary element for author is mapped to the CERIF entity cfPers and its name
is stored in the fields
representing the MARC 21 data. Concrete records’ values
cfFirstName/cfLastName/cfOtherName of the enitity
are actually stored in the auxiliary entities from the
cfPersName. Entity cfPers is linked to the cfResultPubl
bib00X to the bib99X which are keeping the real MARC
21 data, for instance, value of MARC 21 representation via the entity cfPers_ResPubl. The cfPers_ResPubl need
to be classified with the CERIF semantic layer by using
“245$a” as a title of a publication will be stored in bib24X
the scheme Person Output Contributions and the
table. Link between bibrec and some bibXYX is achieved
Classification Author. This semantic organization is very
with entities bibrec_bibXYX. The Invenio records are
hierarchically organized in the collections. There are useful when it is important to distinguish different person
roles such as authors, editors and reviewers. Every book
primary (regular) and secondary collections (virtual) [13]
record usually has the book publisher that is represented
where the primary collections are basic organizational
with the Invenio entity bib26X which field tag has value
types (Books, Theses, Articles, Preprints, Reports,
Pictures …) of how the records are grouped together in “260 $b” and field value with actual name of the publisher
e.g. “Harper Voyager”. The publisher data is mapped to
collections, while the secondary are used mostly for
the CERIF entity cfOrgUnit and its name will be defined
improving web UI and search results. The Collections are
in the cfOrgUnitName where the concrete value will be
not directly connected [Figure 3] to the record as it is
expected. This link between particular record and the stored in its attribute cfName. Entity cfOrgUnit is linked to
collection is established through MARC 21 field 980 and
subfield $a for the bibliographic record. So, all records
from one collection could be retrieved from entity bib98X
where field tag is “980 $a” and field value holds the name
of the collection e.g. “Theses”. The Table 1 presents a
proposal for mapping specific the Invenio record types to
412
7th International Conference on Information Society and Technology ICIST 2017
ACKNOWLEDGMENT
Results presented in this paper are part of the research
conducted within the Grant No. III-47003, Ministry of
Science and Technological Development of the Republic
of Serbia.
REFERENCES
[1] N. F. Foster and S. Gibbons, “Understanding
Figure 3 faculty to improve content recruitment for institutional
the cfResultPubl via entity cfOrgUnit_ResPubl. The repositories,” -Lib Mag., vol. 11, no. 1, pp. 1–12, 2005.
cfOrgUnit_ResPubl need to be classified with CERIF [2] Á. Rocha, A. M. Correia, S. Costanzo, and L. P.
semantic layer by using scheme Organisation Output Reis, New contributions in information systems and
Roles and Classification Publisher. technologies. Volume 1 Volume 1. 2015.
The place of the publication is represented with the [3] “Invenio.” [Online]. Available: http://invenio-
Invenio entity bib26X which field tag has value “260 $a” software.org/. [Accessed: 20-Dec-2014].
and field value with real name of the place e.g. “London”. [4] “Welcome :: Greenstone Digital Library
The data about the location will be mapped to the CERIF Software.” [Online]. Available:
entity cfPAddr and the concrete value will be stored in its http://www.greenstone.org/. [Accessed: 20-Dec-2014].
attribute cfCityTown. The Entity cfPAddr is linked to the
cfOrgUnit via entity cfOrgUnit_PAdrr, providing the [5] “EPrints - Digital Repository Software.”
information about the place of the publication. The year of [Online]. Available: http://www.eprints.org/. [Accessed:
the Publication is represented with the Invenio entity 20-Dec-2014].
bib26X which field tag has value “260 $c” and field value [6] “DSpace | DSpace is a turnkey institutional
with actual year of publishing e.g. “1998”. Publication repository application.” [Online]. Available:
year will be stored to the CERIF entity cfResultPubl, http://www.dspace.org/. [Accessed: 20-Dec-2014].
precisely in the attribute cfResPublDate. Some of the [7] “Fedora Repository | Fedora is a general-purpose,
Invenio records have attached digital resources such as open-source digital object repository system.” [Online].
doc, xls, pdf, image files, etc. The information for the Available: http://fedora-commons.org/. [Accessed: 20-
digital resources in the Invenio is stored in entity bibdoc Dec-2014].
that is linked with the Invenio record via entity [8] “SobekCM : Digital Content Management
bibrec_bibdoc. All the digital resources from the Invenio System.” [Online]. Available: http://sobekrepository.org/.
are represented as the instances of the CERIF cfMedium
entity, in accordance to EuroCRIS suggestion [36], [9] “Common European Research Information
Format | CERIF,” 2000. [Online]. Available:
whereas the value of the bibrec_bibdoc is mapped to the
http://www.eurocris.org/. [Accessed: 18-Jan-2014].
CERIF cfResPubl_Medium entity. The attributes
creation_date and doctype from bibdoc entity are directly [10] V. Penca and S. Nikolić, “Scheme for mapping
mapped to the attributes cfMediumCreationDate and Published Research Results from Dspace to Cerif
cfMimeType of the cfMedium. The attribute docname of Format,” in 2. International Conference on Information
the bibdoc is mapped to the cfTitle attribute of the entity Society Technology and Management, 2012, pp. 170–175.
cfMediumTitle which has a link with the cfMedium. The [11] Valentin Penca, Siniša Nikolić, and Dragan
cfResPubl_Medium needs to be classified with CERIF Ivanović, “Scheme for mapping scientific research data
semantic layer by using scheme Organisation Contact from EPrints to CERIF format,” 2015.
Details and Classificattion Postal Address. [12] V. Penca, S. Nikolić, and D. Ivanović, “Mapping
scheme from Greenstone to CERIF format,” in
V. CONCLUSION Proceedings of the 6th International Conference on
The importance of institutional repositories and CRIS Information Society and Technology (ICIST 2016),
systems for scientific research data is enormous. Making Kopaonik mountain resort, Republic of Serbia, 2016.
data accessible between these systems is unavoidable. [13] CERN, “invenio Documentation.” 07-Dec-2016.
Therefore, this paper presents a mapping scheme for the [14] “Current Invenio Users.” [Online]. Available:
Invenio data to CERIF model where all the Invenio data is http://inveniosoftware.org/showcase.
available to CERIF like systems (CRISs, IRs that support
[15] S. R. Lihitkar and R. S. Lihitkar, “Open Source
CERIF, etc.).
Software for Developing Digital Library: Comparative
The main contributions of this research are: Study,” DESIDOC J. Libr. Inf. Technol., vol. 32, no. 5,
• Proposal for mapping data from Invenio repository to pp. 393–400, Sep. 2012.
the current 1.5 CERIF model [16] G. Pyrounakis, M. Nikolaidou, and M.
• Potential using of Import/Export plug-in for making Hatzopoulos, “Building Digital Collections Using Open
full interoperability between these two systems. Source Digital Repository Software:: A Comparative
Study,” Int. J. Digit. Libr. Syst., vol. 4, no. 1, pp. 10–24,
2014.
413
7th International Conference on Information Society and Technology ICIST 2017
[17] “Invenio Software Official Libraries.” [Online]. [36] Houssos, Nikos, Jörg, Brigitte, and Matthews,
Available: https://github.com/inveniosoftware. Brian, “A multi-level metadata approach for a Public
[18] “Invenio Software Contributor Libraries.” Sector Information data infrastructure,” presented at the
[Online]. Available: https://github.com/inveniosoftware- CRIS2012 Conference, 2012.
contrib.
[19] L. Ivanović, D. Ivanović, and D. Surla,
“Integration of a Research Management System and an
OAI-PMH Compatible ETDs Repository at the University
of Novi Sad,” Libr. Resources Tech. Serv., vol. 56, no. 2,
pp. 104–112, 2012.
[20] “Dublin Core® Metadata Initiative (DCMI)
Format,” 2014. [Online]. Available: http://dublincore.org/.
[Accessed: 11-Sep-2014].
[21] V. Penca, S. Nikolić, and D. Ivanović, “SRU/W
service for CRIS UNS system,” in 4. International
Conference on Information Science and Technology
(ICIST), Kopaonik, 2014, pp. 108–114.
[22] “OpenAIRE.” [Online]. Available:
https://www.openaire.eu/.
[23] “OpenAIRE Guidelines for CRIS Managers.”
[Online]. Available: http://eurocris.org/openaire-gl.
[24] R. Gartner, M. Cox, and K. Jeffery, “A CERIF-
based schema for recording research impact,” Electron.
Libr., vol. 31, no. 4, pp. 465–482, 2013.
[25] Demeranville, Tom, “ORCID and CERIF,” in
euroCRIS Strategic Membership Meeting Autumn 2015,
Barcelona, 2015.
[26] “ORCID Connecting Research and Researchers.”
[Online]. Available: https://orcid.org/.
[27] B. Jörg et al., CERIF 1.3 Full Data Model (FDM)
Introduction and Specification. 2012.
[28] J. Dvořák and B. Jörg, “CERIF 1.5 XML - Data
Exchange Format Specification,” 2013, p. 16.
[29] D. Ivanović, D. Surla, and Z. Konjović, “CERIF
compatible data model based on MARC 21 format,”
Electron. Libr., vol. 29, pp. 52–70, 2011.
[30] L. Ivanovic, D. Ivanovic, and D. Surla, “A data
model of theses and dissertations compatible with CERIF,
Dublin Core and EDT-MS,” Online Inf. Rev., vol. 36, no.
4, pp. 548–567, 2012.
[31] S. Nikolić, V. Penca, and D. Ivanović,
“STORING OF BIBLIOMETRIC INDICATORS IN
CERIF DATA MODEL,” Kopaonik mountain resort,
Republic of Serbia, 2013.
[32] D. Surla, D. Ivanovic, and Z. Konjovic,
“Development of the software system CRIS UNS,” in
Proceedings of the 11th International Symposium on
Intelligent Systems and Informatics (SISY), Subotica,
2013, pp. 111–116.
[33] “Current Research Information System of
University of Novi Sad.” [Online]. Available:
http://www.cris.uns.ac.rs/. [Accessed: 18-Jan-2014].
[34] D. Ivanović, G. Milosavljević, B. Milosavljević,
and D. Surla, “A CERIF-compatible research management
system based on the MARC 21 format,” Program
Electron. Libr. Inf. Syst., vol. 44, no. 3, pp. 229–251,
2010.
[35] “MARCXML: The MARC 21 XML Schema.”
[Online]. Available:
http://www.loc.gov/standards/marcxml/schema/MARC21
slim.xsd. [Accessed: 18-Jan-2014].
414
7th International Conference on Information Society and Technology ICIST 2017
Abstract— In recent years apart from increasing efficiency, to emerge as one of the approaches towards reducing
Governments and Business Organizations have started to carbon footprint. Process reengineering is one of the
give priority towards building a sustainable and pillars of green IT, so combining BPM, in our case
environmentally friendly system. Recently Green business Business process modeling notation (BPMN) 2.0 and
process modeling has also started to emerge as one of the Green IT is one of the ways towards achieving set goals.
approaches towards reducing carbon footprint. Process Despite all the above, in the referent literature there can’t
reengineering is one of the pillars of green IT, so combining be found any business process model for the
BPM, in our case BPMN 2.0 notation and Green IT is one of implementation of regulations, issued by the state for the
the ways towards achieving set goals. We present model of implementation of "Green" technologies (such as Green
business processes, which should contribute to better adapt IT). The aforementioned model would be of benefit to
business processes for public sector with identified steps and local governments to define the expected steps and tasks
tasks for implementation of Green IT recommendations. in the process of implementing regulations on "Green"
technology, which would allow them to easier adjust and
Keywords: Business process model, Green IT, public sector reengineer business process.
I. INTRODUCTION
II. RESEARCH QUESTIONS
415
7th International Conference on Information Society and Technology ICIST 2017
How can Business process modeling influence this goal, levels of business processes in municipalities to those that
and possible solutions? are demanded by Government. Municipalities are
required to create their own action plan, describing a
III. METHODOLOGY strategy to implement these recommendations and
regulations. Action plan needs to be approved by
Government, and after that implement, followed by a
BPM (Business Process Management) aims to design
mandatory quartile report. The final report of techniques
business processes, while maximizing the efficiency of
implemented and new performance values results needs
the organization [3]. Review and analysis of reference
to be submitted to the Government. Government
works found that certain authors have been trying to
performs analyses of compliance with the given
connect BPM and Green IT. Thus, the authors [4] define
recommendations, and plans new steps to further increase
a combination of the two aforementioned approaches,
efficiency of municipalities. Model is given in Diagram
called Green Business Process Management and give a
1.
description of the framework for further research in the
said field. The authors [5] make comparison between
conventional BPM and BPM customized for application
of "Green" technologies, define the overall architecture of
BPM, as well as key performance indicators. The paper
[6] analyzed the possibility of connecting IT components
with BPM, in regard with more efficient use of energy.
Using ER notation conceptual model for integration of IT
components, applications and business processes is
shown. Research Agenda on the topic of Green BPM is
given in [7], which analyzes the different approaches and
their impact on reducing energy consumption and carbon
footprint, as well as the importance of re-engineering of
business processes. The authors [8] also analyzed the
possibility of re-engineering business processes to more
effectively implement "Green" technology, and in their
work pointed out a methodological approach that includes
four phases.
In these works [3] [4] [5] [6] [7] [8] business process
models that could be applied to the implementation of
recommendations for the implementation of Green IT are
not present. In this study, we used the technique for
business process modeling, in order to define the tasks and
taking the necessary steps for the successful
implementation of technology that increase efficiency, in
particular Green IT. Model business processes is
developed using BPMN 2.0 notation.
416
7th International Conference on Information Society and Technology ICIST 2017
particularly important for municipalities, because they are The initial phase of defining the action plan, which
enabled to look at the identified tasks in the process of defines the purpose of the introduction of "green"
implementing recommendations for increased efficiency technology, set goals, determine the criteria and
(in this case the implementation of Green IT selecting the appropriate metrics, all depending on
recommendations) and to adapt their business processes. the indicators that were selected as relevant to the
The model describes the process of implementing the
recommendations of the Green Grid organization, but his process of monitoring performance using the Green
versatility, with minimal changes, can be achieved by IT.
applying to other advisory and regulatory bodies. Screening Phase for the initial situation, using the
selected metrics from the previous phase. Initial
quantitative results need to be compared with the
V. STRATEGY results after the introduction of Green IT.
Fit-Gap Analysis phase, which defines the potential
of the existing IT technology and equipment for the
Information and communication technologies in public
implementation of Green IT (fit), but also the
sector, as well as in whole economy, can in fact have
missing components and technologies for a smooth
significant positive, but also negative effects, in terms of
implementation of Green IT (gap).
environmental impact during its use. Therefore Green IT
Implementation Phase, implementing the
strategy aims to decrease negative environmental impact
recommendations of organization and
of ICT, but also to increase ICT role in public sector
personalization to recommendations through
services and processes. These objectives are in particular
business process reengineering, more efficient use of
to be installed in all processes relating to public
available resources, or the introduction of new
procurement, so that the public procurement system adapt
technologies and resources
to the requirements of the Green IT:
Verification Phase, verification of implemented
changes, adaptation and innovation with the
Use of Total Cost of Ownership metric
measurement of the effects on the business
Set minimum energy standards that each product organization
must meet
In this case, the case of public administration, it can be
The issue of recycling and the reduction of electronic said that information communication technologies
waste is an important issue of Green IT Strategy. The negatively affect the environment because of its pollution
strategy should focus on establishing procedures for and consumption of natural, non renewable resources, but
equipment that in certain business segments represents also affect the increase in efficiency, productivity, and
the excess. enable the provision of services to participants in the
process of cooperation, which would not otherwise be
When it comes to processes of local government strategy available.
should focus on the benefits of ICT for faster sharing of
information, allowing viewing and modification of the Because of this, the first phase of the proposed action
same document, the introduction of paperless systems plan objective is to provide reduction of the negative
work but also use audio, video and web conferencing to impact of ICT, and to revise the role of ICT in processes
reduce travel. and services within the public sector, in order to achieve
rationalization of resources. In this stage the chosen
The strategy should also affirm the requirement for metric is PUE (Power Usage Effectiveness), which
adequate human resources with knowledge of ICT and defines the ratio of the energy consumption of IT
Green IT in order to be able to implement that strategy. equipment in relation to the total energy consumption.
For proper usage of this metric it is important to be able
to correctly measure energy consumption of IT
VI. ACTION PLAN FOR IMPLEMENTATION
equipment, so it can be put in relation with total energy
consumption. Using PUE we can measure and compare
After identifying key stakeholders and developing a data centre energy efficiency with set standards and also
business process model for the implementation of Green compare data centers between themselves.
IT technologies in the public sector, as a next step action
plans specifications are imposed. The Action Plan is of
great importance for successful implementation of Green
IT recommendations. It defines various aspects relevant
to the successful introduction of Green IT technologies. As criteria for efficiency we will use scale developed by
Different authors [9] [10] [11] distinguish different IBM for possible PUE values, according to PUE values
phases of the action plan. In principle, the phases can be of their customers.[9] According to IBM, PUE can range
divided into: from 1.3 to 3.5, where lower value represents better
efficiency. However, in practice[12] we can expect values
from 1.5 to 3.0. Criteria for screening phase should be set
417
7th International Conference on Information Society and Technology ICIST 2017
to 2.0, which in other words mean that 50% of energy appropriate human resources with knowledge of ICT in
consumption in data center is consumed by IT equipment. order to fully implement this action plan.
During iterations of evaluation this criteria should be
lowered. These initial values will be used for comparison
with results achieved after implementation of Green IT. VII. CONCLUSION
After each phase set criteria should be revised and
lowered if possible. Developed Business process model for implementation of
Green IT recommendation is based on real process with
Fit-Gap analyses aims to provide more responsible and goal to implement Green IT into government institutions.
efficient usage of owned IT equipment. As Fit part we Implementation of Green IT is a continuous process, and
recognized that most of Municipalities and Government needs to be done appropriately in order to fully harvest its
offices use too many small printers, and that central benefactions in public administration. In first step we
printers are rarely used. This leads to higher energy developed action plan with several phases to help
consumption, so higher quality printers should be used as implementation of green IT. Criteria we used for public
central printer for department. As big gap we identified administration are based on research, and standards in use
lack of knowledge to use modern IT equipment such as in private sector. These criteria can be reevaluated to fit
tablets, which should be one of the steps towards specific needs. The research could be used beside public
paperless government. Lack of knowledge to use this as well as in other sectors with minor changes.
equipment also corresponds with lack of tablets thus Considering implementation of Green IT continuous
procurement of tablets and providing necessary training process evaluation and verification should be part of each
to use them will provide great positive impact on green it step, and at the end should be able to provide greater
implementation. impact in preserving environment.
418
7th International Conference on Information Society and Technology ICIST 2017
419
7th International Conference on Information Society and Technology ICIST 2017
Abstract— Transportation is an important factor to the business. The transportation cost sometimes determines
development of economy and competitiveness of business. It customers' transaction results in a profit or a loss for the
influences customer’s satisfaction with a particular business. Very important issue is ecological influence of
company. Aside from its importance as an indispensable transport which should be dramatically changed due to
service of supply chain, transportation represents a
the mode of transport and its efficiency. To make
substantial cost for the business and makes essential
influence to environmental performances. To make transport more efficient and to reduce its costs it is
transport more efficient and to reduce its costs it is necessary to have data about different factors related to
necessary to obtain data about different activities and transport. Data base of goods flow must be a platform to
factors of importance for transport. Having this in mind analyze transport activities, give conclusions and
creation and use of goods flow data base must be a platform recommend and implement changes.
to analyze transport activities, give conclusions and At the moment there is no database with data about
recommend and implement changes. There is a need to transport in the Republic of Serbia, at neither national nor
define limited data set which will reflect significant provincial or local level. Consequence of this is that there
transport indicators of the current state and realized
are no data about transport process and its status within
changes during time. Such data will enable correction
measures corresponding to the requests of strategic goals the value chain of services and goods which would enable
defined on national and international level. Database should its analysis. Without such approach there will not be
be available to the certain beneficiary groups with defined improvement, no systematic approach, neither scientific
level of authorization rights. It is difficult, even in developed analysis. Situation in the sector of transport in Serbia
countries to establish a positive intention of economic must be improved and for such a process is a need to
entities to engage in one of these information systems. involve information-communication technology in the
Provincial institutions, local governments, and the state of sector of transport.
Serbia are of great importance in a comprehensive To design database, input and output data should be
collection of data on commodity and transport flows, which defined. Input collects raw data within the organization or
should serve to define a modern and rational transportation from the environment. Processing is the process of
system that will enable sustainable development of the forming data and bringing them into an understandable
economy in the territory of AP Vojvodina, and thus within form. Output transfers processed data to clients who use
the whole country, too. The paper proposes limited data
them or to activities that will have benefit of them.
which are significant to follow goods flow and which should
Information system requires a feedback which is output
be elements of information system and organizational
that is returned to a number of members in the
structure within Republic of Serbia which will enable
organization to help them to assess or to correct input
collecting of data, their analysis and dissemination.
form of a document.
Keywords: transport, supply chain, database, There is a need to define limited data set which will
give reflection of important issues of transport and its
information system, goods flow changes during time. This set should enable correction
measures corresponding to the requests of strategic goals
I. INTRODUCTION in the sector of transport at national and international
level. Database should be available to the certain
Transportation is becoming important fact of beneficiary groups with defined level of authorization
comparative advantage of companies and prices of their rights.
products and services. It functions as an activity that Subject of research presented in this paper are
physically connects the business to its supply chain commodity and transport flows in Vojvodina.
partners which are suppliers and customers of goods. It Problem which is tackled is lack of data about
influences the customer’s satisfaction with a particular commodity and transport flows.
company. It is a service which can’t be avoided within General goal authors wanted to reach is improvement of
supply chain and which is not small part of services' or transport through Vojvodina and creation of sustainable
goods' prices, and as such is an important factor of system of commodity and transport flows.
420
7th International Conference on Information Society and Technology ICIST 2017
Specific goal of this paper is to suggest data groups employing a consequent verification and validation
which must be incorporated into the goods flow database. concept. [7, 8,9].
Transport modeling in general and freight transport
II. LITERATURE OVERVIEW modeling in particular are important tools which explain
There are not many articles tackle the topic of the effects of investments and policies [10].
information system for the goods flow analysis. Some
authors call goods flow as commodity flows. Authors III. METHODOLOGY
from Australia [1] explain that data on commodity flows As one of the most productive way to collect data and
in Australia are poorly covered by official statistics. The information on commodity transport flows is making
analyst demand for many strategic visions a nationwide
information system for monitoring commodity and
picture of commodity flows. Authors described
FreightInfo database which use combined assembly transport flows. If the whole economy, or a significant
techniques: development of secondary data from existing part of the economy would be involved in the generation
national data collections made by experts, the of data in the information system as a partner, then
identification of major flow sources and destinations, and specifying a methodology for collecting data to monitor
a program which enables field data collection from freight commodity and transport costs would not be necessary.
carriers, consignors and consignees, for all transport However, it is difficult, even in developed countries to
modes. establish a positive intention of economic entities to
American authors' [2] research presents a methodology engage in one of these information systems. Provincial
for freight flows estimation along corridors for institutions, local governments, and the state of Serbia are
international trade. They developed a methodology which of great importance in a comprehensive collection of data
disaggregates regional flows from the Federal Highway on commodity and transport flows, which should serve to
Administration’s Freight Analysis Framework (FAF) to define the modern and rational transportation system that
the state level and applied to the State of Texas. They put will enable the sustainable development of the economy
in focus congestion nodes on roadways and railroads and in the territory of AP Vojvodina, and thus the whole
apply the methodology to estimate current and future
freight demand. country. The authors propose the structure of database for
commodity transport flows which covers mode of
Authors from Brazil present a methodology to estimate
transport and specially intermodality issues. They define
freight flows using secondary data within urban sector.
Low efficiency of transport and traffic congestion together the way how to collect data and organizational structure
with lack of policies and problems in decision-making which must be implemented by state institutions and
process are typical for the Brazilian context. The paper which will ensure collecting of right data in the requested
describes new methodological approach for data time, its analysis, drawing conclusions and dissemination
processing in order to support decision-making in Brazil. to beneficiaries and stakeholders.
The results achieved indicate the need for a systematic Determination of a database is dependent on the goals
data collection in order to improve future results [3, 4]. of its configuration and who is intended to use the data
Researchers [5] collated available datasets and from the database.
developed a freight flow model to better understanding The aim of the database for tracking commodity and
of freight movements between countries. This can be transport costs in AP Vojvodina is to monitor the data
used for long-term planning efforts. A simple relevant to the movement of goods within and through
methodology assists to understand links between major Vojvodina, according to various parameters.
district in the country with a similar district in another
country. Existing trade data are base for supply The parameters are:
volumetric data for each country. The model can then • Type of goods,
generate commodity-level corridor flows between • The volume of goods,
African countries, and between them and the rest of the • Hazardous substances,
world. The model enables differation between intra- • Relationships transport,
country rural and metropolitan flows, using a gravity- • Type of Transportation,
based modeling approach as a first step towards • Intermodal units,
developing a freight demand model for sub-Saharan • Special conditions of transport.
Africa. The similar investigations are done by other
authors like [6]. To define structure of the database an survey was
Design and management of Large Logistics Networks performed within 40 important entities in the sector of
(LLN) usually involve model-based analyses of the transport in Vojvodina. Stakeholders and all interested
networks. The usefulness of such an analysis highly parties, and all necessary for giving of so-called input
depends on the quality of the input data, which of course data, in this case, all economic entities in Vojvodina are
should be best possible to capture the real circumstances. these who use:
Authors of this paper developed an advanced procedure
model for a structured, goal- and task-oriented • raw materials for production and supplied them from a
information and data acquisition for the model-based source from the country or from abroad and
analyses of LLN is proposed. This procedure model • placing their products in some markets, at home or
differs from other approaches by focusing on information abroad.
acquisition rather than solely on data acquisition, and by
The data about such subjects should be entered in the
421
7th International Conference on Information Society and Technology ICIST 2017
B. Output data
Output data transmits processed data to people who use
them or to activities that will have benefit of them.
Output data should include the results of the analysis of
data obtained from donors of the input data with goal to
obtain a complete picture of factors in the field of
commodity and transport costs in AP Vojvodina. Output
data are intended to be used by:
Figure 1 Design of subsystem for basic data about
partners-participants in database • Republic administration
• Provincial and local Administration
A. Input data
• National, regional and local Chambers of Commerce
Input collects raw data within the organization or from • National, regional and local development agencies
the environment. • Universities and colleges
• Institutes
Information system of commodity and transport costs
• Republic Institute for Statistics
would have the following subsystems:
• Town Planning Institute
• Subsystem basic data information system partners,
• Institute for Nature Protection, Work Protection
•Subsystem types of goods,
• Civil society organizations
•Subsystem quantity of goods,
• All businesses
•Subsystem "hazardous substances",
• Diplomatic service
•Subsystem-haul transport,
• Organizations for mediation, such as consulting
•The subsystem of the mode of transport,
organization
•Subsystem intermodal units,
• Business incubators
•Subsystem special transport conditions.
• Science and technology parks
Figure 2 shows the shape of table designed to enter
• Clusters.
input data into the different subsystems.
The beneficiary of the output data are expected to use
them to respond to them and to work on improving the
situation in the field of transport and cargo flows in AP
Vojvodina.
Output information system should contain the
following information:
422
7th International Conference on Information Society and Technology ICIST 2017
Example of one output is shown in Figure 4. It presents C. Determination of users of the information system
share of transport quantities due to transport modes in the The users of the information system can have an
territory of Vojvodina. unlimited or limited access to the data. Unrestricted
access to information should be provided to:
• Partners of the information system, provided for certain
types of goods is not required to limit access,
• The organs of state administration,
• Republic Institute for Statistics,
• Provincial authorities,
• Local governments - can limit access to certain data,
• Scientific and educational institutions - can limit access
to certain data.
Other users of the information system should have
limited access, and be charged with an appropriate fee.
It D. Defining methodology for collecting data for
monitoring
Figure 4. Projected output of transport distribution by
Province has an analysis of the flow of goods in their
modes of transport for APV (quantities)
territory carried out by the Faculty of Technical Sciences
in Novi Sad, 2012, entitled: Macro analysis of
commodity and transport costs in the Autonomous
Province of Vojvodina. Provincial Secretariat for
Economy, Employment and Gender Equality has
previously noted lack of this analysis and financed the
production of the said study. The study is of strategic
importance because no other exists and is the only
existing document of this kind in the territory of
Vojvodina. Flows of goods are constantly changing, they
make a living process and therefore it is necessary to
monitor changes continuously. How in Vojvodina at the
moment there is no established system for monitoring the
flow of goods, it is necessary to set up such a system as
soon as possible. The proposal is that the Secretariat for
Economy and Tourism should set up commodity and
Figure 5. Projected output of distribution by type of
transport flows center of Vojvodina which could be
goods for the APV in % (export)
called: Bureau for transport statistics.
423
7th International Conference on Information Society and Technology ICIST 2017
National and regional institutions with local [3] L. Kelli de Oliveira & L. dos Santos Fontes Pereira
governments have great importance for collection of data. “An estimation of freight flow using secondary data: a
The Secretariat should compose a system for all case study in belo horizonte (Brazil)“, City Logistics,
stakeholders to be obligaded to give all necessary data Volume 18, 2014, Issue 2, pp.24-35.
which would be requested as input to the database. [4] J.H, Havenga “Trade facilitation through logistics
performance: The enabling role of national government“,
IV. DISCUSSION AND CONCLUSION Journal of Transport and Supply Chain Management,
2011, Volume 5(1), pp. 123–48.
The paper describes survey which was performed to [5] J., Havenga, Z. Simpson, D. King, E.J., Lanz Leila
help define a pilot set of data which were implemented Goedhals-Gerber, A., De Bod, “Extending freight flow
into the creation of database of commodity flow and to modelling to Sub-Saharan Africa to inform infrastructure
determine input points and output users of data with investments – trade data issues“, Journal of Transport
dissemination system of them. The paper proposes and Supply Chain Management, Johannesburg, 2014,
structure within Republic of Serbia which will enable pp.231-240.
collecting of data, their analysis and dissemination. Such [6] European Commission [Online]. Joint Research
approach will give picture of the state of the art of the Centre, Institute for Prospective Technological Studies,
transport sector and enable political body to bring political Seville, Spain, 2008. Available:
measures which will improve situation and above all, it http://energy.jrc.ec.europa.eu/transtools/ index.html
will give data to all included into the transport value chain [2012, June 14].
to note its role, to understand situation and trends, to see [7] S. Kuhnt, S. Wenzel “Information acquisition for
where the goods are coming from, there origin and where modelling and simulation of logistics networks”, Journal
they are going to, to understand where the transport costs of Simulation, 2010, Volume 4, Issue 2, pp 109–115.
could be reduced and where are possibilities to spread [8] AM, Law Simulation Modeling and Analysis, 4th edn,
their business in this sector. The aim of authors is to arise McGraw-Hill: Boston, 2007.Google Scholar
awareness about real need to constitute such an [9] M. Rabe and B. Hellingrath Handlungsanleitung
information system to analyze goods flow in Vojvodina Simulation in Produktion und Logistik, 2001, SCS
and to finally start to collect data and use conclusions of Publishing House: Erlangen.
analysis. [10] M. Petri, A.Pratelli and G. Fusco “Data mining and
REFERENCES big freight transport database analysis and forecasting
capabilities”, Transactions on maritime science, 2016,
[1] N. Rockliffe, M. Wigan and H. Quinlan, „ Developing pp. 99-110
Database of Nationwide Freight Flows for Australia”,
Transportation Research Record: Journal of the
Transportation Research Board, Volume 1625,
Washington, 2014, pp. 23-40.
[2] A. Bujanda, J. Villa and J.Williams “Development of•
statewide freight flows assignment using the freight
analysis framework (faf3)”, Journal of Behavioural
Economics, Finance, Entrepreneurship, Accounting and
Transport, 2014, 2 (2), pp 47-57.
424
7th International Conference on Information Society and Technology ICIST 2017
425
7th International Conference on Information Society and Technology ICIST 2017
should be carefull if other metrics such as satisfaction, We conducted the experiment in a laboratory setting
effectiveness and time measures counter these results [8]. with a desktop personal computer in a dual monitor setup.
To accurately measure user experience we must also Participants typically use this setup in a work
include other dimensions such as perceived aesthetics and environment. The workplace was also equipped with
emotional response. For emotional response, there are at headphones (regularly used for telephone communication
least three different methods to measure it: physiological with the customers) and a web camera to capture the face
reactivity, behavioural measure and affective reporting. of participants. The session was filmed (combining video
Due to our limited resources, we decided to use affective of the participant, the desktop of the computer, and audio
reporting in a form of a self-reported survey of emotional communication with the participant)
state in a given moment. Different scales are used to We gathered all the participants at the beginning of the
measure different dimensions [9]. [10] introduced a Self- experiment. We explained the first phase of the
Assesment Maniquin scale that was designed to be easily experiment without revealing its goal and dual-phase
applicable to children and non-native English speakers nature. Customer Care agents are accustomed to
without any loss of information. The scale measures three continuous education that covers new features of the
dimensions: arousal, dominance and happiness. system, but they perceived this experiment as an exam of
Aesthetics is another complex topic to measure but is their capabilities. We explained the nature of user
an integral part of user experience. Although aesthetics experience testing, that we are measuring their system and
was already a popular topic of research, a renewed interest not them individually. We also explained that we are not
was found in the end of the 20th century as a consequence interested in performance and the metrics of particular
of studying usability [11]. The concept of aesthetics is participants but in the statistical significance that is
primarily linked to visual appearance but may also include observed over the whole population. We explained what
other characteristics like sound, smell, or taste. It is a data will be gathered and how it will be processed and
multidimensional concept as some studies identify defined the rules of communication (how to declare that a
separate properties that make up aesthetics such as task was finished, that they should be role playing and
complexity, ambiguity, novelty, and familiarity. [12] similar).
showed that a lower number of properties (typicality and When the experimenter started conducting the first
novelty) are enough to describe product aesthetics. Lately, phase of the experiment, he made a quick recap of what
Lavie and Tractinsky [13] have developed a scale with we already explained to each individual. Each participant
two distinct categories named expressive and classical was given 5 minutes to set up the computer (install
aesthetics. “Whereas classical aesthetics refers to design browser add-ons, open all the necessary windows and
characteristics such as orderliness, clarity, and symmetry, enter all the passwords, so they are remembered by the
design based on expressive aesthetics are rather original, application, etc.). After this, participants were isolated in a
creative, fascinating, and refined” [14]. [15] developed a room and communication conducted only via phone.
tool called AttrakDiff 2 to measure four distinct We started the experiment by making a phone call to
dimensions (hedonic stimulation, hedonic identity, the participant. Participants used “soft phones” (a desktop
pragmatic qualitly and attractiveness) and showed that application that they usually use to communicate with the
while hedonic quality and pragmatic quality are customer – 3CX Phone). We then began filming the
independent, attractiveness is determined by both factors. experiment. The experimenter first introduced a General
Survey, followed by a Self-Assessment Manikin (SAM)
III. THE EXPERIMENT survey. The experimenter encouraged the participant to
The goal of the experiment was to compare the select images from the SAM questionnaire that they
helpdesk application that is currently used by Customer identified with.
Service agents (Current System) with the prototype of the The experimenter then informed the participant that
system designed on the feedback from the current system they would begin the tasks. The tasks for phase 1 of the
(the Prototype). We prepared a set of hypothesis we experiment are listed in Table I. Each task was a role-play
wanted to confirm (Table II.) or reject so we can estimate between experimenter and the participant. This means that
if a redesigned system really achieved the goal of reduced the experimenter acted as a customer and requested
cognitive effort and better overall UX.
We conducted the experiment on ten individuals that TABLE II.
form a representative sample. 50% of participants were LIST OF HYPOTHESIS TESTED DURING THE EXPERIMENT
female, 50% have high school education and the other half # Hypothesis description
a university or Master’s degree. Area of education also
differ, with participants ranging from general H1 Prototype usage improved Customer Care agents’ success
(gymnasium) through social (economics) to natural rate.
H2 Prototype usage increased Customer Care agents' time spent
sciences (computer sciences, mathematics). Half the on tasks.
participants have been actively using the Current System H3 Prototype usage increased Customer Care agents' number of
for more than 2 years, while all participants use it at least mouse clicks.
occasionally. Although we used a small sample (10 H4 Prototype usage increased Customer Care agents' number of
people), this is actually the total population actively using mouse movements.
the system. Furthermore, the experiment was designed to H5 Prototype usage reduced Customer Care agents' error rate.
focus on experienced users. The reasoning behind this is H6 Prototype usage improves Customer Care agents' affective
that the system was designed for knowledgeable users and experience.
as such must facilitate quick information retrieval even at H7 Prototype usage improves “subjective perception” of user
the expense of a steep learning curve for new users. experience.
426
7th International Conference on Information Society and Technology ICIST 2017
427
7th International Conference on Information Society and Technology ICIST 2017
All agents made fewer mouse movements while solving ATT 6.00 (SE = 2.66) was statistically insignificant t(9) =
tasks (test for H4 - Prototype usage increased Customer 2.25, p > .025.
Care agents' number of mouse movements.). The average Changes in the attractiveness (ATT) and hedonic
number of mouse movements agents made while solving quality (identity) (HQ-I) scales were statistically
nine tasks in the first measurement was M = 191,577.62, insignificant. Changes on hedonic quality (stimulation)
SD = 56,111.84 and M = 109,834.27, SD = 31,919.40 in (HQ-S) and pragmatic quality (PQ) showed significant
the second measurement. When using Prototype agents improvement between two phases which means I can
made 81,743.35 mouse moves (SE = 18,898.60) less on accept the null hypothesis H7.
tasks. With a paired samples t-test I determined that there
was a statistically significant decrease of mouse V. DISCUSSION AND FURTHER WORK
movements when subjects used the Prototype compared to
the Current System t(9) = -4.33, p = .002, d = 1.37. Comparison between two phases showed an overall
improvement in all parameters for both individual subjects
There was a statistically significant difference and as an average over the whole group. Some of the
between means (p < .025), and therefore, I can reject increases are statistically insignificant due to the small
the null hypothesis and accept the alternative population sample. We measured usability on all three
hypothesis H4. dimensions: efficiency, effectiveness, and satisfaction. All
Six agents decreased their error rate in the second measures showed a statistically significant improvement
measurement of testing for H5 (Prototype usage reduced or insignificant change. The average SUS score rose 15
Customer Care agents' error rate). The average error rate points between phases from 60 to 75 and is statistically
of first measurement was M = 1.50 SD = 1.50. In the significant. The score is above industry average [7, 17].
second measurement the average error rate was lower M = We believe that the increasing confidence throughout the
1.10, SD = 1.45. To investigate if the difference in error study is also one of the reasons for a better overall SUS
rate was statistically significant, we used nonparametric score achieved with the Prototype. In total, agents used
test (Wilcoxon signed-rank test) and assumed that the 22% less time on tasks, 61% fewer mouse clicks and
distribution of error rates was symmetrical. A Wilcoxon moved the mouse 42% less than when working with the
signed-rank test determined that there was a statistically Current System. I believe that this finding clearly shows
insignificant decrease of error rate when subjects used the that we made correct design decisions based on analysis of
Prototype compared to the Current System p >.025. the first phase of the experiment. This result is also the
Due to the statistically insignificant change in error greatest success of the study. The central challenge of this
rate, we can neither confirm nor reject the null thesis is to design a system that would reduce the
hypothesis H5. cognitive effort required by the Customer Care agents,
We tested hypothesis H6 (Prototype usage improves thus freeing them up to focus on customers’ issues. As the
Customer Care agents' affective experience) with the use measure of cognitive effort is by definition linked to user
of Self-Asessment Manikin. Affective experience with interactions and time on task [7], we can simply rely on
this tool is measured with three dimensions – arousal, efficiency measures to provide an estimate of cognitive
dominance and pleasure. The average level of all three effort. These results clearly show that the features of the
dimensions slightly rose. In the first measurement average Prototype are exactly the features that played a major role
level of arousal was 2.33, dominance 3.33 and pleasure in decreasing the cognitive effort.
3.67. In the second measurement average arousal level Self-Assessment Manikin tests showed insignificant
was 2.60, dominance 2.60, and pleasure 4.00. changes in agents’ mood. Even though tests were
All changes were positive, but statistically performed before and after all the tasks through both
insignificant, which means that I can neither confirm phases, emotional state on all three dimensions remained
nor reject the null hypothesis H6. similar. I believe that this is due to a small sample size.
When analyzing the AttrakDiff 2 scale, we were interested
I investigated four aspects of subjective perception of
in pragmatic quality of the product, hedonic stimulation
the quality of user experience while testing for H7
and attractiveness factors. In my case, we see a
(Prototype usage improves “subjective perception” of user
statistically significant rise in average values when
experience.) – pragmatic quality (PQ), hedonic quality
comparing Current System to the Prototype. This finding
(HQI, HQS) and attractiveness (ATT). Average levels of
is in line with the effectiveness and efficiency usability
all four dimension were higher when using the Prototype.
measures. Thus, I can conclude that agents found the
Average PQ level rose from M = 29.30, SD = 1.95 to M Prototype easier to use than the Current System. I found a
= 34.10, SD = 4.36. The increase of average level of statistically insignificant increase in the Attractiveness
pragmatic quality 4.80 (SE = 1.79) was statistically dimension between two phases. I also found that
significant t(9) = 2.68, p = .024, d = 0.85. pragmatic and hedonic qualities have an impact on
Average HQI level rose from M = 29.00, SD = 4.19 to attractiveness, which is also in line with other studies like
M = 35.80, SD = 5.43. The increase of average level of [18]
HQI 6.80 (SE = 2.72) was statistically insignificant t(9) = We plan to further develop the Current System and
2.50, p > .025, but close to the borderline ( p = .034). upgrade it with tools that were implemented in the
Average HQS level rose from M = 27.40, SD = 1.73 to Prototype. We also plan to conduct additional experiments
M = 3.40, SD = 1.41. The increase of average level of when we complete these features to determine if changes
HQS 8.00 (SE = 2.34) was statistically significant t(9) = have the same effect as in this study. We also plan to
3.42, p = .008, d = 1.08. implement features that were requested by agents when
Average ATT level rose from M = 30.70, SD = 1.27 to testing the Prototype (additional options for dig tool,
M = 36.70, SD = 1.73. The increase of average level of options to hide expired services, etc.). The next step in
428
7th International Conference on Information Society and Technology ICIST 2017
developing this product is the implementation of some [10] P. J. Lang, “Behavioral treatment and bio-behavioral
kind of natural language-processing text-based interface. assessment: Computer applications,” 1980.
This would allow the agent to search for information and [11] N. Tractinsky, A. S. Katz, and D. Ikar, “What is
conduct certain tasks by writing commands in a natural beautiful is usable,” Interacting with computers, vol.
language. 13, no. 2, pp. 127–145, 2000.
[12] P. Hekkert, D. Snelders, and P. C. W. Wieringen,
REFERENCES “‘Most advanced, yet acceptable’: Typicality and
[1] M. Hassenzahl and N. Tractinsky, “User experience novelty as joint predictors of aesthetic preference in
- a research agenda,” Behaviour & Information industrial design,” British journal of psychology,
Technology, vol. 25, no. 2, pp. 91–97, 2006. vol. 94, no. 1, pp. 111–124, 2003.
[2] J. Forlizzi and K. Battarbee, Eds., Understanding [13] T. Lavie and N. Tractinsky, “Assessing dimensions
experience in interactive systems: ACM, 2004. of perceived visual aesthetics of web sites,”
[3] J. A. Bargas-Avila and K. Hornbæk, Eds., Old wine International Journal of Human-Computer Studies,
in new bottles or novel challenges: a critical vol. 60, no. 3, pp. 269–298, 2004.
analysis of empirical studies of user experience: [14] A. Sonderegger, J. Sauer, and J. Eichenberger,
ACM, 2011. “Expressive and classical aesthetics: two distinct
[4] 9241-11 Ergonomic requirements for office work concepts with highly similar effect patterns in user–
with visual display terminals (VDTs) – Part 11: artefact interaction,” Behaviour & Information
Guidance on usability., ISO DIS 9241-11, 1998. Technology, vol. 33, no. 11, pp. 1180–1191, 2014.
[5] 9241-210 Ergonomics of human-system interaction - [15] M. Hassenzahl, M. Burmester, and F. Koller,
- Part 210: Human-centred design for interactive “AttrakDiff: Ein Fragebogen zur Messung
systems, ISO DIS 9241-210, 2010. wahrgenommener hedonischer und pragmatischer
[6] 25010 ISO/IEC FCD 25010 Systems and software Qualität,” in Mensch & Computer 2003: Springer,
engineering – Software product Quality 2003, pp. 187–196.
Requirements and Evaluation (SQuaRE) – Software [16] M. Hassenzahl and N. Sandweg, Eds., From mental
product quality and system quality in use models., effort to perceived usability: transforming
ISO/IEC FCD 25010, 2010. experiences into summary assessments: ACM, 2004.
[7] T. Tullis and B. Albert, Measuring the user [17] A. Bangor, P. Kortum, and J. Miller, “Determining
experience: Collecting, analyzing and presenting what individual SUS scores mean: Adding an
usability metrics. Amsterdam, Boston adjective rating scale,” Journal of usability studies,
(Massachusetts): Morgan Kaufmann, 2008. vol. 4, no. 3, pp. 114–123, 2009.
[8] K. Hornbæk, “Current practice in measuring [18] M. Schrepp, T. Held, and B. Laugwitz, “The
usability: Challenges to usability studies and influence of hedonic quality on the attractiveness of
research,” International Journal of Human- user interfaces of business management software,”
Computer Studies, vol. 64, no. 2, pp. 79–102, 2006. Interacting with computers, vol. 18, no. 5, pp. 1055–
[9] A. Mehrabian, “A semantic space for nonverbal 1069, 2006.
behavior,” Journal of Consulting and Clinical
Psychology, vol. 35, no. 2, p. 248, 1970.
429
7th International Conference on Information Society and Technology ICIST 2017
430
7th International Conference on Information Society and Technology ICIST 2017
the question whether the delay in the case of one-track IV. SIMULATION MODEL OF SUBURBAN PASSENGER TRAIN
train traffic are significantly greater than those of the SERVICES
double track traffic. For a given problem, two models of a section of
III. SIMULATION SOFTWARE OPENTRACK railway line Batajnica - Ovča from Vukov Spomenik to
Ovča were created.
The observed problem is solved by simulation.
Simulation models are developed using software package First model of single track line was created using
OpenTrack. OpenTrack is a dynamic rail simulation current data with some approximations. For this model
software for continuous simulations. As such it simulates we defined routes through the station Krnjača, so that all
the behaviour of all railway elements (infrastructure BG trains which operates from Batajnica to Ovča must
network, rolling stock, and timetable), as well as all the pass first station track, and in opposite direction BG
processes between them. It can be easily used for many trains must pass through second station track.
different types of projects, including testing the stability First model was tested for two different situations,
of a new timetable, evaluating the benefits of different which differ only in a single element. In the first variant
long-term infrastructure improvement programs, and of this model (V1) real train speed from the field was
analysing the impacts of different rolling stock. used. In that case train speed across Pančevački most was
20 km/h. Second variant of this model (V1a) was tested
A. Input data with same elements as previous with only one difference,
Open track manages input data in three modules: train speed across Pančevački most was increased to 50
rolling stock (trains), infrastructure and timetable. km/h. Remaining data about train speed on sections, track
Infrastructure data comprises information about tracks lengths and locations of signals, stations and halts are the
and operation control points, as well as all related same. First model of single track line is shown on figure
information. Users enter input information into these 1.
modules and OpenTrack stores it in a database structure.
Once data has been entered into the program, it can be
used in many different simulation projects.
In this paper main infrastructure data are: the length of
the block sections, the number of tracks in the stations
and halts, and the location of the signals on the network.
B. OpenTrack simulation process
During the simulation, OpenTrack calculates train
movements under the constraints of the signalling system
and timetable. OpenTrack uses a mixed continuous
/discrete simulation process that allows a time driven
running of all the continuous and discrete processes (of
Figure 1. Part of first single track model Batajnica - Ovča
both the vehicles and the safety systems) under the
conditions of the integrated dispatching rules. Therefore, Second model of double track line was created
parameters including occupied track sections, signal according to the reconstructed situation on the field
switching times, and restrictive signal states all influence (figure 2). In this model, station Krnjača has additional
the train performance. two tracks, and station Ovča has three additional tracks.
OpenTrack is a dynamic rail simulation program. A In both models part from Batajnica to Pančevački most
simple way of describing dynamic rail simulation is that has remained unchanged, according to the current
the program decides what routes trains use while the situation, and because of that, we only use line part from
program is running. Dynamic simulation enables users to Vukov Spomenik to Ovča in simulation process. We also
run OpenTrack in a step-by step process and monitor use current timetable for year 2016/2017 for both models.
results at each step in real time data, so that users can
identify problems and develop some new and alternative
solutions.
C. Output data
After a simulation run, OpenTrack offers a number of
evaluations, and software can analyse and display the
resulting data in the form of diagrams, train graphs,
occupation diagrams and statistics.
OpenTrack allows users to present output data in many
different formats including various forms of graphs (e.g.
time-space diagrams), tables, and images. Similarly, users
can choose to model the entire network or selected parts,
depending on their needs.
Figure 2. Part of second model Batajnica - Ovča with reconstructed
double track
Total number of stations and halts in first model are 6,
Vukov Spomenik, Pančevački most, Krnjača most,
Krnjača, Sebeš and Ovča. In model two total number of
431
7th International Conference on Information Society and Technology ICIST 2017
stations and halts are 5 because in earlier situation halt as route occupation, where conflicts can arise, what their
Krnjača most does not exist. Some dead end tracks were cause is and how to manage them.
not included in the model, and train speeds were used Route occupations and conflicts in period from 7 to 8
from current timetable. am for second model are shown on figure 4.
Sections with restricted speed were included in the
simulations, because some of them exist for a number of
years.
The rolling stock data concerns physical characteristics
of trains that are part of the traffic system under
consideration. The timetable data provides information
about planned train routes and their associated time
schedules. For both models, 2016/2017. timetable was
used, all BG trains, local trains for Vršac and freight
trains are included, and so that model could be tested in
more realistic conditions [6]. Defining different types of
trains requires data about train engines. Both models use
three types of engines, 711 for local trains for Vršac, 661
Figure 4. Route occupations and conflicts for model two
for freight trains and 412 for BG trains.
B. Simulation results
V. SIMULATION RESULTS AND DISCUSSION
Average delay per train was chosen as a criterion
A. Simulation parameters and outputs mainly because main goal of this simulation was to
determine the justification for the construction of the
There are number of output files which OpenTrack double track na a given route. Since same timetable was
generates during simulations. Some files contain global used for all simulation models, by comparing data for the
information about whole simulation while others contain same trains it can be concluded how much delays varies
information about particular train, station or track section. from one variant to other.
In that case, purpose of this models was to give us answer In regular operating conditions when all trains are
on question can BG trains operate without delays in both running on time, stoppings in front of the signals are not
situations? For this comparison three criteria were used: planned. If these events occur, they unnecessarily prolong
average delays per train, number of trains with late travelling times and decrease traffic flow. Therefore,
arrivals and departures at stations and halts and their total certain station buffer times are always planned and
delays. All information were obtained from files included in the timetable so that trains would not be
OT_Messages.txt and OT_Delay.delavg. stopped or slowed down in front of entrance signal.
OT_Messages.txt is output file which includes all During simulation, OpenTrack calculates running
warnings and normal messages which appear during times by dividing block length with maximum block
simulation. Warnings includes late arrivals, departures speed which results in a shorter running times than those
and stops at stations and signals for each train, and included in the timetable. There was unscheduled stops in
duration of this conflicts. Also from this file we can front of the signals in all models, but with the obvious
which course changed priority. Average train delays for difference. Number of stops in first model were 108 for
all trains in model can be seen from OT_Delay.delavg. V1, 30 for V1a, and in second model that number was
During the simulation OpenTrack also offers train only 10 stops.
graph in real time, and we can easily compare real Main results obtained by this simulation models refer
timetable with planned one. Train graph for railway to unscheduled stops caused by traffic situation. Special
section Vukov Spomenik - Ovča is shown on figure 3. attention was given to BG train delays on Pančevački
most - Ovča railway line. By analysing of the obtained
results for both models according to the current timetable
2016/2017. justification was examined of the construction
of double track. If the number of freight and local
passenger trains which run at the same time as BG trains
increase, then the construction of the double track would
be justified.
Part of file OT_Delay.delavg edited in Excel is shown
on Figure 5.
432
7th International Conference on Information Society and Technology ICIST 2017
433
7th International Conference on Information Society and Technology ICIST 2017
434
7th International Conference on Information Society and Technology ICIST 2017
Abstract— In this paper, we present a prototype version of (e.g., human resource managers/departments) are tasked
a new software solution that allows its users to with sending job offers on behalf of the company, posting
simultaneously send the same message and set the same advertisements and promoting company on all social
status on multiple social networks. We describe the networks where the company has an account. Posting the
microservice architecture of our solution, where
same status or sending the same message on each social
communication with social networks through their
Application Programming Interfaces (APIs) is handled by a network separately is a redundant, often tedious work.
set of dedicated services, one for each supported social Our goal is to provide users of social networks with a
network. Services in our microservice architecture may be single interface that would offer common social network
used by one or more users, who send requests to the services features for all social networks where the users have
in order to use their capabilities. In addition to the services accounts, such as sending messages, posting statuses, and
for different social networks, there is also a service changing personal information. To achieve this goal, we
orchestrator whose task is to direct user requests to the developed a system that supports working with multiple
appropriate service. At present, our software solution social networks. We designed the architecture of our
supports the following social network sites: VKontakte,
system with the idea of having small services instead of a
Twitter, and LinkedIn. We also give an overview of a new
textual domain specific language (DSL). This DSL is used in single monolithic system and, consequently, opted for
our software solution to specify commands for sending microservice architecture. This kind of architecture is
messages and posting statuses on social networks. more conducive to application development and
upgrading, as individual services may be independently
developed, configured, and tested. Set of dedicated
I. INTRODUCTION services are in charge of communication with social
In recent years, we have witnessed the growing networks, one for each supported social network. Besides
popularity of social networks. A social network is a services for social networks, we also created a service
structure consisting of nodes (individuals or that redirects client requests to the appropriate service for
organizations), as well as links between these nodes [1]. social network. We also created a textual domain specific
In this paper, we use the term social network also to language (DSL) that supports commands for sending
denote a social web site that allows its users to messages and posting statuses.
communicate with each other, express their opinion, post In this paper, we present an application, a prototype
media content. The primary purpose of social networks is version of a new software solution that provides the
communication between people, but social networks may aforementioned capabilities for the following three social
be used also in marketing to promote web sites and network sites: VKontakte1, Twitter2, and LinkedIn3. We
various kinds of service (e.g., selling) [2], [3]. Because of also give a brief overview of the microservice
their increasing influence on daily lives and their large architecture of our solution, where communication with
number of users, social networks are becoming more social networks through their Application Programming
interesting not just to business users, but to researchers as Interfaces (APIs) is handled by a set of dedicated
well. services, one for each supported social network. In
These days most individuals and companies, small and addition to services for different social networks, there is
large alike, use social networks for presenting themselves also a service orchestrator whose task is to direct user
or their brand, because social networks provide them with requests to the appropriate service for a particular social
a wide number of potential customers, much higher than network. The created textual DSL is also presented in this
they would have without social networks [4], [5]. paper.
Individuals may use social networks to promote services The rest of the paper has the following structure.
that they offer (e.g., beauty care services) and express Related work is presented in Section II. In Section III, we
their opinion in order to get more customers or followers, describe the selected social networks, their APIs and data
all that being repeated for each social network where taken from social networks. We present the architecture
these individuals have accounts. Companies use social of our solution in Section IV. Section V contains a
networks to promote themselves and their services and description of the textual DSL. An overview of the
get new employees and customers. Some of the
companies have dedicated employees or even whole
1
departments that are responsible for the company’s image More about VKontakte at: https://vk.com/
2
More about Twitter at: https://twitter.com
on social networks. These employees and departments 3
More about LinkedIn at: https://www.linkedin.com/
435
7th International Conference on Information Society and Technology ICIST 2017
436
7th International Conference on Information Society and Technology ICIST 2017
user being followed does not need to follow back. communicate over the network by using standard
Because of this relationship, some authors do not transport protocols, such as REST, AMQP (Advanced
consider Twitter as social network, than a social media Message Queuing), and JMS (Java Message Service).
[17]. In this paper, we consider Twitter as a social The approach of building applications as sets of
network. smaller parts is not innovative. Similar philosophy may
Like VKontakte, Twitter also provides an API that be found in Unix: “Write programs that do one thing and
allows users to programmatically read and write data. In do it well. Write programs to work together.”8 The
order to use all the capabilities of Twitter API [18], an microservice architecture approach is also very similar to
application has to be registered at the Twitter site for the one named SOA (Service-Oriented Architecture) [24].
developers. As is the case with VKontakte API methods, SOA is an architectural style that uses services in order
almost all Twitter API methods require authentication. to model information contained in the system. SOA also
Twitter uses the OAuth 2.0 public protocol for promotes the use of poorly integrated services to ensure
authentication. By using OAuth, an access token is business flexibility. Both approaches are service-oriented,
obtained. With respect to the way data are accessed, there but SOA is a much broader concept and for this reason
are two types of APIs on Twitter: REST microservices may be seen as a specialization of SOA.
(Representational State Transfer) API and Streaming We opted for the SOA because of its reusability,
API. The result of both REST and Streaming API interoperability, flexibility, and cost effectiveness [24].
methods is an object in JSON format. As with One of the contemporary challenges is to provide support
VKontakte, we did not directly use the official API, i.e., for application upgrading in order to best respond to the
Twitter API, but relied on a .NET library named demands and needs of application users-individuals or
TweetSharp [19] instead. companies. By upgrading we consider adding support for
other social networks besides those already supported or
C. LinkedIn
providing some new functionality within the application.
LinkedIn is a business social network for the people Upgrading an existing service or adding a new service
looking for work and for the employers looking for new should not affect the whole application, but only the
workers. LinkedIn allows its users to set CVs, share service where the change is made. This is in sharp
content, and join groups and sites. contrast to the monolithic architecture, in which any
As is the case with VKontakte and Twitter, LinkedIn change to the system requires rebuilding and redeploying
provides an API that allows users to read and write data. the whole application and not only the affected part. In
In order to use these capabilities of LinkedIn API [20], the case of communication over multiple social networks,
the application has to be registered at the LinkedIn site if there is a need to make changes at one social network
for developers. Also, almost all LinkedIn API methods API (e.g., to change library which works with it), we do
require authentication. LinkedIn uses the OAuth 2.0 not need to make changes at the whole application, but
public protocol for authentication. There are two types of only at the service which is using that specific API.
APIs: REST and JavaScript API. Methods of REST API However, this approach also has some disadvantages,
return a response in the JSON format. In addition to the such as a large number of components and an increase in
JSON format, the response may also be in the XML network traffic [26].
format, but it is necessary to explicitly request it.
LinkedIn has significantly limited the possibilities of A. Microservice Architecture
its API since 2015 [21]. An extremely small number of For the implementation of our microservice
functionalities have remained available, while additional architecture we used Windows Communication
functionalities may not be used unless special Framework (WCF) [27], [28] within Visual Studio
permissions are obtained first. In order to work with Enterprise Edition 2015. The basic components of a WCF
LinkedIn API, we used a .NET library named application are the following three: WCF service, WCF
LinkedInNet [22]. service host, and WCF service client.
Our system architecture (shown in Fig. 1) includes
IV. SYSTEM ARCHITECTURE five services:
In this section, we give a brief overview of the three social network services (VKontakte,
architecture of our solution. The main idea behind the Twitter, and LinkedIn),
architecture of our system was to divide the system into one logging service, and
smaller parts-services. Communication with the social
networks should be handled by set of dedicated services, one service orchestrator.
one for each social network. Each service should have its Communication with the services is done through the
own database and should be independently developed, Hyper Text Transfer Protocol (HTTP), while Simple
configured and tested. Services may be used by one or Object Access Protocol (SOAP) is used for exchanging
more client applications, which in turn could be web or messages between the services. The client application is
desktop applications. This method of building a system is implemented as an ASP.NET web application. All the
known as the microservice7 architecture [23]. Services services, except the service orchestrator, have their own
database. Communication with the database is done
7
The term “microservice” was discussed at a workshop in 2011 as an
8
architectural style and adopted in 2012 [25] Doug McIlroy about Unix philosophy [33]
437
7th International Conference on Information Society and Technology ICIST 2017
438
7th International Conference on Information Society and Technology ICIST 2017
each of the selected social networks or, to be more relation with user credentials on a particular social
precise, should be available using the official API. In network.
order to send uniform messages, we have recognized the Fig. 3 shows the graphical interface for sending
following concepts: first name, last name, birthday and messages. Text is entered in the format described in
city. By using the new textual DSL, the user may specify Section V. Alongside the text area, there is a list of
the message “Hello {FirstName} {LastName}”, where concepts that may be used, such as FirstName and
FirstName and LastName are the concepts that we LastName.
recognized. The names of concepts correspond to the Fig. 4 shows the graphical interface for posting status.
fields of the class Person, which encapsulates information The user enters the text of the status and selects the social
about friends from social networks. It is necessary to put networks on which the status should be published.
each concept between the curly brackets
Moreover, our textual DSL supports detailed text
commands for sending messages and posting statuses. In
order to post status user should enter a command like:
where
typeOfCommand represents the command type
(“status” for status posting and “msg” for message
sending),
socialNetworks represents a comma-separated list
of social network names, and
text represents the text of the desired status.
Figure 3. The graphical interface for sending messages
The command for sending messages is very similar,
but it includes the names of friends to whom the message
should be sent. If the user wants to notify all friends, then
instead of their names, the character “*” should be
entered. For users who are not very familiar with textual
commands, we have supported these commands through
a graphical user interface. Through this graphical
interface, users should only enter the text of their
message in the aforementioned format (more information
about the usage of our solution is presented in Section
VI). After the user chooses the friends from the list of all
friends and enters the desired message, the text of the
message is generated in the background and
aforementioned concepts (e.g., FirstName, LastName
etc.) are replaced by the names of the recipients or by Figure 4. The graphical interface for posting status
other attributes, such as city or birthday. Text generation
is also done for the non-graphical command for message
VII. CONCLUSION
sending. To generate the text, we used a .NET library
named SmartFormat.NET [32]. The application that is subject of this paper was
developed for the needs of the companies and individuals
VI. AN OVERVIEW OF THE SOFTWARE SOLUTION interested in better use of the opportunities that social
This section contains a short overview of the networks provide. The presented application is a
implemented software solution. The application is prototype version of software solution that allows its
designed to support display of friends from all social users to select friends from all social networks where
networks (VKontakte, Twitter, and LinkedIn), sending a users have accounts and send the same message. In order
message to selected friends, posting a status to selected to post a status, users have to enter the text of the status
social networks, getting information about friends, and and select social networks on which they want to post it.
changing personal information. In order to best respond The application implemented supports the following
to the demands of the modern business environment, the social network sites: VKontakte, Twitter, and LinkedIn.
client application is implemented as a web application, In this paper, we also give a short description of the used
using ASP .NET web technology. microservice architecture, where communication with
To be able to use the main functionalities, it is social networks through their APIs is handled by a set of
necessary to log in to the system. Application credentials, dedicated services, one for each supported social
like username and password, do not need to have any network. The system architecture, besides services that
work with social networks, includes two more services:
439
7th International Conference on Information Society and Technology ICIST 2017
440
7th International Conference on Information Society and Technology ICIST 2017
Abstract—This paper presents a comparison of processing highly-parallel architecture [1, 3, 15]. Their
times for different implementations of the singular value computational resources became accessible for
decomposition (SVD) of matrices, performed on multicore performing non-graphics general-purpose algorithms (a
central processing units (CPUs) and manycore graphics technique known as GPGPU – general-purpose
processing units (GPUs). The aim of the paper is to identify
computing on the GPU) only in the last decade. GPGPU
which of the considered computing platforms and
programming environments leads to the fastest computation has been a topic of a very fast-growing research interest,
of the SVD. The results show that the MATLAB with researchers showing significant speed-ups of
implementation, processed in parallel on the multicore CPU, different programs after porting them to GPUs [3, 5, 6, 7,
outperforms all other considered approaches for computing 15]. Further, different programming approaches, such as
the SVD. It can also be concluded that the GPU using C/C++ (or CUDA C on the GPU) combined with
implementation of the SVD, which is part of the Nvidia BLAS/LAPACK (Basic Linear Algebra
CUDA cuSolver/cuBLAS library, does not efficiently use the Subprograms/Linear Algebra PACKage) routines or
computational resources available on GPUs. Therefore, it performing computations in MATLAB, can be used both
would be beneficial to develop a custom GPU
on CPUs and GPUs [7, 8, 11, 12, 15, 16]. Therefore, it is
implementation for computing the SVD, based on some of
the more advanced algorithms for this purpose, which compelling to perform a comparison of different
would better exploit the advantageous features of the GPU implementations of the SVD processed on CPUs and
architecture. Plans for future work include the development GPUs, in order to establish the most efficient method for
of this implementation and its use for efficient latent its computation. These results can then be used for
semantic analysis of large term-document matrices. developing new implementations of the SVD with
improved performance. This can, in turn, help to extend
I. INTRODUCTION the set of problem instances that can be efficiently
The singular value decomposition (SVD) factorizes a handled in practice.
real or complex rectangular matrix into a product of three In this paper, we present a performance comparison of
matrices – two orthogonal and one diagonal [9, 20, 22]. computing the SVD of matrices using different
This method is widely used for problems in areas as programming frameworks for both CPUs and GPUs. The
diverse as linear algebra (e.g. solving homogeneous linear main goal of the presented research is to identify the most
equations, low-rank matrix approximation), statistics suitable environment, out of the considered
(principal components analysis - PCA), natural language computational platforms and programming approaches,
processing (latent semantic analysis - LSA), information for performing the SVD. This evaluation is performed in
retrieval (latent semantic indexing - LSI), signal terms of processing times for different implementations
processing (Karhunen–Loève transform), and machine on CPUs and GPUs. Further, the research offers answers
learning (recommender systems) [9, 13, 17, 18, 22]. Time to the question of how efficiently do different
required for computing the SVD of a matrix is a limiting implementations of the same algorithm use the
factor in many of these practical applications, since the computational resources available on distinct
size of a typical problem instance that needs to be architectures, such as the ones of central and graphics
handled is, currently, of the order of gigabytes. Therefore, processing units.
efficient computation of the SVD is of critical importance The specific problem that motivated the research
for its use in science and engineering. presented in this paper is the performance of latent
Two main contemporary computing platforms, which semantic analysis of large term-document matrices that
are available for performing numerical algorithms such as mainly relies on the fast computation of SVD [22, 23].
computing the SVD of a matrix, are central processing Therefore, the presented research sheds light on the steps
units (CPUs) and graphics processing units (GPUs). required to develop a high-performance GPU
CPUs are still built using the single instruction, single implementation of LSA.
data (SISD) architecture, although they now feature The paper is organized as follows. In Section II we
multiple cores [18]. GPUs, on the other hand, have a present the SVD and its mathematical foundations, as
single instruction, multiple data (SIMD) manycore and well as a short introduction to the use of the SVD for
441
7th International Conference on Information Society and Technology ICIST 2017
latent semantic analysis. Implementations of the SVD on can be factorized by using the SVD as follows
CPUs and GPUs are described in Section III.
Experimental settings and results are presented in Section A UΣV T
IV. The final section of the paper offers main conclusions 1 2 1
and offers some directions for further work. 1 1 6 6 6
2 2 12 0 0 2 1
0
II. THEORETICAL BACKGROUND 0 .
1 1 0 10 5 5
The singular value decomposition (SVD) is often used 2 2 1 2 5
in situations where techniques such as Gaussian 30 30 30
elimination or LU decomposition don’t offer satisfactory
answers [9, 20]. The application of SVD decomposes a A. The Reduced SVD
real or complex rectangular matrix into a product of three The reduced SVD (also known as latent semantic
matrices – two orthogonal and one diagonal. For a matrix indexing – LSI in information retrieval, or latent semantic
A of size M × N, the SVD is the factorization of the analysis – LSA in natural language processing) is a
following form: mathematical method that projects queries and documents
into a space with latent semantic dimensions [22]. It is
A = UΣVT, (1) based on the assumption that there is some underlying
latent semantic structure in the data that is corrupted by
where U is a column-orthogonal unitary matrix of size M the wide variety of used words and that this semantic
× M, Σ is a diagonal matrix of size M × N with elements structure can be discovered and enhanced by projecting
sij = 0 if i ≠ j and sij ≥ 0 in descending order along the the data (the term-document matrix and the queries) onto
diagonal, and VT is an N × N unitary matrix [9, 19, 20]. a lower-dimensional space using the SVD [22, 23, 24]. A
Elements on the diagonal of Σ are the singular values of term-document matrix describes the frequency of terms
A. that occur in a collection of documents. In a term-
Therefore, using the SVD, a matrix A of size M × N is document matrix, columns correspond to documents in
factorized as the collection and rows correspond to terms.
For a term-document matrix A, we can perform an
a11 a12 a1N SVD using (1) as described in the previous section. Some
a a22 a2 N of the computed singular values are very small in
A 21 magnitude and, thus, negligible. Therefore, these values
can be ignored and replaced by zeros. If we keep only K
aM 1 aM 2 aMN singular values, then Σ will contain all zeros, except for
the first K entries along its diagonal. As such, we can
UΣVT reduce matrix Σ into Σk, which is an K × K matrix
containing only the k singular values that we keep. We
u11 u12 u1M
can also reduce U and VT into UK and VKT, respectively
u u22 u2 M
21
[23]. Therefore, A can now be approximated by:
AK = UKΣKVKT.
u M 1 u M 2 uMM
11 0 0 0 The decomposition of A using the SVD and the
0 0 0 reduced SVD, which is obtained by keeping K singular
21
values, is presented in Fig. 1. The following example
illustrates the case of computing the reduced SVD of a
0 0 MN 0 (5×5) square matrix when K = 3.
v11 v12 v1N
v u22 u2 N
Example 2. A given (5×5) square matrix A, taken from
21 . [24],
2 0 8 6 0
vN 1 vN 2 u NN
1 6 0 1 7
Depending on the values for M and N, the matrices U, A 5 0 7 4 0 ,
Σ, and VT can take various shapes [20].
7 0 8 5 0
0 10 0 0 7
Example 1. A given (2×3) rectangular matrix A, taken
from [24],
can be factorized by using the SVD in the following way
3 1 1
A
1 3 1
442
7th International Conference on Information Society and Technology ICIST 2017
Figure 1. Decomposition of a rectangular matrix using the SVD and the reduced SVD.
0.54 0.07 0.82 0.11 0.12 V, and document similarity is obtained by comparing
0.1 0.59 0.11 0.79 0.06 rows in the matrix VΣ (note that documents are
represented as row vectors since we are working with V,
U 0.53 0.06 0.21 0.12 0.81 , not VT). Words are represented by row vectors in U, and
0.65 0.07 0.51 0.06 0.56 word similarity can be measured by computing row
0.06 0.8 0.09 0.59 0.04 similarity in UΣ [22].
III. IMPLEMENTATION
17.92 0 0 0 0 On the CPU, we compute the singular value
0 15.17 0 0 0 decomposition using two programming frameworks -
C/C++ (with Intel Math Kernel Libraries - MKL) and
Σ 0 0 3.56 0 0 ,
MATLAB. The MKL is a library of mathematical
0 0 0 1.98 0 functions for Intel and other compatible CPUs, composed
0 0 0 0 0.35 of BLAS and Linear LAPACK routines, fast Fourier
transform (FFT) algorithms, and a number of other
mathematical operations [12]. MATLAB is an interactive
0.46 0.02 0.87 0 0.17
0.07 0.76 0.06 0.6 0.23
numerical computing environment and a programming
language which can be used for tasks such as numerical
V T 0.74 0.1 0.28 0.22 0.56 . computations, simulations, and data analysis and
visualization [15].
0.48 0.03 0.4 0.33 0.7
On the GPU, we also use two different frameworks for
0.07 0.64 0.04 0.69 0.32
performing the computation of the SVD. The first is
MATLAB Parallel Computing Toolbox and the second is
When we set K = 3, i.e. if we consider only the first three Nvidia CUDA (Compute Unified Device Architecture),
singular values and try to reconstruct the original matrix, extended with the cuSolver programming package
we get: featuring cuBLAS and cuSPARSE libraries [15, 16].
MATLAB’s Parallel Computing Toolbox allows the use
17.92 0 0
of multicore CPUs, GPUs, and clusters, for parallel
Σ 0 15.17 0 , processing of computationally intensive algorithms in
0 0 3.56 MATLAB [15]. It includes special high-level data types,
parallel loops, and numerical algorithms, which permit
concurrent execution of programs. CUDA is a parallel
2.29 0.66 9.33 1.25 3.09
1.77 6.76 0.9 5.5 2.13
programming architecture, framework, and language,
developed by Nvidia, for the purposes of implementing
ˆ 4.86 0.96 8.01 0.38
A 0.97 , and processing general-purpose algorithms on graphics
processing units [16]. It supports a highly-parallel
6.62 1.23 9.58 0.24 0.71
programming model, constructed around the high-
1.14 9.19 0.33 7.19 3.13 throughput, high-latency GPU architecture.
443
7th International Conference on Information Society and Technology ICIST 2017
444
7th International Conference on Information Society and Technology ICIST 2017
Figure 2. Processing times for different implementations of the SVD on the CPU and the GPU.
445
7th International Conference on Information Society and Technology ICIST 2017
[14] C. Manning, P. Raghavan, H. Schutze, Introduction to Information [20] W. H. Press, S. A. Teukolsky, W. T. Vetterling, B. P. Flannery,
Retrieval, 1st edition, Cambridge University Press, 2008. Numerical Recipes: The Art of Scientific Computing, 3rd edition,
[15] MathWorks, MATLAB, available from: Cambridge University Press, 2007.
http://www.mathworks.com/products/matlab/, [21] J. W. Suh, Y. Kim, Accelerating MATLAB with GPU Computing –
[accessed March, 10, 2017]. A Primer with Examples, Morgan Kaufmann – Elsevier, 2014.
[16] Nvidia, Nvidia CUDA Programming Guide, available from: [22] L. Eldén, Matrix Methods in Data Mining and Pattern
http://docs.nvidia.com/cuda/pdf/ Recognition, Society for Industrial and Applied Mathematics,
CUDA_C_Programming_Guide.pdf Philadelphia, 2007.
[accessed March, 10, 2017], version 8.0, January, 2017. [23] A. Thomo, Latent Semantic Analysis Tutorial, available from:
[17] J. Owens, M. Houston, D. Luebke, S. Green, J. Stone, J. Phillips, http://webhome.cs.uvic.ca/~thomo/svd.pdf [accessed March 12,
“GPU computing”, Proc. of the IEEE, vol. 96, no. 5, 2008, pp. 2017].
279–299. [24] K. Baker, Singular Value Decomposition Tutorial, available from:
[18] P. Pacheco, An Introduction to Parallel Programming, Elsevier, http://davetang.org/file/Singular_Value_Decomposition_Tutorial.
2011. pdf [accessed March 12, 2017].
[19] D. Poole, Linear Algebra - A Modern Introduction, 2nd edition,
Brooks/Cole, Thomson, 2006.
446
7th International Conference on Information Society and Technology ICIST 2017
Abstract—This paper investigates the use of reconfigurable cores that can be easily programmed with great flexibility
memory for fast network packet processing. It is proposed towards ever-changing services and protocols.
that this is achieved with the addition of logic to the memory Furthermore, EZChip has introduced the first network
that allows direct access to non byte- or word-aligned fields processor with 100 ARM cache-coherent programmable
found in various packet header formats. The proposed processor cores, [9], that is by far the largest 64-bit ARM
packet header parsing hardware is made flexible by the use processor yet announced. Along with the general-purpose
of FPGA re-configurability and is capable to provide single- ARM cores, this novel chip also include a mesh core
cycle access to different-sized packet header fields, placed in interconnect architecture that provides a lot of bandwidth,
the on-chip memory. The paper elaborates how this solution low latency and high linear scalability.
results with much faster packet processing and significant
The discussed network processors proof that most of
improvement of the overall network processing throughput.
the network processing is basically performed by general-
purpose RISC-based processing cores as a cheaper but
I. INTRODUCTION slower solution, combined with custom-tailored hardware
that is more expensive but also more energy-efficient and
Considering the great expansion of packet-switch faster. If network packet processing is analyzed on
networks, followed by the huge increase of network general-purpose processing cores then it can be easily
traffic, it is expected that Internet throughput will increase concluded that a significant part of processor cycles is
to 1 Tb/s very soon, [1]. As technology advances, network spent on packet header field access, especially when the
connection links are gaining higher capacities, and packet header fields are non byte- or word-aligned. In
consequently, networking hardware experiences many such case, some bit-wise logical and arithmetic operations
difficulties to timely satisfy the link requests for are needed in order to extract the field's value from the
bandwidth, throughput, speed, and delays. packet header.
Assuming that the networking devices (like routers) Network interfaces usually copy packets to a shared
remain the bottleneck for communication in networks, the memory buffer that is available for further processing by
design of fast network processing hardware is still an the processor. This buffer may be upgraded with dedicated
attractive and ongoing field of research. The most hardware to perform the field extraction operations
common network processing hardware that is optimized directly on its output, before forwarding them to the
for packet data processing at wire rates is known as processor. The general idea of this approach is to replace
network processor, [2]. Network processors are defined as the bit-wise logical and arithmetic operations by a special
chip-programmable devices, which are responsible to hardware logic that will extract the packet header fields
perform several network processing operations, including: from the on-chip memory and provide them to the
header parsing, pattern matching, bit-field manipulation, processor. This logic is called “Direct Field Access–DFA”
table look-ups, packet modification, and data movement. module in this paper. The result of using the DFA
Currently there is a wide range of network processor hardware should be a single-cycle memory access to these
architectures, [3] - [5], each using many processing non byte- or word- aligned packet header fields.
engines (PE), hardware assistance (co-processors, The DFA logic is simple to design, provided there is a
functional units), various parallelization techniques, specific packet header format, however the complexity
adjusted memory architectures and interconnection would increase significantly when access for several
mechanisms. Over the last few years many companies different header formats is needed at once. Also, if the
have developed their own network processors, so many DFA is manufactured as an ASIC it cannot be reused for
different network processor architectures have been other header formats. Therefore, in order to overcome
applied. Furthermore, many novel approaches, such as the these problems, we investigate the possibilities for
NetFPGA architecture, [6], or software routers, [7] are reconfigurable hardware platforms such as FPGA boards,
constantly emerging. [10]. These are a very suitable solution for implementation
The most popular network processors, which are in use of header access logic, especially because they create a
today, include one or many homo- or heterogeneous good compromise between performance, price and re-
processing cores that operate in parallel. For example, configurability.
Intel's IXP2800 processor, [8], consists of 16 identical
multi-threaded general-purpose RISC processors
organized as a pool of parallel homogenous processing
447
7th International Conference on Information Society and Technology ICIST 2017
The rest of this paper is organized as follows: Section II In general, network processing software is getting
gives an overview of various networking hardware and closer to the network processing hardware, such as in [16]
software solutions used to speed-up network processing where part of the packet processing tasks such as
and discusses several proposals that are intended to classification or security are offloaded to application-
simplify packet header parsing. Sections III describes the specific coprocessors which are used and controlled by the
proposed DFA logic and explains its ability to allow software. In this way, the coprocessor hardware handles
single-cycle memory access to non byte- or word- aligned the heavy part of the packet processing, at the same time
packet header fields. Furthermore, this section presents leaving more complex and specific network traffic
more details about the operation of the proposed DFA analyses to the general-purpose processor. As follows, a
logic and gives a description of its look-up table that is flexible network processing system with high throughput
used for header field selection. After that, Sections IV can be build. Some researchers also try to unify the view
proves that the proposed DFA hardware module speeds- on the various network hardware systems, as well as their
up network packet processing, by simplifying the access network offloading coprocessors, by developing a
to packet header fields. Section V concludes the paper. common abstraction layer for network software
The conclusion outlines the benefits of the proposed development, [17].
solutions and emphasizes the intended future work. Other approaches make big use of FPGA technology
for packet parsing, as it is very suitable for
II. CURRENT STATE implementation of pipeline architectures and thus ideal for
Given the widespread use of network technologies, it is achieving high-speed network stream processing, [18].
expected that packet parsing, [11], is subject to a vast Additionally, the reconfigurable FPGA boards can be used
amount of research. With the ever increasing network link to design flexible multiprocessing systems that adjust
speeds, most research is focused on hardware acceleration themselves to the current packet traffic protocols and
for achieving suitable processing speeds [12]. This is characteristics. This approach is given in [19], where the
usually done by combining application-specific authors propose use of PP as a simple high-level language
coprocessors with general-purpose multiprocessor for describing packet parsing algorithms in an
systems, or reconfigurable FPGA platforms. implementation-independent manner. Similarly, in [20], a
Network processors are an important part of various special descriptive language PX is used to describe the
network equipment devices, including routers, switches, kind of network processing that is needed in a system, and
firewalls or IDS (Intrusion detection systems). Their then a special tool generates the whole multiprocessor
development starts in the late 1990s when network system as an RTL description. Afterwards, the system
devices were insufficient to handle complex network may be run on an FPGA platform, and may be
processing requirements, [3]. Generally, network dynamically reconfigured.
processors are used to perform fast packet forwarding in
the data plane, while the slow control plane packet III. DESIGN OF DIRECT FIELD ACCESS LOGIC
processing is mostly handled by a general purpose The general idea of this paper is to use FPGA re-
processor. configurability for the purpose of a flexible packet parsing
Network processor operation begins with the receipt of hardware. This means that the appropriate parsing
an input stream of data packets. After that, the headers of hardware can be extended and generated for various
the received packets are inspected and its content is packet formats. The proposed DFA hardware would allow
analyzed, parsed and modified. During the packet a single cycle access to header fields, and even though this
processing some additional hardware units may be circuit complexity decreases the operating frequency of a
involved in order to perform some specialized tasks such circuit, it could also mean decreased power consumption.
as classification of packets, lookup and pattern matching, Therefore it is a promising direction for a further research
forwarding, queue management and traffic control, [4]. in the field of packet parsing. The topic is focused on the
After the completion of all the required operations, the network processing hardware but it has implications on
network processing is finished and the packet is sent out computer networks at all, because the acceleration that is
through the switching fabric to the appropriate outbound involved with the proposed DFA module would cause
port of the networking device. further increase of the overall network throughput.
According to [13], the packet processing can be speed- The proposed DFA hardware would speed up packet
up if the most time–consuming network processing processing by allowing same access time for a packet
operations are simplified, and some appropriate choices of header field as the access to any random memory word,
the routing protocol functionalities are made. Therefore, even when it is not byte- or word- aligned. In order to
many different approaches have been proposed, including achieve single-cycle access, part of the memory address
the label concept to accelerate the look-up operations and space is specially tailored to address the packet header
utilization of special hardware functional units to perform fields. When such address is input in the DFA module it
some very slow operations, like CRC calculation. selects the corresponding word from memory, and
Additionally, several algorithms for faster table lookup afterwards depending on the field, the word is processed
algorithms have been also proposed in [14] and [15]. in order to extract it. This may include shifting the word
Moreover, the authors of [13] suggest an approach that and/or modification of its bits. A scheme of the proposed
avoids the slow table lookup operations during the packet logic is presented in Fig.1.
processing, by using the source routing option of the IP
protocol.
448
7th International Conference on Information Society and Technology ICIST 2017
TABLE I.
LOOK UP TABLE IN DFA LOGIC
IPv4 Header
IPv4 Header @0000100000001000 MemoryAddress for specific field Word-aligned Field Offset
version 4 0000h (IPv4 version) 0000h (first word)
headerLength 4
0001h (IPv4 headerLength) 0000h (first word)
typeofService 8
0002h (IPv4 typeofService) 0000h (first word)
firstwordfirstHalf 16 // used for IP checksum
totalLength 16 0003h (IPv4 firstwordfirstHalf) 0000h (first word)
identifier 16 0004h (IPv4 totalLength) 0000h (first word)
flags 3
0005h (IPv4 identifier) 0001h (second word)
fragmentOffset 13
0006h (IPv4 flags) 0001h (second word)
secondwordsecondHalf 16 // used for IP checksum
timetoLive 8 0007h (IPv4 fragmentOffset) 0001h (second word)
protocol 8 0008h (IPv4 secondwordsecondHalf) 0001h (second word)
thirdwordfirstHalf 16 // used for IP checksum 0009h (IPv4 timetoLive) 0002h (third word)
headerChecksum 16
000Ah (IPv4 protocol) 0002h (third word)
IPv6 Header 000Bh (IPv4 thirdwordfirstHalf) 0002h (third word)
IPv6 Header @0000110000000000
000Ch (IPv4 headerChecksum) 0002h (third word)
version 4
000Dh (IPv6 version) 0000h (first word)
trafficClass 8
flowLabel 20 000Eh (IPv6 trafficClass) 0000h (first word)
payloadLength 16 000Fh (IPv6 flowLabel) 0000h (first word)
nextHeader 8 0010h (IPv6 payloadLength) 0001h (second word)
hopLimit 8
0011h (IPv6 nextHeader) 0001h (second word)
Figure 2. Example of IPv4 and IPv6 header description 0012h (IPv6 hopLimit) 0001h (second word)
The DFA logic is designed so that it assumes that a The packet header starting address that is given in the
packet with a specific header format is located in a fixed header description is placed in a specific base address unit
area of the memory. The packet header format should be in the DFA logic. In addition to that, the input memory
presented in a textual format as shown in the example for address for the specific field is translated into a field offset
the IPv4 and IPv6 protocol in Fig. 2. In the given header by the lookup table (LUT), as given in Table 1. The field
descriptions the first line defines the name of the header offset represents a word-aligned offset to the starting
and its location in memory. Each following line contains header packet address, which points to the location where
the definition of a single field. The first column gives the the given packet header field is placed. This means that if
name, whereas the second column gives its size in bits. the length of a specific field is smaller than the memory
The fields are defined in the order that they appear in the word length, then the closest word-aligned offset is
header. selected and put in the LUT table.
449
7th International Conference on Information Society and Technology ICIST 2017
450
7th International Conference on Information Society and Technology ICIST 2017
V. CONCLUSION
MIPS MIPS with DFA
This paper proposes a DFA module that allows single-
mov r0, #0 cycle memory access for non byte- or word- aligned fields
mov r1, #0 --Set Base Register in various packet header formats. This approach
mov r2, headerIPv6[r1] la t0, headerIPv6 introduces speed-up in packet processing in both general-
purpose and application-specific processor architectures,
--Version --Version as header field access is a very frequent operation in
and r3, r2, 15 mov r3, h13 network processing. In fact it was shown that a MIPS
processor that is extended with DFA logic can provide
--Traffic Class --Traffic Class 40/45% faster header parsing of IPv4/IPv6 packets, in
srl r4, r2, 4 mov r4, h14 comparison with a bare MIPS processor.
and r4, r4, 255 Future work includes comparison of hardware
--Flow Label --Flow Label complexities for various header formats and justification
srl r5, r2, 12 mov r5, h15 of the additional hardware over the performance
improvement. It is obvious that further modifications
mov r1, #4 would require extensions of the look up table and
definition of novel field logic blocks in the DFA logic.
mov r2, headerIPv6[r1]
Additionally, the possibility to generate DFA modules
for specific packet headers, and reconfigure the system to
--Payload Length --Payload Length start using them, whenever there is a need for a new
and r6, r2, 65535 mov r6, h16 networking protocol, is very attractive. This approach
--Next Header --Next Header makes use of FPGA re-configurability, which has proven
srl r7, r2, 16 mov r7, h17 itself as an ideal technology for achieving fast enough
and r7, r7, 255 speed, at low price.
--Hop Limit --Hop Limit
REFERENCES
srl r8, r2, 24 mov r8, h18
[1] Bob Wheeler, “A new era of network processing,” LinleyGroup
Bob Wheeler's White paper, 2013.
Figure 4. Comparison for an IPv6 packet header parsing with and [2] M. Ahmadi, S. Wong, “Network processors: challenges and
without DFA trends,” in 17th Annual Workshop on Circuits, Systems and Signal
Processing, Netherlands, 2006, pp. 222-232.
[3] R. Giladi, Network Processors - Architecture, Programming and
last four bits of the shifted word and thus to extract the Implementation, Ben-Gurion University of the Negev and EZchip
Technologies Ltd., 2008.
header length value. All the other fields are also retrieved
by the use of shifting and the logical AND instruction. [4] Panos C. Lekkas, Network Processors: Architectures, Protocols
and Platforms, McGraw-Hill Professional, 2003.
The right side of Fig. 3 shows the equivalent program if [5] M. Shorfuzzaman, R. Eskicioglu, P. Graham, “Architectures for
DFA logic were used. The directly accessed fields are network processors: key features, evaluation, and trends,” in
addressed in the assembler by using mnemonics starting Communications in Computing, 2004, pp.141-146.
with the letter 'h' followed by the number of the field as [6] J. Naous G. Gibb, S. Bolouki, N. McKeown., “NetFPGA: reusable
specified in the header description file. In this case, only router qrchitecture for experimental research,” in Sigcomm Presto
one instruction is needed to read out a header field directly Workshop, USA, 2008.
into a register. As it can be seen from Fig. 3, the number [7] M. Petraccaa, R. Birkea, A. Bianco, “HERO: High speed
enhanced routing operation in software routers NICs,” in IEEE
of instructions needed to load all fields from the IPv4 Telecommunication Networking Workshop on QoS in Multiservice
header is 24 when no DFA logic is used, whereas the IP Networks, Politecnico di Torino, 2008.
number of instructions is decreased to 14 when DFA is [8] Intel Corporation, Intel® IXP2800 and IXP2850 network
used. This amounts to around 40% decrease in instruction processors, Product Brief, 2005.
count for a per IPv4 packet parsing. [9] Bob Doud, “Accelerating the data plane with the Tile-mx
Fig. 4 gives a comparison between a MIPS assembler manycore processor,” in Linley Data Center Conference, 2015.
program that processes an IPv6 packet header, with and [10] Joao M. P. Cardoso, Michael Hubner, Reconfigurable Computing:
without the DFA logic. The program consists of reading From FPGAs to Hardware/Software Codesign, Springer-Verlag
New York, 2011.
out the version, traffic class, flow label, payload length,
next header and the hop limit from the packet header into [11] G. Gibb, G. Varghese, M. Horowitz, and N. McKeown, “Design
principles for packet parsers,” in 2013 ACM/IEEE Symposium on
registers. This assembler program is very similar to the Architectures for Networking and Communications Systems
one related to IPv4 header parsing and also consists of bit- (ANCS), 2013, pp. 13–24.
wise logical and shifting operations. Actually, the number [12] J. Kořenek, “Hardware acceleration in computer networks,” in
of instructions needed to load all fields from the IPv6 2013 IEEE 16th International Symposium on Design and
header is 13 when no DFA logic is used, whereas the Diagnostics of Electronic Circuits Systems (DDECS), 2013, pp.
number of instructions is decreased to 7 when DFA is 11–11.
used. This amounts to around 45% decrease in instruction [13] S. Hauger, T. Wild, A. Mutter, et al., “Packet processing at 100
count for a per IPv6 packet parsing. Depending on Gbps and beyond—challenges and perspectives,” in 15th
International Conference on High Performance Switching and
processor characteristics this may bring significant Routing, Germany, 2009.
improvement in packet processing performance. [14] P. Gupta, S. Lin, and N. McKeown, “Routing lookups in hardware
at memory access speeds,” in IEEE Infocom’98, San Francisco,
CA, pp. 1240–1247.
451
7th International Conference on Information Society and Technology ICIST 2017
[15] W. Eatherton, G. Varghese, Z. Dittia, “Tree bitmap: International Symposium on Design and Diagnostics of Electronic
hardware/software IP lookups with incremental updates,” in Circuits Systems, 2014, pp. 189–194.
Sigcomm Computer Communication Review, vol. 34, no. 2, 2004. [19] M. Attig and G. Brebner, “400 Gb/s Programmable Packet Parsing
[16] L. Kekely, V. Puš, and J. Kořenek, “Software Defined Monitoring on a Single FPGA,” in 2011 Seventh ACM/IEEE Symposium on
of application protocols,” in IEEE Infocom 2014 - IEEE Architectures for Networking and Communications Systems
Conference on Computer Communications, 2014, pp. 1725–1733. (ANCS), 2011, pp. 12–23.
[17] R. Bolla, R. Bruschi, C. Lombardo, F. Podda, “OpenFlow in the [20] G. Brebner and W. Jiang, “High-Speed Packet Processing using
Small: A Flexible and Efficient Network Acceleration Framework Reconfigurable Computing,” in IEEE Micro, vol. 34, no. 1, pp. 8–
for Multi-Core System.” in IEEE Transactions on Network and 18, 2014.
Service Management, 2014, pp. 390-404. [21] David A. Patterson, J. L. Hennessy, Computer organization and
[18] V. Puš, L. Kekely, and J. Kořenek, “Design methodology of design: the hardware/software interface, 5th ed. Elsevier, 2014.
configurable high performance packet parser for FPGA,” in 17th
452
7th International Conference on Information Society and Technology ICIST 2017
Abstract - The unique data set was created by collecting data In 2011 the Provincial Government of Vojvodina
about sanitary inspection control on the territory of AP enjoined the Provincial Department of Health to prepare
Vojvodina during the period of two years. The total number the tender documents for the implementation of the
of multi-dimensional data records is 28,403. The Open Data software for sanitary control. The goal was a software
with complete details is published on web site of Provincial
system aimed at improvement of the entire inspection
Health Secretariat in the section Documents – Data, available
for a bulk download. Data comprise temporal, spatial and process. More precisely, in order to achieve this global
categorical components and as such are highly suitable for a goal, the software system should provide for:
variety of analyses by means of machine learning techniques, Automated creation of documents with electronic
especially neural networks. In this paper examples of linear archive functionality.
regression and neural networks applications to analysis of the Efficient creation and maintenance of the following
data are presented. The obtained results can be used for up to date electronic databases:
improving daily tasks like estimating inspection control o Registry of objects under sanitary control,
workload, and alike.
o Registry of persons under sanitary control,
With the adoption of the Law on Inspection Control (in o Registry of relevant regulations and legislation
further text referred as Law) in 2015, Serbian Parliament used in the control.
opened the way for a paradigm shift in government Monitoring of sanitary and hygiene conditions in
inspection services. In fact, only with the adoption of this facilities under sanitary supervision.
Law the preconditions for fundamental reform of Significant improvement of the functions of planning
governmental inspection services were established. The inspection control, autonomous processing of reports,
adoption of the Law was preceded by analysis, study visits monitoring of inspection process (current outcome
to countries that cultivate a good inspection practice, and and legal assistance to the inspector from start to the
consulting local and foreign experts’ consultation in the end of the control process).
process of passing the Law. The result was the Law that Assisting in establishing a uniform inspection
was positively evaluated not only by non-governmental methodology and procedures.
organizations but also by the European Commission. Equal treatment for all subjects under supervision.
One of the mechanisms which create the preconditions Creating the conditions for a comfortable, efficient,
for the lawful and proper operation of the inspection precise, legally safe operation of inspectors.
managers and inspectors are proper information systems Supervision of the work and expertise of inspectors.
and software solutions that enable the transparent work of Internal licensing (periodic proficiency testing) for
inspections, harmonization of inspection practices, timely inspectors.
access to data, case monitoring, risk evaluation, and Preventing illicit relations, corruption, and other
corruption prevention. [1] adverse consequences.
An important new aspect of the Law is regulation of the Raising transparency of inspection by publishing
transparency of governmental inspection services. The information regarding regulations, practices, lessons
Law defines, but does not restrict, activities aimed at about appeals, answers to frequently asked questions,
establishing transparency, e.g. publication of inspection educational materials and the like.
control plans and reports , inspection checklists, best and Good international practice in the organization of the
worse rated subjects of inspections (white and black lists), inspection that is sublimated in the recommendations of
and current regulation concerning inspections on public the World Bank presented the study: “Good Practices for
internet sites. The practice of publishing required Business Inspections - Guidelines for Reformers, World
information is not widespread at the moment among Bank Group / Small and Medium Enterprise Department –
inspections in Serbia, mostly due to the lack of technical 2006” which was the starting basis for requirements
capacity. This paper proposes publishing additional non- specification. In 2012 the team from the Faculty of
mandatory data extracted from the database and/or derived Technical Sciences in Novi Sad implemented the software
from the raw data. Furthermore, the conducted data for sanitary control with requested features. The software
analysis could be the base for resource planning in future is in the production phase since January 2013 on the
inspection supervisions. territory of AP Vojvodina.
453
7th International Conference on Information Society and Technology ICIST 2017
The software has won awards in 2014 by the One successful example of making similar data (those
Informatics Society of Serbia in the field of innovative for restaurant inspection) publicly available is an initiative
application of IT in the business environment. from the last century. Since the mid-1990s, NYC has been
Sanitary control process are implemented with a weekly posting its restaurant inspection data online to reach a
work plan, which is designed using the module for wider audience [5]. Outcomes of food safety inspections
planning. This module was enriched in 2015 with the of restaurants and other similar providers of food to the
addition of smart planning of inspection control based on public in Open Data form are now widely accepted. On
dynamic risk level based on AI techniques. The system for web portal of The US City Open Data Census [6], more
smart planning is described in the paper “Intelligent than 70 open data sets regarding restaurant inspections are
planning of inspection control based on a dynamic level of available. Relaying upon that data arise numerous and
risk” [2], an award winning paper of the Infofest various applications, including a mobile app to help user
conference 2015. chose a nearby place to eat. Another example is HDScores
The methodology of inspection oversight, from the very which gathers health inspection data from across the
beginning of the system use in 2012, relies upon checklists, country and incorporates data from 530,000
which are also a guide to performing the inspection. establishments. Need to gather data from various system
Provincial Secretariat for Health, Social Policy, and led to proposing a new standard for Local Inspector Value-
Demography created more than 120 different types of Entry Specification (LIVES) [7].
checklists for the most common types of facilities that are
subject to sanitary surveillance. Checklists are in line with I. DATA
current legislation. Answers to inquiries automatically Collecting data used in this study was done by
calculate the level of compliance of the supervised entities checklists that are an integral part of the software. The
with the law. Software tools enable the search and initial checklists were swapped in 2016 with a set of
processing of all parameters that are used in the control checklists approved by the Ministry of Health of the
lists queries. Republic of Serbia and Coordination Committee of the
The software is an essential IT tool for the work of Ministry of Public Administration and Local Self-
sanitary inspectors. All users of the system, including Government. Checklists are available on the website of the
around 80 sanitary inspectors, passed individual and group Ministry of Health as well as on the website of the
training. Secretariat of Health.
Serbia started the implementation of an Open data Checklists used by the Provincial Sanitary Inspection
Initiative in 2015. The Strategy for the Development of e- are additionally customized for electronic processing,
Government comprises the introduction of an open data while the contents and questions are fully aligned with
policy. Therefore, as the first step towards that goal, with official checklists. Thus decorated checklist as Excel
the support from the World Bank and UNDP, the workbooks contain several worksheets and enable
Government of Serbia conducted an Open Data Readiness collection of data on the supervised entity, its
Assessment (ODRA) [3] in June-November 2015, upon organizational units, responsible and present
the request of the Ministry of Public Administration and representative. A special worksheet contains scored
Local Self-Government, Directorate of e-Government. questions for an inspection check, which allow automatic
The assessment concludes that Serbia is in a good position calculation of identified risk in business operations of
to move forward with an Open Data Initiative, although supervised subjects during the supervision. Data are
stronger senior management level support and awareness collected in the process of field supervision. Sanitary
are needed. inspectors fill checklists during the assessment of the
Strategy for Development of e-Government and ODRA compliance and security risk.
recommendations anticipate the next step of the open data The data used in this paper have been collected by 65
initiative in Serbia as the development of an open data sanitary inspectors in the period January 2013 to
portal that would provide, via a metadata catalog, a single December 2016 on the territory of the Province of
point of access to data of the Government institutions, Vojvodina. The electronic archive spanning the period
agencies, and bodies for anyone to reuse. At the moment 2013 – 2016 stores over 45 000 filled checklists and over
of writing this article, the national portal is in final phase 80 000 cases formed by inspectors, containing more than
of development, available on web address 500,000 documents that are stored and available in the
http://data.gov.rs/sr/. So far, the portal hold the data from software for sanitary control. The total number of records
six governmental institutions that already published data in a DataSet is 28,403 collected in the period 1.1.2015. -
sets as open data, including the Ministry of Education, 01.01.2017.
Science, and Technological Development; Ministry of Table 1 presents semantics and some quality indicators
Interior; The Public Procurement Office; The of collected data.
Commissioner for Information of Public Importance and The field DatumInspekcije is the date when inspection
Personal Data Protection. This is the first data set took place. Fields Zapoceta and Zvrsena are times of
regarding inspections data released in Serbia. Data is inspection control beginning and ending respectively.
available for a bulk download on the official web site of The field Inspector is the identification of inspector in
Secretariat for Health of Autonomous Province of charge. Inspectors have territorial jurisdiction, defined by
Vojvodina [4]. the administrative boundaries of administrative districts in
454
7th International Conference on Information Society and Technology ICIST 2017
the province. Each administrative district has department The field Kategorizacija is defined by the Ministry and
and representative of Sanitary Inspection. represents the category of the epidemiological risk.
Fields Mesto, Opstina, and Okrug describe the spatial The field VrstaMin is a hierarchical categorization of
characteristics of the data set. business activity prescribed by the Serbian Ministry of
The field Kontrolna lista is a document containing the Health.
checklist for inspection control. The field OJVrsta is extended classification that
The field VrstaPregleda indicates inspection control accurately determines the object affiliation to a group of
type. objects that are under sanitary control.
The field Usaglasenost is a percentage of compliance in
inspection control.
Table I.
TABLE II
INSPECTION START/END TIME DISTRIBUTION (TOP TEN VALUES)
455
7th International Conference on Information Society and Technology ICIST 2017
456
7th International Conference on Information Society and Technology ICIST 2017
The graph is produced with Python and NumPy package B. Second example: Regression modelled by artificial
for scientific computing. neural network
The same problem was solved applying Artificial
Neural Network. The network is organized as a multi-layer
perceptron with following characteristics:
Input layer: 1 neuron
Hidden layer 1: 5 neurons, Activation function TanH
Hidden layer 2: 5 neurons, Activation function TanH
Output layer: 1 neuron, Activation function ReLU
Training function:
SGD (Stohastic Gradient Descent),
learning rate: 0.2,
decay: 0.000001,
momentum:0.7,
loss function: mean_squared_error
Figure 3 – Correlation of inspection controls with resident number
Figure 6 shows network training error, while Figure 7
shows correlation gained through ANN.
457
7th International Conference on Information Society and Technology ICIST 2017
transformed in analogous way: mean subtraction Y -= produce critical consequences in core business processes
Y.mean(), and then normalization Y /=Y.std(). functionality; on the contrary, they prove to be very useful
Network is organized as a multi-layer perceptron with for reporting and controlling purposes.
following characteristics: The paper also argues about advantages of the open data
Input layer: 1 neuron approach and presents illustrative examples of AI and
Hidden layer: 50 neurons, Activation function TanH machine learning applications utilizing such data. The
Output layer: 1 neuron, Activation function TanH quality and amount of the data are essential for utilization
Training function: of efficient data mining methods/algorithms, while proper
SGD (Stochastic Gradient Descent), metadata and the processes of data collection are required
learning rate: 0.1, for the analyst to understand the data better. In our example
decay: 0.000001, the quality of data is high, which is warranted by the data
momentum: 0.7, provider/owner, while its utilization should be warranted
loss function: mean_squared_error by data openness and content which is interesting for
After performed training the error change is displayed citizens, number of business organizations in different
on the Figure 8. industries, and numerous administrative entities.
The examples provided in this paper are only
illustrations of applications that can be implemented by
applying selected data analysis algorithms to the
inspection data set as well as to data obtained by its
integration with other open data. New applications are the
matter of available open data and users’ creativity.
The data set presented in this paper allows for defining
a procedure aimed at quality assurance of algorithms for
lexical analysis and correction of textual input data like
manual filling of the address field, identity management
algorithms, and dynamic risk assessment based on
Figure 8 - Training error previous experience and data. These are, by authors’
opinion, some up-and-coming directions for the further
The results for the most important object categories are research in the field addressed by this paper.
displayed in Table II.
REFERENCES
TABLE II. [1] M. Stefanović, D. Milovanović, J. Stefanović, and I. Dragošan,
“GUIDE for the application of the Law on Inspection Control,”
RESULT OF THE NEURAL NETWORK TRAINING Inspection supervision - Serbia - Legislation ISBN 978-86-
919327-0-1, Nov. 2015.
Inspection [2] R. Bulatović and Đ. Obradović, “Inteligentno planiranje
Object type control duration inspekcijskog nadzora bazirano na dinamičkom stepenu rizika,”
presented at the INFOFEST, Miločer, Montenegro.
(min) [3] A. Zijlstra, O. Petrov, and A. Ivic, “Open Data Readiness
B301 Klinika 134 Assesment,” UNDP, WB, MoPALSG, DEU, Republic of
Serbia, Dec. 2015.
B211 Opšta bolnica 119 [4] “Sanitary Inspection | Open Data download.” [Online].
Available:
E101 Hoteli i moteli sa restoranom 117 http://www.zdravstvo.vojvodina.gov.rs/index.php?option=com_
F101 Osnovno obrazovanje opšte 109 docman&task=cat_view&gid=121&Itemid=91. [Accessed: 14-
Apr-2017].
E222 Kuhinje kolektivne ishrane 106 [5] N. Helbig, A. M. Cresswell, G. B. Burke, and L. Luna-Reyes,
“The dynamics of opening government data, CS - Restaurant
B101 Dom zdravlja 102 health inspection data in New York City (NYC),” Cent.
E204 Picerija 98 Technol. Gov. Available Httpwww Ctg Albany
Edupublicationsreportsopendata, 2012.
D101 Prodavnica 81 [6] “Restaurant Inspections — Datasets - US City Open Data
Census.” [Online]. Available: http://us-
city.census.okfn.org/dataset/food-safety. [Accessed: 30-Jan-
III. CONCLUSION 2017].
This paper presents the data set containing the data [7] “Local Inspector Value-Entry Specification - Yelp.” [Online].
Available: https://www.yelp.com/healthscores. [Accessed: 30-
collected during two years of sanitary inspection control Jan-2017].
on the territory of Province of Vojvodina. The data [8] “Download Serbia cesus results 2011.” [Online]. Available:
available in Open Data format were extracted from the http://popis2011.stat.rs/?page_id=2162&lang=en. [Accessed:
database established in 2012 relying upon the flexible 14-Apr-2017].
information system that supports business process and
legislation changes, and smooth implementation. Some
deliberate approach to information system design and
implementation are that the inspector can type in some data
without restrictions/control. Fortunately, this did not
458