Professional Documents
Culture Documents
Knowledge Management For Business Intelligence Mea
Knowledge Management For Business Intelligence Mea
net/publication/319145281
CITATION READS
1 3,395
4 authors, including:
Ioannis Kazanidis
International Hellenic University
106 PUBLICATIONS 1,129 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Ioannis Kazanidis on 07 September 2018.
Sotirios Kontogiannis
Department of Mathematics,
University of Ioannina,
P.O. Box 1186, 54110 Ioannina, Greece
Email: skontog@cc.uoi.gr
Website: http://spooky.math.uoi.gr/~skontog
Giannoula Florou
Department of Accountancy,
Technological Educational Institute of East Macedonia and Thrace,
AgiosLoukas, Kavala 65404, Greece
Email: gflorou@teiemt.gr
1 Introduction
With changes and advancements every day, organisations must be prepared and informed
with the possible trends and applications in order to achieve competitive advantage
(Elragal and Gendy, 2013).
E-Business refers to any business that uses the internet and related technologies.
E-Business is the conducting of business on the internet, not only buying and selling but
also servicing customers and collaborating with business partners. It refers to any
business that uses the internet and related technologies. It also includes the processes and
tools that allow an organisation to use internet-based technologies and infrastructure, both
internally and externally, to conduct day-to-day business process operations. It applies to
both large and small businesses in electronic commerce for buying, selling, marketing, as
well as customer relations and management services.
Knowledge management (KM) is a systematic and integrative process of coordinating
organisation-wide activities of acquiring, creating, storing, sharing, diffusing, developing
and deploying knowledge by individuals and groups in the pursuit of major
organisational goals (Rastogi, 2000). The KM process involves the identification of
needed skills, sharing knowledge, creating new knowledge and cataloguing current
organisational knowledge (Mellor, 2001). KM has been shown to be the powerful
ingredient in the success of organisations (Davenport and Prusak, 1998; Desouza and
Awazu, 2006).
Luhn (1958) had defined intelligence as: “the ability to apprehend the
interrelationships of presented facts in such a way as to guide action towards a desired
goal”.
Business intelligence (BI) was introduced by Dresner (the member of the Gartner
group) in the year 1989, as a term that “describes a set of concepts and methods to
improve business decision making by using fact-based support systems” (Power, 2007).
BI consists of a dynamic and continuous set of processes and practices embedded in
individuals, as well as in groups and organisational structures (Sharma and Djiaw, 2011).
According to Adelman et al. (2002), BI is a term that encompasses a broad range of
analytical software and solutions for gathering, consolidating, analysing and providing
access to information in a way that is supposed to let an enterprise’s users make better
business decisions. BI enables the comprehension, understanding and profit from
experience (Green, 2007).
BI is the emerging discipline that aims at combining corporate data with textual
user-generated content (UGC) to let decision-makers analyse their business based on the
trends perceived from the environment. Despite the increasing diffusion of BI
applications, no specific and organic design methodology is available yet.
In this paper, we propose an iterative approach for designing and maintaining BI
application using effectively KM. The paper is completed by a case study in e-business
for a web shop, aimed at proving that the adoption of a structured approach positively
impacts on the project success.
The paper is structured as follows: Section 2 presents the literature review by
analysing existing work in the field and Section 3 describes the approach used to conduct
the study and the techniques applied to the analysis of the corresponding data.
Subsequently, the results after the application of the indices and metrics, and both
hierarchical clustering and product classification algorithm (PCA) are presented in
Section 4. The findings, which are discovered for the products, and how these could lead
326 S. Valsamidis et al.
2 Literature review
BI outcomes are achieved through the business processes which are implemented
with tools and information systems in order to empower the acquisition, integration,
sharing and dissemination of organisational knowledge (Bartlett, 1998; Sensiper, 1997).
BI is used to understand the capabilities in which the firm competes; and the actions of
competitors and the implications of these actions (Negash, 2004). BI describes the result
of in-depth analysis of detailed business data, including database and application
technologies as well as analysis practices (Gangadharan and Swamy, 2004). BI is
technically much broader, potentially encompassing KM, ERP, decision-support systems
and DM (Gangadharan and Swamy, 2004). In addition, BI is primarily aimed at
supplying top management with relevant information in order to support strategic
decision making (Bucher et al., 2009). Moss and Atre (2003) described BI as a seamless
integration of operational front-office applications with operational back-office
applications. BI is defined as the solution applying information technology (IT) to
retrieve heterogeneous and distributed resources in order to formulate any usable
knowledge by employing analysis mechanisms (Vine, 2000). In addition, Gangadharan
and Swamy (2004) defined BI as an enterprise architecture for an integrated collection of
operational as well as decision support applications and databases, which provides the
business community easy access to their business data and allows them to make accurate
business decisions. A successful BI ties business and IT together to help enterprises
manage and integrate ongoing investments in BI, allocate BI resources, prioritise projects
and minimise the risk associated with BI implementations (Ranjan, 2008).
The proposed approach consists of five main stages, namely storing data, pre-processing
data, indices computation and calculation metrics, application of hierarchical clustering,
328 S. Valsamidis et al.
and application of the PCA and e-business usage assessment. The first three stages
are based on the framework described in detail in Kazanidis et al. (2009) and Valsamidis
et al. (2011, 2012a), and facilitate the extraction of useful information from the data
logged by a web server running a web application.
The main advantages of the proposed approach are that:
• it uses the log file for web usage analysis
• it proposes indices and metrics to be used for the first time in e-business system
• it uses two algorithms for clustering and classification in a different way
• it can be easily adapted to any e-business system
• it visualises the results in a user friendly environment.
The five distinct stages of the approach are depicted in Figure 1. After the logging of data
from e-business system which stands in a web server, the data are pre-processed in order
to be ready for data analysis. It is worth mentioning that the recorded information by the
module secures uses one-to-one mapping between URI and per product stored
information in terms of pages.
Indices and metrics which were firstly introduced by the authors are used for the
measurement of the usage of the e-business system.
specific web shop platform fields at the web shop database. More specifically, the
following fields are being recorded:
1 remote_host: The IP of the host that the user connected to the web shop as well as
location info of the connected IP address taken from a who is request.
2 session_id: Session id created at the web system upon user authentication. If a user
does not identify itself with the web shop, a random session_id denoted to the user
based on an md5 hash of its remote IP address.
3 user_name: The username of a logged in user. If user did not log in to the web store
then anonymous username is used (user name can be used in order to extract gender
or age attributes from the e-shops database by performing post-processing analysis).
4 module_id: The e-shop component module_id. The web shop modules identified are
product, new_product, offer_product, wish_productprice_reduction_product, news,
links, payment modules, shipping modules and pricing modules.
5 request_uri: The e-shop webpages that a session_id visits. Each page is identified by
a module_id field. For product, new_product, offer_product, wish product modules,
a product_id field and cat_id field, identifies each product page additionally. If it is
an informatory page such as e-shop news or link page, then it is identified by an
unique news_id or a link_id number. If it is a payment or shipping product module
then it is identified by a payment type id or a ship type id accordingly.
6 request_uri_duration: The time in seconds that a user visits a specific page.
7 session_duration: The total session duration.
8 user_order_id: The order_id of a completed e-shop cart transaction, with transaction
date included.
9 user_payment: The total amount of cart transaction cost.
10 user_wishlist_id: The id of a new product put into user’s wish list from the wish list
module.
11 user_payment_type: The cart enabled payment type selected by customer.
12 user_ship_type: The cart enabled shipping type selected by the customer.
13 user_red_id: The link of a product proposed by the user for a price reduction from
the reduction_product_list_module.
14 user_red_value: The product reduction value.
These fields are being recorded with the use of an Apache module, developed in Perl
programming language, as a first step.
The development of such a module has the following advantages:
• rapid storage of user information, since it is executed straight from the server API
and not by the e-business application, or database
• the produced data are independent of specific formulations used by the web shop
system.
330 S. Valsamidis et al.
Metrics are used for the facilitation of the product usage assessment. First, the indices
Sessions, Pages, Unique pages, Unique Pages per ProductID per Session (UPPS) are
computed with the use of a Perl program. Then, the metrics Enrichment, Disappointment,
Interest and Homogeneity are calculated. Finally, the rates, mean rate and the Score are
calculated based on the previous metrics.
The number of sessions and the number of pages viewed by all users are counted for
the calculation of viewed by many users but there were also some other pages not so
popular. To refine the situation, we define another index which is called unique pages and
metrics the total number of unique pages visited per product viewed by all users. It counts
each page of the product only once, independently of how many times they were viewed
Knowledge management for business intelligence measurement 331
by the users. The Unique Pages per Product per Session (UPPS) index expresses the
number of Unique Pages per Product visited in one Session; it is used for the calculation
of the product activity in an objective manner. Because some novice users may navigate
in a product and visit some pages of the product more than once, UPPS eliminates
duplicate page visits, since it considers the visits of the same user in a session only once.
Enrichment is a metric which is proposed in order to express the ‘enrichment’ of each
product in terms of educational material. Enrichment is defined as the complement of the
ratio of the unique pages over total number of product webpages as proposed in
Valsamidis et al. (2010a).
Enrichment = 1 – (Unique Pages/Total Pages), (1)
where Unique Pages ≤ Total Pages.
Enrichment values are in the range [0, 1). When users follow unique paths in a
product this is 0 while in a product with minimal unique pages this is close to 1. Since it
offers a metric of how many unique pages were viewed by the users, it shows how much
information included in each product is handed over to the end user inferring that the
product contains rich educational material.
Disappointment is a metric which combines sessions and pages viewed by users and
it metrics the disappointment of the users in the product, in the sense that when a user
views few pages of the product, he or she logs out of the product.
Disappointment = Sessions/Total Pages. (2)
In other words, the disappointment metric reflects how quickly the users discontinue
viewing pages of the products. Disappointment values are in the range (0, 1]. Owing to
the negative nature of the Disappointment metric, it was replaced by another metric
which has a positive sounding manner, Interest. Interest metric is defined as the
complement to the disappointment.
Interest = 1 − Disappointment. (3)
Both disappointment and interest metrics were proposed in Valsamidis et al. (2010b).
Homogeneity metric is another metric, which is defined as the ratio of unique visited
product pages to the number of sessions that visited the product (Valsamidis et al.,
2012b).
Homogeneity = Unique pages/Total Sessions. (4)
where Total Sessions per product Unique product pages.
Homogeneity metric value ranges from [0, 1), where 0 means that no user followed a
unique path and 1 that every user followed unique paths. It is a product quality index and
characterises the percentage of product information discovered by each user participating
in a product. The aforementioned metrics contribute to the assessment of product usage.
The aforementioned metrics once counted allow us to rank products.
individual pieces of data are then combined systematically and classified on a higher
level iteratively until one output is produced. This final output is the overall classification
of the data. Depending on application-specific details, this output can be one of a set of
pre-defined outputs, one of a set of online learned outputs, or even a new novel
classification that has not been seen before. Generally, such systems rely on relatively
simple individual units of the hierarchy that have only one universal function to do the
classification (Papadimitriou, 2007).
The data of our research are quantitative variables. We categorised the quantitative
variables in order to convert them to qualitative variables, by dividing them in three value
classes. For equivalence raisons between variables, we use three categories for each
variable, as they defined be quartiles. The first category of each variable contains values
less than the first quartile (25% of all data), the second category contains values up to the
first quartile and the third quartile (50% of all data) and the third category contains values
greater than the third quartile (25% of all data). So, each variable counted is measured
three values 1, 2, 3 (small, medium, large, respectively) and there are no differences in
variable measurements. Each variable has the same significance in the datatable. The first
category represents small values (smaller than first quartile), the second category
represents medium values (between first quartile and third quartile) and the third category
represents large values (greater than third quartile). The four main variables (Sessions,
Pages, Unique pages, UPPS) were used as active variables for the analysis. The initial
data matrix had 40 rows (products) and 4 columns with initial quantitative variables. The
new data table has 40 rows and 12 columns (4 variables × 3 classes).
3.5 PCA
The algorithm we propose is called PCA and it initially tries to classify e-business system
products based on poor or rich quantity of product information material. Afterward, based
on e-business system products with adequate information material, it tries to spot how
often product information is added or updated by administrators (or users based on
homogeneity classification) or followed by users (the updated information as it is
discovered by users). Finally, using the UPPS metric it tries to identify whether updates
of product information can increase the customer’s interest for the specific product. PCA
algorithm discovery schema is depicted in Figure 2.
According to the above the proposed algorithm is based on Enrichment, Homogeneity
and UPPS and is consisted by the corresponding stages.
At the first stage of the algorithm the Enrichment metric is involved in order to
identify products with poor or rich educational content (poor equals to small enrichment
value while rich to high enrichment value). We place to an N-ordered table a set of N
products based on Enrichment, where N ≤ Total e-business system products, the products
with the highest Enrichment metric values.
On the second stage, the algorithm classifies the previous set of N products using the
values of Enrichment and Homogeneity. The classification of e-business system l
products is performed using four clusters as shown in Figure 2. The highest the
Homogeneity value the more frequent the product updates or the more dynamic the
product content, depending on Enrichment value. The lowest the Homogeneity value then
the e-business system is more of static content or of poor content updates. The
classification of the products is depended on the average Enrichment value of the N
Knowledge management for business intelligence measurement 333
e-business system products and the average Homogeneity value of the high and low
Enrichment clusters accordingly.
The aim of the third stage of the algorithm is to identify whether the content can be
characterised as rich or poor, and whether is static, frequent or dynamic. To do this, we
order each cluster’s products based on the value of the UPPS.
4 Results
As it can be seen in Figure 3, we spot three groups of products. At first, we spot the group
(no. 73) which is characterised by large values of Sessions, Pages and UPPS.
We can characterise this cluster as ‘large frequency Pages’.
Next, we spot the group (no. 76) of products with medium and large values of Unique
pages. We can characterise this cluster as ‘Rich Pages’.
Furthermore, we spot the group no. 77 of products with low values of Sessions, Pages
and UPPS. This group is separated into three subgroups. Group no. 66 is characterised by
low values of Unique page. Group no. 72 is characterised by low values of Page and
medium values of Unique page. Group no. 70 is characterised by medium values of Page
and large values of Unique page.
After the end of pre-processing stage the PCA algorithm was applied. We initially
order 40 products according to Enrichment metric. We wanted to test our algorithm, so
we picked the best and worse e-business system products from a list of 40 products which
are shown in Table 2. That is, best and worst cases from products’ usage point of view.
Table 2 Processed data for 12 products with average enrichment value of 0.899
As shown in Table 3, for each one of the four classes the e-business system products are
ordered based on the UPPS metric value. So, the products PID105 and PID36 are the
representatives of high and low UPPS values for cluster I, PID132 and PID41 for cluster
II, PID112 and PID122 for cluster III and PID66 and PID8 for cluster IV accordingly.
In Table 4, we present these products and PCA evaluation feedback for each one of
those products.
5 Discussion
The indication that many pages within useful paths contribute to increased usage is fairly
obvious. Namely, the more and better content on a site, the more a user might visit it.
So the administrators should add some useful and helpful pages to a site. However, the
case is not that simplistic. If there is an essentially blank site but it is required for the
customers to visit it every day and contribute a comment, then the usage will be
necessarily high. On the other hand, if there is a very elaborate website with rich content
but is not required reading, limited usage of the site would be expected. So these issues
have to be adequately addressed. Here these issues are simply mentioned but they have to
be considered for future work.
The application of two algorithms proved that there is a relationship between
e-business system usage and the corresponding product purchases. An increased score to
UPPS leads to better sales of product and therefore to an improved business outcome.
The fact that only 40 products in one e-business system were investigated is a
limitation to the study. Especially for the data analysis techniques which demand large
datasets. However, this was ineluctable since the e-business system of the case study had
this number of active online products. In the future, we intend to apply the same approach
in other e-business systems.
We also plan to further automate the whole procedure, that is, we are developing a
plug-in tool to automate the data pre-processing and measures calculation steps. This tool
Knowledge management for business intelligence measurement 337
will run periodically (each week) and will notify the results to the administrators. We
intend the final tool to offer insights at two levels:
• online, with total statistical information such as number of visits per product (pages
and sessions), customer trends and activities at their visits, as well as detailed
information per customer (customer duration per product and activity, customer
preferences and activities for all products)
• offline, with the use of data mining techniques such as pre-process, visualisation,
clustering, classification, regression and association, discovering hidden data
patterns.
6 Conclusions
The proposed iterative method uses existing tools and techniques in a novel way to
perform e-business systems usage analysis. The metrics enrichment, homogeneity,
disappointment and interest are used. It also incorporates clustering and classification
algorithms.
It has the following advantages.
• It is independent of a specific e-business system, since it is based on the Apache log
files and not the e-business system itself. Thus, it can be easily implemented for
every e-business system
• It uses indices and metrics in order to facilitate the evaluation of each product.
• It offers useful information for a company to have to determine which parts of its
website to improve.
It is worth mentioning that this approach may be applied after a long time period of data
tracking. For example, the enrichment metric measures the number of visited pages in a
session divided by the number of available pages of a product. It is true that the higher
the number of pages in the product, the lower is the fraction a user can see in one session.
On the other hand, the administrators may update the product material with more
information so the ratio is not fully diminished.
Finally, it should be mentioned that the proposed approach may also be applied to
other web applications such as e-government, e-learning, e-banking, blogs, social
networks, etc. For example, in e-government applications, enrichment shows how much
information is handed over to the end user and homogeneity characterises the percentage
of information independently discovered by each user. Interest indicates whether users
are pleased with the material and its usefulness on a website. Furthermore, UPPS gives an
objective view for website usage.
References
Adelman, S., Moss, L. and Barbusinski, L. (2002) ‘I found several definitions of BI’, DM Review,
Retrieved 5 December, 2013, from http://www.dmreview.com/article_sub.cfm?articleId=5700
Anantatmula, V. and Kanungo, S. (2005) ‘Establishing and structuring criteria for measuring
knowledge management efforts’, Paper Presented at the 38th Hawaii International Conference
on System Sciences (HICSS-38), Big Island, HI.
338 S. Valsamidis et al.
Marwick, A.D. (2001) ‘Knowledge management technology’, IBM Systems Journal, Vol. 40,
No. 4, pp.814–829.
Massey, A.P. and Montoya-Weiss, M. (2002) ‘A performance environment perspective of
knowledge management’, Paper Presented at the 36th Hawaii International Conference on
System Sciences (HICSS-36), Big Island, HI.
McGurk, J. and Baron, A. (2012) ‘Knowledge management: time to focus on purpose and
motivation’, Strategic HR Review, Vol. 11, No. 6, pp.316–321.
McKinlay, A. (2005) ‘Knowledge management’. in Ackroyd, S., Batt, R. and Thompson, P. (Eds.):
The Oxford Hand Book of Work and Organization, Oxford University Press, Oxford, UK,
pp.242–262.
Mellor, R.B. (2001) Knowledge Management and Information Systems: Strategies for Growing
Organizations, Palgrave Macmillan, New York, NY.
Moss, T.L. and Atre, S. (2003) Business Intelligence Roadmap: The Complete Project Lifecycle for
Decision Support Applications, Addison Wesley Longman, Reading, MA.
Negash, S. (2004) ‘Business intelligence’, Communications of the Association for Information
Systems, Vol. 13, pp.177–195.
Nonaka, I. (1994) ‘A dynamic theory of organizational knowledge creation’, Organization Science,
Vol. 5, No. 1, pp.14–27.
Nonaka, I. and Takeuchi, H. (1995) The Knowledge-Creating Company, Oxford University Press,
Inc., New York.
Offsey, S. (1997) ‘Knowledge management: linking people to knowledge for bottom line results’,
Journal of Knowledge Management, Vol. 1, No. 2, pp.113–122.
Pandey, S.C. and Dutta, A. (2013) ‘Role of knowledge infrastructure capabilities in knowledge
management’, Journal of Knowledge Management, Vol. 17, No. 3, pp.435–453.
Papadimitriou, G. (2007) Data Analysis, Ed. Tipothito, Athens.
Power, D.J. (2007) A Brief History of Decision Support Systems. Retrieved 28 May, 2009 from
DSSResources.com: http://DSSResources.com/history/dsshistory.html.
Ranjan, J. (2008) ‘Business justification with business intelligence’, VINE: The Journal of
Information and Knowledge Management Systems, Vol. 38, No. 4, pp.461–475.
Rao, M. (2002) Knowledge Management Tools and Techniques: Practitioners and Experts
Evaluate KM Solutions, Elsevier, Amsterdam, The Netherlands.
Rastogi, P. (2000) ‘Knowledge management and intellectual capital: the new virtuous reality of
competitiveness’, Human Systems Management, Vol. 19, No. 1, pp.39–49.
Sanchez, R. (1996) Strategic Learning and Knowledge Management, John Wiley & Sons,
Chichester, UK.
Sensiper, S. (1997) AMS Knowledge Centers, Case N9-697-06, Harvard Business School Press,
Boston, MA.
Serenko, A. and Bontis, N. (2004) ‘Meta-review of knowledge management and intellectual capital
literature: citation impact and research productivity rankings’, Knowledge and Process
Management, Vol. 11, No. 3, pp.185–198.
Sharma, R.S. and Djiaw, V. (2011) ‘Realising the strategic impact of business intelligence tools’,
VINE: The Journal of Information and Knowledge Management Systems, Vol. 41, No. 2,
pp.113–131.
Turban, E., Aronson, J.E., Liang, T.P. and Sharda, R. (2007) Decision Support and Business
Intelligence Systems, 8th ed., Pearson Prentice Hall, New York, USA.
Valsamidis, S., Kontogiannis, S., Kazanidis, I. and Karakos, A. (2010a) ‘Homogeneity and
enrichment, two metrics for web applications assessment’, Proceedings of 14th Panhellenic
Conference on Informatics (PCI2010), Tripoli, Greece.
Valsamidis, S., Kazanidis, I., Kontogiannis, S. and Karakos, A. (2010b) ‘Automated suggestions
and course ranking through web mining’, Proceedings of 10th IEEE International Conference
on Advanced Learning Technologies ICALT 2010, Sousse, Tunisia.
340 S. Valsamidis et al.
Valsamidis, S., Kontogiannis, S., Kazanidis, I. and Karakos, A. (2011) ‘E-Learning platform usage
analysis’, Interdisciplinary Journal of E-Learning and Learning Objects (IJELLO), Vol. 7,
pp.185–204.
Valsamidis, S., Kontogiannis, S., Kazanidis, I., Theodosiou, T. and Karakos, A. (2012a)
‘A clustering methodology of web log data for learning management systems’, Educational
Technology & Society, Vol. 15, No. 2, pp.154–167.
Valsamidis, S., Kontogiannis, S., Kazanidis, I. and Karakos, A. (2012b) ‘An approach for LMS
assessment’, International Journal of Technology Enhanced Learning IJTEL, Vol. 4, No. 3,
pp.265–283.
Vine, D. (2000) Internet Business Intelligence: How to Build a Big Company System on a
Smallcompany Budget, CyberAge Books, Medford, NJ.
Von Krogh, G., Roos, J. and Kleine, D. (1998) Knowing in Firms: Understanding, Managing and
Measuring Knowledge, Sage, London, UK.