You are on page 1of 20

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/319145281

Knowledge management for business intelligence measurement in an e-


business system

Article  in  International Journal of Electronic Business · January 2017


DOI: 10.1504/IJEB.2017.10007156

CITATION READS

1 3,395

4 authors, including:

S. Kontogiannis Giannoula Florou


University of Ioannina International Hellenic University
105 PUBLICATIONS   589 CITATIONS    38 PUBLICATIONS   251 CITATIONS   

SEE PROFILE SEE PROFILE

Ioannis Kazanidis
International Hellenic University
106 PUBLICATIONS   1,129 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

PEPPER: Political Events analysis with a Process PERspective View project

Strategy of Flexibility and Financial Performance View project

All content following this page was uploaded by Ioannis Kazanidis on 07 September 2018.

The user has requested enhancement of the downloaded file.


Int. J. Electronic Business, Vol. 13, No. 4, 2017 323

Knowledge management for business intelligence


measurement in an e-business system

Stavros Valsamidis* and Ioannis Kazanidis


Department of Accountancy,
Technological Educational Institute of East Macedonia and Thrace,
AgiosLoukas, Kavala 65404, Greece
Email: svalsam@teiemt.gr
Email: kazanidis@teiemt.gr
*Corresponding author

Sotirios Kontogiannis
Department of Mathematics,
University of Ioannina,
P.O. Box 1186, 54110 Ioannina, Greece
Email: skontog@cc.uoi.gr
Website: http://spooky.math.uoi.gr/~skontog

Giannoula Florou
Department of Accountancy,
Technological Educational Institute of East Macedonia and Thrace,
AgiosLoukas, Kavala 65404, Greece
Email: gflorou@teiemt.gr

Abstract: Knowledge management (KM) can be defined as the set of activities


involved in discovering, capturing, sharing and applying knowledge to
enhance, and the strategic impact of knowledge. Business intelligence (BI) is
the process of enhancing data into information and into knowledge, in the field
of business. E-Business refers to any business that uses the internet and related
technologies. This paper introduces the role of knowledge management, for BI
providing the understanding of the application of KM with explaining the
usefulness of knowledge sharing system for BI in the dynamic transformation
of explicit knowledge and tacit knowledge in e-business system. Four indices
and four metrics, applied innovatively for the first time in the field of
e-business, provide capabilities for examining data with web usage analysis in
an online store. Two techniques, the hierarchical ascendant classification
(HAC) and the product classification algorithm (PCA) are applied with the aid
of the aforementioned measures in this e-business system.

Keywords: knowledge management; business intelligence; e-business; indices


and metrics; HAC; hierarchical ascendant classification; PCA; product
classification algorithm.

Copyright © 2017 Inderscience Enterprises Ltd.


324 S. Valsamidis et al.

Reference to this paper should be made as follows: Valsamidis, S.,


Kazanidis, I., Kontogiannis, S. and Florou, G. (2017) ‘Knowledge management
for business intelligence measurement in an e-business system’, Int. J.
Electronic Business, Vol. 13, No. 4, pp.323–341.

Biographical notes: Stavros Valsamidis received a five-year Electrical


Engineering Diploma from Department of Electrical Engineering, University
of Thessaloniki, Greece, an MSc in Computer Science from the University of
London, UK and his PhD from the Department of Electrical and Computer
Engineering, University of Thrace, Greece. He is an Associate Professor in the
Department of Accounting and Finance, Eastern Macedonia and Thrace
Institute of Technology, Greece. He has published more than 110 papers in
international journals and conferences. His research interests are in the areas of
database systems, data mining and web applications assessment.

Ioannis Kazanidis is an Adjunct Assistant Professor at Eastern Macedonia and


Thrace Institute of Technology, Greece. He has published more than 70 papers
in international journals and conferences. His research interests mainly focus
on information systems and particularly on adoption of innovations in different
fields of study on interaction design, educational technology, knowledge
management, augmented reality, and user experience design.

Sotirios Kontogiannis graduated from Democritus University of Thrace,


Department of Electrical and Computer Engineering. He received an MSc in
Software Engineering and PhD in the research area of algorithms and network
protocols for distributed systems, from the same department. He worked as a
software developer for more than 10 years in the private sector and participated
into SME research and development projects. We also worked as a contract
Assistant Professor at the Department of Business administration, TEI of
Western Macedonia, for six years and as a contract lecturer at the Department
of Informatics & Telecommunications Engineering, University of Western
Macedonia. His research interests focus on the areas of distributed systems,
artificial intelligence, AI algorithms, sensor networks, middleware protocols
and computer networks. He is currently a scientific staff member and director
of the distributed micro-computers laboratory (http://kalipso.math.uoi.gr/
microlab), at the Applied Mathematics and Engineering research section of the
Department of Mathematics, University of Ioannina.

Giannoula Florou is Professor at the Department of Accounting and Finance in


Eastern Macedonia and Thrace Institute of Technology, Greece. She had her
BSc in Mathematics, at Aristotle University of Thessalonica (Greece). She had
a post-doc position at Institute de Mathematiques Appliquees de Grenoble,
Laboratoired ‘Informatique Fondamentaleetd’ Intelligence Artificielle in
France, where she was interested in camera and distortion model estimation,
in image processing. She had taken her doctoral from University of Macedonia
in Greece. She was teaching in Aristotle University of Thessalonica and in
Technological Institution of Thessalonica until 2000. Since 2001 until 2004,
she was an Associate Professor in the Department of International Economic
Relation and Development, in Democritus University of Thrace. She has many
publications in revues and conferences contributions in subject of data analysis,
clustering and statistics.
Knowledge management for business intelligence measurement 325

1 Introduction

With changes and advancements every day, organisations must be prepared and informed
with the possible trends and applications in order to achieve competitive advantage
(Elragal and Gendy, 2013).
E-Business refers to any business that uses the internet and related technologies.
E-Business is the conducting of business on the internet, not only buying and selling but
also servicing customers and collaborating with business partners. It refers to any
business that uses the internet and related technologies. It also includes the processes and
tools that allow an organisation to use internet-based technologies and infrastructure, both
internally and externally, to conduct day-to-day business process operations. It applies to
both large and small businesses in electronic commerce for buying, selling, marketing, as
well as customer relations and management services.
Knowledge management (KM) is a systematic and integrative process of coordinating
organisation-wide activities of acquiring, creating, storing, sharing, diffusing, developing
and deploying knowledge by individuals and groups in the pursuit of major
organisational goals (Rastogi, 2000). The KM process involves the identification of
needed skills, sharing knowledge, creating new knowledge and cataloguing current
organisational knowledge (Mellor, 2001). KM has been shown to be the powerful
ingredient in the success of organisations (Davenport and Prusak, 1998; Desouza and
Awazu, 2006).
Luhn (1958) had defined intelligence as: “the ability to apprehend the
interrelationships of presented facts in such a way as to guide action towards a desired
goal”.
Business intelligence (BI) was introduced by Dresner (the member of the Gartner
group) in the year 1989, as a term that “describes a set of concepts and methods to
improve business decision making by using fact-based support systems” (Power, 2007).
BI consists of a dynamic and continuous set of processes and practices embedded in
individuals, as well as in groups and organisational structures (Sharma and Djiaw, 2011).
According to Adelman et al. (2002), BI is a term that encompasses a broad range of
analytical software and solutions for gathering, consolidating, analysing and providing
access to information in a way that is supposed to let an enterprise’s users make better
business decisions. BI enables the comprehension, understanding and profit from
experience (Green, 2007).
BI is the emerging discipline that aims at combining corporate data with textual
user-generated content (UGC) to let decision-makers analyse their business based on the
trends perceived from the environment. Despite the increasing diffusion of BI
applications, no specific and organic design methodology is available yet.
In this paper, we propose an iterative approach for designing and maintaining BI
application using effectively KM. The paper is completed by a case study in e-business
for a web shop, aimed at proving that the adoption of a structured approach positively
impacts on the project success.
The paper is structured as follows: Section 2 presents the literature review by
analysing existing work in the field and Section 3 describes the approach used to conduct
the study and the techniques applied to the analysis of the corresponding data.
Subsequently, the results after the application of the indices and metrics, and both
hierarchical clustering and product classification algorithm (PCA) are presented in
Section 4. The findings, which are discovered for the products, and how these could lead
326 S. Valsamidis et al.

to BI are discussed in Section 5. The paper then proceeds to Section 6, in which


conclusions are drawn and future work is identified.

2 Literature review

2.1 Knowledge management


KM can be defined as a systematic approach that provides efficient disciplines and
procedures to enable knowledge to grow and create value for organisations (Rao, 2002).
KM has become one of the most important trends in modern businesses across the
globe (Pandey and Dutta, 2013). A general goal of KM is to improve the systematic
handling of knowledge and potential knowledge within the organisation (Heisig, 2009).
Knowledge must be refreshed by the organisation, and therefore knowledge networks are
needed to ensure employees have opportunities to share knowledge (McGurk and Baron,
2012). In addition, Labedz et al. (2011) stated that KM processes that have been
integrated into work processes can be used to correct dysfunctional organisational
behaviour.
KM systems can be categorised as falling into the following groups: groupware,
document management systems, expert systems, semantic networks, relational and
object-oriented databases, simulation tools and artificial intelligence (Gupta and Sharma,
2004).
KM is crucial for maintaining and gaining competitive advantage, as it supports more
effective knowledge acquisition and transfer (Bollinger and Smith, 2001; McKinlay,
2005; Offsey, 1997).
Gummesson (2000) stated that the key task of the organisation is acquiring
institutional knowledge and knowledge of the social interaction processes. KM system
can improve a firm’s operational processes (Arora, 2002). KM efforts focus on
organisational objectives such as improved performance, competitive advantage
innovation, the sharing of lessons learned, integration and continuous improvement of the
organisation (Sanchez, 1996). KM effort needs to convert internalised tacit knowledge
into explicit knowledge in order to share it, but the same effort must also permit
individuals to internalise and make meaningful any codified knowledge retrieved from
the KM effort (Serenko and Bontis, 2004).

2.2 Business intelligence


BI is the process of gathering information as the process of enhancing data into
information and into knowledge in the field of business (Green, 2006). BI is a valuable
core competence and should be managed like traditional factors of labour, capital and raw
material (Von Krogh et al., 1998). Business data and information are the soil that grows
BI, which provides the capability to reason, plan, solve problems, think abstractly,
comprehend ideas and language, and learn from business data and information (Ranjan,
2008).
BI is an umbrella term that combines architectures, tools, databases, applications,
practices and methodologies (Turban et al., 2007). Organisations have failed to realise the
full potential of BI and KM tools to increase corporate performance (Anantatmula and
Kanungo, 2005; Lee et al., 2005; Massey and Montoya-Weiss, 2002).
Knowledge management for business intelligence measurement 327

BI outcomes are achieved through the business processes which are implemented
with tools and information systems in order to empower the acquisition, integration,
sharing and dissemination of organisational knowledge (Bartlett, 1998; Sensiper, 1997).
BI is used to understand the capabilities in which the firm competes; and the actions of
competitors and the implications of these actions (Negash, 2004). BI describes the result
of in-depth analysis of detailed business data, including database and application
technologies as well as analysis practices (Gangadharan and Swamy, 2004). BI is
technically much broader, potentially encompassing KM, ERP, decision-support systems
and DM (Gangadharan and Swamy, 2004). In addition, BI is primarily aimed at
supplying top management with relevant information in order to support strategic
decision making (Bucher et al., 2009). Moss and Atre (2003) described BI as a seamless
integration of operational front-office applications with operational back-office
applications. BI is defined as the solution applying information technology (IT) to
retrieve heterogeneous and distributed resources in order to formulate any usable
knowledge by employing analysis mechanisms (Vine, 2000). In addition, Gangadharan
and Swamy (2004) defined BI as an enterprise architecture for an integrated collection of
operational as well as decision support applications and databases, which provides the
business community easy access to their business data and allows them to make accurate
business decisions. A successful BI ties business and IT together to help enterprises
manage and integrate ongoing investments in BI, allocate BI resources, prioritise projects
and minimise the risk associated with BI implementations (Ranjan, 2008).

2.3 The integration of BI and KM


KM is distinct from BI in many aspects. The majority of models used in the KM field,
such as the tacit and explicit knowledge framework for a dynamic human process of
justifying personal belief toward the truth (Nonaka, 1994; Nonaka and Takeuchi, 1995)
are typically non-technology oriented. In addition, KM deals with unstructured
information and tacit knowledge which BI fails to address (Marwick, 2001).
According to Stevan Dedijer (the father of BI), KM emerged in part from the thinking
of the ‘intelligence approach’ to business. Dedijer thinks that ‘Intelligence’ is more
descriptive than knowledge. “Knowledge is static, intelligence is dynamic” (Marren,
2004).
In addition, the visions of integration of BI and KM are diversified, and issues of
whether KM should be viewed as a subset of BI or vice versa are still under debate in
these two well-established fields (Kasemsap, 2015). While both KM and BI are
influenced by the approaches of the research and practitioners’ communities, the way of
integration of KM and BI seems not unique (Herschel and Jones, 2005). There have been
several models of integration of BI and KM reported in the literature (Herschel and
Jones, 2005). In addition, Malhotra (2004) has proposed general models of integration of
KM and BI for routine structured information processing and non-routine unstructured
sense making.

3 Proposed BI module approach

The proposed approach consists of five main stages, namely storing data, pre-processing
data, indices computation and calculation metrics, application of hierarchical clustering,
328 S. Valsamidis et al.

and application of the PCA and e-business usage assessment. The first three stages
are based on the framework described in detail in Kazanidis et al. (2009) and Valsamidis
et al. (2011, 2012a), and facilitate the extraction of useful information from the data
logged by a web server running a web application.
The main advantages of the proposed approach are that:
• it uses the log file for web usage analysis
• it proposes indices and metrics to be used for the first time in e-business system
• it uses two algorithms for clustering and classification in a different way
• it can be easily adapted to any e-business system
• it visualises the results in a user friendly environment.
The five distinct stages of the approach are depicted in Figure 1. After the logging of data
from e-business system which stands in a web server, the data are pre-processed in order
to be ready for data analysis. It is worth mentioning that the recorded information by the
module secures uses one-to-one mapping between URI and per product stored
information in terms of pages.

Figure 1 Stages of the e-business assessment approach

Indices and metrics which were firstly introduced by the authors are used for the
measurement of the usage of the e-business system.

3.1 Logging the data


This stage involves the logging of specific data from e-business systems. The data
recording module is embedded in the web server of the e-learning platform and records
Knowledge management for business intelligence measurement 329

specific web shop platform fields at the web shop database. More specifically, the
following fields are being recorded:
1 remote_host: The IP of the host that the user connected to the web shop as well as
location info of the connected IP address taken from a who is request.
2 session_id: Session id created at the web system upon user authentication. If a user
does not identify itself with the web shop, a random session_id denoted to the user
based on an md5 hash of its remote IP address.
3 user_name: The username of a logged in user. If user did not log in to the web store
then anonymous username is used (user name can be used in order to extract gender
or age attributes from the e-shops database by performing post-processing analysis).
4 module_id: The e-shop component module_id. The web shop modules identified are
product, new_product, offer_product, wish_productprice_reduction_product, news,
links, payment modules, shipping modules and pricing modules.
5 request_uri: The e-shop webpages that a session_id visits. Each page is identified by
a module_id field. For product, new_product, offer_product, wish product modules,
a product_id field and cat_id field, identifies each product page additionally. If it is
an informatory page such as e-shop news or link page, then it is identified by an
unique news_id or a link_id number. If it is a payment or shipping product module
then it is identified by a payment type id or a ship type id accordingly.
6 request_uri_duration: The time in seconds that a user visits a specific page.
7 session_duration: The total session duration.
8 user_order_id: The order_id of a completed e-shop cart transaction, with transaction
date included.
9 user_payment: The total amount of cart transaction cost.
10 user_wishlist_id: The id of a new product put into user’s wish list from the wish list
module.
11 user_payment_type: The cart enabled payment type selected by customer.
12 user_ship_type: The cart enabled shipping type selected by the customer.
13 user_red_id: The link of a product proposed by the user for a price reduction from
the reduction_product_list_module.
14 user_red_value: The product reduction value.
These fields are being recorded with the use of an Apache module, developed in Perl
programming language, as a first step.
The development of such a module has the following advantages:
• rapid storage of user information, since it is executed straight from the server API
and not by the e-business application, or database
• the produced data are independent of specific formulations used by the web shop
system.
330 S. Valsamidis et al.

3.2 Data pre-processing


The second stage is data pre-processing. The data of the log file contain noise such as
missing values and outliers. These values have to be pre-processed in order to prepare
them for data analysis. Specifically, in this stage the recorded data are filtered. Outlier
detection is performed and extremes in values are removed. This step is not performed by
the e-business system, but in the web server, thus can be embedded into a variety of
e-business system.
The produced log file, from the previous step, is filtered, so it includes only the
following fields:
• productID, which is the identification string of each product;
• sessionID, which is the identification string of each session;
• page Uniform Resource Locator (URL), which contains the requests of each page of
the platform that the user visited.

3.3 Indices and metrics


Although, the aforementioned fields contain information about the e-business process,
more metrics and rates (Table 1) are proposed in order to adequately facilitate the
assessment of product web usage.

Table 1 Attributes name and description

Index/metric name Description of the index/metric


Sessions The total number of sessions per product viewed by users
Pages The number of pages per product viewed by users
Unique pages The number of unique pages per product viewed by users
Unique pages per ProductID The number of unique pages per product viewed by users per
per session (UPPS) session
Homogeneity The homogeneity of products
Enrichment The enrichment of products
Disappointment The disappointment of users when they view pages of the
products
Interest It is the one’s complement to the disappointment

Metrics are used for the facilitation of the product usage assessment. First, the indices
Sessions, Pages, Unique pages, Unique Pages per ProductID per Session (UPPS) are
computed with the use of a Perl program. Then, the metrics Enrichment, Disappointment,
Interest and Homogeneity are calculated. Finally, the rates, mean rate and the Score are
calculated based on the previous metrics.
The number of sessions and the number of pages viewed by all users are counted for
the calculation of viewed by many users but there were also some other pages not so
popular. To refine the situation, we define another index which is called unique pages and
metrics the total number of unique pages visited per product viewed by all users. It counts
each page of the product only once, independently of how many times they were viewed
Knowledge management for business intelligence measurement 331

by the users. The Unique Pages per Product per Session (UPPS) index expresses the
number of Unique Pages per Product visited in one Session; it is used for the calculation
of the product activity in an objective manner. Because some novice users may navigate
in a product and visit some pages of the product more than once, UPPS eliminates
duplicate page visits, since it considers the visits of the same user in a session only once.
Enrichment is a metric which is proposed in order to express the ‘enrichment’ of each
product in terms of educational material. Enrichment is defined as the complement of the
ratio of the unique pages over total number of product webpages as proposed in
Valsamidis et al. (2010a).
Enrichment = 1 – (Unique Pages/Total Pages), (1)
where Unique Pages ≤ Total Pages.
Enrichment values are in the range [0, 1). When users follow unique paths in a
product this is 0 while in a product with minimal unique pages this is close to 1. Since it
offers a metric of how many unique pages were viewed by the users, it shows how much
information included in each product is handed over to the end user inferring that the
product contains rich educational material.
Disappointment is a metric which combines sessions and pages viewed by users and
it metrics the disappointment of the users in the product, in the sense that when a user
views few pages of the product, he or she logs out of the product.
Disappointment = Sessions/Total Pages. (2)
In other words, the disappointment metric reflects how quickly the users discontinue
viewing pages of the products. Disappointment values are in the range (0, 1]. Owing to
the negative nature of the Disappointment metric, it was replaced by another metric
which has a positive sounding manner, Interest. Interest metric is defined as the
complement to the disappointment.
Interest = 1 − Disappointment. (3)
Both disappointment and interest metrics were proposed in Valsamidis et al. (2010b).
Homogeneity metric is another metric, which is defined as the ratio of unique visited
product pages to the number of sessions that visited the product (Valsamidis et al.,
2012b).
Homogeneity = Unique pages/Total Sessions. (4)
where Total Sessions per product  Unique product pages.
Homogeneity metric value ranges from [0, 1), where 0 means that no user followed a
unique path and 1 that every user followed unique paths. It is a product quality index and
characterises the percentage of product information discovered by each user participating
in a product. The aforementioned metrics contribute to the assessment of product usage.
The aforementioned metrics once counted allow us to rank products.

3.4 Hierarchical classification


Hierarchical ascendant classification (HAC) is a multiple data analysis methods that
maps input data into defined subsumptive output categories. The classification occurs
first on a low-level with highly specific pieces of input data. The classifications of the
332 S. Valsamidis et al.

individual pieces of data are then combined systematically and classified on a higher
level iteratively until one output is produced. This final output is the overall classification
of the data. Depending on application-specific details, this output can be one of a set of
pre-defined outputs, one of a set of online learned outputs, or even a new novel
classification that has not been seen before. Generally, such systems rely on relatively
simple individual units of the hierarchy that have only one universal function to do the
classification (Papadimitriou, 2007).
The data of our research are quantitative variables. We categorised the quantitative
variables in order to convert them to qualitative variables, by dividing them in three value
classes. For equivalence raisons between variables, we use three categories for each
variable, as they defined be quartiles. The first category of each variable contains values
less than the first quartile (25% of all data), the second category contains values up to the
first quartile and the third quartile (50% of all data) and the third category contains values
greater than the third quartile (25% of all data). So, each variable counted is measured
three values 1, 2, 3 (small, medium, large, respectively) and there are no differences in
variable measurements. Each variable has the same significance in the datatable. The first
category represents small values (smaller than first quartile), the second category
represents medium values (between first quartile and third quartile) and the third category
represents large values (greater than third quartile). The four main variables (Sessions,
Pages, Unique pages, UPPS) were used as active variables for the analysis. The initial
data matrix had 40 rows (products) and 4 columns with initial quantitative variables. The
new data table has 40 rows and 12 columns (4 variables × 3 classes).

3.5 PCA
The algorithm we propose is called PCA and it initially tries to classify e-business system
products based on poor or rich quantity of product information material. Afterward, based
on e-business system products with adequate information material, it tries to spot how
often product information is added or updated by administrators (or users based on
homogeneity classification) or followed by users (the updated information as it is
discovered by users). Finally, using the UPPS metric it tries to identify whether updates
of product information can increase the customer’s interest for the specific product. PCA
algorithm discovery schema is depicted in Figure 2.
According to the above the proposed algorithm is based on Enrichment, Homogeneity
and UPPS and is consisted by the corresponding stages.
At the first stage of the algorithm the Enrichment metric is involved in order to
identify products with poor or rich educational content (poor equals to small enrichment
value while rich to high enrichment value). We place to an N-ordered table a set of N
products based on Enrichment, where N ≤ Total e-business system products, the products
with the highest Enrichment metric values.
On the second stage, the algorithm classifies the previous set of N products using the
values of Enrichment and Homogeneity. The classification of e-business system l
products is performed using four clusters as shown in Figure 2. The highest the
Homogeneity value the more frequent the product updates or the more dynamic the
product content, depending on Enrichment value. The lowest the Homogeneity value then
the e-business system is more of static content or of poor content updates. The
classification of the products is depended on the average Enrichment value of the N
Knowledge management for business intelligence measurement 333

e-business system products and the average Homogeneity value of the high and low
Enrichment clusters accordingly.

Figure 2 PCA discovery schema (see online version for colours)

The aim of the third stage of the algorithm is to identify whether the content can be
characterised as rich or poor, and whether is static, frequent or dynamic. To do this, we
order each cluster’s products based on the value of the UPPS.

4 Results

4.1 Study population and context


After the application of the described method at the first two stages, the data are
organised as shown in Appendix 1. There are data for 40 products but because of the lack
of space there are results only for the 40 products. The dataset was collected from the e-
shop with URL http://www.tapanda.gr/.

4.2 Data pre-processing and calculation of the metrics and rates


The data are in ASCII form and are obtained from the Apache server log file. As
described in the second stage of the methodology, the produced log file is filtered and
pre-processed in order to include the following fields: ProductID, SessionID and page
uniform resource locator (URL). In the third step, the indices are computed and, the
metrics and the rates are calculated.
334 S. Valsamidis et al.

4.3 Hierarchical clustering results


We apply the HAC to detect the characteristics of each product group. We analyse the
datatable, using hierarchical cluster analysis, with chi-2 metric as variables are
categorised (Papadimitriou, 2007). We use Chic analysis (Markos et al., 2010) software,
Ward criterion and we take results as they are seeing in the tree in Figure 3.

Figure 3 Dendrogram of classification

As it can be seen in Figure 3, we spot three groups of products. At first, we spot the group
(no. 73) which is characterised by large values of Sessions, Pages and UPPS.
We can characterise this cluster as ‘large frequency Pages’.
Next, we spot the group (no. 76) of products with medium and large values of Unique
pages. We can characterise this cluster as ‘Rich Pages’.
Furthermore, we spot the group no. 77 of products with low values of Sessions, Pages
and UPPS. This group is separated into three subgroups. Group no. 66 is characterised by
low values of Unique page. Group no. 72 is characterised by low values of Page and
medium values of Unique page. Group no. 70 is characterised by medium values of Page
and large values of Unique page.

4.4 Application of PCA


The proposed algorithm usefulness in a data mining tool should be evaluated. For the
evaluation of the proposed algorithm, we based on a dataset from a real e-business
environment used in e-shop. The data are in ASCII form and are obtained from the
Apache server log file.
Knowledge management for business intelligence measurement 335

After the end of pre-processing stage the PCA algorithm was applied. We initially
order 40 products according to Enrichment metric. We wanted to test our algorithm, so
we picked the best and worse e-business system products from a list of 40 products which
are shown in Table 2. That is, best and worst cases from products’ usage point of view.

Table 2 Processed data for 12 products with average enrichment value of 0.899

Product ID Sessions Pages Unique pages UPCS Homogeneity Enrichment


PID132 152 230 5 184 0.033 0.978
PID35 87 338 9 179 0.103 0.973
PID125 93 164 6 134 0.065 0.963
PID129 75 209 8 131 0.107 0.962
PID105 91 297 12 216 0.132 0.960
PID41 98 185 8 129 0.082 0.957
PID36 72 217 10 134 0.139 0.954
PID17 53 206 21 89 0.396 0.899
PID66 56 144 16 107 0.286 0.889
PID8 45 135 18 82 0.400 0.867
PID122 33 71 21 45 0.636 0.704
PID112 30 62 20 46 0.667 0.677

On the basis of the previous order by Enrichment of 12 e-business system products


(Table 2), we apply the PCA using an average Enrichment value of 0.89 and average
homogeneity value for the high enrichment cluster of 0.09 and for the low enrichment
cluster 0.45. The PCA classification produced four clusters, which are shown in Table 3.

Table 3 Clustering of the 12 products based on PCA

Enrichment Homogeneity Product Unique


class clusters ID Sessions Pages pages UPPS Homogeneity Enrichment
Dynamic PID105 91 297 12 216 0.132 0.960
content or PID35 87 338 9 179 0.103 0.973
frequently
updated, PID36 72 217 10 134 0.139 0.954
High

cluster I PID129 75 209 8 131 0.107 0.962


Static content PID132 152 230 5 184 0.033 0.978
with frequent PID125 93 164 6 134 0.065 0.963
updates,
cluster II PID41 98 185 8 129 0.082 0.957
Dynamic PID112 30 62 20 46 0.667 0.677
content with less PID122 33 71 21 45 0.636 0.704
updates,
cluster III
Low

Static content, PID66 56 144 16 107 0.286 0.889


cluster IV PID17 53 206 21 89 0.396 0.898
PID8 45 135 18 82 0.400 0.867
336 S. Valsamidis et al.

As shown in Table 3, for each one of the four classes the e-business system products are
ordered based on the UPPS metric value. So, the products PID105 and PID36 are the
representatives of high and low UPPS values for cluster I, PID132 and PID41 for cluster
II, PID112 and PID122 for cluster III and PID66 and PID8 for cluster IV accordingly.
In Table 4, we present these products and PCA evaluation feedback for each one of
those products.

Table 4 Clustering of the 12 products based on PCA

Cluster ID Product ID CCA evaluation


I PID105 High activity e-Business system with updates followed by users
I PID36 High activity e-Business system with frequent educator updates that
are not followed by users
II PID132 High activity e-Business system with Static content, frequently
updated and followed by users
II PID41 High activity e-Business system with static content, frequently
updated but poorly followed by users
III PID112 Garbage product or forum with updates-need for further evaluation
III PID122 Abandoned product of dynamic content left to open view
IV PID66 Product of poor static content that still contains information followed
by users (or forced to follow)
IV PID8 Abandoned product of poor static content occasionally followed by
curious users

5 Discussion

The indication that many pages within useful paths contribute to increased usage is fairly
obvious. Namely, the more and better content on a site, the more a user might visit it.
So the administrators should add some useful and helpful pages to a site. However, the
case is not that simplistic. If there is an essentially blank site but it is required for the
customers to visit it every day and contribute a comment, then the usage will be
necessarily high. On the other hand, if there is a very elaborate website with rich content
but is not required reading, limited usage of the site would be expected. So these issues
have to be adequately addressed. Here these issues are simply mentioned but they have to
be considered for future work.
The application of two algorithms proved that there is a relationship between
e-business system usage and the corresponding product purchases. An increased score to
UPPS leads to better sales of product and therefore to an improved business outcome.
The fact that only 40 products in one e-business system were investigated is a
limitation to the study. Especially for the data analysis techniques which demand large
datasets. However, this was ineluctable since the e-business system of the case study had
this number of active online products. In the future, we intend to apply the same approach
in other e-business systems.
We also plan to further automate the whole procedure, that is, we are developing a
plug-in tool to automate the data pre-processing and measures calculation steps. This tool
Knowledge management for business intelligence measurement 337

will run periodically (each week) and will notify the results to the administrators. We
intend the final tool to offer insights at two levels:
• online, with total statistical information such as number of visits per product (pages
and sessions), customer trends and activities at their visits, as well as detailed
information per customer (customer duration per product and activity, customer
preferences and activities for all products)
• offline, with the use of data mining techniques such as pre-process, visualisation,
clustering, classification, regression and association, discovering hidden data
patterns.

6 Conclusions

The proposed iterative method uses existing tools and techniques in a novel way to
perform e-business systems usage analysis. The metrics enrichment, homogeneity,
disappointment and interest are used. It also incorporates clustering and classification
algorithms.
It has the following advantages.
• It is independent of a specific e-business system, since it is based on the Apache log
files and not the e-business system itself. Thus, it can be easily implemented for
every e-business system
• It uses indices and metrics in order to facilitate the evaluation of each product.
• It offers useful information for a company to have to determine which parts of its
website to improve.
It is worth mentioning that this approach may be applied after a long time period of data
tracking. For example, the enrichment metric measures the number of visited pages in a
session divided by the number of available pages of a product. It is true that the higher
the number of pages in the product, the lower is the fraction a user can see in one session.
On the other hand, the administrators may update the product material with more
information so the ratio is not fully diminished.
Finally, it should be mentioned that the proposed approach may also be applied to
other web applications such as e-government, e-learning, e-banking, blogs, social
networks, etc. For example, in e-government applications, enrichment shows how much
information is handed over to the end user and homogeneity characterises the percentage
of information independently discovered by each user. Interest indicates whether users
are pleased with the material and its usefulness on a website. Furthermore, UPPS gives an
objective view for website usage.

References
Adelman, S., Moss, L. and Barbusinski, L. (2002) ‘I found several definitions of BI’, DM Review,
Retrieved 5 December, 2013, from http://www.dmreview.com/article_sub.cfm?articleId=5700
Anantatmula, V. and Kanungo, S. (2005) ‘Establishing and structuring criteria for measuring
knowledge management efforts’, Paper Presented at the 38th Hawaii International Conference
on System Sciences (HICSS-38), Big Island, HI.
338 S. Valsamidis et al.

Arora, R. (2002) ‘Implementing KM – a balanced scorecard approach’, Journal of Knowledge


Management, Vol. 6, No. 3, pp.240–249.
Bartlett, C. (1998) McKinsey & Company: Managing Knowledge and Learning, Case 9-396-357,
Harvard Business School Press, Cambridge, MA.
Bollinger, A.S. and Smith, R.D. (2001) ‘Managing organizational knowledge as a strategic asset’,
Journal of Knowledge Management, Vol. 5, No. 1, pp.8–18.
Bucher, T., Gericke, A. and Sigg, S. (2009) ‘Process-centric business intelligence’, Business
Process Management Journal, Vol. 15, No. 3, pp.408–429.
Davenport, T.H. and Prusak, L. (1998) Working Knowledge: Managing What your Organization
Knows, Harvard Business School Press, Boston, MA.
Desouza, K.C. and Awazu, Y. (2006) ‘Knowledge management at SMEs: Five peculiarities’,
Journal of Knowledge Management, Vol. 10, No. 1, pp.32–43.
Elragal, A. and Gendy, N.E. (2013) ‘Trajectory data mining: Integrating semantics’, Journal of
Enterprise Information Management, Vol. 26, No. 5, pp.516–535.
Gangadharan, G.R. and Swamy, S.N. (2004) ‘Business intelligence systems: design and
implementation strategies’, Paper presented at the 26th International Conference on
Information Technology Interfaces, Cavtat, Croatia.
Green, A. (2006) ‘The starting block: enterprise (business) ‘intelligence – evolving towards
knowledge valuation’, VINE: The Journal of Information and Knowledge Management
Systems, Vol. 36, No. 3, pp.267–277.
Green, A. (2007) ‘Business information – a natural path to business intelligence: knowing what to
capture’, VINE: The Journal of Information and Knowledge Management Systems, Vol. 37,
No. 1, pp.18–23.
Gummesson, E. (2000) Qualitative Methods in Management Research, Sage, Thousand Oaks, CA.
Gupta, J. and Sharma, S. (2004) Creating Knowledge Based Organization, Idea Group, Boston,
MA.
Heisig, P. (2009) ‘Harmonisation of knowledge management: comparing 160 KM frameworks
around the globe’, Journal of Knowledge Management, Vol. 13, No. 4, pp.4–31.
Herschel, R.T. and Jones, N.E. (2005) ‘Knowledge management and business intelligence: the
importance of integration’, Journal of Knowledge Management, Vol. 9, No. 4, pp.45–55.
Kasemsap, K. (2015) ‘The role of data mining for business intelligence in knowledge
management’, in Azevedo, A. and Santos, M. (Eds.): Integration of Data Mining in Business
Intelligence Systems, IGI Global, Hershey, PA, pp.12–33.
Kazanidis, I., Valsamidis, S., Theodosiou, T. and Kontogiannis, S. (2009) ‘Proposed framework for
data mining in e-learning: the case of open e-Class’, in Weghorn, H. and Isaias, P. (Eds):
Proceedings of Applied Computing, IADIS Press, Rome, Italy, pp.254–258.
Labedz, C., Cavaleri, S. and Berry, G. (2011) ‘Interactive knowledge management: putting
pragmatic policy planning in place’, Journal of Knowledge Management, Vol. 15, No. 4,
pp.551–567.
Lee, K.C., Lee, S. and Kang, I.W. (2005) ‘KMPI: measuring knowledge management
performance’, Information and Management, Vol. 42, No. 8, pp.469–482.
Luhn, HP. (1958) ‘A business intelligence system’, IBM Journal, Vol. 2, No. 4, p.314.
Malhotra, Y. (2004) ‘Why knowledge management systems fail: enablers and constraints of
knowledge management in human enterprise’, in Koenig, E. and Srikantaiah, T.K. (Eds.):
Knowledge Management: Lessons Learned, Information Today, Medford, NJ, pp.87–112.
Markos, A. Menexes, G. and Papadimitriou, I. (2010) ‘The CHIC analysis software v1.0’, in
Loracek-Junge, H. and Weihs, C. (Eds.): Classification as a Tool for Research, Proceedings
of the 11th IFCS Conference, Springer, Berlin, pp.409–416.
Marren, P. (2004) ‘The father of business intelligence’, Journal of Business Strategy, Vol. 25,
No. 6, pp.5–7.
Knowledge management for business intelligence measurement 339

Marwick, A.D. (2001) ‘Knowledge management technology’, IBM Systems Journal, Vol. 40,
No. 4, pp.814–829.
Massey, A.P. and Montoya-Weiss, M. (2002) ‘A performance environment perspective of
knowledge management’, Paper Presented at the 36th Hawaii International Conference on
System Sciences (HICSS-36), Big Island, HI.
McGurk, J. and Baron, A. (2012) ‘Knowledge management: time to focus on purpose and
motivation’, Strategic HR Review, Vol. 11, No. 6, pp.316–321.
McKinlay, A. (2005) ‘Knowledge management’. in Ackroyd, S., Batt, R. and Thompson, P. (Eds.):
The Oxford Hand Book of Work and Organization, Oxford University Press, Oxford, UK,
pp.242–262.
Mellor, R.B. (2001) Knowledge Management and Information Systems: Strategies for Growing
Organizations, Palgrave Macmillan, New York, NY.
Moss, T.L. and Atre, S. (2003) Business Intelligence Roadmap: The Complete Project Lifecycle for
Decision Support Applications, Addison Wesley Longman, Reading, MA.
Negash, S. (2004) ‘Business intelligence’, Communications of the Association for Information
Systems, Vol. 13, pp.177–195.
Nonaka, I. (1994) ‘A dynamic theory of organizational knowledge creation’, Organization Science,
Vol. 5, No. 1, pp.14–27.
Nonaka, I. and Takeuchi, H. (1995) The Knowledge-Creating Company, Oxford University Press,
Inc., New York.
Offsey, S. (1997) ‘Knowledge management: linking people to knowledge for bottom line results’,
Journal of Knowledge Management, Vol. 1, No. 2, pp.113–122.
Pandey, S.C. and Dutta, A. (2013) ‘Role of knowledge infrastructure capabilities in knowledge
management’, Journal of Knowledge Management, Vol. 17, No. 3, pp.435–453.
Papadimitriou, G. (2007) Data Analysis, Ed. Tipothito, Athens.
Power, D.J. (2007) A Brief History of Decision Support Systems. Retrieved 28 May, 2009 from
DSSResources.com: http://DSSResources.com/history/dsshistory.html.
Ranjan, J. (2008) ‘Business justification with business intelligence’, VINE: The Journal of
Information and Knowledge Management Systems, Vol. 38, No. 4, pp.461–475.
Rao, M. (2002) Knowledge Management Tools and Techniques: Practitioners and Experts
Evaluate KM Solutions, Elsevier, Amsterdam, The Netherlands.
Rastogi, P. (2000) ‘Knowledge management and intellectual capital: the new virtuous reality of
competitiveness’, Human Systems Management, Vol. 19, No. 1, pp.39–49.
Sanchez, R. (1996) Strategic Learning and Knowledge Management, John Wiley & Sons,
Chichester, UK.
Sensiper, S. (1997) AMS Knowledge Centers, Case N9-697-06, Harvard Business School Press,
Boston, MA.
Serenko, A. and Bontis, N. (2004) ‘Meta-review of knowledge management and intellectual capital
literature: citation impact and research productivity rankings’, Knowledge and Process
Management, Vol. 11, No. 3, pp.185–198.
Sharma, R.S. and Djiaw, V. (2011) ‘Realising the strategic impact of business intelligence tools’,
VINE: The Journal of Information and Knowledge Management Systems, Vol. 41, No. 2,
pp.113–131.
Turban, E., Aronson, J.E., Liang, T.P. and Sharda, R. (2007) Decision Support and Business
Intelligence Systems, 8th ed., Pearson Prentice Hall, New York, USA.
Valsamidis, S., Kontogiannis, S., Kazanidis, I. and Karakos, A. (2010a) ‘Homogeneity and
enrichment, two metrics for web applications assessment’, Proceedings of 14th Panhellenic
Conference on Informatics (PCI2010), Tripoli, Greece.
Valsamidis, S., Kazanidis, I., Kontogiannis, S. and Karakos, A. (2010b) ‘Automated suggestions
and course ranking through web mining’, Proceedings of 10th IEEE International Conference
on Advanced Learning Technologies ICALT 2010, Sousse, Tunisia.
340 S. Valsamidis et al.

Valsamidis, S., Kontogiannis, S., Kazanidis, I. and Karakos, A. (2011) ‘E-Learning platform usage
analysis’, Interdisciplinary Journal of E-Learning and Learning Objects (IJELLO), Vol. 7,
pp.185–204.
Valsamidis, S., Kontogiannis, S., Kazanidis, I., Theodosiou, T. and Karakos, A. (2012a)
‘A clustering methodology of web log data for learning management systems’, Educational
Technology & Society, Vol. 15, No. 2, pp.154–167.
Valsamidis, S., Kontogiannis, S., Kazanidis, I. and Karakos, A. (2012b) ‘An approach for LMS
assessment’, International Journal of Technology Enhanced Learning IJTEL, Vol. 4, No. 3,
pp.265–283.
Vine, D. (2000) Internet Business Intelligence: How to Build a Big Company System on a
Smallcompany Budget, CyberAge Books, Medford, NJ.
Von Krogh, G., Roos, J. and Kleine, D. (1998) Knowing in Firms: Understanding, Managing and
Measuring Knowledge, Sage, London, UK.

Appendix 1: E-business data for 40 products


Product ID Sessions Pages Unique_pages UPPS Enrichment Homogeneity Interest
PID105 94 299 12 218 0.960 0.128 0.686
PID35 89 339 9 182 0.973 0.101 0.737
PID132 158 235 8 198 0.966 0.051 0.328
PID36 76 219 8 134 0.963 0.105 0.653
PID129 78 211 7 132 0.967 0.090 0.630
PID125 96 166 9 136 0.946 0.094 0.422
PID41 101 188 9 132 0.952 0.089 0.463
PID66 59 148 9 109 0.939 0.153 0.601
PID17 55 221 12 92 0.946 0.218 0.751
PID111 35 144 9 81 0.938 0.257 0.757
PID8 45 137 9 84 0.934 0.200 0.672
PID11 52 109 8 84 0.927 0.154 0.523
PID98 32 118 10 63 0.915 0.313 0.729
PID99 38 119 12 65 0.899 0.316 0.681
PID62 24 95 11 55 0.884 0.458 0.747
PID61 33 76 11 65 0.855 0.333 0.566
PID44 51 84 11 77 0.869 0.216 0.393
PID26 51 94 9 73 0.904 0.176 0.457
PID14 47 124 8 62 0.935 0.170 0.621
PID34 38 115 6 59 0.948 0.158 0.670
PID133 26 82 8 55 0.902 0.308 0.683
PID115 19 75 9 43 0.880 0.474 0.747
PID9 27 106 12 43 0.887 0.444 0.745
PID112 30 66 7 48 0.894 0.233 0.545
Knowledge management for business intelligence measurement 341

Appendix 1: E-business data for 40 products (continued)


Product ID Sessions Pages Unique_pages UPPS Enrichment Homogeneity Interest
PID120 39 82 5 48 0.939 0.128 0.524
PID122 34 73 8 46 0.890 0.235 0.534
PID64 22 49 9 42 0.816 0.409 0.551
PID80 14 40 8 35 0.800 0.571 0.650
PID50 23 48 8 39 0.833 0.348 0.521
PID60 23 44 6 41 0.864 0.261 0.477
PID10 17 63 9 29 0.857 0.529 0.730
PID114 28 44 5 36 0.886 0.179 0.364
PID21 12 27 9 26 0.667 0.750 0.556
PID96 21 32 6 33 0.813 0.286 0.344
PID23 30 39 6 33 0.846 0.200 0.231
PID130 13 32 6 22 0.813 0.462 0.594
PID15 12 26 8 21 0.692 0.667 0.538
PID134 25 29 5 29 0.828 0.200 0.138
PID49 15 24 5 22 0.792 0.333 0.375
PID67 18 24 5 25 0.792 0.278 0.250

View publication stats

You might also like