You are on page 1of 8

ISBN: 978-81-930654-7-5

Proceedings of ICEEM-2016

SEMANTIC WEB-PAGE RECOMMENDER SYSTEM


T.vetriselvi
V
2
Department of cse,
1,2 K. Ramakrishna College of technology, Trichy.
E-Mail:cse.vinodhini@gmail.com
E-Mail:cse.vinodhini@gmail.com,
E-mail:vetriselvi09@gmail.com
1 P.Vinothini,

Abstract -With the explosive growth of internet, large number of users are doing online search to satisfy their information need.
Web Usage Mining plays an important role in discovering knowledge representing the online users behaviour from the available
web log data. Satisfying online users need by the traditional web usage mining system is a challenging task as it solely constructed
by the web usage data of online users. Web-page recommendation is used to effectively capture intuition of online users. In order
to make Web-page recommendation system to accurately capture the intuition of the users, we proposed two novel knowledge
representation models to provide semantic enhancement to the web-page recommender system. The first model, namely semantic
network of a website, which represents domain knowledge by domain terms, Web-page and relations between them. Web Usage
model generates the frequent web access patterns by sequential pattern mining algorithms based on the usage data from the web
server. The second model, namely Conceptual Prediction Model (CPM), which integrates the semantic knowledge with the web
usage model resulting in weighted semantic network of semantic web usage knowledge. CPM constructs weighted semantic network
with the Frequently Viewed Terms as nodes, where weight represents the probability of transition between adjacent terms, using
Markov models.
Index terms: Web Usage mining, semantic knowledge, conceptual prediction model, semantic network, domain terms.
By using this knowledge, when user comes online for the
next time, they predict next Web-page(s) that user most
likely to visit, given the current Web-page and previously
visited k- Web pages.
The performance of these approaches depends on
the sizes of training usage datasets. The bigger the
training dataset size is, the higher the prediction accuracy
is. The main drawback of these Web-page
recommendations are that they solely based on the Web
access sequences learnt from the Web usage data.
Therefore, if a user is visiting a new Web-page that is not in
the training usage data, then these approaches does not offer any
recommendations to this user. This problem is referred to as
new-item problem.
Some studies are showing that semanticenhanced approaches are used to overcome these newitem problem [2],[3] by using domain ontology.
Integrating domain knowledge with Web usage
knowledge improves the prediction accuracy of the
recommender systems using ontology based Web mining
techniques [4][6].Web usage mining enriched with
semantic information showed higher performance than
classic Web usage mining algorithms [5]-[6]. However,
the main issue in these approaches are the problem facing
in representing and acquiring the semantic domain
knowledge. A lot of researches are going in this domain
ontology.
The domain ontology are mostly used to
represent the semantics of a website, which can be
constructed manually by experts or automatically by
learning models, such as the Bayesian network or a
collocation map, for many different applications. Given
the very large size of Web data in todays websites,
building ontology manually for a website is challenging
task and they are time consuming and less reusable.
According to Stumme, Hotho and Berendt, it is

1. INTRODUCTION:
Web Mining is the major area in data mining
applications which discover patterns from the web data,
in order to better understand the needs of web-based
applications. Web mining can be divided into three
different types, which are web usage mining, web content
mining and web structure mining. Web Usage Mining
(WUM) is the process of discovering or extracting
patterns from the users access data in the web. Usage
data of the user is collected from one or more Web
servers. Web usage mining is very useful in
understanding the users interests and their network
behaviours. A typical application of WUM is represented
by the recommender system.
The main goal of a Web-page recommender
system is to effectively forecast the Web-page(s) that will
be visited next while user navigating through the website.
Web-Page recommendation is a system that captures
intuition of online users by their browsing patterns and
recommending those to users in the form of links to
stories, books, or interested pages. There are lot of
difficulties in developing an effective Web-page
recommender system, such as how to effectively learn the
users online behaviour and Web-page navigation
patterns from available historical usage data and, how to
discover these knowledge, and how to make online
recommendations system based on the discovered
knowledge.
In order to efficiently represent Web access
sequences (WAS) from the Web usage data, some studies
shown that approaches based on the tree structures and
probabilistic models are used [1]. These approaches are
using the historical web usage data and construct user
profile, which consist of links between Web-pages that
user are mostly interested, based only on the usage data.

www.iirdem.org

14

IIRDEM 2016

ISBN: 978-81-930654-7-5

Proceedings of ICEEM-2016

impossible to manually discover the meaning of all Webpages and their usage for a large scale website [10].
Automatic construction of ontologies saves time and
discovers all possible concepts within a website and links
between them, and they are reusable. However, the
drawback of this automatic approach is the need to
design and implement the learning models which can
only be done by professionals at the beginning.
This paper presents a novel method to provide
better Web-page recommendation by integrating Web
usage and domain knowledge. Two new knowledge
representation models and a set of Web-page
recommendation strategies are proposed in this paper.
The first model is a semantic network that represents
domain knowledge, which can be constructed
automatically. As it is fully automated, it can be easily
integrated with the Web-page recommendation process.
The second model is a conceptual prediction model,
which is a navigation network of domain terms based on
the frequently viewed Web-pages. This represents the
integrated Web usage and domain knowledge which
supports Web-page prediction and it can also be
constructed
automatically.
The
proposed
recommendation strategies predict the next pages with
probabilities for a given Web user based on his or her
current Web-page navigation state through these two
models. This new method has automated the knowledge
base construction and alleviated the new-item problem.
This method yields better performance compared with
the existing Web usage based Web-page recommendation
systems.
This paper is structured as follows: Section 2
discusses about the related works; Section 3 briefs the
architecture diagram and the implementation of web
usage mining. Section 4 presents the first model, i.e. a
semantic network of domain terms. Section 5 presents the
second model, i.e. a conceptual prediction model (i.e.
integrating the semantic knowledge with the web-page
recommendation). For each of the models presented in
Sections 4-6, the corresponding queries that are used to
retrieve semantic information from the knowledge
models have been presented. Section 6 presents a set of
recommendation strategies based on the queries to make
semantic-enhanced Web-page recommendations.

similar information from their usage data collected from


the web server. Then the online phase predicts which
cluster the current user may fall by their active user
sessions and suggest the list of pages which are related to
the current session. This approach has several drawbacks:
mainly scalability and accuracy. SUGGEST 1.0 [21] was
proposed as a two-tier system composed of off-line
module which analyse the Web servers access log file,
and an online classification module which carried out the
second stage. Its main drawback was the asynchronous
cooperation between the two modules. In the next
version, SUGGEST 2.0, the two modules were merged to
perform the same operations but in a complete online
fashion. This results in the problem of estimating the
update frequency of the knowledge base. Potential
limitation of SUGGEST 2.0 might be: a) the memory
required to store Web server pages is quadratic in the
number of pages. b) it does not permit us to manage
Websites made up of pages dynamically generated.
Bamshad Mobasher et al. [19] presented WebPersonalizer,
a system that provides dynamic recommendations as a
list of hypertext links to users. The method is based on
anonymous usage data combined with the Web site
structure. F. Masseglia et al. [20] proposed an integrated
system, WebTool, which is based on sequential patterns
and association rules extraction to dynamically customize
the hypertext organization. The current user's behaviour
is compared with previously induced sequential patterns
and navigational hints are provided to the user. In
traditional web recommendation system [2], Sequential
mining is effectively used to discover the web access
patterns, particularly tree structures and markov models
are used. WAP-Tree is a tree structure used for holding
access sequences in a very compact form to enable access
pattern mining. In [7], they proposed the PLWAP-Mine,
which use the PLWAP tree structure to incrementally
update web sequential access patterns efficiently without
scanning the whole database even when previous small
items become frequent. The position code features of the
PLWAP tree are used to efficiently mine these trees to
extract current frequent patterns when the database is
updated. FOL-Mine is an efficient sequential pattern
mining algorithm proposed in [8]. It is based on the
concept of WAP-tree but uses a special linked structure to
hold access sequences for processing and proved to be
efficient. FOL-mine is proved better than all existing
WAP-tree mining methods. FOL-list is used to hold the
first occurrence information of items during the mining
of patterns in the intermediate projected databases. This
manages the suffix building very efficiently. The node
structure suggested in [14] is modified to process the
weighted support of sequences. Based on the study [9],
weighted sequential pattern mining is better than all nonweighted sequential pattern mining (eg: FOL-Mine) by

2. LITERATURE SURVEY:
Research work related to the web-page
recommender system that combines the web usage
mining with the semantic knowledge is very limited. So
they can be classified by the following two approaches :

2.1Traditional Usage Based Approaches


Analog is the first Web Usage Mining systems. It
consists of two components: offline and online. In offline
phase, they construct the session clusters that exhibit

www.iirdem.org

15

IIRDEM 2016

ISBN: 978-81-930654-7-5

Proceedings of ICEEM-2016

giving weights to the item in Web Access Sequence


Database (WASD).They use the modified form of the
structure used in [8] and are enhanced by holding weight
information of the item. This method needs only one
database scan to generate weighted list structure.

effort in defining Web pages and objects in terms of


semantic information by using ontology. In [2], the first
part covers how the content and the structure of the site
can be leveraged to transform raw usage data into
semantically-enhanced transactions which is then used
for semantic Web usage mining and personalization. The
second part presents a framework for more
systematically integrating full-fledged domain ontologies
in the personalization process. In [12], the proposed
system is domain-independent, is implemented as a Web
service, and uses both explicit and implicit feedbackcollection methods to obtain information on users
interests. Domain-based method makes inferences about
users interests and a taxonomy-based similarity method
is used to refine the item-user matching algorithm,
improving recommendation prediction.

2.2 semantic-Enhanced Approaches


A lot of research reported that Web-Page
recommendation can made more accurate by integrating
the web usage knowledge with the domain knowledge.
In [11]-[12], domain ontology of the websites is used to
improve the recommendation process. In [11], Liang Wei
and Song Lei used ontology, which includes concepts and
significant terms extracted from documents, to represent
a websites domain knowledge. They generate online
recommendations by semantically matching and
searching for frequent pages discovered from the Web
usage mining process. This approach showed higher
precision rates, coverage rates and matching rates. In
[6],[13] ontology reasoning are used, where Web access
sequences are converted into sequences of ontology
instances, to make recommendation. In these studies, the
Web usage mining algorithms find the frequent
navigation paths in terms of ontology instances rather
than normal Web-page sequences. In [14], they proposed
SWUM (Semantically enriched Web Usage Mining)
method which incorporate the semantic data and site
structure with the solely usage data based WebPUM
method. WebPUM represents usage data by means of
adjacency matrix and induces the navigation patterns
using a graph partitioning technique, which is then
enriched with the semantic data of the website. The
semantic metadata extracted takes into account both the
semantics in a page contents and the semantic
relationship in the Web pages. The semantic similarity is
represented in terms of a semantic similarity matrix that
gives the similarity score between every pair of Web
pages. Thus, the semantic similarity matrix is combined
with the adjacency matrix in order to derive the
semantically enriched weight matrix, and the resulting
navigation patterns are fed into recommendation engine.
The drawback is that the system is suitable only for
statically generated web-pages of the website. In [15],
frequent sequential patterns are enriched with semantic
information, which are expressed in terms of ontology
instances instead of web page sequences, are used for
recommending subsequent pages to the user. The
discovered Semantic rich sequential association rules
form the core knowledge of the recommendation engine
of the proposed model. The vision of a Semantic Web has
recently drawn attention both from academic and
industrial circles. The incorporation of semantic Web for
generating personalized Web experience is to improve
the results of Web mining by exploiting the new semantic
structures [2]. As a consequence, there is an increasing

www.iirdem.org

3 ARCHITECTURE OF WEB-PAGE
RECOMMENDER SYSTEM:
The
implementation
of
the
recommendation system is taken place in two
components: offline and online. Offline component
builds the knowledge base by analysing the historical
data, such as server access log file or web logs which are
captured from the server, then these web logs are used in
the online component for capturing intuition list of the
user so as to recommend page views to the user whenever
user comes online for the next time. Data collection, data
pre-processing, pattern discovery and pattern analysis
are the steps to be taken in web usage mining in offline
phase.

3.1 Data Collection:


Data collection is the first step in web usage
mining. Web usage data are collected from the three main
sources: Web servers, proxy servers and client-side
requests. In [17],Cooley and Mobasher reported that large
information reside only in server log files and it is difficult
to get the data from proxy servers and from client side
browsing, So we use the server log files as a primary data
sources. There are several types of log files. IIS web log
consists of 17 attribute, each represents data in records.
The fragment of IIS web log:

3.2 Data Pre-processing:


Generally, data cleaning, identification of user,
session and path completion are various steps involved
in pre-processing.

3.2.1. Data Cleaning:


The data cleaning task removes the log entries
which are irrelevant and redundant. There are two kinds
of irrelevant data need to be removed:

16

IIRDEM 2016

ISBN: 978-81-930654-7-5

Proceedings of ICEEM-2016

i.

Files having suffixes such as .jpeg, .gif, .css, .cgi,


etc., which can be found in cs_uri_stem field of
IIS log.
ii.
Error request, which can be found in sc_status
field.
Once pre-processing done, data from multiple sources are
transformed into an acceptable form, which serves as an
input to various mining processes.

55%.

3.2.2. User Identification:


The user identification process is to distinguish
the different users from the web access log file. Referrerbased method is used for this process. It is complex task
due to the presence of resident caches and proxy servers.
We have the following heuristics [18] used to identify the
user:
1) Each IP address represents one user;
2) If the IP address is same for more logs, but the
agent field shows changes in browser or OS , then
IP address represents a different user;
3) If all the above fields are same, then referrer
information can be considered. If a user
requested page is not directly accessible by a link
from any of these pages, hence with the same IP
there is another user.

Figure 1: Architecture Of Web Usage Mining


integrated With Semantic Knowledge

3.2.3. Session Identification


The aim of the user session identification is
to find out the different user sessions from the web access
log file. The user session identification involves - dividing
the page accesses of every user into separate sessions. We
have the methods to identify user session based on
timeout mechanism and maximal forward reference. In
[18], following rules are used to identify user session:
1) If there is a new user, and hence, there is a new
session;
2) If the referrer page is null in one user session,
there is a new session;
3) If the time frame between page requests
exceeds a limit, then user is starting a new session.

3.3 Pattern Discovery:


Once user transactions have been identified, the
web logs are converted into relational databases and
then sequential pattern mining are performed on data for
discovering Frequent Web Access Patterns (FWAP).
In this paper, we used LL-Mine algorithm, which
is a modified form of the structure in [9] for Sequential
pattern mining as it is efficient compared to all other
existing algorithm, which produces frequent web access
sequences in Linked List data structure. This scans the
database and produces frequent item sets which satisfy
the weighted support. Usually, only the order of Webpage is taken into consideration in Sequential pattern
mining. In order to give the importance to the Web-page,
time visited by the user and the frequency of visit both
are taken into account to assign the weight to the Webpage while generating web patterns using W_ASSIGN
algorithm. The weight support of the access sequence, s is
given by [9]:
Weight_support(s) = g_support(s) x weight(s)
Where,
Weight(s) is calculated from the average weight of the
items in the sequence, and
g_support(s) is the support of the sequence in the WASD.
Frequent patterns are generated by this algorithm and are
used to integrate with the semantic knowledge by
crawling all the URL of these FWAP to collect domain
term sequences.

3.2.4. Path Completion


Due to the presence of proxy server and local
cache, some user accesses will not be recorded in the
access log. The path completion is used to acquire
complete user access path by filling up the missing page
references. The incomplete access path is recognized by
checking the link for the user requested page and last
page. If it is unlinked and that page is already in the users
history, then it is clear that back button is used by the
user. By these methods, complete path is acquired. Web
log pre-processing helps in removing unwanted data
from the log file and reduces the original file size by 50-

www.iirdem.org

17

IIRDEM 2016

ISBN: 978-81-930654-7-5

Proceedings of ICEEM-2016
represents concepts as domain terms and Web-pages, and
relations between the concepts. To construct the semantic
network, domain terms are collected from the Web-page
titles and then we extract the relations between these
terms by these two aspects: (i) the collocations of termsdetermined by the co-occurrence relations of terms in
Web-page titles; and (ii) the associations between terms
and webpages.
In order to know how these terms are
semantically related, the domain terms and co-occurrence
relations are weighted. Based on these relations, we can
guess how closely the Web-page is associated with each
other semantically. To infer the semantics of Web-pages,
we can query about the relations including relevant pages
and key terms for a given page, and the pages for given
terms, thereby achieving semantic enhanced Web-page
recommendations. This semantic network is considered
to be TermNetWP.
The following are the procedures to automatically
construct TermNetWP:
1) Collect the titles of visited Web pages.
2) Extract term sequences from the Web-page
titles.
3) Build the semantic network TermNetWP.
4) Implement an automatic construction of
TermNetWP.
To reuse and share the domain term network by
Web-page recommender system, TermNetWP is
implemented in OWL. The input to this network is a term
sequence collection (TSC), in which each record consists
of:
1) The PageID of a Web-page d D;
2) A sequence of terms X = t1 t2 . . . tm TS, m >0, extracted
from the title of the Web-page;
3) The URL of the Web-page.

TABLE 1: Algorithm W_ASSIGN


ALGORITHM: W_ASSIGN
Input:
An access sequence database, WASD
A support threshold
Output:
Set of weighted access patterns
Method:
1. For each web access sequence s=p 1,p2,.,pn
Set weight (pi) =0;
Let length =0;
Create linked list C, where node containing item name
and their weight;
Set weight to 0;
For each occurrence of item pi ,
Increment freq (pi) and add Time (pi);
Update the values in C;
End for;
Update the list of items in LIN with the C
For each pi, Update
Take harmonic mean of freq(pi) and Time(pi);
Assign it to weight (pi);
{End for}
2. For each item pi in LIN, check whether it passes the
Support threshold, add the item into frequent pattern
3. Call LL-Mine
4. Return
TABLE2: Algorithm for LL-Mine
Algorithm: LL-Mine
Parameters:
Current frequent pattern, p
List of fist occurrence, L
Absolute support,
Method:
1. for each weighted frequent item, pi
i. generate the first occurrences list, L1,
Initialize L1 with Weight_support=0;
Locate the first occurrences of the element p in
projected databases D-p using L;
Generate L1 with node holding seq-id and pos;
Add the weight of the item at each occurrence;
Update the header of the list L1 with Weight_support (pi);
ii. If the Weight_Support (pi) >
Add p.pi to F, set of pattern
Add p.pi to stack for suffix building.
p= p.pi
Call LL-Mine (p, L1, )
{End if}
iii Delete the current L.
{End for}
2. Return

3.5 Frequently Viewed Term Pattern (FVTP):


In this paper, we used Web usage mining
technique, namely LL-Mine, to obtain the frequent Web
access patterns (FWAP).We integrate FWAP with
TermNetWP in order to result in a set of frequently
viewed term patterns (FVTP) which is the semantic Web
usage knowledge of a website.
The frequent web access pattern is described as follows:
P = {P1, P2 . . . Pn}: Set of FWAP
Where Pi = di1 di2 . . . dim: pattern showing sequence of Webpages,
n is the number of the patterns,
m is the number of Web-pages in the pattern.
The Frequently viewed term patterns is denoted as
follows:
F = {ti1 ti2 . . . tim }: Set of FVTP,
where each domain term pattern f is a sequence of domain
terms, in which each domain term tik is a domain term of page
dik in Pi.

3.4 semantic network construction:


This section presents the first model, i.e.
Semantic network of a website and their schema and
explains the queries to infer the terms and webpages.
Semantic network is a kind of knowledge map which

www.iirdem.org

18

IIRDEM 2016

ISBN: 978-81-930654-7-5

Proceedings of ICEEM-2016

3.6 Conceptual Prediction Model (CPM)

i.e.(t,d)R, from Instance to WPage, shows that a term


instance has one or more Web-pages; and belongtoInstance relationship, which is the inverse relationship of
hasWPage, shows that a Web-page belongs to one or
more term instances. An association class OutLink is
defined to specify the in-out relationship between two
terms. Class OutLink is used for connecting from one term
instance (tx) to another term instance (ty), and defines the
corresponding connection weight (iWeight = wxy).

Conceptual prediction model (CPM) is used to


automatically generate a weighted semantic network of
frequently viewed terms with the weight being the
probability of the transition between two adjacent terms
based on FVTP in order to obtain the semantic Web usage
knowledge that is efficient for semantic-enhanced Webpage recommendation. This semantic network is referred
to as TermNavNet.
We present two Web-pages recommendation
strategies, based on the semantic knowledge base of a
given website, through the semantic network of Webpages (TermNetWP) and the weighted semantic network
of frequently viewed terms of Web-pages within the
given website (TermNavNet). These recommendations
are named as semantic enhanced Web-page
recommendations.

4 TermNetWP ALGORITHM:
4.1 Definitions of TermNetWP
The

notations

used

in

TermNetWP

Figure 2: schema of TermNetWP

are

Class OutLink involves two object properties: (i) fromInstance defines one previous term instance, and (ii) toInstance defines one next term instance. Class Instance
also has two object properties: (i) hasOutLink, which is
the inverse of from-Instance relation, and (ii)
fromOutLink, which is the inverse of to-Instance
relation.

summarized as follows:

TERMauto = {ti: 1 i p}: set of domain terms extracted


from Web-page titles;
D = {dj: 1 j q}: set of the Web-pages;
Xj = t1 t2 t3. . .tn tk : sequence of domain terms, which
may be duplicated, present in each page dj,
ti dj: Denotes ti is a domain term of dj.
tf (t, D): term frequency of t over D;
TS = {Xj: 1 j q}: set of domain term sequences, and
a pair of terms (ti, tj), ti, tj TERMauto;
(ti, tj): Number of times that ti is followed by tj in
TS, and there is no term between them.
The semantic network of Web-pages, namely
TermNetWP, is defined as a 4-tuples:
Netauto: =<T, A, D, R >, where
T = {(term, term frequency)}: Set of domain terms and
corresponding occurrences,
A= {(tx, ty, wxy): wxy= (tx, ty) >0}: Set of associations
between tx and ty with weight wxy,
R = {(t, d): t d}: domain term t is related to webpage d by its presence in title page.

4.3 Queries
Based on TermNetWP, we can query: (i) domain
terms for a given Web-page, and (ii) Web-pages mapped
to a given domain term.

4.3.1 Query about terms of a given Web-page:


Querytopic (d) = (t1, t2 . . . ts), where d D; (ti, d) R,
i = [1 . . . s]; tf (ti, D) >tf (tj, D), (i <j & 1 i, j s).
Using this query Querytopic (d), given Web-page d D,
term instances that are associated with the WPage
instance dare retrieved via the belongto-Instance object
property. Degree of occurrences of term in the domain is
taken into account and is returned in descending order.
The Connection weight between a page and a domain
term is defined as:

(dj, t) = =0 (, ) + (, )

4.2 Schema of TermNetWP:


In schema of TermNetWP, class Instance
represents domain term, i.e. t TERMauto, which has two data
type Name, and iOccur, and one WPage object property.
The iOccur property refers to the count of occurrences of
term among the set of Web-page titles. Class WPage
represents Web-page, i.e. d D, with properties Title,
PageID, URL and Keywords in the title. The Keywords
property defines the terms in a Web-page title. These two
classes are related through the hasWPage relationship,

www.iirdem.org

Where n = | {tk: tk d}|: the number of domain terms in


the title of page d.

4.3.2 Query about pages mapped to a given term:


Querypage (t) = (d1, d2 . . . ds), where (t, di) R,
i = [1 . . . s]; and (di, t) < (dj, t), (i <j&1 i, j s).
Using this query Querypage (t), given domain term t
TERMauto, WPage instances (i.e. web-pages) that are

19

IIRDEM 2016

ISBN: 978-81-930654-7-5

Proceedings of ICEEM-2016

mapped to the term instance t are retrieved via


hasWPage object property. The returned pages are
sorted in ascending order of connection weights between
the Web-pages and domain term t to show the degree of
relevance to the term t.

S,x :Number of times domain term tx is the first item in a


domain term pattern f ;
x,E: Number of times a domain term pattern f terminates at
domain term tx ;
x,y,z: Number of times that (tx, ty) followed by tz in F and there
is no term between them.
The probability of a transition is estimated by the ratio of
the number of times the corresponding sequence of states
(i.e. visited Web-page) was traversed and the number of
times the anchor state occurred. In our system, we take
into account first-order and second-order transition
probabilities.

TABLE 3:Algorithm forTermNetWP


Input: TSC(Term Sequence Collection)
Output:G(TermNetWP)
Process:
Let TSC = {PageID,X= t1t2 . . . tm , URL }
Initialize G;Let R= root or the start node of G
Let E= the end node of G
For each PageID and each sequence X in TSC{
Initialize a WPage object identified as PageID

Given a CPM having states {S, t1 . . . tp , E}, and N is the


number of term patterns in F, the first-order transition
probabilities are estimated according to the following
expressions:
Transition from the starting state S to state t x:

For each term ti X {


If node ti is not found in G, then
Initialize an Instance object I as a node of G
Set I.Name =ti
Else
Set I= the Instance object named ti in G
Increase I.iOccur by 1
If (i==0) then
Initialize an OutLink R-ti if not found
Increase R-ti.iWeightby 1
Set R-ti fromInstance=R
Set R-ti toInstance =I
If (i>0 & i<m) then
Get PreI =the Instance object with name ti-1
Initialize an OutLink ti-1-ti if not found
Increase ti-1-ti.iWeight by 1
Set ti-1-ti.toInstance = I
Set ti-1-ti.fromInstance = preI
If (i==m) then
Initialize an OutLink ti-E if not found
Increase ti-E.iWeight by 1
Set ti E.toInstance =E
Set ti E.fromInstance = I
Set I.hasWPage = PageID
Add term ti into PageID.Keywords
}

, =

=1 ,

(1)

Transition from state tx to ty:


, =

(2)

Transition from state tx to the final state E:


, =

(3)

The second-order transition probability, which is the


probability of the transition (ty, tz) given that the previous
transition that occurred was (tx, ty), are estimated as
follows:
, , =

,,
,

(4)

The conceptual prediction model is represented as a triple: Cpm


:=( N, , M), where
N = {(tx, x)}: Set of terms along with the
corresponding occurrence counts,
= {(tx , ty , x,y , x,y)}: set of transitions from tx to ty,
along with their transition weights (x,y), and first-order
transition probabilities (x,y),
M = {(tx , ty, tz, x,y,z, x,y,z )}: Set of transitions from tx
, ty to tz, along with their transition weights (x,y,z ), and secondorder transition probabilities (x,y,z ). If M is non-empty, the
CPM is considered as the second order conceptual prediction
model, otherwise the first-order conceptual prediction model.

5. TermNavNet ALGORITHM:
In Section 4, we presented TermNetWP, which
represents the semantics of Web-pages within a website
efficiently but they are not sufficient for making effective
Web-page recommendations on their own. To overcome
this issue, we should integrate the TermNetWP with Web
usage knowledge to obtain the semantic Web usage
knowledge.
The notations used to represent the TermNavNet are
summarized as follows:

5.1 Schema of CPM


TermNavNet is automatically implemented in
OWL. The schema consists of classes cNode defines the
current state node and cOutLink defines the association
from the current state node to a next state node with a
transition probability Prob (e.g. x,y.) and relationship
properties referred as inLink, outLink and LinkTo.

x: Number of occurrences of tx in F;
x, y: Number of times that tx followed by ty in F and there is no
term between them;

www.iirdem.org

20

IIRDEM 2016

ISBN: 978-81-930654-7-5

Proceedings of ICEEM-2016
Recommendation strategy-1 uses TermNetWP and the firstorder CPM:
Step 1 builds TermNetWP;
Step 2 generates FWAP using LL-Mine;
Step 3 builds FVTP;
Step 4 builds a 1st-TermNavNet given FVTP;
Step 5 identifies a set of currently viewed terms
{tk} using query Querytopic (dk) on TermNetWP;
Step 6 infers next viewed terms {tk+1} given each
term in {tk} using query Recterm (tk) on the 1st-order
TermNavNet;
Step 7 recommends pages mapped to each term
in {tk+1} using query Querypage (tk+1) on TermNetWP.
Recommendation strategy-2 uses TermNetWP and the secondorder CPM:
Step 1 builds TermNetWP;
Step 2 generates FWAP using LL-Mine;
Step 3 builds FVTP;
Step 4 builds a 2nd-order TermNavNet given
FVTP.
Step 5 identifies a set of previously viewed terms
{tk-1}, and a set of currently viewed terms {tk} using query
Querytopic (d), d {dk-1, dk}, on TermNetWP;
Step 6 infers next viewed terms {tk+1} given each
pair {tk-1,tk} using query Recterm(tk-1, tk) on the 2nd-order
TermNavNet;
Step 7 recommends pages mapped to each term
in {tk+1} using query Querypage (tk+1) on TermNetWP.

Fig. 3. Schema of conceptual prediction model.

5.2 Automatic Construction of TermNavNet


using CPM
We can construct TermNavNet by applying the
CPM schema with FVTP by using following algorithm.
We can obtain a 1st or 2ndorder TermNavNet by using
the 1st or 2nd-order CPM, respectively to update the
transition probability Prob based on first-order or secondorder probability formula.

TABLE 4: TermNavNet construction


Algorithm: Building TermNavNet
Input: F (FVTP)
Output: M (TermNavNet)
Process:
Initialize M
For each F= t1t2tm F
For each ti F
Initialize cNode objects with NodeName = ti ,ti-1, ti+1
Occur =1 if they are not found in M
Initialize a cOutLink object with Name =ti_ti+1 and
Occur =1 if it is not found in M
Increase ti.Occur and ti_ti+1.Occur if they found in M
ti_ti+1.linkTo = ti+1
ti.outLink = ti_ti+1
ti.inLink =ti-1
Update all objects into M
Update transition probabilities in the cOutLink objects
Return M

Web-page recommendation rule, denoted as Rec, is


defined as a set of recommended Web-pages that are
generated by a Web-page recommendation strategy. A
Web-page recommendation rule can be categorised as
follows:
1) Recommendation rule is correct if next web page
accessed by the current user is present in the Rec.
2) Recommendation rule is satisfied if the Users target
page will be accessed through any of the Web-page
present in Rec.
3) Recommendation rule is empty if next webpage
accessed by the user is not present in the Rec.
In [16], Zhou stated that the performance of Web-page
recommendation strategies is measured in terms of two
performance metrics: Precision and Satisfaction.
Let Rc is the sub-set of Rec, which consists of all correct
recommendation rules. The Web-page recommendation
precision is defined as:

5.3 Queries
RecTerm (tx, ty) is used to query the next viewed
terms for a given current viewed term curt and previous
viewed term prt by applying second order transition
probability. If first-order transition probability is used
and we want to query the next viewed terms for a given
current viewed term curT using the query RecTerm (tx).

6.
SEMANTIC-ENHANCED
WEBPAGE
RECOMMENDATION
STRATEGIES

Precision=

(5)

Let Rs be the sub-set of Rec, which consists of all satisfied


recommendation rules. The satisfaction for Web-page
recommendation is defined as:

Two Web-page recommendation strategies are


proposed depending on the order of CPM (i.e. for a given
current web-page or combination of current and previous
web-page, recommendations are made) as follows:

www.iirdem.org

||
||

Satisfaction =

21

||
||

(6)

IIRDEM 2016