You are on page 1of 16

“An Intelligent Approach of Web Personalization Using Web Access Mining”

1. Introduction

The content on the Web in various domains is rapidly increasing and the need
for identifying and retrieving the content exactly based on the needs of the
users is more than required. Therefore, an ultimate need nowadays is that of
predicting the user needs in order to improve the usability of a Web site. In
brief, Web Personalization can be defined as any action that adapts the
information or services provided by a web site to an individual user, or a set of
users, based on knowledge acquired by their navigational behavior, recorded in
the web site’s logs. This information is often combined with the content and the
structure of the web site as well as the user’s interests/preferences. Using the
above specified sources of information as input to pattern discovery
techniques, the proposed system molds the provided content to the needs of
each visitor of the website. The personalization process can result in the
dynamic generation of suggestions, the creation of pages according to the
needs of the user, highlighting of existing hyperlinks that are exactly required
by the users. Most of the earlier research efforts in Web

Retrieving the most relevant information for the Web becomes difficult because
of the huge amount of documents available in various formats. It is mandatory
for the users to go through the long list of snippets and to choose their relevant
one, which is a time consuming process. User satisfaction is secondary in this
aspect. One approach to satisfy the requirements of the user is to personalize
the information available on the Web, called Web Personalization. Web
Personalization is the process that adapts information or services provided by a
Web to the needs of each specific or set of users, taking the facts of the
knowledge gained from the users. Web Personalization can be the solution to
the information overload problem, as its objective is to provide users with what
they really want or need, without having to ask or search for it explicitly. It is a
multi discipline area for putting together data and producing personalized

1 Shri Ram Institute of Science & Technology, Jabalpur


“An Intelligent Approach of Web Personalization Using Web Access Mining”

output for individual users or groups of users. This approach helps the
researchers to improve the efficiency of Information Retrieval (IR) systems.

Web Mining is a mining of Web data on the World Wide Web. Web Mining does
the process on personalizing these Web data. The Web data may be of the
following.
 Content of the Web pages (actual Web Content)
 Inter page Structure
 Usage data includes how the web pages are accessed by users
 User profile includes information collected about users (Cookies/Session
data)

With personalization the content of the web pages are modified to better fit for
user needs. This may involve actually creating web pages, that are unique per
user or using the desires of a user to determine what web documents to
retrieve. Personalization can be done to a group of specific interested
customers, based on the user visits to a websites. Personalization also includes
techniques such as use of cookies, use of databases, and machine learning
strategies. Personalization can be viewed as a type of Clustering, Classification,
or even Prediction [3][4].
Web mining[5] is under three basic concepts,
1.Web Content Mining
2.Web Structure Mining and
3.Web Usage Mining

Personalization deal with Web Usage Mining [1].Pure usage-based


personalization, however, presents certain shortcomings, such as when there is
insufficient use of data available in order to extract patterns, or when the web
site’s content changes and new pages are added but are not yet included in the
web logs. The users’ visits usually aim at finding information concerning a
particular subject, thus the underlying content semantics should be a

2 Shri Ram Institute of Science & Technology, Jabalpur


“An Intelligent Approach of Web Personalization Using Web Access Mining”

dominant factor in the process of web personalization. There have been a


number of research studies that integrate the web site’s content in order to
enhance the Web Personalization process [2].

2. Motivation

Considering the amount of data and variety of users on the World Wide Web,
key word based search results may not serve the purpose of providing the
relevant information to the user, as each users’ intention is different and the
same may not reflect in the key words they use. Because of the above reasons
web personalization has attracted many researchers to look into and provide a
mechanism to understand the user in a better way and provide most relevant
information to the user. User may not have time to fill in the data (method
Explicit)describing about his/her interests, likes, dislikes, background
educational qualification etc. Many web mining researchers worked on the
above challenge and provided a few techniques for automatic personalization,
the best example till date was Amazon where user need not give his/her
details, the system will fetch the relevant information to the users. Today,
internet has become a part and parcel of our lives and one cannot imagine a
world without internet, every day millions of people use internet for various
purposes mostly for information. And user is often not happy due the amount
of information he has been provided with, as the user needs further filtering,
which is very time consuming and expects the system to understand his/her
thoughts. Understanding user is not as simple as it’s said, and web
personalization is one step towards the goal.

3. Problem Statement

3 Shri Ram Institute of Science & Technology, Jabalpur


“An Intelligent Approach of Web Personalization Using Web Access Mining”

As it has been observed that there is an explosive growth in the information


available on the Web gathering useful information from the web has become a
challenging issue for users. The Web users expect more intelligent systems to
gather the useful information from the huge size of Web to meet their
information needs. The user profiles are created for user background
knowledge description. User profiles represent the concept models possessed
by users when gathering web information. A user profile is a collection of
personal data associated to a specific user. Traditionally Web Usage Mining is
used for web personalization. Web personalization, according to user profile
provide facility of intelligent web application for user, by which he / she does
not need to waste time to navigate the system. Normally web logs are stored on
server in the form of file. Web master performs website restructuring according
to web usage pattern.
Performing user access wise web site restructuring for different user is a
challenging task. Following should be major objectives of the system:
1. Developing a web site.
2. Developing a complete framework for tracking of usage data.
3. Web site restructuring according to user profile to provide customized
feel.
4. Evaluate proposed system with traditional WUM.

Application of Web Personalization

Web personalization has been recently gaining great momentum in research


and in various commercial web applications. One of the interesting
applications of personalization on Web is the recommender systems [6][7].
Recommender systems are used to provide users with a richer experience and
help them to make the selection process easier. In the web personalization,
recommendation engines recommends object in the form of pages, products,
advertisements etc depending upon the type and taste of the user. Now a days
countless of new products are being advertised over the media everyday.
Hence, various business strategies have been developed to retain the existing
customers as well creating new customers. Web personalization

4 Shri Ram Institute of Science & Technology, Jabalpur


“An Intelligent Approach of Web Personalization Using Web Access Mining”

recommendation is being used by various e business applications. Examples of


some of such recommendation systems are Amazon.com, barnesnoble.com
Ebay.com, FAIRWIS, LIBRA, CDNOW etc. The recommendations systems also
recommend web pages for various web site MEMOIR, Phoaks, GAB, Fab,
Alexa.com, Quickstep, R2P, SOAP. It has also been used for recommending
music, movies, videos or other services. CDNOW, Moviefinder.com, Movie lens,
Moviefinder.com, Firefly, Morse are among some systems which suggest
interesting movies and songs to the users.

Recommender systems are also used in news reading domain as well. Tapestry,
GroupLens, PHOAKS, WebMate, Alipes and Personal View Agent, Lotus Notes,
PVA are among some systems which suggest interesting news to readers.
Another interesting application of personalization is in adaptive hypermedia
systems. For example WebWatcher helps its users by modifying the page that
the users browse. The web personalization system has also been used for e-
learning.

4. Literature Survey
4.1 Web Personalization
The content on the Web in various fields is rapidly increasing and the need for
identifying and retrieving the content exactly based on the needs of the users is

5 Shri Ram Institute of Science & Technology, Jabalpur


“An Intelligent Approach of Web Personalization Using Web Access Mining”

more than required. Therefore, an ultimate need nowadays is that of predicting


the user needs in order to improve the usability of a Web site. In brief, Web
Personalization can be defined as any action that adapts the information or
services provided by a web site to an individual user, or a set of users, based
on knowledge acquired by their navigational behavior, recorded in the web
site’s logs[4]. This information is often combined with the content and the
structure of the web site as well as the user’s interests/preferences. Using the
above specified sources of information as input to pattern discovery
techniques, the system molds the provided content to the needs of each visitor
of the web site. The personalization process can result in the dynamic
generation of suggestions, the creation of pages according to the needs of the
user, highlighting of existing hyperlinks that are exactly required by the users.
Most of the earlier research efforts in Web Personalization deal with Web Usage
Mining [10].
Pure usage-based personalization, however, presents certain shortcomings,
such as when there is insufficient use of data available in order to extract
patterns, or when the web site’s content changes and new pages are added but
are not yet included in the web logs. The users’ visits usually aim at finding
information concerning a particular subject, thus the underlying content
semantics should be a dominant factor in the process of web personalization.
There have been a number of research studies that integrate the web site’s
content in order to enhance the Web Personalization process [4]. Most of these
efforts characterize web content by extracting features from the web pages.
Usually these features are keywords subsequently used to retrieve similarly
characterized content based on the requirements of the user. When Web
Personalization approaches were embedded with Semantic Web, it yields more
effective search response and user satisfaction.
Web Personalization architecture is as follows:

6 Shri Ram Institute of Science & Technology, Jabalpur


“An Intelligent Approach of Web Personalization Using Web Access Mining”

Fig.1 Web personalization architecture

The above architecture uses Web site’s structure, Web logs created by
observing the user’s navigational behavior and User Profiles created according
to the user’s preferences along with Web site’s content to analyze and extract
the information needed for the user to find the pattern expected by the user.
This analysis creates a recommendation model which is presented to the user.
Phases of web personalization[7][12][13] are:
1. Collections of Web Data/ User profiling
2. Preprocessing of Web data
3. Log analysis and web usage mining
4. Decision making/Final Recommendation Phase

4.2 Web Log Mining

The Web contains huge amount of web sites. A web site usually contains great
amounts of information distributed through hundreds of pages. Without proper
guidance, a visitor often wanders aimlessly without visiting important pages,
loses interest and leaves the site sooner than expected. This consideration is at
the basis of the great interest about Web Usage Mining both in the academic
and the industrial world. In the online shopping, E-commerce site if the user is
not getting his required pages, he will simply switches to the another web site.
For avoiding this we have to find the frequent user access patterns from his
previous or history. Web Usage mining will generate user access patterns from
the web log records. This Web log records are stored in the web server or

7 Shri Ram Institute of Science & Technology, Jabalpur


“An Intelligent Approach of Web Personalization Using Web Access Mining”

application server. Web Usage Mining is defined as the process of applying


data mining techniques to the discovery of usage patterns from Web logs data
which to identify Web user’s behavior [8]. Web Usage Mining is the type of Web
mining activity that involves an automatic discovery of user access patterns
from one or more Web servers[11].

The process of WUM [8] includes three phases shown in Fig. 2.1 :

 data preprocessing,
 pattern discovery and
 pattern analysis.

Fig.2 Web Usage Mining Process

Data Preprocessing
The purpose of data preprocessing is to extract useful data from raw web log
and then transform these data in to the form necessary for pattern discovery.
Due to large amount of irrelevant information in the web log, the original log
cannot be directly used in the web log mining procedure, hence in data

8 Shri Ram Institute of Science & Technology, Jabalpur


“An Intelligent Approach of Web Personalization Using Web Access Mining”

preprocessing phase, raw Web logs need to be cleaned, analyzed and converted
for further step. The data recorded in server logs, such as the user IP address,
browser, viewing time, etc, are available to identify users and sessions.
However, because some page views may be cached by the user browser or by a
proxy server, we should know that the data collected by server logs are not
entirely reliable. This problem can be partly solved by using some other kinds
of usage information such as cookies.
Data cleaning algorithm is as follows[10]:
Input: log_table
Output: refine_log_table
Begin
Read records in log_table
For each record in log_table
Read fields (Status code)
If Status code=200, Then Get all fields.
Ifsuffix.URL_Link={*.gif,*.jpg,*.css,*.ico} then,
Remove suffix.URL_link
Save fields in new table.
End if
Else
Next record
End if End
User Identification algorithm is as follows[10]:
Input: refine_log_table
Output: identification of user
Begin
Read records in log_table
for each record in dataset do
If current IP is not in ListOfIP then
add the current IP in ListOfIP mark whole record as a new user and
assign userID
else
assign the old userID.
End if

Pattern Discovery

9 Shri Ram Institute of Science & Technology, Jabalpur


“An Intelligent Approach of Web Personalization Using Web Access Mining”

Pattern discovery "draws upon methods and algorithms developed from several
fields such as statistics, data mining, machine learning and pattern recognition"
[9].

Several methods and techniques have already been developed for this step as
summarized below:

 Statistical Analysis such as frequency analysis, mean, median, etc.


 Clustering of users help to discover groups of users with similar
navigation patterns (provide personalized Web content).
 Classification is the technique to map a data item into one of several
predefined classes.
 Association Rules discover correlations among pages accessed together
by a client.
 Sequential Patterns extract frequently occurring inter-session patterns
such that the presence of a set of items s followed by another item in
time order.
 Dependency Modeling determines if there are any significant
dependencies among the variables in the Web.

Pattern Analysis

Pattern Analysis is the final stage of WUM (Web Usage Mining), which involves
the validation and interpretation of the mined pattern.

 Validation: to eliminate the irrelevant rules or patterns and to extract the


interesting rules or patterns from the output of the pattern discovery
process.
 Interpretation: the output of mining algorithms is mainly in mathematic
form and not suitable for direct human interpretations.

4.3 Web Logs

10 Shri Ram Institute of Science & Technology, Jabalpur


“An Intelligent Approach of Web Personalization Using Web Access Mining”

On the World Wide Web (WWW), logs of HTTP traffic are recorded continuously
as a function of most origin web servers as well as intermediate proxies. Web
Server logs are plain text (ASCII) files, that is independent from the server
platform. A Web log file [10] records activity information when a Web user
submits a request to a Web Server. The main source of raw data is the web
access log which we shall refer to as log file.
66.249.65.107 - - [08/Oct/2007:04:54:20 -0400] "GET /support.html
HTTP/1.1" 200 11179 " "Mozilla/5.0(compatible;Googlebot/2.1; +http:/ /
www.google.com/bot.html)".

This reflects the information as follows :

• Remote IP address or domain name: An IP address is a 32-bit host address


defined by the Internet Protocol.

•Authuser: Username and password if the server requires user authentication.

• Entering and exiting date and time.

• Modes of request: GET,POST or HEAD method of CGI(Common Gateway


Interface).

• Status: The HTTP status code returned to the client, e.g., 200 is “ok” and 404
is “not found”.

• Bytes: The content-length of the document transferred.

• Remote log and agent log.

• Remote URL.

• “Request:” The request line exactly as it came from the client.

• Requested URL.

5. Proposed Work
5.1 Proposed Model
11 Shri Ram Institute of Science & Technology, Jabalpur
“An Intelligent Approach of Web Personalization Using Web Access Mining”

Web personalization is the process of customizing a Web site to the needs of a


specific user, considering the knowledge acquired from the analysis of the
user’s navigational behavior in to the account in correlation with other
information collected in the form of structure, content, and user profile data.
The proposed web personalization process can be divided into different phases :
1. Construction of web site link structure
2. Construction of user profile
3. Rule building
4. System restructuring.

Construction of web
site link structure

Web site usage by users

Construction of user
profile

Rule building

System restructuring
Fig.3 Architecture of proposed work
1. Construction of website link structure
Repository of web link structure will be created in this phase. There are
predefine set of links exists in the web site, with their complete URL according
to standard navigation map. Following will be steps for this phase:
Step 1- makes complete navigation chart of website, so that web link structure
will be created.
Step 2- create a database for web link structure named WEB_LINK with
following field:
Link ID, Link Title, URL, Link Target
Step 3- insert whole navigation chart details in this database.

2. Construction of user profile


New user should register in the system before using it. Whenever user get
registered a standard link structure will be generated for him/her. Normally
12 Shri Ram Institute of Science & Technology, Jabalpur
“An Intelligent Approach of Web Personalization Using Web Access Mining”

standard website with standard navigation map will be shown to user.


Whenever user visits some link its web usage data will be stored for further
personalization, which forms user profile. Traditionally personalization will be
performed using Web Usage Mining, it creates user profile on the basis of
pattern discovery in web access log. In the proposed system user profile will be
generated using traditional web access log as well as proposed approach. In the
proposed approach user’s access count will be handled according to link
structure and count of access link. Steps are as follows:
Step 1-A database of registered user should be created first with following field:
User ID, Password, User Name, Email ID etc.
Step 2- create a replica database for web link structure named
WEB_LINK_USERID with following field:
Link ID, Link Title, URL, Link Target
Step 3-Create a database for user profile named USER_PROFILE with following
field:
User ID, Link ID, Count
Step 4- Create a database for user profile details named
USER_PROFILE_DETAIL with following field:
User ID, Link ID, Date, Time, IP add, HTTP Code, Method, Agent

3. Rule building
In the proposed system, some rule set has been defined to classify frequent
web resources usage by user. For example, applying some threshold parameter
to find frequently visited pages. Web restructuring will be applied after usage of
resources reached to some threshold value or after specified number of login.

4. System re-structuring
In this phase system will scan USER_PROFILE for finding frequent pattern in
usage data. Than restructure WEB_LINK_USERID structure according to rules
decided by phase 3.
Following figure explain overall working of the proposed system:

New user
Website USER

Usage data WEB_LINK_USERID

13 Shri Ram Institute of Science & Technology, Jabalpur


Personalization
“An Intelligent Approach of Web Personalization Using Web Access Mining”

USER_PROFILE

USER_PROFILE_DETAIL
Fig.4 Overall working of proposed work

Proposed system will also performed traditional personalization using Web


Usage Mining on Web Log and compare the proposed system with it.

6. Implementation Details

Software Requirement:

14 Shri Ram Institute of Science & Technology, Jabalpur


“An Intelligent Approach of Web Personalization Using Web Access Mining”

 PhP

 HTML, DHTML, JavaScript, CSS

 My SQL

 XAMPP Server

 Platform: Windows 7 & higher

Hardware Requirement:

Processor PIV or above


RAM 1 GB or above
HDD 160 GB or above
Modem Any

7. References

[1]. M. Albanese, A. Picariello, C. Sansone, L. Sansone, “A Web Personalization


System based on Web Usage Mining Techniques”, in Proc. of WWW2004, May
2004, New York, USA.

[2]. B. Mobasher, H. Dai, T. Luo, Y. Sung, J. Zhu, “Integrating web usage and
content mining for more effective Personalization”, in Proc. of the International
Conference on Ecommerce and Web Technologies (ECWeb2000), Greenwich,
UK, September 2000.

15 Shri Ram Institute of Science & Technology, Jabalpur


“An Intelligent Approach of Web Personalization Using Web Access Mining”

[3] Jiawei Han And Micheline Kamber “Data Mining: Concepts and Techniques”,
2nd ed., Morgan Kaufmann Publishers, March 2006. ISBN
1-55860-901-6.

[4] K. SRIDEVI1 and Dr. R. UMARANI WEB PERSONALIZATION APPROACHES:


A SURVEY, International Journal of Advanced Research in Computer and
Communication Engineering Vol. 2, Issue 3, March 2013

[5] V. Anitha, Dr. P. Isaki, A Survey on Predicting User Behavior Based on Web
Server Log Files in a Web Usage Mining, IEEE,2016

[6] Faten Khalil Jiuyong LiHua Wang,” Integrating Recommendation Models for
Improved Web Page Prediction Accuracy”, Conferences in Research and
Practice in Information Technology (CRPIT), 2008,Vol. 74.

[7] Rajesh K Shukla Dr Sanjay Silakari Dr P K Chande, “Existing Trends and


Techniques for Web Personalization”, IJCSI International Journal of Computer
Science Issues, Vol. 9, Issue 4, No 1, July 2012

[8] Suneetha K.R & Dr. R. Krishnamoorthi, “Data Preprocessing and Easy
Access Retrieval of Data through Data Ware House”, Proceedings of the World
Congress on Engineering and Computer Science WCECS Vol 1, 2009

[9] Kobra Etminani, Mohammad-R. Akbarzadeh-T. & Noorali Raeeji Yanehsari,


“Web Usage Mining: users' navigational patterns extraction from web logs
using Ant-based Clustering Method”, IFSA-EUSFLAT 2009
[10] G. Neelima, Dr. Sireesha Rodda,” Predicting user behavior through
Sessions using the Web log mining”, 978-1-4673-8810-8/16/$31.00 ©2016
IEEE

[11]G. Neelima and Sireesha Rodda, “An Overview on Web Usage Mining”,
Springer International Publishing Switzerland December 2015.

[12] Nasraoui, O. World Wide Web Personalization. Encyclopedia of Data


Mining and Data Warehousing, J. Wang, Ed, 2005, Idea Group

[13] Mobasher, B., Web Usage Mining and Personalization, in Practical


Handbook of Internet Computing, M.P. Singh, Editor. 2004, CRC Press. p.
15.137.

16 Shri Ram Institute of Science & Technology, Jabalpur

You might also like