Professional Documents
Culture Documents
Web Personalization by Kanish
Web Personalization by Kanish
1. Introduction
The content on the Web in various domains is rapidly increasing and the need
for identifying and retrieving the content exactly based on the needs of the
users is more than required. Therefore, an ultimate need nowadays is that of
predicting the user needs in order to improve the usability of a Web site. In
brief, Web Personalization can be defined as any action that adapts the
information or services provided by a web site to an individual user, or a set of
users, based on knowledge acquired by their navigational behavior, recorded in
the web site’s logs. This information is often combined with the content and the
structure of the web site as well as the user’s interests/preferences. Using the
above specified sources of information as input to pattern discovery
techniques, the proposed system molds the provided content to the needs of
each visitor of the website. The personalization process can result in the
dynamic generation of suggestions, the creation of pages according to the
needs of the user, highlighting of existing hyperlinks that are exactly required
by the users. Most of the earlier research efforts in Web
Retrieving the most relevant information for the Web becomes difficult because
of the huge amount of documents available in various formats. It is mandatory
for the users to go through the long list of snippets and to choose their relevant
one, which is a time consuming process. User satisfaction is secondary in this
aspect. One approach to satisfy the requirements of the user is to personalize
the information available on the Web, called Web Personalization. Web
Personalization is the process that adapts information or services provided by a
Web to the needs of each specific or set of users, taking the facts of the
knowledge gained from the users. Web Personalization can be the solution to
the information overload problem, as its objective is to provide users with what
they really want or need, without having to ask or search for it explicitly. It is a
multi discipline area for putting together data and producing personalized
output for individual users or groups of users. This approach helps the
researchers to improve the efficiency of Information Retrieval (IR) systems.
Web Mining is a mining of Web data on the World Wide Web. Web Mining does
the process on personalizing these Web data. The Web data may be of the
following.
Content of the Web pages (actual Web Content)
Inter page Structure
Usage data includes how the web pages are accessed by users
User profile includes information collected about users (Cookies/Session
data)
With personalization the content of the web pages are modified to better fit for
user needs. This may involve actually creating web pages, that are unique per
user or using the desires of a user to determine what web documents to
retrieve. Personalization can be done to a group of specific interested
customers, based on the user visits to a websites. Personalization also includes
techniques such as use of cookies, use of databases, and machine learning
strategies. Personalization can be viewed as a type of Clustering, Classification,
or even Prediction [3][4].
Web mining[5] is under three basic concepts,
1.Web Content Mining
2.Web Structure Mining and
3.Web Usage Mining
2. Motivation
Considering the amount of data and variety of users on the World Wide Web,
key word based search results may not serve the purpose of providing the
relevant information to the user, as each users’ intention is different and the
same may not reflect in the key words they use. Because of the above reasons
web personalization has attracted many researchers to look into and provide a
mechanism to understand the user in a better way and provide most relevant
information to the user. User may not have time to fill in the data (method
Explicit)describing about his/her interests, likes, dislikes, background
educational qualification etc. Many web mining researchers worked on the
above challenge and provided a few techniques for automatic personalization,
the best example till date was Amazon where user need not give his/her
details, the system will fetch the relevant information to the users. Today,
internet has become a part and parcel of our lives and one cannot imagine a
world without internet, every day millions of people use internet for various
purposes mostly for information. And user is often not happy due the amount
of information he has been provided with, as the user needs further filtering,
which is very time consuming and expects the system to understand his/her
thoughts. Understanding user is not as simple as it’s said, and web
personalization is one step towards the goal.
3. Problem Statement
Recommender systems are also used in news reading domain as well. Tapestry,
GroupLens, PHOAKS, WebMate, Alipes and Personal View Agent, Lotus Notes,
PVA are among some systems which suggest interesting news to readers.
Another interesting application of personalization is in adaptive hypermedia
systems. For example WebWatcher helps its users by modifying the page that
the users browse. The web personalization system has also been used for e-
learning.
4. Literature Survey
4.1 Web Personalization
The content on the Web in various fields is rapidly increasing and the need for
identifying and retrieving the content exactly based on the needs of the users is
The above architecture uses Web site’s structure, Web logs created by
observing the user’s navigational behavior and User Profiles created according
to the user’s preferences along with Web site’s content to analyze and extract
the information needed for the user to find the pattern expected by the user.
This analysis creates a recommendation model which is presented to the user.
Phases of web personalization[7][12][13] are:
1. Collections of Web Data/ User profiling
2. Preprocessing of Web data
3. Log analysis and web usage mining
4. Decision making/Final Recommendation Phase
The Web contains huge amount of web sites. A web site usually contains great
amounts of information distributed through hundreds of pages. Without proper
guidance, a visitor often wanders aimlessly without visiting important pages,
loses interest and leaves the site sooner than expected. This consideration is at
the basis of the great interest about Web Usage Mining both in the academic
and the industrial world. In the online shopping, E-commerce site if the user is
not getting his required pages, he will simply switches to the another web site.
For avoiding this we have to find the frequent user access patterns from his
previous or history. Web Usage mining will generate user access patterns from
the web log records. This Web log records are stored in the web server or
The process of WUM [8] includes three phases shown in Fig. 2.1 :
data preprocessing,
pattern discovery and
pattern analysis.
Data Preprocessing
The purpose of data preprocessing is to extract useful data from raw web log
and then transform these data in to the form necessary for pattern discovery.
Due to large amount of irrelevant information in the web log, the original log
cannot be directly used in the web log mining procedure, hence in data
preprocessing phase, raw Web logs need to be cleaned, analyzed and converted
for further step. The data recorded in server logs, such as the user IP address,
browser, viewing time, etc, are available to identify users and sessions.
However, because some page views may be cached by the user browser or by a
proxy server, we should know that the data collected by server logs are not
entirely reliable. This problem can be partly solved by using some other kinds
of usage information such as cookies.
Data cleaning algorithm is as follows[10]:
Input: log_table
Output: refine_log_table
Begin
Read records in log_table
For each record in log_table
Read fields (Status code)
If Status code=200, Then Get all fields.
Ifsuffix.URL_Link={*.gif,*.jpg,*.css,*.ico} then,
Remove suffix.URL_link
Save fields in new table.
End if
Else
Next record
End if End
User Identification algorithm is as follows[10]:
Input: refine_log_table
Output: identification of user
Begin
Read records in log_table
for each record in dataset do
If current IP is not in ListOfIP then
add the current IP in ListOfIP mark whole record as a new user and
assign userID
else
assign the old userID.
End if
Pattern Discovery
Pattern discovery "draws upon methods and algorithms developed from several
fields such as statistics, data mining, machine learning and pattern recognition"
[9].
Several methods and techniques have already been developed for this step as
summarized below:
Pattern Analysis
Pattern Analysis is the final stage of WUM (Web Usage Mining), which involves
the validation and interpretation of the mined pattern.
On the World Wide Web (WWW), logs of HTTP traffic are recorded continuously
as a function of most origin web servers as well as intermediate proxies. Web
Server logs are plain text (ASCII) files, that is independent from the server
platform. A Web log file [10] records activity information when a Web user
submits a request to a Web Server. The main source of raw data is the web
access log which we shall refer to as log file.
66.249.65.107 - - [08/Oct/2007:04:54:20 -0400] "GET /support.html
HTTP/1.1" 200 11179 " "Mozilla/5.0(compatible;Googlebot/2.1; +http:/ /
www.google.com/bot.html)".
• Status: The HTTP status code returned to the client, e.g., 200 is “ok” and 404
is “not found”.
• Remote URL.
• Requested URL.
5. Proposed Work
5.1 Proposed Model
11 Shri Ram Institute of Science & Technology, Jabalpur
“An Intelligent Approach of Web Personalization Using Web Access Mining”
Construction of web
site link structure
Construction of user
profile
Rule building
System restructuring
Fig.3 Architecture of proposed work
1. Construction of website link structure
Repository of web link structure will be created in this phase. There are
predefine set of links exists in the web site, with their complete URL according
to standard navigation map. Following will be steps for this phase:
Step 1- makes complete navigation chart of website, so that web link structure
will be created.
Step 2- create a database for web link structure named WEB_LINK with
following field:
Link ID, Link Title, URL, Link Target
Step 3- insert whole navigation chart details in this database.
3. Rule building
In the proposed system, some rule set has been defined to classify frequent
web resources usage by user. For example, applying some threshold parameter
to find frequently visited pages. Web restructuring will be applied after usage of
resources reached to some threshold value or after specified number of login.
4. System re-structuring
In this phase system will scan USER_PROFILE for finding frequent pattern in
usage data. Than restructure WEB_LINK_USERID structure according to rules
decided by phase 3.
Following figure explain overall working of the proposed system:
New user
Website USER
USER_PROFILE
USER_PROFILE_DETAIL
Fig.4 Overall working of proposed work
6. Implementation Details
Software Requirement:
PhP
My SQL
XAMPP Server
Hardware Requirement:
7. References
[2]. B. Mobasher, H. Dai, T. Luo, Y. Sung, J. Zhu, “Integrating web usage and
content mining for more effective Personalization”, in Proc. of the International
Conference on Ecommerce and Web Technologies (ECWeb2000), Greenwich,
UK, September 2000.
[3] Jiawei Han And Micheline Kamber “Data Mining: Concepts and Techniques”,
2nd ed., Morgan Kaufmann Publishers, March 2006. ISBN
1-55860-901-6.
[5] V. Anitha, Dr. P. Isaki, A Survey on Predicting User Behavior Based on Web
Server Log Files in a Web Usage Mining, IEEE,2016
[6] Faten Khalil Jiuyong LiHua Wang,” Integrating Recommendation Models for
Improved Web Page Prediction Accuracy”, Conferences in Research and
Practice in Information Technology (CRPIT), 2008,Vol. 74.
[8] Suneetha K.R & Dr. R. Krishnamoorthi, “Data Preprocessing and Easy
Access Retrieval of Data through Data Ware House”, Proceedings of the World
Congress on Engineering and Computer Science WCECS Vol 1, 2009
[11]G. Neelima and Sireesha Rodda, “An Overview on Web Usage Mining”,
Springer International Publishing Switzerland December 2015.