This action might not be possible to undo. Are you sure you want to continue?

8, November 2010

**Implementation of Polynomial Neural Network in Web Usage Mining
**

S.Santhi

Research Scholar Mother Teresa Women’s University Kodaikanal, India

Dr. S. Purushothaman

Principal Sun college of Engineering and Technology Nagarkoil, India

Abstract—Education, banking, various business and humans’ necessary needs are made available on the Internet. Day by day number of users and service providers of these facilities are exponentially growing up. The people face the challenges of how to reach their target among the enormous Information on web on the other side the owners of web site striving to retain their visitors among their competitors. Personalized attention on a user is one of the best solutions to meet the challenges. Thousands of papers have been published about personalization. Most of the papers are distinct either in gathering users’ logs, or preprocessing the web logs or Mining algorithm. In this paper simple codification is performed to filter the valid web logs. The codified logs are preprocessed with polynomial vector preprocessing and then trained with Back Propagation Algorithms. The computational efforts are calculated with various set of usage logs. The results are proved the goodness of the algorithm than the conventional methods.

**Keywords- web usage mining; Back propagation algorithm;,
**

Polynomial vector processing

I.

INTRODUCTION

Web users feel comfortable if they reached the desired web page within the minimum navigation on a web site. A study of Users’ recent behavior on the web will be useful to predict their desired target page. Generally Users’ browsing patterns are stored in the web logs of a web server. These patterns are learned through the efficient algorithms to find the target page. Backpropagation Algorithm with Polynomial Vector Preprocessing,(BPAPVP) is implemented for learning the patterns. With learned knowledge, various set of users’ browsing patterns are tested. The results are observed and presented as an analysis on computational efforts of the algorithm. The analysis on the results proves the correctness of the algorithm. Thus the BPAPVP leads to improved web usage mining than the numerous conventional methods. A. Literature Review Michael Chau et al. [1] attempted to use Hopfield Net for web analysis. The web structure and content analysis are incorporate into the network through a new design of network Their algorithm performed (70% of accuracy) better than traditional web search algorithms such as

breadth-first search(42.6% of accuracy) and best-first search algorithms(48.2% of accuracy). David Martens et al. [2] proposed a new active learning based approach (ALBA) to extract comprehensible rules from opaque SVM models. They applied ALBA on several publicly available data sets and confirmed its predictive accuracy. Dilhan Perera [3] et al. have performed mining of data collected from students working in teams and using an online collaboration tool in a one-semester software development project. Clustering was applied to find both groups of similar teams and similar individual members, and sequential pattern mining was used to extract sequences of frequent events. The results revealed interesting patterns characterizing the work of stronger and weaker students. Key results point to the value of analysis based on each resource and on individuals, rather than just the group level. They also found that some key measures can be mined from early data, in time for these to be used by facilitators as well as individuals in the groups. Some of the patterns are specific for their context (i.e., the course requirements and tool used). Others are more generic and consistent with psychological theories of group work, e.g., the importance of group interaction and leadership for success. Edmond H.Wu et al.[4] introduced an integrated data warehousing and data mining framework for website management. The model focuses on the page, user and time attributes to form a multidimensional can be which can be frequently updated and queried. The experiment shown that data model is effective and flexible for different analysis tasks. Gaungbin Huang et al. [5] proposed a simple learning algorithm capable of real-time learning, which can automatically determine the parameters of the network at one time only. This learning algorithm is compared with BP and k-NN algorithm. There are 4601 instances and each instance has 57 attributes. In the simulation 3000 randomly selected instances compose the training set and all the rest are used for testing. RLA achieves good testing accuracy at very fast learning speed; however BP need to spend 4641.9s on learning which is not realistic in such a practical real-time application. In the forest typed prediction problem 100,000 training data and 481012

160

http://sites.google.com/site/ijcsis/ ISSN 1947-5500

(IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 8, November 2010

testing data have been taken. The testing time of k-NN can be as long as 26 hours, where as RLA finished within 65.648 seconds. Incorporating neural network (NN) into supervised learning classifier system (UCS) [6] offers a good compromise between compactness, expressiveness, and accuracy. A simple artificial NN is used as the classifier’s action and obtained a more compact population size, better generalization and the same or better accuracy while maintaining a reasonable level of expressiveness negative correlation learning (NCL) is also applied during the training of the resultant NN ensemble. NCL is shown to improve the generalization of the ensemble. Hongjun Lu et al.[7] proposed an neural network to extract concise symbolic rules with high accuracy. They have been improving the speed of network training by developing fast algorithms, the time required to extract rules by our neural network approach is still longer than the time needed by the decision tree approach. They tried to reduce the training time and improve the classification accuracy is to reduce the number of input units by feature selection. James caverlee et al.[8] presented the Thor framework for sampling, locating and partitioning the QA-Pagelets ( Query-Answer pagelets) from the Deep web. [ Large and growing collection of web accessible databases known as the deep web ] Their experiments have shown that the proposed page clustering algorithm achieves low-entropy clusters and the sub-tree clustering algorithms identify QA-Pagelets with excellent precision and recall. Lotfi Ben Romdhane [9] extends a neural model for casual reasoning to mechanize the monotonic class. They developed Unified Neural Explainer (UNEX) for casual reasoning (independent, incompatibility and open). UNEX is mechanized by the use of Fuzzy AND-ing networks, whose activation is based on new principle, called softmin. They considered a battery of 1000 random manifestations/cases. UNEX had a coverage ration greater than 0.95 in 220 cases (22%). Magdalini Eirinaki et al. [10] presented a survey of the use of web mining for web personalization. A review of the most common methods that are used as well as technical issues that occur is given, along with a brief overview of the most popular tools and applications available from S/W vendors. Mankuan Vai et al.[11] developed a systematic approach that creates a Hop field network to represent qualitative knowledge about a system for analysis and reasoning. A simple sic node neural network is designed as a building block to capture basic qualitative relations. The objective of the transistor modelling technique is to determine the topology of an equivalent circuit and to extract its element values from the measured device data. The ultimate advantage of the neural network is in its capability of implementing the neural network as a parallel distributed processor, which will remove the time consuming factor of sequentially updating individual neurons. C. Porcel et al. [12] presented a new fuzzy

linguistic recommender system that facilitates the acquisition of the user preferences to characterize the user profiles. They allowed users to provide their preferences by means of incomplete fuzzy linguistic preference relation. The user profile is completed with user preferences on the collaboration possibilities with other users. Therefore, this recommender system acts as a decision support system that makes decisions about both the resources that could be interesting for a researcher and his/her collaboration possibilities with other researchers to form interesting working groups. The experimental results shown the user satisfaction with the received recommendations. The average of precision, recall and F1 (F1 is a combination metric that gives equal weight to both precision) metrics are 67.50%, 61.39% and 63.51%, respectively. Ranieri Barglia et al.[13] proposed a recommender system that helps user to navigate through the web by providing dynamically generated links to pages that have not been visited and are of potential interest. They contributed and suggest, a privacy enhanced recommender system that allows for creating serendipity recommendations without breaching users privacy. They said that a system is privacy safe if the two conditions hold: (i) The user activity cannot be tracked (ii) The user activity cannot be inferred. They conducted a set of experiments assess the quality of recommendations Sankar K.Pal et al. [14] summarized the different type of web mining and its basic components, along with their current states of are. The limitations of existing web mining methods / tools are explained. The relevance of soft computing is illustrated through example and diagrams. Tianyi et al. [15] is examined the problem of optimal partitioning of customer bases into homogeneous segments for building better customer profiles and have presented the direct grouping approach as a solution. That approach partitions the customers not based on computed statistics and particular clustering algorithms, but in terms of directly combining transactional data of several customers and building a single model of customer behaviour on that combined data. They formulated the optimal partitioning problem as a combinatorial optimization problem and showed that it is NP-hard. Then, three suboptimal polynomial-time direct grouping methods, Iterative Merge (IM), Iterative Growth (IG), and Iterative Reduction (IR) are shown that the IM method provides the best performance among them. It is shown that the best direct grouping method significantly dominates the statistics-based and one-to-one approaches across most of the experimental conditions, while still being computationally tractable. It is also shown that the distribution of the sizes of customer segments generated by the best direct grouping method follows a power law distribution and that micro segmentation provides the best approach to personalization. Vir V.Phoha et al.[16] developed a new learning algorithm for fast web page allocation on a server using the self-organizing properties

161

http://sites.google.com/site/ijcsis/ ISSN 1947-5500

(IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 8, November 2010

of the neural network (NN). They compared the performance of the algorithm with round-robin (RR). As the number of input objects increases, the algorithm achieves a hit ratio close to 0.98 whereas RR schema never achieve more than 0.4. Xiaozhe Wang et al.[17] proposed a concurrent neuro-fuzzy model to discover and analyze useful knowledge from the available web log data. They made use of the cluster information generated by self organizing map for pattern analysis and a fuzzy inference systerm to capture the chaotic trend to provide short-term(hourly) and long-term (daily) web traffic trend predictions. Yu-Hui et al.[18] explored a new data source called intentional browsing data (IBD)for potentially improving the effectiveness of WUM applications IBD is a category of online browsing actions such as “copy”, “scroll”, or “save as “ and is not recorded in web log files. Consequently this research aims to build a basic understanding of IBD, which will lead to its easy adoption in WUM research and practice. Specially, this paper formally defines IBD and clarifies its relationship with other browsing data. Zhicheng Douet al. [19] developed an evaluation framework based on real query logs to enable large-scale evaluation of personalized search. They have taken 5 algorithms for evaluation research (i) Click-based algorithm (P-Click) , (ii) longterm user topic interests ( L-Topic) (iii) Short-term interests (S-Topic) (iv) Hybrid of L-Topic and S-Topic, (LS-Topic).(v) Group base personalization (G-Click). They found that no personalization algorithms can outperform others for all queries and concluded that different methods have different strength and weakness. Zi Lu et al. [20] reviewed related research results in this area and their practical significance for a comprehensive explanation of various effect functions based on utility theory. They used the data on Internet development in China and related intelligent decision models to calculate the effect function. Based on the findings, they explained the features of the effect of website information flow on realistic human flow from various aspects. Research results showed that the effect of website information flow can be divided into substitution and enhancement, so that the relationship of the website information flow in guiding the human flow changes from one dimension to multi-dimensional morphology. They indicated that, on one hand, website information flow is lagged to some extent, but is enhanced gradually and grows faster than realistic human flow; on the other hand, by comparing the evolution trend of the intensity of the two functions, it can be seen that the enhancement function occurs later than the substitution, but develops faster and has greater force. Following comparison between the simulation value and the actual value, it is proved that the effect of website information flow is basically in line with the relationship of realistic human flow. These results can support government and business in making decisions on web information publication. Through the comparison between

enhancement effect and substitution effect, they found that the substitution and enhancement effect of website information flow to realistic human flow exist simultaneously. The development trend of the enhancement effect is quicker than that of the substitution effect, and the enhancement effect is stronger. The information flow guiding human flow in the initial period of the network economy suggests that the substitution effect is stronger, and in the later period that the enhancement effect is stronger and quicker.

II.

PROBLEM DEFINITION

Users’ browsing patterns are gathered from the web server and then extracts only the valid logs i,e., The logs that doesn’t contain robots.txt, .jpg, ,gif etc and unsuccessful request. These logs are codified with Meta data of the web site. Then the codified patterns are applied to the polynomial vector for preprocessing . The preprocessed data are fed to back propagation algorithm for training the usage patterns. Machine learning theory based web usage mining assumes no statistical information about the web logs. This work falls under the category of supervised learning by employing two phase strategies such as a) Training phase b) Testing phase. In training phase, original logs are codified by simple substitution of unique page_id instead of page name for all the successful html requests and are interpolate by preprocessing into polynomial vector. The n dimensional patterns are innerproduct to obtain 2 dimensional vectors which is trained by neural classifier to learn the nature of the logs. BPA takes the role of neural classifier in this work. By training the classifier for a specific users’ logs a reasonably accurate suggestions can be derive. In testing phase, various users’ logs are supplied to the trained classifier to decide which page-id is to be suggested. The flow charts of both phases are given in Figure1.a and Figure 1.b

Fig 1(a) Training Phase

162

http://sites.google.com/site/ijcsis/ ISSN 1947-5500

(IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 8, November 2010

Fig.1 (b) Testing Phase III. IMPLEMENTATION

The simulation of personalization through web usage mining has been implemented using MATLAB 7®. Sample sets of logs are taken from ProtechSC’s web server. These logs are filtered and codified. Table II gives sample codified logs that have been obtained after codification of the extended log format. Each number refers to a webpage. The % symbol is the comment and the number after the percent is the line number. Users’ 50 days patterns have been collected. 25 patterns have been used for training and the remaining patterns used for testing. A. Filter the Log File the web logs are collected from the web server of www.protechsc.net . Sample web log file of this site is given in Fig.2

TABLE I- CODIFICATION TABLE OF WWW.PROTECHSC.NET

Fingerprint_ANN.html FacialRecog.html ObjectRecog.html HarmonicAnalysis.html ImageCompression.html ImageDeconvolution.html Intrusion.html ImageCompression.html ImageRestoration.html ObjectTracing.html DigitalModulation.html EDM_Matching.html CuttingTool.html ToolWear.html PowerForecasting.html RemoteSpeaker.html SpeakerIdentification.html SegmentationTextures.html Steganalysis.html Steganagraphy.html SoftSecurity.html SurvillanceRobot.html TransmitterPlacement.html TextureSegmentation.html ImageRecovery.html WoodDefect.html 3DFacial.html

26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52

PageName

index.html aboutus.html Dissertation.html Whatwedo.html Projecttopics.html Services.html consultation.html Contactus.html PaymentDetails.html Enquies&Comment.html Algorithm Flowchart Submit SpeechSeparation.html WaveletPackett.html PwdAuthentication.html OFDM_Frequency.html CharRecog.html CarotidArtery.html AnalysisMRI.html BPA_Char.html DirectSearch.html Detect_micro classfication.html Cloud_Contamination.html Info_retrieval

Code

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Figure 2: Sample Web logs of www.protechsc.net The Filtering Process as follows: Step 1:Select the logs which don’t contain Robots.txt and request of image files. Step2: Group by IP address of the logs Step3: Codify the requested page with following information Step 4: Store only IP address, visited page-id into database and make use of it for the polynomial preprocessing. These steps are pictorially presented in figure 3.

163

http://sites.google.com/site/ijcsis/ ISSN 1947-5500

nf is the number of features (nf = 11). An outer product matrix Xop of the original input vector is formed, and it is given by: X1X2 X1X3 X1X4 X1X5......X1X21 (2) X1X1

X2X1 X3X1 X4X1 Xop.. = .. X5X1 . . . X21X1

X2X2 X3X2 X4X2 X5X2

X2X3 X3X3 X4X3 X5X3

X2X4 X3X4 X4X4 X5X4

X21X2 X21X3..

X21X4 X21X5.....X21X21

X2X5......X2X21 X3X5......X3X21 X4X5......X4X21 X5X5......X5X21

Figure 3. Filtering the Logs TABLE II - CODIFIED WEBPAGE DETAILS OF A USER

a =[1, 2, 3, 4, 5 , 6, 7, 8,13, 0, 0, 0; %1 1, 2, 4, 5,14, 9,10,11,13, 8, 3, 0; %2 1, 2, 4, 5,14,10,11, 5,15,10,13, 4; %3 5,15, 9,10,11,12,13, 6, 8, 7, 0, 0; %4 5, 7, 8, 3, 4, 6, 0, 0, 0, 0, 0, 0; %5 5,16, 9,10,11,12, 3, 7, 8, 0, 0, 0; %6 5,17,10,11, 3, 6, 8, 1, 0, 0, 0, 0; %7 5,26, 9,10,11,12, 5,18, 9,10,11,12; %8 2, 3, 5,27,10,11,12, 6, 8, 0, 0, 0; %9 2, 4, 7, 5,19,10,11,12,13, 0, 0, 0; %10 2, 6, 5, 3, 4, 0, 0, 0, 0, 0, 0, 0; %11 3, 4, 5, 6, 7, 8, 5,32, 9,10,11, 0; %12 3, 5, 8, 0, 0, 0, 0, 0, 0, 0, 0, 0; %13 3, 7, 5,45, 9,10,11,12, 0, 0, 0, 0; %14 1, 4, 5, 7, 8, 3, 6, 5,38, 9,10,11; %15 4, 6, 3, 5,41, 9,10,11,12,13, 8, 1; %16 4, 5,18, 9,10,11,12, 2, 0, 0, 0, 0; %17 4, 6, 5, 8, 3, 5, 0, 0, 0, 0, 0, 0; %18 6, 3, 5, 7, 8, 1, 0, 0, 0, 0, 0, 0; %19 6, 7, 4, 3, 5,22, 5,34, 5,17, 9 , 8; %20 1, 4, 5,22, 9,10,11, 3, 0, 0, 0, 0; %21 2, 4, 8, 5,29, 5,32, 5,40, 9,10,11; %22 3, 6, 7, 5,14, 5,16, 5, 4, 0, 0, 0; %23 7, 8, 3, 4, 5, 0, 0, 0, 0, 0, 0, 0; %24 1, 3, 4, 5,14, 9,10,11,12, 2,13, 0] %25

Using the Xop matrix, the following polynomials are generated: (i) Product of inputs (NL1) it is denoted by: ∑wijxi (i≠j) = Off-diagonal elements of the outer product matrix. (3) The pre-processed input vector is a 55-dimensional vector. ii) Quadratic terms (NL2) It is denoted by: Σwijxi2 = Diagonal elements of the outer (4) product matrix. The pre-processed input vector is a 11-dimensional vector. iii) A combination of product of inputs and quadratic terms (NL3) It is denoted by: Σwijxi(i≠j) + Σwijxi2 = Diagonal elements and Off-diagonal elements of the outer product matrix. (5) The pre-processed input vector is a 66 dimensional vector. iv) Linear plus NL1 (NL4) The pre-processed input vector is a 66 dimensional vector. (6) v) Linear plus NL2 (NL5) The pre-processed input vector is a 22-dimensional vector. (7) vi) Linear plus NL3 (NL6) The pre-processed input vector is a 55-dimensional vector. (8) In the above polynomials such as NL4, NL5 and NL6 vector, the term ‘linear’ represents the normalized input pattern without pre-processing. When the training of the network is done with a fixed pre-processing of the input vector, the number of iterations required is less than that required for the training of the network without pre-processing of the input vector to reach the desired MSE. The combinations of different pre-processing methods with different synaptic weight update algorithms are shown in Table III. BPA weight update algorithms have been used with fixed pre-processed input vectors for learning. C. Back Propagation Algorithm A neural network is constructed by highly interconnected processing units (nodes or neurons) which perform simple mathematical operations . Neural networks are characterized by their topologies, weight vectors and activation function which are used in the hidden layers and output layer. The

B.

Polynomial Interpolation Polynomial interpolation is the interpolation of a given navigation patterns by a polynomial set obtained by outer product the given navigation sequence. Polynomial interpolation forms the basis for comparing information between two points. The pre-processing generates a polynomial decision boundary. The pre-processing of the input vector is done as follows: Let X represents the normalized input vector, X = Xi ; i=1,…nf, (1) Where Xi is the feature of the input vector

164

http://sites.google.com/site/ijcsis/ ISSN 1947-5500

topology refers to the number of hidden layers and connection between nodes in the hidden layers. The activation functions that can be used are sigmoid, hyperbolic tangent and sine. The network models can be static or dynamic . Static networks include single layer perceptrons and multilayer perceptrons. A perceptron or adaptive linear element (ADALINE) refers to a computing unit. This forms the basic building block for neural networks. The input to a perceptron is the summation of input pattern vectors by weight vectors. In most of the applications one hidden layer is sufficient. The activation function which is used to train the Artificial Neural Network is the sigmoid function. 1) Training 1. Read log files and filter it 2. Separate the data into inputs and target 3. Preprocess the data to any NL 4. Calculate Principal Component Vector by Z=Z*ZT (9) Where Z denotes the cleaned logs 5. Train the BPA. 5.a Forward Propagation (i) The weights of the network are initialized. (ii) The inputs and outputs of a pattern are presented to the network (iii) The output of each node in the successive layers is calculated. (10) O (output of a node) = 1/(1+exp(-∑Wij Xi )) (iv) The error of a pattern is calculated E(p) = (1/2) ∑(d(p)-o(p))2 (11) 5.b Backward Propagation (i) The error for the nodes in the output layer is calculated δ(output layer) = o(1-o)(d-o) (12) (ii) Weights between output layer and hidden layer are updated. W(n+1) = W(n) + ηδ(output layer) o(hidden layer) (13) (iii) The error for the nodes in the hidden layer is calculated. δ(hidden layer) = o(1-o) ∑δ(output layer) W (updated weights between hidden and output layer) (14) (iv) The weights between hidden and input layer are updated W(n+1) = W(n) + ηδ(hidden layer) o(input layer) (15) The above steps complete one weight updating. Second pattern is presented and the above steps are followed for the second weight updating. When all the training patterns are presented, a cycle of iteration or epoch is completed. The errors of all the training patterns are calculated and displayed on the monitor as the mean squared error (MSE). 2) Testing 1. Read filtered logs and separate into inputs and target 2. Preprocess the data with a polynomial function 3. Process with final weights of BPA 4. Generate the suggestions from the output layer

5. Present the suggestions through templates IV. RESULTS AND DISCUSSION

Figure 4 presents the mean squared error and classification performance of BPA without preprocessing the input vectors. Fig. 5 to Fig. 10 presents the MSE and classification performance of BPA with preprocessed input vectors. The computational effort, Mean squared error, the iterations required for various algorithm are presented in Table IV. From the Table III , it can be noted that , the algorithm with ( BPA +NL2 ) requires less number of computational effort to achieve minimum 80% classification. V. CONCLUSION

In this work, a preprocessing approach has been implemented for ANN to learn the web usage mining. The number of arithmetic operations required to train the network with a pre- processed input vector is more, indicating that the computational effort is more. The number of iterations required is less than that required for the vector without preprocessing. The classification performance after preprocessing is more than that of the network trained without pre-processing. The proposed method has to be tried with different types of web sites.

REFERENCES

[1] Chau, M.; Chen, H., Incorporating Web Analysis Into Neural Networks: An Example in Hopfield Net Searching, IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, Volume 37, Issue 3, May 2007 Page(s): 352 – 358 David Martens, Bart Baesens, and Tony Van Gestel, Decompositional Rule Extraction from Support Vector Machines by Active Learning, IEEE Transactions On Knowledge And Data Engineering, Vol. 21, No. 2, pp.178 – 191, February 2009 Dilhan Perera, Judy Kay, Irena Koprinska, Kalina Yacef, and Osmar R. Zaý¨ane, Clustering and Sequential Pattern Mining of Online Collaborative Learning Data, IEEE Transactions On Knowledge And Data Engineering, Vol. 21, No. 6, pp.759-772 June 2009 Edmond H.Wu, Michael K.Ng, Joshua Z. Huang, A Data Warehousing and Data Mining Framework for Web usage Management, Communication in Information And Systems Vol. 4, No.4 pp 301-324, 2004 Guang-Bin Huang, Qin-Yu, Chee-Kheong Siew, Real-Time Learning Capability of Neural Networks, IEEE Transactions on Neural Networks, Vol.17, No.4 July 2006, pp 863-878. Hai H. Dam, Hussein A. Abbass, , Chris Lokan, and Xin Yao, Neural-Based Learning Classifier Systems , IEEE Transactions On Knowledge And Data Engineering, Vol. 20, No. 1, pp. 26 – 39, January 2008 Hungjun Lu, Rudy Setiono , Huan Liu , Effective Data mining using neural networks, IEEE Transactions on knowledge and data engineering Vol.8 No.6 December 1996 pp 957-961, 1996. James Caverlee, Ling Liu, QA-Pagelet: Data Preparation Techniques for Large-Scale Data Analysis of the Deep Web, IEEE Transactions on knowledge and data engineering Vol.17 No.9 September 2005 pp 1247-1261 , 2005

[2]

[3]

[4]

[5]

[6]

[7]

[8]

165

http://sites.google.com/site/ijcsis/ ISSN 1947-5500

[9] Lotfi Ben Romdhane, A Softmin-Based Neural Model for Casual Reasoning, IEEE Transactions on Neural Networks, Vol.17, No.3 May 2006, pp 732-744 Magdalini Eirinaki and Michalis Vazirgiannis, Web Mining For Web Personalization, ACM Transactions on Internet Technology, Vol 3. No.1, February 2003 Pages 1-27 Mankuan Vai, Zhimin Xu, Representing Knowledge by Neural Networks for qualitative Analysis and Reasoning, IEEE Transactions on knowledge and data engineering Vol.7 No.5 October 1995, pp 683-690 C. Porcel , E. Herrera-Viedma , Dealing with incomplete information in a fuzzy linguistic recommender system to disseminate information in university digital libraries, ELSEVIER, Knowledge-Based Systems 23 (2010), pp. 40–47 Ranieri Baraglia and Fabrizio Silvestri , Dynamic Personalization of Web sites without user intervention, Communications of the ACM February 2007, Vol.50 No.2 pp 63-67 Sankar K.Pal, Pabitra Mirta , Web Mining in Soft Computing Framework: Relevance, State of the Art and Future Directions, IEEE Transactions on Neural Networks, Vol 13, No. 5 September 2002 pp 1163- 1176 Tianyi Jiang and Alexander Tuzhilin, Improving Personalization Solutions through Optimal Segmentation of Customer Bases, IEEE Transactions On Knowledge And Data Engineering, Vol. 21, No. 3, pp.305-320, March 2009.

2.5 55 X 5 X 1 70 2 80 55 X 5 X 1

[10]

60

% correct proposed webpage 0 100 200 Iterations 300 400

[11]

1.5 50

MSE

1

40

[12]

30

0.5

20

[13]

0 10

[14]

-0.5 -100

0

0

50

100

150

Figure 5 MSE and percentage of correct proposed webpage using (BPA+NL1) with preprocessing the input vector (Table II )

3 11 X 5 X 1 80 2.5 70 90 11 X 5 X 1

200 250 Iterations

300

350

400

[15]

1.5 MSE

% correct proposed webpage

[16] Vir V.Phoha, S.Sitharama iyengar, Rajgopal Kannan, Faster Web Page Allocation with Neural Networks, IEEE Internet Computing November-December 2002. pp 18-26 [17] Xiaozhe Wang, Ajith Abraham, Kate A. Smith, Intelligent web traffic mining and analysis, Journal of Network and Computer Applications 28 (2005) 147-165, ELSEVIER [18] Yu-Hui Tao a, Tzung-Pei Hong b, Yu-Ming Su c , Web usage mining with intentional browsing data, ELSEVIER Expert Systems with Applications, pp.1893–1904. Available online at www.sciencedirect.com 2008, www.elsevier.com/locate/eswa [19] Zhicheng Dou, Ruihua Song, Ji-Rong Wen, and Xiaojie Yuan, Evaluating the Effectiveness of Personalized Web Search, IEEE Transactions On Knowledge And Data Engineering, Vol. 21, No. 8, pp.1178 – 1190, August 2009 [20] Zi Lu Ruiling Han, Jie Duan , Analyzing the effect of website information flow on realistic human flow using intelligent decision models, ELSEVIER, Knowledge-Based Systems 23 (2010), pp. 40–47

2 60

50

1

40

30 0.5 20 0 10

-0.5 -20

0

20

40 Iterations

60

80

100

0

0

20

40 60 Iterations

80

100

Figure 6. MSE and percentage of correct proposed webpage using (BPA+NL2) with preprocessing the input vector (Table II)

2.5 66 X 5 X 1 70 2 80 66 X 5 X 1

1.8 11 X 3 X 1 1.6

90 11 X 3 X 1

60

80

1.5

1.4

% c orrec t propos ed webpage 0 20 40 Iterations 60 80 100

70

50

1.2

MS E

1

% Correct proposed webpage

60

1

40

50

MSE

0.8

30

40

0.5

0.6

20

30

0.4

0

0.2 20

10

0

10

-0.5 -20

0 50 Iterations 100 150 0 0 50 Iterations 100

0

0

20

40 60 Iterations

80

100

-0.2 -50

150

Figure .7 MSE and percentage of correct proposed webpage using (BPA+NL3) with preprocessing the input vector (Table II)

Figure 4. MSE and percentage of correct proposed webpage using BPA without preprocessing the input vector (Table II)

166

http://sites.google.com/site/ijcsis/ ISSN 1947-5500

1.8 66 X 5 X 1 1.6 70 1.4 60 1.2 % correct proposed webpage 50

1.2 % correct proposed webpage 50 1.4 60

80 66 X 5 X 1

1.8 55 X 5 X 1 1.6

80 55 X 5 X 1 70

1

1

MSE

MSE

0.8

40

0.8

40

0.6

0.6

30

0.4

30

0.4 20

20 0.2

0.2

10

10 0

0

-0.2 -20

0

20

40 Iterations

60

80

100

0

-0.2 -20

0

20

0

20

40 60 Iterations

80

100

40 Iterations

60

80

100

0

0

20

40 60 Iterations

80

100

Figure.8 MSE and percentage of correct proposed webpage using (BPA+NL4) with preprocessing the input vector (Table II )

1.8 22 X 5 X 1 1.6 80 90 22 X 5 X 1

Figure.10 MSE and percentage of correct proposed webpage using (BPA+NL6) with preprocessing the input vector (Table II ) AUTHORS PROFILE S.Santhi received her B.Sc and M.Sc degrees in Computer Science from University of Madras and Alagappa University in 1997 and 2000 respectively. She completed her M.Phil in Computer Science from Mother Teresa Womens’ University in 2003. Her areas of research includes Data Mining and Neural Networks.

1.4

70

1.2 60 1 % correct proposed webpage

50

M SE

0.8

40

0.6

30 0.4 20

0.2

0

10

-0.2 -20

0

20

40 Iterations

60

80

100

0

0

20

40 60 Iterations

80

100

Dr. S. Purushothaman is working as professor in Sun College of Engineering, Nagerkoil,India. He received his Ph.D from IIT Madras. His area of research includes Artificial Neural Networks, Image Processing and signal processing. He published more than 50 research papers in national and international journals.

Figure 9.MSE and percentage of correct proposed webpage using (BPA+NL5) with preprocessing the input vector (Table II)

167

http://sites.google.com/site/ijcsis/ ISSN 1947-5500

Education, banking, various business and humans’ necessary needs are made available on the Internet. Day by day number of users and service providers of these facilities are exponentially growing u...

Education, banking, various business and humans’ necessary needs are made available on the Internet. Day by day number of users and service providers of these facilities are exponentially growing up. The people face the challenges of how to reach their target among the enormous Information on web on the other side the owners of web site striving to retain their visitors among their competitors. Personalized attention on a user is one of the best solutions to meet the challenges. Thousands of papers have been published about personalization. Most of the papers are distinct either in gathering users’ logs, or preprocessing the web logs or Mining algorithm. In this paper simple codification is performed to filter the valid web logs. The codified logs are preprocessed with polynomial vector preprocessing and then trained with Back Propagation Algorithms. The computational efforts are calculated with various set of usage logs. The results are proved the goodness of the algorithm than the conventional methods.

Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

We've moved you to where you read on your other device.

Get the full title to continue

Get the full title to continue reading from where you left off, or restart the preview.

scribd