Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Download
Standard view
Full view
of .
Look up keyword
Like this
1Activity
0 of .
Results for:
No results containing your search query
P. 1
Implementation of Polynimial Neural Network in Web Usage Mining

Implementation of Polynimial Neural Network in Web Usage Mining

Ratings: (0)|Views: 135 |Likes:
Published by ijcsis
Education, banking, various business and humans’ necessary needs are made available on the Internet. Day by day number of users and service providers of these facilities are exponentially growing up. The people face the challenges of how to reach their target among the enormous Information on web on the other side the owners of web site striving to retain their visitors among their competitors. Personalized attention on a user is one of the best solutions to meet the challenges. Thousands of papers have been published about personalization. Most of the papers are distinct either in gathering users’ logs, or preprocessing the web logs or Mining algorithm. In this paper simple codification is performed to filter the valid web logs. The codified logs are preprocessed with polynomial vector preprocessing and then trained with Back Propagation Algorithms. The computational efforts are calculated with various set of usage logs. The results are proved the goodness of the algorithm than the conventional methods.
Education, banking, various business and humans’ necessary needs are made available on the Internet. Day by day number of users and service providers of these facilities are exponentially growing up. The people face the challenges of how to reach their target among the enormous Information on web on the other side the owners of web site striving to retain their visitors among their competitors. Personalized attention on a user is one of the best solutions to meet the challenges. Thousands of papers have been published about personalization. Most of the papers are distinct either in gathering users’ logs, or preprocessing the web logs or Mining algorithm. In this paper simple codification is performed to filter the valid web logs. The codified logs are preprocessed with polynomial vector preprocessing and then trained with Back Propagation Algorithms. The computational efforts are calculated with various set of usage logs. The results are proved the goodness of the algorithm than the conventional methods.

More info:

Published by: ijcsis on Dec 04, 2010
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less

12/04/2010

pdf

text

original

 
(IJCSIS) International Journal of Computer Science and Information Security,Vol. 8, No. 8, November 2010
Implementation of Polynomial Neural Network inWeb Usage Mining
S.Santhi
Research Scholar Mother Teresa Women’s UniversityKodaikanal, India
Dr. S. Purushothaman
PrincipalSun college of Engineering and Technology Nagarkoil, India
 Abstract 
Education, banking, various business and humans’necessary needs are made available on the Internet. Day by daynumber of users and service providers of these facilities areexponentially growing up. The people face the challenges of howto reach their target among the enormous Information on web onthe other side the owners of web site striving to retain theirvisitors among their competitors. Personalized attention on auser is one of the best solutions to meet the challenges.Thousands of papers have been published aboutpersonalization. Most of the papers are distinct either ingathering users’ logs, or preprocessing the web logs or Miningalgorithm. In this paper simple codification is performed tofilter the valid web logs. The codified logs are preprocessed withpolynomial vector preprocessing and then trained with Back Propagation Algorithms. The computational efforts arecalculated with various set of usage logs. The results are provedthe goodness of the algorithm than the conventional methods.
 Keywords-
 
web usage mining; Back propagation algorithm;, Polynomial vector processing 
I.
 
I
 NTRODUCTION
Web users feel comfortable if they reached the desired web page within the minimum navigation on a web site. A study of Users’ recent behavior on the web will be useful to predicttheir desired target page. Generally Users’ browsing patternsare stored in the web logs of a web server. These patterns arelearned through the efficient algorithms to find the target page.Backpropagation Algorithm with Polynomial Vector Preprocessing,(BPAPVP) is implemented for learning the patterns. With learned knowledge, various set of users’ browsing patterns are tested. The results are observed and presented as an analysis on computational efforts of thealgorithm. The analysis on the results proves the correctness of the algorithm. Thus the BPAPVP leads to improved web usagemining than the numerous conventional methods.
 A.
 
 Literature Review
Michael Chau et al. [1] attempted to use Hopfield Netfor web analysis. The web structure and content analysisare incorporate into the network through a new design of network Their algorithm performed (70% of accuracy) better than traditional web search algorithms such as breadth-first search(42.6% of accuracy) and best-firstsearch algorithms(48.2% of accuracy). David Martens etal. [2] proposed a new active learning based approach(ALBA) to extract comprehensible rules from opaqueSVM models. They applied ALBA on several publiclyavailable data sets and confirmed its predictive accuracy.Dilhan Perera [3] et al. have performed mining of datacollected from students working in teams and using anonline collaboration tool in a one-semester softwaredevelopment project. Clustering was applied to find bothgroups of similar teams and similar individual members,and sequential pattern mining was used to extractsequences of frequent events. The results revealedinteresting patterns characterizing the work of stronger and weaker students. Key results point to the value of analysis based on each resource and on individuals, rather than just the group level. They also found that some keymeasures can be mined from early data, in time for theseto be used by facilitators as well as individuals in thegroups. Some of the patterns are specific for their context(i.e., the course requirements and tool used). Others aremore generic and consistent with psychological theoriesof group work, e.g., the importance of group interactionand leadership for success. Edmond H.Wu et al.[4]introduced an integrated data warehousing and datamining framework for website management. The modelfocuses on the page, user and time attributes to form amultidimensional can be which can be frequently updatedand queried. The experiment shown that data model iseffective and flexible for different analysis tasks. Gaung- bin Huang et al. [5] proposed a simple learning algorithmcapable of real-time learning, which can automaticallydetermine the parameters of the network at one time only.This learning algorithm is compared with BP and k-NNalgorithm. There are 4601 instances and each instancehas 57 attributes. In the simulation 3000 randomlyselected instances compose the training set and all the restare used for testing. RLA achieves good testing accuracyat very fast learning speed; however BP need to spend4641.9s on learning which is not realistic in such a practical real-time application. In the forest typed prediction problem 100,000 training data and 481012
160http://sites.google.com/site/ijcsis/ISSN 1947-5500
 
(IJCSIS) International Journal of Computer Science and Information Security,Vol. 8, No. 8, November 2010
testing data have been taken. The testing time of k-NNcan be as long as 26 hours, where as RLA finished within65.648 seconds. Incorporating neural network (NN) intosupervised learning classifier system (UCS) [6] offers agood compromise between compactness, expressiveness,and accuracy. A simple artificial NN is used as theclassifier’s action and obtained a more compact population size, better generalization and the same or  better accuracy while maintaining a reasonable level of expressiveness negative correlation learning (NCL) is alsoapplied during the training of the resultant NN ensemble. NCL is shown to improve the generalization of theensemble. Hongjun Lu et al.[7] proposed an neuralnetwork to extract concise symbolic rules with highaccuracy. They have been improving the speed of network training by developing fast algorithms, the timerequired to extract rules by our neural network approachis still longer than the time needed by the decision treeapproach. They tried to reduce the training time andimprove the classification accuracy is to reduce thenumber of input units by feature selection. Jamescaverlee et al.[8] presented the Thor framework for sampling, locating and partitioning the QA-Pagelets (Query-Answer pagelets) from the Deep web. [ Large andgrowing collection of web accessible databases known asthe deep web ] Their experiments have shown that the proposed page clustering algorithm achieves low-entropyclusters and the sub-tree clustering algorithms identifyQA-Pagelets with excellent precision and recall. LotfiBen Romdhane [9] extends a neural model for casualreasoning to mechanize the monotonic class. Theydeveloped Unified Neural Explainer (UNEX) for casualreasoning (independent, incompatibility and open).UNEX is mechanized by the use of Fuzzy AND-ingnetworks, whose activation is based on new principle,called softmin. They considered a battery of 1000random manifestations/cases. UNEX had a coverageration greater than 0.95 in 220 cases (22%). MagdaliniEirinaki et al. [10] presented a survey of the use of webmining for web personalization. A review of the mostcommon methods that are used as well as technical issuesthat occur is given, along with a brief overview of themost popular tools and applications available from S/Wvendors. Mankuan Vai et al.[11] developed a systematicapproach that creates a Hop field network to representqualitative knowledge about a system for analysis andreasoning. A simple sic node neural network is designedas a building block to capture basic qualitative relations.The objective of the transistor modelling technique is todetermine the topology of an equivalent circuit and toextract its element values from the measured device data.The ultimate advantage of the neural network is in itscapability of implementing the neural network as a parallel distributed processor, which will remove the timeconsuming factor of sequentially updating individualneurons. C. Porcel et al. [12] presented a new fuzzylinguistic recommender system that facilitates theacquisition of the user preferences to characterize the user  profiles. They allowed users to provide their preferences by means of incomplete fuzzy linguistic preferencerelation. The user profile is completed with user  preferences on the collaboration possibilities with other users. Therefore, this recommender system acts as adecision support system that makes decisions about boththe resources that could be interesting for a researcher andhis/her collaboration possibilities with other researchers toform interesting working groups. The experimental resultsshown the user satisfaction with the receivedrecommendations. The average of precision, recall and F1(F1 is a combination metric that gives equal weight to both precision) metrics are 67.50%, 61.39% and 63.51%,respectively. Ranieri Barglia et al.[13] proposed arecommender system that helps user to navigate throughthe web by providing dynamically generated links to pages that have not been visited and are of potentialinterest. They contributed and suggest, a privacyenhanced recommender system that allows for creatingserendipity recommendations without breaching users privacy. They said that a system is privacy safe if the twoconditions hold: (i) The user activity cannot be tracked (ii)The user activity cannot be inferred. They conducted a setof experiments assess the quality of recommendationsSankar K.Pal et al. [14] summarized the different type of web mining and its basic components, along with their current states of are. The limitations of existing webmining methods / tools are explained. The relevance of soft computing is illustrated through example anddiagrams. Tianyi et al. [15] is examined the problem of optimal partitioning of customer bases into homogeneoussegments for building better customer profiles and have presented the direct grouping approach as a solution. Thatapproach partitions the customers not based on computedstatistics and particular clustering algorithms, but in termsof directly combining transactional data of severalcustomers and building a single model of customer  behaviour on that combined data. They formulated theoptimal partitioning problem as a combinatorialoptimization problem and showed that it is NP-hard.Then, three suboptimal polynomial-time direct groupingmethods, Iterative Merge (IM), Iterative Growth (IG), andIterative Reduction (IR) are shown that the IM method provides the best performance among them. It is shownthat the best direct grouping method significantlydominates the statistics-based and one-to-one approachesacross most of the experimental conditions, while still being computationally tractable. It is also shown that thedistribution of the sizes of customer segments generated by the best direct grouping method follows a power lawdistribution and that micro segmentation provides the bestapproach to personalization. Vir V.Phoha et al.[16]developed a new learning algorithm for fast web pageallocation on a server using the self-organizing properties
161http://sites.google.com/site/ijcsis/ISSN 1947-5500
 
(IJCSIS) International Journal of Computer Science and Information Security,Vol. 8, No. 8, November 2010
of the neural network (NN). They compared the performance of the algorithm with round-robin (RR). Asthe number of input objects increases, the algorithmachieves a hit ratio close to 0.98 whereas RR schemanever achieve more than 0.4. Xiaozhe Wang et al.[17] proposed a concurrent neuro-fuzzy model to discover andanalyze useful knowledge from the available web logdata. They made use of the cluster information generated by self organizing map for pattern analysis and a fuzzyinference systerm to capture the chaotic trend to provideshort-term(hourly) and long-term (daily) web traffictrend predictions. Yu-Hui et al.[18] explored a new datasource called intentional browsing data (IBD)for  potentially improving the effectiveness of WUMapplications IBD is a category of online browsing actionssuch as “copy”, “scroll”, or “save as “ and is not recordedin web log files. Consequently this research aims to builda basic understanding of IBD, which will lead to its easyadoption in WUM research and practice. Specially, this paper formally defines IBD and clarifies its relationshipwith other browsing data. Zhicheng Douet al. [19]developed an evaluation framework based on real querylogs to enable large-scale evaluation of personalizedsearch. They have taken 5 algorithms for evaluationresearch (i) Click-based algorithm (P-Click) , (ii) long-term user topic interests ( L-Topic) (iii) Short-terminterests (S-Topic) (iv) Hybrid of L-Topic and S-Topic,(LS-Topic).(v) Group base personalization (G-Click).They found that no personalization algorithms can out- perform others for all queries and concluded that differentmethods have different strength and weakness. Zi Lu etal. [20] reviewed related research results in this area andtheir practical significance for a comprehensiveexplanation of various effect functions based on utilitytheory. They used the data on Internet development inChina and related intelligent decision models to calculatethe effect function. Based on the findings, they explainedthe features of the effect of website information flow onrealistic human flow from various aspects. Researchresults showed that the effect of website information flowcan be divided into substitution and enhancement, so thatthe relationship of the website information flow inguiding the human flow changes from one dimension tomulti-dimensional morphology. They indicated that, onone hand, website information flow is lagged to someextent, but is enhanced gradually and grows faster thanrealistic human flow; on the other hand, by comparing theevolution trend of the intensity of the two functions, it can be seen that the enhancement function occurs later thanthe substitution, but develops faster and has greater force.Following comparison between the simulation value andthe actual value, it is proved that the effect of websiteinformation flow is basically in line with the relationshipof realistic human flow. These results can supportgovernment and business in making decisions on webinformation publication. Through the comparison betweenenhancement effect and substitution effect, they foundthat the substitution and enhancement effect of websiteinformation flow to realistic human flow existsimultaneously. The development trend of theenhancement effect is quicker than that of the substitutioneffect, and the enhancement effect is stronger. Theinformation flow guiding human flow in the initial periodof the network economy suggests that the substitutioneffect is stronger, and in the later period that theenhancement effect is stronger and quicker.II.
 
P
ROBLEM
D
EFINITION
 Users’ browsing patterns are gathered from the web server andthen extracts only the valid logs i,e., The logs that doesn’tcontain robots.txt, .jpg, ,gif etc and unsuccessful request.These logs are codified with Meta data of the web site. Thenthe codified patterns are applied to the polynomial vector for  preprocessing . The preprocessed data are fed to back  propagation algorithm for training the usage patterns.Machine learning theory based web usage mining assumesno statistical information about the web logs. This work fallsunder the category of supervised learning by employing two phase strategies such as a) Training phase b) Testing phase. Intraining phase, original logs are codified by simplesubstitution of unique page_id instead of page name for all thesuccessful html requests and are interpolate by preprocessinginto polynomial vector. The n dimensional patterns are inner- product to obtain 2 dimensional vectors which is trained byneural classifier to learn the nature of the logs. BPA takes therole of neural classifier in this work. By training the classifier for a specific users’ logs a reasonably accurate suggestions can be derive. In testing phase, various users’ logs are supplied tothe trained classifier to decide which page-id is to besuggested. The flow charts of both phases are given inFigure1.a and Figure 1.bFig 1(a) Training Phase
 
162http://sites.google.com/site/ijcsis/ISSN 1947-5500

You're Reading a Free Preview

Download
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->