Project On ML in Python

A Project Report On Book Recommendation System Submitted to RAJIV GANDHI UNIVERSITY OF KNOWLEDGE AND TECHNOLOGIES, RK VALLEY, KADAPA in partial fulfillment of the requirements for the award of the Degree of BACHELOR OF TECHNOLOGY IN ELECTRONICS AND COMMUNICATION ENGINEERING Submitted by L.Hussaina R170417 G.Prasuna R170097 over ney DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING RAJIV GANDHI UNIVERSITY OF KNOWLEDGE TECHNOLOGIES (Catering the Educational Needs of Gifted Rural Youth of AP) R.K. Valley, Vempalli (M), Kadapa(dist)- 516330 2019-2023RAJIV GANDHI UNIVERSITY OF KNOWLEDGE, ‘TECHNOLOGIES (Catering the Educational Needs of Gifted Rural Youth of AP) R.K. Valley, Vempalli(M), Kadapa(dist)— 516330 2019-2023 DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING os, . RGUKT R.K.Vallley CERTIFICATE ‘This is to certify that the project report entitled “ Book Recommendation System” a bonafide record of the project work done and submitted by L.HUSSAINA —_(R170417) G.PRASUNA (R170097) for the partial fulfillment of the requirements for the award of B.Tech. Degree in ELECTRONICS AND COMMUNICATION ENGINEERING,RGUKT, RK Valley, Kadapa. GUIDE ASSITANT PROFESSOR B.SHAIKMOHAMMAD RAFI P.JANARDHANA REDDY INTERNAL EXAMINER EXTERNAL EXAMINERDECLARATION ‘We here by declare that the project report entitled “Book Recommendation System submitted to the Department of ELECTRONICS AND COMMUNICATION) ENGINEERING in partial fulfillment of requirements for the award of the degree of BACHELOR OF TECHNOLOGY. This project is the result of our own effort and that it has not been submitted to any other University or Institution for the award of any degree or diploma other than specified above. L. Hussaina (R170417) G.Prasuna(R170097)ACKNOWLEDGEMENTS We are thankful to our guide Rafi sir and Head of the Department of ELECTRONICS AND COMMUNICATION ENGINEERING, Mr. B. MADHAN MOHAN for his valuable, guidance and encouragement. His helping attitude and suggestions have helped us in the successful completion of the project. ‘We have great pleasure in expressing our hearty thanks to our beloved Director Mrs. Sandhya Rani for spending her valuable time with us to complete this project. Successful completion of any project cannot be done without proper support and encouragement, We sincerely thanks to the Management for providing all the necessary facilities during the course of study. We would like to thank our parents and friends, who have the greatest contributions in all our achievements, for the great care and blessings in making us successful in all our endeavors. Yours Sincerely| L.-Hussaina(R170417) G.Prasuna (R170097)Book Recommendation System Abstract Now-a-days, everyone depending on reviews by others in many things such as selecting a ‘movie to watch, buying products, reading a book. Recommender system are used for that purpose only. A recommender system is a kind of filtering system that predicts a user's rating of an item, Recommender systems recommend items to users by filtering through a large database of information using a ranked list of predicted ratings of items. Online Book recommender system is a recommender system for ones who love books. When selecting a book to read, individuals read and rely on the book ratings and reviews that previous users have written. In this paper, Hybrid Recommender system is used in which Collaborative Filtering and Content-Based Filtering techniques are used. The author used Collaborative techniques such as Clustering in which data-points are grouped into clusters. Algorithms such ‘as K-means clustering and Gaussian mixture are used for clustering, The better algorithm was selected with the help of silhouette score and used for clustering. Matrix Factorization technique such as Truncated-SVD which takes sparse matrix as input is used for reducing the features of a dataset. Content Based Filtering System used TF-IDF vectorizer which took statements as input and return a matrix of vectors. RMSE (Root Mean Square Error) is used for finding the deviation of an absolute value from an obtained value and that value is used for finding the fundamental accuracy.CONTENTS LINTRODUCTION 1.1 Introduction 1.2 Motivation of the work 1.3 Problem Statement 2.Methodology 3.IMPLEMENTATION 3.1 Reading and loading of the Datasets 3.2 Preprocessing of the data 3.3 Create Piviot Table 3.4 Vectorizing the data 3.5 Calculating Eucledian distance using Cosine similarity 3.6 Generating output using recommendation algorithm, ‘4.Algorithms and Techniques 4.1 LogisticRegression 4,2 Extra Tree Regressor 4.3 DecisionTreeClassifier 4.4 RandomForestClassifier 4.5 Ada Boost Regressor 5.RESULTS AND DISCUSSION 6.Condusion 7.References\LINTRODUCTION 1.1 Introduction Now-a-days, online rating and reviews are playing an important role in books sales. Readers were buying books depend on the reviews and ratings by the others. Recommender | system focuses on the reviews and ratings by the others and filters books. In this paper, Hybrid! |recommender system is used to boost our recommendations. The technique used by ‘recommender systems is Collaborative filtering, This technique filters information by |collecting data from other users. Collaborative filtering systems apply the similarity index- based technique. The ratingsof those items by the users who have rated both items determine |the similarity of the items. The similarity of users is determined by the similarity of the ratings |given by the users to an item. Content-based filtering uses the description of the items and |gives recommendations which are similar to the description of the items. With these two |filtering systems, books are recommended not only based on the user’s behaviour but also with the content of the books. So, our recommendation system recommends books to the new users |also. In this recommender system, books are recommended based on collaborative filtering technique and similar books are shown using content based filtering, 1.2 Motivation of the work In the present world, all products are buying based on the reviews and ratings by the |others, There are so many products with high rating and reviews but we only put our interest in some products which we like. Recommender system works on this principle only i.e. it recommends products based on the interest of the users. Our idea is to create recommender | system that recommends books based on the user's interest i.e. we recommends books which |are similar to the books that user already liked. It can also recommend books which are liked [by similar users. Similar users are those who liked the books which are liked by the current luser. 1.3 Problem Statement Recommending books using Machine learning algorithm is the main goal of this project. |Books are recommended by the clustering model and we are going to train and build using various features such as user’s rating, book description, book titles etc. The system groups users into clusters so that each data point within cluster is similar and dissimilar to the data |point in the other cluster. The system we would like to develop will also be able to find an Javerage rating for each cluster and it is going to find top rated books of users from each |dluster. All these books shortlisted by our system will be used for training our model in future. ‘The prediction model needs to be trained so as to produce better results.2.Methodology ‘The goal of this step is to find and acquire all the related datasets or data sources. In this, ‘step, the main aim is to identify various available data sources, as data are often collected from various online sources like databases and files. The size and the quality of the data in the collected dataset will determine the efficiency of the model. Data Pre-processing ‘The goal of this step is to study and understand the nature of data that was acquired in the previous step and also to know the quality of data. In this step, we will check for any null values and remove them as they may affect the efficiency, Identifying duplicates in the dataset and removing them is also done in this step. Feature Extraction After pre-processing the acquired data, the next step is to reduce the features i.e. Dimensionality reduction. The reduced features should be able to give high efficiency. ‘We used Matrix Factorization technique such as Truncated SVD which takes sparse ‘matrix as input for reduction of features. ‘Training Methods LogisticRegression Extra Tree Regressor DecisionTreeClassifier RandomForestClass ‘Ada Boost Regressor Testing Data Once Clustering model has been trained on pre-processed dataset, then the model is tested using different data points. In this testing step, the model is checked for the silhouette score for checking goodness of clustering, All the training methods need to be verified for finding out the best model to be used. 3.IMPLEMENTATION 3.1 Reading and loading of the Datasets Reading and loading of dataset using pandas 3.2 Preprocessing of the data Perform preprocessing of data such as displaying first &last rows,checking shape etc 3.3 Create Piviot Table Creating Pivot Table 3.4 Vectorizing the data ‘We have Vectorized the data for better performance 3.5 Calculating Eucledian distance using Cosine similarity Calculated Eucledian distance using Cosine similarity3.6 Generating output using recommendation algorithm. We have generated a output using recommendation algorithms 4.Algorithms and Techniques 4.1 LogisticRegression Since the outcome is binary and we have a reasonable number of examples at our disposal compared to number of features, this approach seems suitable. At the core of this method is a logistic or sigmoid function that quantifies the difference between each prediction and its corresponding true value.When presented with a number of inputs, it assigns different ‘weights to features (based on their relative importance). Since for this data it already knows the output beforehand, it continuously adjusts the ‘weights such that when these weights summed up with their features are introduced in the logistic function, the results are as near as possible to the actual ones. Once presented with a test value, it again inserts the value into our logistic function and returns the output as a number between 0 and 1, which represents the probability of that test value being in a Particular class. 4,2 Extra Tree Regressor The extra trees algorithm, like the random forests algorithm, creates many decision trees, but the sampling for each tree is random, without replacement. This creates a dataset for each tree with unique samples. A specific number of features, from the total set of features, are also selected randomly for each tree. 4.3 DecisionTreeClas fier Decision Tree is one of the most powerful and popular algorithm. Decision-tree algorithm falls under the category of supervised learning algorithms. It works for both continuous as well as categorical output variables. A decision tree is a flowchart-like tree structure where an internal node represents a feature(or attribute), the branch represents a decision rule, and each leaf node represents the outcome. The topmost node in a decision tree is known as the root node. It learns to partition on the basis of the attribute value. It partitions the tree in a recursive manner called recursive partitioning. This flowchart-like structure helps you in decision-making. It' visualization like a flowchart diagram which easily mimics the human level thinking. That is why decisiontrees are easy to understand and interpret. The basic idea behind any decision tree algorithm is as follows: 1. Select the best attribute using Attribute Selection Measures (ASM) to split the records. 2. Make that attribute a decision node and breaks the dataset into smaller subsets. 3, Start tree building by repeating this process recursively for each child until one of the conditions will match: + All the tuples belong to the same attribute value. + There are no more remaining attributes. + There are no more instances. 4.4 RandomForestClassifier ‘The random forest algorithm can also be thought of as an ensemble method in machine learning. The input to a random forest algorithm is a dataset consisting of records, with attributes. Random subsets of the input are created. On each of the random subset created, a decision tree will be constructed. The final class of a test record will be decided by the algorithm which uses the majority vote technique. Random forest algorithm makes use of the out of bag error technique.Each tree is constructed using the following algorithm: 1. Let the number of training cases be N, and the number of variables in the classifier be M. 2. We are told the number m of input variables to be used to determine the decision at a nodeof the tree;m should be much less than M. 3. Choose a training set for this tree by selecting N times with replacement from all N available training cases (i.e. take a bootstrap sample). Use the rest of the cases to estimate the error of the tree, by predicting their classes. 4, For each node in the tree, randomly choose m variables on which to base the decision at that node, Calculate the best split based on these m variables in the training set. 5, Each tree is fully grown and not pruned (as may be done in constructing a normal tree Classifier) 45 Ada Boost Regressor An AdaBoost regressor is a meta-estimator that begins by fitting a regressor on the original dataset and then fits additional copies of the regressor on the same dataset but where the weights of instances are adjusted according to the error of the current prediction,5.RESULTS AND DISCUSSION Table 11. Root Mean Squire and Accuracy on Validation Data SNo. Models Root mean square error ‘Accuracy on validation data 1 R 156556.45293730005, (0.022388308037899925 2 om 20691,78703623191, 0.9829226515205226 3 PER 25373.003231378854 0.9743215863591789 4 Em 211339154013095 0.982185059749982 5 8 35891,80415270158 0.9186175250138749 6 x8 29863.671852777836 0.9644277897507563, 7 BR 23021 588532473676 0.9771753318099993 Here We have created a Book recommandation system with Decision tree model of highest accuracy 98.216% 6.Conclusion In these methods colloborative filtering is the most used model and in the methods used to achireve this colloborative feeding decision tree regression algorithm is used because of its highest accuracy and low square mean error....Here we created a recommendation systems for book recommendation based on popularity ratings and interests of the user with an accuracy of 98.216 percent. 7.References + hnups://ieeexplore.iece.org/document/8303693/ ‘© https://journalofcloudcomputing springeropen, com/counter/pdt/10,1186/s13677-020-00199- 2.pdf2pdi=button%20sticky ‘@ hitps://media springernature.com/Iw685/springer-staticimage/art%3A 10,1186%2Fs13677- 020-00199-2/MediaObjects/13677_2020_199_Figi_HTML.png?as=webp tae af : gi12S/L109810812519, www Recommendation_Systems_on_Social_Network

Project On ML in Python

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Project On ML in Python

Uploaded by

Copyright:

Available Formats

You might also like