You are on page 1of 36
SAVEETHATWUICLEIS ry Bo Rs ENGINEERING COLLEGE no Affiliated to Anna University | Approved by AICTE LABORATORY RECORD NOTEBOOK Name of the Student: o...ccscssssssscsssescsssecssnee Register Number Department Semester Subject bea Yea, ENGINEERING COLLEGE COLLEGE | Affiliated to Anna University | Approved by AICTE University | Approved by AICTE SAVEETHA WUICIEI El Department Of .........:ccsccsessessesessesssesesessescsesseseeseesenesesseseeseneeseesenees LABORATORY RECORD NOTE BOOK 202 - 202 It is to certify that this is a bonafide record work done by DS scree ccpacess ccceap saenceeene ME RISTOR PUNO ED oaesrsctesceecearatse of | / Ul / Il / WV year B.E / Bech. / M.E / M.B.A Department of .. inthe ... Laboratory in the ..........:csssssseeeeeeeeeee SeMeEster. Staff in — Charge Head of the Department University Examination held OM ..........ssssseseseeeees Internal Examiner External Examiner sl Date Name of the Experiment Pg. No. | Marks | Signature No. limplementation of Linear Regression Using 1 | 06-09-2021 |Gradient Descent 2 implementation of Simple Linear Regression 2 | 08-09-2021 |Model for Predicting the Marks Scored 6 limplementation of Multivariate Linear Regression 3 | 13-09-2021 |Model for Sales Prediction 10 limplementation of Logistic Regression Model to 4 | 27-09-2021 Predict the Placement Status of Student 13 implementation of Logistic Regression Using 5_| 29-09-2021 [Gradient Descent 16 implementation of Decision Tree Classifier Model 6 | 11-10-2021 ror Predicting Employee Churn 21 implementation of Decision Tree Regressor Mode 7 | 13-10-2021 ffor Predicting the Salary of the Employee 24 limplementation of K Means Clustering for 8 | 20-10-2021 |Customer Segmentation 27 limplementation of Movie Recommender System 9 | 27-10-2021 30 limplementation of SVM For Spam Mail Detection| 10 | 01-11-2021 33 Ex.No:1 Date : 06-09-2021 Implementation of Linear Regression Using Gradient Descent Aim: To implement the simple linear regression model using gradient descent. ‘Theory : Regression analysis is a statistical method to model the relationship between a dependent and independent variable with more independent variables.Regression analysis helps us to understand how the value of the dependent variable is changing corresponding to an independent variable.It predicts continuous/real values such as temperature , age , salary , price etc,... Linear Regression is a statistical regression method which is used for predictive analysis Types: If there is only one input variable (x) , then such linear regression is called simple linear regression. And if there is more than one input variable, then such linear regression is called multiple linear regression Source Code : import pandas as pd import matplotlib .pyplot as plt df=pd.read_esv("'student_scores.csv") dfthead() df.describe() dftisnull().sum() df.info() x=df Hours x.head() y=df Scores y.head() iterations=1000 n=len(df) n n=len(df) 0 m=0 L=0.01 loss = [] for i in range(1000) em *x +e =(1/n)*sum((y_pred-y)**2) dm=(2/n)*sum(x*(y_pred-y)) de=(2/n)*sum((y_pred-y)) c=e-L*de m=m-L*dm loss.append(MSE) print(m,c) ypred = m*x+e ypred plt scatter(x,y,color="blue") plt.plot(x.y_pred) plt.xlabel("Study hours") plt,ylabel("Scores") plt.title("Study hours vs Scores") plt.plot(loss) pit xlabel("iterations") pit ylabel("loss") Output: DATA_HEAD : Hours Scores ee 2 4 54 a7 2s 27 3 85 75 4 35 30 DATA_DESCRIBE : Hours Scores ‘count 25,000000 25,000000 mean 5.012000 51.480000 ‘std 2.625004 25,286887 min 1.100000 17.0000 25% 2.700000 0.000000 60% 4.800000 47.000000 75% 7.400000 75.000000 max 9.200000 95.000000 m&ec: 9.778905988234964 2.4644522714760995 PREDICTED VALUES : GRAPH: Text(9.5, 1.0, "study hours vs scores’) Ries, ae Study hours vs Scores scores sues eea es T2385 67 8 @ arent Result : ‘Thus, the simple linear regression model using gradient descent was implemented. EXPNO: 02 Date: 08-09-2021 Implementation of Simple Linear Regression Model for Predicting the Marks Scored Aim: ‘To implement the simple linear regression model for predicting the marks scored by the student based on study hours Theory: Simple linear regression is a regression model that estimates the relationship between one independent variable and one dependent variable using a straight line. Both variables should be quantitative. Linear regression most often uses mean-square error (MSE) to calculate the error of the model If the goal is to predict the salary of a person, forecasting or error reduction, linear regression can be used to fit a predictive model to an observed dataset of values of the response and explanatory variables. If additional values of the explanatory variables are collected without accompanying the response value, the fitted model can be used to make prediction by response. Source Code: import pandas as pd import matplotlib .pyplot as plt df=pd.read_csv("'student_scores.csv") dfhead() dfisnull().sum() df-duplicated().sum() x=df Hours values x y=df Scores y.head() x=x.reshape(-1,1) x shape from sklearn model_selection import train_test_split X_train, x_test, y_train, y_test = train_test_split(x.y, test_size = 0.2, random_state = 0) x_train.shape x test shape from sklearn.linear_model import LinearRegression Ir= LinearRegression() Infit(x_train, y_train) Inintercept_ print("The value of bo:"Ir.intercept_) Incoef_ print(""The value of b1:"Ir.coef_{0]) y_pred = In predict(x_test) y_pred from sklearn import metrics MSE=metrics.mean_squared_error(y_test,y_pred) print("Mean Squared Error is", MSE) R2=metrics.r2_score(y_test,y_pred) print("R Squared Error is"R2) plt scatter(x_testy_test.c="blue") pit plot(x_test,y_pred) plt.title("Linear Regression = Hours vs Scores") plt.xlabel("Hours") plt,ylabel("Scores") Inpredict({[11])) print("when hours = 9.4, the score is :", y1[0]) Output: HEAD : Hours Scores 25 21 1 51 47 2 3.2 27 3 85 75 4 35 30 bd: The value of bo: 2.018160041434683 bl: The value of b1: 9.910656480642237 Predicted Output: array([16.88414476, 33.73226078, 75.357018 , 26.79480124, 60.49103328]) MSE: Mean Squared Error is 21.5987693072174 R SQUARED ERROR : R Squared Error is @.9454906892105356 GRAPH: Text(2, 0.5, "scores*) Linear Regression - Hours Vs Scores 8 ea 8 PREDICTED VALUE : when hours= 9.4, the score is : 93.19619966334325 Result: ‘Thus, the linear regression model was implemented for predicting the salary using years of experience and the performance of the model was evaluated. EXP.NO :3 Date: 13-09-2021 Implementation of Multivariate Linear Regression Model for Sales Prediction Aim: ‘To Implement the multivariate linear regression model for predicting the sales. Theory: Multivariate Regression is a supervised machine learning algorithm involving multiple data variables for analysis. A Multivariate regression is an extension of multiple regression with one dependent variable and multiple independent variables, Based on the number of independent variables, we try to predict the output Multivariate Regression is a method used to measure the degree at which more than one independent variable (predictors) and more than one dependent variable (responses), are linearly related Source Code: import pandas as pd import matplotlib .pyplot as plt df= pd.read_csv(’Advertising_data.csv’) df head() df.describe() dfisnull.sum() df'shape x=df]["TV", "radio", "newspaper"]] f["sales"] from sklearn model_selection import train_test_split x train, x_test, y_train, y_test = train_test_split(xyy, test_size = 0.2, random_state = 101) from sklearn.linear_model import LinearRegression I= LinearRegression() Lfit(x_train, y_train) 10 y_pred = L.predict(x_test) x_test print("Regressor slope: ", L.coef_{0]) print("Regressor intercept", |.intercept_) y_pred from sklearn import metrics MSE=metrics.mean_squared_error(y_test,y_pred) print("MSE is {}".format(MSE)) r2=metrics.r2_score(y_test,y_pred) print("R squared error is {}".format(r2)) | predict({{150.3,240.5,234.5]]) Output: DATA_HEAD : TV radio newspaper sales 0 2301 378 692 2210 1 445 303 451 1040 2 172 459 693 $30 31515 41.3 585 1850 4 1808 108 584 1290 bo: be is : 299.4893030495332 bi: b1 is : 4.458402011996426 b2: b2 is : 19.649703415540504 b3 is : -0.2781463981925975 Mean Squared Error: 44021.18291449688 R SQUARED ERROR: R Squared Error: 0.8601145185017868 PREDICTED VALUE: when TV=25,radio=35,newspaper=45, the sales is : 1086.1723849746945 Result: Thus, the multivariate linear regression model for predicting the sale was implemented. EXP.NO :4 Date: 27-09-2021 Implementation of Logistic Regression Model to Predict the Placement Status of Student Aim: To implement the logistic regression model to predict the placement status of student. Theory: Logistic Regression is one of the most popular machine learning algorithms, which comes under the supervised learning technique. It is used for categorical dependent variables using a given set of independent variables. Logistic Regression predicts the output of the categorical dependent variable. Therefore the outcome must be a categorical or discrete value Logistic Regression is much similar to linear regression except that how they are used. Linear Regression is used for solving regression problems, whereas logistic regression is used for solving the classification problems. Source Code: import pandas as pd data=pd.read_excel("Placement_Data xlsx") data head() datal=data.copy() datal=data!.drop(("sl_no","salary"],axis=1) data! head() data! isnull().sum() 13 datal.duplicated().sum() from sklearn preprocessing import LabelEncoder le=LabelEncoder() data! [" gender" =le.fit_transform(datal ["gender"}) datal["sse_b"}-le.fit_transform(datal["ssc_b"]) datal["hsc_b" -le.fit_transform(datal ["hse_b"]) data ["hse_s"}-le.fit_transform(datal["hsc_s"]) datal["degree_t"-Ie.fit_transform(datal ["degree_t")) datal ["workex"-le.fit_transform(datal ["workex"]) datal "specialisation" |=le.fit_transform(datal ["specialisation"]) data! ["status"]-le.fit_transform(datal["status"]) datal xedatal iloe[:,-1] x yedatal "status"] from sklearn.model_selection import train_test_split x_train.x_testy_train.y_test=train_test_split(x,y,test_size=0.2,random_state=0) from sklearn.linear_model import LogisticRegression Ir=LogisticRegression(solver="liblinear") Ir-fit(<_train,y_train) y_pred=Ir predict(x_test) y_pred from sklearn.metries import accuracy_score accuracy=accuracy_score(y_testy_pred) accuracy from sklearn.metrics import confusion_matrix confusion=confusion_matrix(y_testyy_pred) confusion from sklearn.metries import classification_report classification_report|=classification_report(y_test.y_pred) classification_reportl Irpredict({{1,80,1,90,1,1,90,1,0,85,1,85]]) Output: DATA_HEAD : sLno gender ssc_p ssc_b hec_p hec_b hec_s dogroo_p dogree.t workox otest_p specialisation mba_p status @ 1M 6700 Obes 0100 Ofwe Canmore 6800 SdBTah —No=S80~=~«MMNR ED Ped. ZTE 42M 7038 Cantal 7833 Owe Scenen—T1AB_—SeBToch Yes «888. AMMF 28. Paced 2IC 23M 6500 coral Geo Cow —Ame_——GAOD Conmsigné No 750 BF S750 Paced 2 34M 00 Cantal 5200 Cortal Scenen 5200 SedBToch «No 680A ssp, NA 4 § M8580 Cental 7860 Canta Conmerce 7830 Conmsilgnt No GEA «AMIRFn 55.80 Paced 428 PREDICTED VALUE: array(l0, DT, 1, Oe Le OW, Le Le Le Le Le Le de Le 0, Or Te Or O Dn DO Ds 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 11) ACCURACY : 0.813953488372093 CONFUSION MATRIX : array({{11, 5], { 3, 24]], dtype=int64) CLASSIFICATION REPORT: precision recall fi-score support \n\n ° 0.79 0.69 0.73 16\n 1 0. 0.83 0.86 21\n\n accuracy 0.81 43\n macro avg Oe 0.79 0.80 AS\nweignted avg ot 0.8 oat a3\n" Result: Thus, the logistic regression model for predicting the placement status of students was implemented 15 EXP.NO :5 Date: 29-09-2021 Implementation of Logistic Regression Usi Aim: To implement the Logistic Regression model using Gradient Descent. Theory: Logistic Regression is a supervised learning classification algorithm used to predict the probability of a target variable. The nature of a target or a dependent variable is dichotomous, which means there would be only two possible classes. Logistic Regression predicts the output of a categorical dependent variable. Therefore, the outcome must be a categorical or discrete value. It can be yes or no, 0 or 1, true or false, etc but instead of giving the exact value as 0 or 1. it gives the probabilistic values, which lie between Oand 1 Logistic Regression is much similar to Linear Regression except that how they are used. Linear regression is used for solving Regression problems, whereas Logistic regression is used for solving classification problems ‘The hypothesis function for Logistic Regression is given by, h(x) = g(0"x) Cost Function: The cost function for logistic regression can be found using Cross-Entropy. It generally calculates the difference between two probability distributions. So, we have, 3(0) == ¥ Cost(he(x®), ¥) Here, the Cost(h(x"”), y) is given by, 16 Coeitats) {somes 1 —log(1 — h8(x)) ify = 0 Ina simplified version, Cost Function for logistic regression can be written as, (0) = = E [y(i) logtha(x®)) + (1-¥®) log (1- ho(x®))] Gradient Descent: ‘The main aim of Gradient Descent is to minimize the cost function. The gradient descent for logistic regression is similar to linear regression, Now, to minimize the cost function, ‘we need to run the gradient descent function on each parameter, Gj: =8j-a 2 10) $0, Repeat{ 6) 8) aE (hx) - yx? Source Code: import numpy as np def hypothesis(X.theta) z=np.dot(theta,X.T) s=1/(1+np.exp(-(z))) return s def cost(X.y, theta) yl=hypothesis(X,theta) c=(I/len(x))*np.sum(y*np log(y1)+(-y)*np.log(1-y1)) return ¢ def gradient_descent(X.y,theta,alpha, iterations): jen(X) -ost(X.y,theta)] for i in range(iterations): h=hypothesis(X,theta) for i in range(len(X,columns)): thetafi]=thetafi]-(alpha/m)*np.sum((h-y)*X. iloc[:.i]) Jappend(cost(X.y,theta)) return J,theta def predict(X.y.theta,alpha, iterations): J,th=gradient_descent(X.y.theta,alpha iterations) h=hypothesis(X,theta) y_pred=[] for i in range(len(h)) if hfi}>=0.5 y_pred.append(1) else y_pred.append(0) return J,y_pred import pandas as pd data=pd.read_excel("Placement_Data xlsx") data head() datal=data.copy() data1=data!.drop(["sl_no","salary"],axis=1) datal head) data! isnull().sum() data! .duplicated().sum() from sklearn. preprocessing import LabelEncoder le=LabelEncoder() data! "gender" |=le.fit_transform(data! ["gender"}) datal["sse_b"}-le.fit_transform(datal["sse_b"]) datal["hse_b"-le.fit_transform(data! ["hse_b"]) datal ["hsc_s"}-le.fit_transform(datal["hsc_s"]) datal["degree_t"-le.fit_transform(datal ["degree_t"]) data! [" workex" ]-le.fit_transform(data |["workex"]) datal "specialisation" |=le.fit_transform(datal ["specialisation"]) data ["status" le fit_transform(datal["status"}) datal X=datal iloe[::-1] X.insert(0,"x0",1) x yedatal "status"] theta=[0.5}*len(X.columns) Jy_pred=predict(X,y,theta,0.0001,25000) accuracy=sum(y_pred==y)/len(y) accuracy import matplotlib.pyplot as plt plt.plot()) plt.xlabel("iterations") pltylabel("cost") Output: DATA_HEAD: no gener s8e.p ste.) MCp MCD INCA Gee p Some! woneK WELD speciaton moxp sutue + 71 W 6700 Ofen 0100 Ohew Gomme GRGDGalilck Ne GEO” Wain S880 Paced Zt 42 Mh 7055 Caml 7853 Chen Scene T748—=SBTch-Yen=«8GS—=— (AF) 628 Pend 20 2M e800 canal e200 Cal e640 Cong! No750. «AME 57.00 Pd 28t 34 e800 cane 5200 Canes Seenco 5200 Sach MoD 3 Mt 46 M8880 coal 7200 Conta! Commace 7810 Commit Ho AF 5550 Paced 22 ACCURACY: 0.8558139534883721 GRAPH: Text (0, 0.5, 'cost') w 8 5 § 4 2 ° sooo 0000 1500020000 5000 iterations Result: ‘Thus, the logistic regression model was implemented using Gradient Descent for predicting the placement status of a student 20 EXP.NO :6 Date: 11-10-2021 Implementation of Decision Tree Classifier Model for Predicting Employee Churn Aim: To implement the Decision Tree classifier model for predicting employee churn. Theory: Decision Tree is a supervised learning technique that can be used for both classification and regression problems, but mostly it is preferred for solving Classification problems. It is a tree-structured classifier, where internal nodes represent the features of a dataset, branches represent the decision rules and each leaf node represents the outcome. Tree models where the target variable can take a discrete set of values are called classification trees. Decision trees where the target variable can take continuous values (typically real numbers) are called regression trees. Classification and regression tree (CART) is the general term for this. ‘There are various algorithms in Machine Learning, so choosing the best algorithm for the given dataset and problem is the main point to remember while creating a machine learning model. There are two main reasons for using the decision tree, they are i, Decision trees usually mimic human thinking ability while making a decision, so it is easy to understand. ii The logic behind the decision trees can be easily understood because it shows a tree-like structure, The advantage of the decision tree is that it is easy to use and understand and can handle both categorical and numerical data The disadvantage of the decision tree is that it is prone to overfitting and requires some kind of measurement as to how well they are doing. eat Source Code: import pandas as pd data=pd.read_esv("Employee.csv") data head() data.info() data isnull().sum() data["left"].value_counts() from sklearn.preprocessing import LabelEncoder le=LabelEncoder() data["salary"|=Ie.fit_transform(data["salary"]) data head() x=data[["satisfaction_level","last_evaluation" ,"number_project","average_montly_hours","time_ spend_comp any","Work_accident","promotion_last_Syears","salary"]] x.head() y=dataf"left"] from sklearn.model_selection import train_test_split X_train,x_test.y_train.y_test=train_test_split(x.y,test_size=0.2,.random_state=100) from sklearn.tree import Decision TreeClassifier dt=DecisionTreeClassifier(criterion="entropy") dtfit(x_train.y_train) y_pred=dt.predict(x_test) from sklearn import metrics accuracy=metrics.accuracy_score(y_testy_pred) accuracy dt predict({[0.5,0.8,9,260,6,0.1,2]}) 22 Output: DATA_HEAD: satisfaction level ast_evluation aber project average nontly ure tine spend_cmpany ork accident eft promotion Jatt sears oepartzents salary ° owe os 1 ono 088 2 on ons 8 on, os ‘4 om on ACCURACY: PREDICTED VALUE : Result: ‘0 2 m a 1 3 6 ‘ 5 3 0. 9833333333333333 array({0], dtype=int64) 0 0 o 0 o 1 1 1 1 1 ° ‘Thus, the Decision Tree classifier mode! was implemented for predicting Employee Churn sale low sale medi sales meu 23 EXP.NO :7 Date: 13-10-2021 Implementation of Decision Tree Regressor Model for Predicting the Salary of the Employee Aim: To implement the Decision Tree Regressor model for predicting the salary of the employee. Theory: A Decision Tree is a flowchart-like structure in which each internal node represents a test on a feature, each leaf node represents a class label and branches represents a class label and branches represent conjunctions of features that lead to those class labels. Decision trees build regression or classification models in the form of a tree structure. It breaks down a dataset into smaller and smaller subsets while at the same time an associated decision tree is incrementally developed ‘The final result is a tree with decision nodes and leaf nodes. A decision node has two or more branches, each representing values for the attribute tested. Leaf node represents a decision on the numerical target. The topmost decision node in a tree which corresponds to the best predictor called root node. Decision trees can handle both categorical and numerical data Decision Tree is one of the most commonly used, practical approaches for supervised learning. It can be used to solve both Regression and Classification tasks with the latter being put more into practical application. It is a tree-structured classifier with three types of nodes. The root node is the initial node which represents the entire sample and may get split into further nodes. The interior nodes represent the features of a dataset and the branches represent the decision rules. Finally, the leaf nodes represent the outcome. This algorithm is very useful for solving decision-related problems 24 Source Code: import pandas as pd data=pd.read_esv("Salary.csv") data head() data.info() data isnull().sum() from sklearn preprocessing import LabelEncoder le=LabelEncoder() data["Position"}-le.fit_transform(data{"Position"]) data head() x=datal["Position","Level"]] y=data["Salary"] from sklearn.model_selection import train_test_split x_train.x_testy_train.y_test=train_test_split(x,y.test_size=0.2,random_state=2) from sklearn.tree import Decision TreeRegressor dt=DecisionTreeRegressor() dt fit(x_train.y_train) y_pred=dt.predict(x_test) from sklearn import metrics mse=metrics.mean_squared_error(y_test,y_pred) mse 12=metrics.r2_score(y_test.y_pred) 2 dt predict({[5.6])) 25 Output: HEAD: Position Level Salary © Business Analyst 45000 1 Junior Consultant Manager 1 2 Senior Consultant 3 60000 4 Country Manager 5 a on 110000 MEAN SQUARED ERROR: 462500000.0 R-SQUARED ERROR: 0.48611111111111116 PREDICTED VALUE FOR THE INPUT -5,6: array([150000.]) Result: ‘Thus, the Decision Tree Regressor model for predicting the salary of the employee was implemented 26 EXP.NO :8 Date: 20-10-2021 Implementation of K Means Clustering for Customer Segmentation Aim: To Implement the K means clustering for customer segmentation Theory: K-means clustering is one of the simplest and popular unsupervised machine learning algorithms. Typically, unsupervised algorithms make inferences from datasets using only input vectors without referring to known, or labelled, outcomes. K-means clustering aims to partition data into k clusters in a way that data points in the same cluster are similar and data points in the different clusters are farther apart. Similarity of two points is determined by the distance between them k-means clustering, Source Code: import pandas as pd import matplotlib.pyplot as plt data=pd.read_esv("Mall_Customers.csv") data head() data.info() data.isnull().sum() from sklearn.cluster import KMeans wes for i in range( 1,11): kmeans=KMeans(n_clusters kmeans.fit(data.iloc[:,3:]) wess.append(kmeans inertia_) pit.plot(range( 1,11),wess) plt.xlabel("no of clusters") pit ylabel("wess") init="k-means++") The similarity measure is at the core of 27 plt.title("Elbow Method") km=KMeans(n_clusters=5) km.fit(data.iloc{:,3:}) y_pred=km_predict(data iloc{:,3:]) data["cluster"|=y_pred df=data[data[" cluster” dfl=data[data[" cluster" |=1] df2=data[data[" cluster" |==2] df3=data{data[" cluster" |=3] df4=data[data[" cluster" |==4] plt.scatter(df0[" Annual Income" },df0[" Score" ],c="red' label="cluster0") plt.scatter(dfl[" Annual Income" }.dfl["Score"],c="black" label="cluster1") plt scatter(df2[" Annual Income" ],df2["Score"},c="blue" Jabel="cluster2") plt scatter(df3[" Annual Income’ ],df3[" Score"},c="green" Jabel="cluster3") plt.scatter(df4{” Annual Income", df4[" Score" ],c="magenta" Jabel="cluster4") plt.legend() plt.title(" Customer Segments") OUTPUT: DATA_HEAD : CustomerID Gender Age Annualincome Score 0 1 Male 19 15 39 1 2 Male 21 15 81 a) 3 Female 20 16 6 3 4 Female 23 16 7 4 5 Female 31 17 40 ELBOW METHOD GRAPH : Text (0.5, 1.0, ‘Elbow Method") Elbow Method 250000 200000 250000 g 300000 0000 2 7 é 3 0 10 of clusters CUSTOMER SEGMENTATION GRAPH : Text (0.5, 1.0, "Customer Segments") Customer Segments 100 . 8 «x 2 . © dustero © © dustert © duster2 ‘6 © dusters © dusters . 2 ~ me . ° oO Result: ‘Thus, the K means clustering for customer segmentation was implemented 29 EXP.NO :9 Date: 27-10-2021 Implementation of Movie Recommender Aim: To implement the movie recommender system Theory: Recommender systems are the systems that are designed to recommend things to the user based on many different factors. These systems predict the most likely product that the users are most likely to purchase and are of interest to. ‘The recommender system deals with a large volume of information present by filtering the most important information based on the data provided by a user and other factors that take care of the user’s preference and interest, It finds out the match between user and item and imputes the similarities between users and items for recommendation Both the users and the services provided have benefited from these kinds of systems. The quality and decision-making process has also improved through these kinds of systems. Source code: import pandas as pd data=pd.read_csv("movies csv") data head() rating=pd.DataFrame(data.groupby("title")["rating” ].count()) movie_df=data pivot_table(index="user_id” ,columns="title" values="rating") movie_df movie=movie_df["Star Wars (1977)"] movie head() similar_movie=movie_df corrwith(movie) similar_movie correlation_df=pd.DataFrame(similar_movie, columns=["Correlation"]) correlation_df correlation_df=correlation_df.dropna() correlation_df correlation_df sort_values("Correlation" ascending correlation_df=correlation_df join(rating) False) head(10) 30 correlation df correlation_df=correlation_dffcorrelation_df["rating"|>100] correlation_df=correlation_difcorrelation_di]"rating"}>100].sort_values(""Correlation’ ascending =False) correlation_df head(6) def similarmovie(movie_df.rating,moviename): movie=movie_df[moviename] similar_movie=movie_df.corrwith(movie) correlation_df=pd.DataFrame(similar_movie, columns=["Correlation"}) correlation_df=correlation_df.dropna() correlation_df sort_values("Correlation" ascending=False).head( 10) cortelation_df=correlation_df join(rating) correlation_df=correlation_difcorrelation_di["rating"}> 100]. sort_values("Correlation" ascending =False) head(6) return correlation_df similarmovie(movie_df.rating,"Return of the Jedi (1983)") Output: DATA_HEAD: user_id item_id rating timestamp title 0 0 50 5 881250949 Star Wars (1977) 1 290 50 5 880473582 Star Wars (1977) 79 50 4 891271545 Star Wars (1977) 2 50 5 888552084 Star Wars (1977) ae) cco: 8 50 5 879362124 Star Wars (1977) 34 Similar Movies : Correlation rating title Return of the Jedi (1983) 1.000000 507 Empire Strikes Back, The (1980) 0.721229 «368 Star Wars (1977) 0.672556 584 Raiders of the Lost Ark (1981) 0.467391 420 Indiana Jones and the Last Crusade (1989) 0.422294 331 Sneakers (1992) 0.412559 150 Result: Thus the movie recommender system was implemented successfully 32 EXP.NO:10 Date: 01-11-2021 Implementation of SVM For Spam Mail Detection Aim: ‘To implement the spam mail detection using support vector machine algorithm. ‘Theory: Support Vector Machine or SVM is one of the most popular Supervised Learning algorithms, which is used for Classification as well as Regression problems. However, primarily, it is used for Classification problems in Machine Learning, ‘The goal of the SVM algorithm is to create the best line or decision boundary that can segregate n-dimensional space into classes so that we can easily put the new data point in the correct category in the future. This best decision boundary is called a hyperplane. ‘Types Of SVM They are two types of SVM. . Linear SVM . Non-linear SVM Linear SVM: Linear SVM is used for linearly separable data, which means if a dataset can be classified into two classes by using a single straight line, then such data is termed as linearly separable data, and classifier is used called as Linear SVM classifier. Non-Linear SVM: If data is linearly arranged, then we can separate it by using a straight line, but for non-linear data, we cannot draw a single straight line. Source Code: import pandas as pd data=pd.read_esv("spam.csv") data head() data.info() data isnull().sum() x=data["EmailText" values 33 y=data[" Label"). values from sklearn.model_selection import train_test_split X_train,x_test,y_train,y_test=train_test_split(x.y,test_size=0.2,random_state=0) from sklearn.feature_extraction.text import CountVectorizer ev=CountVeetorizer() X_train=cv.fit_transform(x_train) xt -v.transform(x_test) from sklearn.svm import SVC sve=SVC() sve.fit(x_train.y_train) y_pred=sve.predict(x_test) Y_pred from sklearn import metrics accuracy=metrics.accuracy_score(y_testy_pred) accuracy Output: Data_Head Emairent ‘Go unit rang part eazy. Avalaie only Okla. Joking wl ok Free ent in 2a wht camp o win FA Cup fina. dun say s0 eat or. Uc aendy then say. Nah dont ink he goes ous, he ves ao. pitadl Predicted values Accuracy 0..9766816143497757 Result: ‘Thus the implementation of Email Spam detector using Support vector machine algorithm was successfully implemented 34

You might also like