You are on page 1of 5
217123, 9:14 0M E506 _Linear_Regression_MoreExamples_2.ipynb ~Colaboraiory Linear Regression More examples Inthis session, we shall lear about somelimiations of ner regression. Let us rat considera sample data, whlch willbe useful for our ty. AFinst, we tnpore the required packages fngort pandas a5 pd #the pancas Tibrary is useful for data processing 1 te foltatag python directive nelps Uo plot the graph In the notebook directly We il consider data generated by y= x where x denotes certain points in interval [0,5] x= npsarange(e5,0.8) prints") print¢"y:"ay) sample date = [(69) for (ey) Sn alee] print sanple data:",sanple_ data} colunn_panes = (,°¥1] ample a = 94.080 rane(data = eaqple eats, Colvane = coluennanes) Bidereret) shiple data: ((0.8, 0.2), (8.1, @.6xe99ee99e60000002), (2.2, e.esopezeeeeeeeG0e1), (2. 2e888eGG0009R—RRE, .eFeGe0—R=z=N0IEe2), (0.4, 2.1 ‘The dataset contains a predictor and response variable Sanple_abend() © 00 000 + 01 001 303 099 404 016 print(sanple¢F) 18 npe:iolab research google. convdrive/JeoWz2Ua\KS8SyecailkNO4nE72qXUpU)?authuser="HecrllTo=g6PRIAMDMEgG&orntMod 217123, 9:14 0M E506 _Linear_Regression_MoreExamples_2.ipynb -Colaboraiory Seeing the data as mere numbers might not be interesting. So let us use some graphical ways to visualize the data, ‘the WIN] plot 2 scatter plot of the dts plt-seatter (sample dF(°X"], sampte dFl°Y"}) ple tithe “Sanpte Gata |X vs ¥") putsadavel(%) ple-ylavel("Y") ple-show0) 7 Sample bata xvsv Question: ts the trend linear (0?) at east linea looking? npe:iolab research google. convdrive/JeoWz2Ua\KS8SyecailkNO4nE72qXUpU)?authuser="HecrllTo=g6PRIAMDMEgG&orntMod 216 217123, 9:14 0M E506 _Linear_Regression_MoreExamples_2.ipynb -Colaboraiory Computing fy, J; for the Anscombe data set 2 : Let us now compute beta, and beta.a fron the anscanbe data Set 2 f= len(sanple-af-index) thuwer of data points in the data set Drint(*nunber of data points im the asta set: "sn} sist Tet us conpute x por and yar for {An angen): daccess each row for the data set ‘a.bar se float(sanpie_df-Sat{1,0]) faccess elenent at X coluan Yyibar Se #loat(sanpie_of-tat{1,2)) faccess element at Y column Prine("x bar:"sxLbary "y bar:"s¥_ bar) for {in range(n): taccess each row fron the date set Ssigna_sx t= (Float(sanole_¢.43t(1,0]) ~ %bar)**2 teowuting (x"L - xbar)2 Signa_ay + (Fleat(Sanole_ ef 29t(1,0]) ~ bar) * (Float(sanple-et.tat(iy1})-y_bar) #eorputing (wel print" signees" shana, "stgnm 99" sien 39) how we can compute beta, and betas beta 2 = slips ny/signe_ x betale = y.bar'~ beta i x bar print¢"osta_e:", bete_e, “er: 1, beta) Plotting the regression line Having computed and, we willow plot the line y = f+ By alg with the points inthe dataset. for 4 An ranger): daccess cath row fron the data set Xi = Aoat(sanple df. iat[1,e]) faccess elenent at X column ranean x ainp.Linspace(x nin, a, 108) aereates 2 series of points 49 x axis y= bec 1ete 8 ple-plotix, y, "=F", labele'regression Line") ple-seatter(saeple dX], semple-d#1°Y"]) npe:iolab research google. convdrive/JeoW2Ua\KS8SyecailkNO4nE72qXUpU/?authuser="scrllTo=g6PRIAMDMEgGBorntMod xpanyty.ar) 36 217123, 9:14 0M E506 _Linear_Regression_MoreExamples_2.ipynb -Colaboraiory ple. titte“Sampte data X vs ¥ Regression") putsadaped(%") ple-ylavel("Y") lt legend(Loce"upper Letty ‘olt-erta() ple. stow) Sample data x ve ¥ Regression Residual Plot Sometimes it would be usefl to pot the err (1) residual e versus the fited values J = fa + Bp. viduals ist = 1) Yar © in manga(n): aaecess each row for the dats aot ‘Lt + float (sample ef.at{i,e]) daccess element at X column yi = Alout(sanple_ef.at{1,4]) daccess element at ¥ colum Yyopred_t = neta. * x1 + beta e teonpute the prediction obtained using the regression coefficients el = yA = y.pred.s Beonpute the difference between the actual observation y-4 and prediction y_pred_4 ‘plot the resicuals « plt.seatter(sarple_a*["V"], «1 residuels_ 1st color ple titte("Ressaual plot”) plt-adavel("Y responses") plt-ylavel(*Restauals") piteris) ple-stow() ry Residual plot Note thatthe residual vs response plot shows a significant trend of residual behavior as y varies. Residue vs Predictor Plot Let us pot the residue ve predictor. plot the residuals e4 against the predictors 1 ple.seatter(sarple 4¢{'X'], ¢_i_resicuals_list,color="g') ple-title("Resieual vs, preeictar plot") plt-xlavel ("erediccor x") pit ylavet("aesteuats") npe:iolab research google. convdrive/JeoW2Ua\KS8SyecailkNO4NE72qXUpU/?authuser="HecrllTo=g6PRIAMDMEgG&orntMod resicuals_list-append(e_1) fappend the value of ei to the list 46 217123, 9:14 0M E506 _Linear_Regression_MoreExamples_2.ipynb -Colaboraiory pitertaey pltshont) esidal vs. reictr pot 2, "Note thatthe residual vs predictor plot also shows a sigaificant rend of residual behavior as 2 varies, Ths indicates that the inear model assumption used to fit the data might nat be @ goad assumption, Letus now compute the sample correlation. rote that stgna AY apd signa OC nave already been computed ance we will now conpute signa ¥¥ tniso note that y_bar As computed before stay = 0 for i in range(n): #access eath row fron the dats set yA = float (sample ef-at[t,1]) aaccess elenent at ¥ colum Sigmagy o= Ocisysbar)"2 rint("signe_yy:", signe yy) sTyen we WLLL conpute the sanple correlation sample_conrelation = signa_xy/(np.sqrt(signa_ee * signe_yy)) print( "sample correlation" ysample_correlatson) signa: 2673. 30525000002 Sanpla Correlation: @,9870508513356978 Letus compute R?, ote that sun of squared residuals needs to be computes sum_se_restvats = @ for i in ranger): #access each row frov the date set ALi = Alost(sanple of. iat{i,e]) #access elenent st X colum 91k = Aloat(sanple_ef-iatTi,a]) aaccess elenent at ¥ colum Yipred t's beta Ieiet + bets 8 Son_sguresiduals 40 (-S-y.pned 1)*"2 Print( "sun of Squared resiguats:*, sum sq_restdusts) sTyen we WLLL conpute the R12 quantity sq’ L-sun sq_resicvals/signs_vy prine°2:"s8_s8) + Note: ‘The R? valu is quite high which might seem to indicate thatthe fis good enough. However taken together wit residual pots, we see that espite having good R? value, the inear model assumptions questionable. Exercises: npe:iolab research google. convrive/JeoWzUa\KS8SyecailkNO4nE72qXUpU/?authuser="HecrllTo=g6PRIAMDMEgGBorntMod 56

You might also like