Ass 11

‘i222, 905 PM In [ ]: In [ '87_Suryaprakash Dubey ass-13 -Jupyter Notebook 1, What is Linear regression? 1. Linear Regression is Supervised machine Learning Algorithm 2. It will help us to resolve the regression problems. 3. It is a predictive model used for finding linear relationship between dependent and one more independent variables 2, How do you represent a simple linear regression? y=m+e >> Slope >> Intercept c Dependent Variable >> Continuous Data Independent Variable >> Continuous / Discrete 3, What is multiple linear regression? In multiple regression Dependent/Target variable is 1 but Independendent variables are more then one equation of multiple regression is- mxn + ¢ y = mixl + m2x2 + m3x3..... Dependent Variable >> Continuous Data Independent Variable >> Continuous / Discrete 4. What are the assumpt ns made in the Linear regression model? Linearity Independence No Multicolinearity Normality HomoScedasticity 5. What if these assumptions get violated? 1. if linearity is voilated- Tansformation like Log, Square Root, Cube Root is done. 2. if independence or no multicolinearity is voileted that attributes are added/clubed together ot only 1 attribute is kept and other similar attributes are dropped. localhost 8888inotebooks/DesktopNelocty/02_ML/AssignmentAssignment 13/87_Suryaprakash Dubey ass-13.ipynb ano‘i222, 905 PM '87_Suryaprakash Dubey ass-13 -Jupyter Notebook 6. What is the assumption of homoscedasticity Homoscedasticity in which the residual or the errors will be constant rate from the linear line. 7. What is the assumption of normality? Normality >> Normality is the Normal Distribuation of the errors or the data points 1. We always try to keep our data points near to the Mean 2. Or else, We always try to gather the data points in first,second and third Standard Deviation 8. How to prevent heteroscedasticity? - Possible reasons of arising Heteroscedasticity: - Often occurs in those data sets which have a large range between the largest and the smallest observed values i.e. there are outliers. - When model is not correctly specified. - If observations are mixed with different measures of scale. - When incorrect transformation of data is used to perform the regression. Skewness in the distribution of a regressor, and may be some other sources. 1.We can use different specification for the model. 2.Weighted Least Squares method is one of the common statistical method. This is the generalization of ordinary least square and linear regression in which the errors co-variance matrix is allowed to be different from an identity matrix. 3,Use MINQUE: The theory of Minimum Norm Quadratic Unbiased Estimation (MINQUE) involves three stages. First, defining a general class of potential estimators as quadratic functions of the observed data, where the estimators relate to a vector of model parameters. Secondly, specifying certain constraints on the desired properties of the estimators, such as unbiasedness and third, choosing the optimal estimator by minimizing a “norm” which measures the size of the covariance matrix of the estimators. 9. What does multicollinearity mean? Input variables are highly correlated to each other, And if the are kept as it is the increase the dimentionality of the model and make it complex 10. What are feature selection and feature scaling? localhost 8888inotebooks/DesktopNelocty/02_ML/AssignmentAssignment 13/87_Suryaprakash Dubey ass-13.ipynb 210‘i222, 905 PM '87_Suryaprakash Dubey ass-13 -Jupyter Notebook In machine learning and statistics, feature selection, also known as variable selection, attribute selection or variable subset selection, is the process of selecting a subset of relevant features (variables, predictors) for use in model construction - simplification of models to make them easier to interpret by researchers/users. - shorter training times. = to avoid the curse of dimensionality. - improve data's compatibility with a learning model class. = encode inherent symmetries present in the input space. Feature Scaling Algorithms will attributes in a fixed range say [-1, 1] or [0, 1]. And then no feature can dominate others. 11. How to find the best fit line in a linear regression model? 1. The line which passes through the maximum number of data points. It is called as Best Fit Line 2. Line on which we are having lowest mean squared error (MSE) or Sum of Squared error (SSE) 3. Gradient Descent algorithm will help to find the best fit line 4. G.D algorithm finds single line (Best Fit Line) from infinite number of possibilities or regression lines 12. Why do we square the error instead of u: g modulus? 1, The absolute error is often closer to what we want when making predictions from our model. But, if we want to penalize those predictions that are contributing to the maximum value of error. 2. The squared function is differentiable everywhere, while the absolute error is not differentiable at all the points in its domain(its derivative is undefined at @). This makes the squared error more preferable to the techniques of mathematical optimization. To optimize the squared error, we can compute the derivative and set its expression equal to ®, and solve. But to optimize the absolute error, we require more complex techniques having more computations. ues adopted to find the slope and the intercept of the linear regression line which best fits the model? 1. Gradient Descent algorithm will work on the PD. 2. Tt will help to reduce the Cost Function or Loss Function 3. It will help to get the best M and C Value 4. We do follow the baby steps while working on the GD algorithm 5. Baby steps are totally depends on the learning rates that we do keep in the model. 6. Default or Best learning rate value will be 0.001 (L = 0.001) localhost 8888inotebooks/Desktopielocty/02_ML/AssignmentAssignment 13/87_Suryaprakash Dubey ass-13.ipynb ano‘i222, 905 PM 87_Suryaprakash Dubey ass-13 -Jupyter Notebook 7. If we will change the learning rate as L = 1, Then we might overshoot the Global Minima. 14. What is cost Function in Linear Regression? It is the metrics to measure accurecy of the best fit Line. MSE = Mean Squared Error MSE = sum(Ya - Yp)*2/N N = Nunber of Samples MSE = Cost Function = Loss Function 15, briefly explain gradient descent algorithm 1. Gradient Descent algorithm will work on the PD. 2. It will help to reduce the Cost Function or Loss Function 3. It will help to get the best M and C Value 4. We do follow the baby steps while working on the GD algorithn 5. Baby steps are totally depends on the learning rates that we do keep in the model. 6. Default or Best learning rate value will be 8.001 (L = 0.001) 7. If we will change the learning rate as L = 1, Then we might overshoot the Global Minina. 16, How to evaluate regression models? 1.Mean Absolute Error (Scale variant): MAE = Sum(|Ya - Yp|)/N 2.Means Squared Error (Scale variant): MSE = sum(Ya - Yp)*2/N 3.R2Score = Coefficient of Determination(Scale invariant) R2Score = 1 - SSE/SST = (SST - SSE)/SST @ is the worst Score in terms of the Coefficient of Determination 1 is the best Score in terms of the Coefficient of Determination 17. Which evaluation technique should you prefer to use for data having a lot of outliers in it? localhost 8888inotebooks/DesktopNelocty/02_ML/AssignmentAssignment 13/87_Suryaprakash Dubey ass-13.ipynb ano‘i222, 905 PM In]: '87_Suryaprakash Dubey ass-13 -Jupyter Notebook Mean Absolute Error(MAE) is preferable to use for data having too many outliers in it because MAE is robust to outliers whereas MSE and RMSE are very susceptible to outliers and starts penalizing the outliers by squaring the residuals 18. What is re: lual? How is it computed? Residual is equal to the difference between the observed value and the predicted value. For data points above the line, the residual is positive, and for data points below the line, the residual is negative. Error or residual=(Ya-Yp) 19. What are SSE, SSR, and SST? and What is the relationship between them? SSE = sum(Ya - Yp)*2 SSR = sum(Yp - Ym)*2 SST = sum(Va - Ym)*2 SST = SSE + SSR 20, What's the intuition behind R-Squared? We use linear regression to predict y given some value of x. But suppose that we had to predict a y value without a corresponding x value. Without using regression on the x variable, our most reasonable estimate would be to simply predict the average of the y values. However, this line will not fit the data very well. One way to measure the fit of the line is to calculate the sum of the squared residuals - this gives us an overall sense of how much prediction error a given model has. Now, if we predict the same data with regression we will see that the least- squares regression line will seem to fit the data pretty well (as shown in the figure below). We will find that using least-squares regression, the sum of the squared residuals has been considerably reduced. So using least-squares regression eliminated a considerable amount of prediction error. R-squared tells us what percent of the prediction error in the y variable is eliminated when we use least-squares regression on the x variable. localhost 8888inotebooks/DesktopNelocty/02_ML/AssignmentAssignment 13/87_Suryaprakash Dubey ass-13.ipynb 510‘i222, 905 PM '87_Suryaprakash Dubey ass-13 -Jupyter Notebook As a result, R? is also called the coefficient of determination. Many formal definitions say that R? tells us what percent of the variability in the y variable is accounted for by the regression on the x variable. The value of R? varies from @ to 1. 21, What does the coefficient of determination explain? coefficient of determination tells us what percent of the variability in the y variable is accounted for by the regression on the x variable 22. Can R? be negative? Yes, R? can be negative. The formula of R* is given by: R2=1- SSE/SST If the sum of squared error of the mean line(SSE) is greater than sum of squared OF total error(SST), R squared will be negative. 23. What are the flaws in R-squared? There are two major flaws: Problem 1: R? increases with every predictor added to a model. As R? always increases and never decreases, it can appear to be a better fit with the more terms we add to the model. This can be completely misleading. Problem 2: Similarly, if our model has too many terms and too many high-order polynomials we can run into the problem of over-fitting the data. When we over-fit data, a misleadingly high R? value can lead to misleading predictions. 24, What is adjusted R*? Adjusted R-squared is used to determine how reliable the correlation is between the independent variables and the dependent variable. On addition of highly correlated variables the adjusted R-squared will increase and or variables with no correlation with dependent variable the adjusted R- squared will decrease. Adjusted R? will always be less than or equal to R? 25. What is the Coefficient of Correlation: Definition, Formula localhost 8888inotebooks/DesktopNelocty/02_ML/AssignmentAssignment 13/87_Suryaprakash Dubey ass-13.ipynb 60‘i222, 905 PM '87_Suryaprakash Dubey ass-13 -Jupyter Notebook Correlation Coefficient is a statistical concept, which helps in establishing a relation between predicted and actual values obtained in a statistical experiment. The calculated value of the correlation coefficient explains the exactness between the predicted and actual values. R (x,y) = cov (x,y) / STDx.STDy R (x,y) = E{(Xi-Xm) (Yi-Ym) }/{(Z(XA-Xm) **2) (VEn(Vi-Ym)**2) }**0.5 26, What is difference between Correlation and covariance? Covariance is a statistical term that refers to a systematic relationship between two random variables in which a change in the other reflects a change in one variable. The covariance value can range from -« to +#, with a negative value indicating a negative relationship and a positive value indicating a positive relationship. Correlation is limited to values between the range -1 and +1 Change in scale Affects covariance but Does not affect the correlation 27. What is the relationship between R-Squared and Adjusted R- Squared? R2Score = 1 - SSE/SST Adjusted R-squared is used to determine how reliable the correlation is between the independent variables and the dependent variable. On addition of highly correlated variables the adjusted R-squared will increase and or variables with no correlation with dependent variable the adjusted R- squared will decrease. Adjusted R? will always be less than or equal to R? 28. What is the difference between overfitting and underfitting? In overfitting, a statistical model describes random error or noise instead of the underlying relationship. Overfitting occurs when a model is excessively complex, such as having too many parameters relative to the number of observations. A model that has been overfit has poor predictive performance, as it overreacts to minor fluctuations in the training data localhost 8888inotebooks/DesktopNelocty/02_ML/AssignmentAssignment 13/87_Suryaprakash Dubey ass-13.ipynb m0‘i222, 905 PM '87_Suryaprakash Dubey ass-13 -Jupyter Notebook Underfitting occurs when a statistical model or machine learning algorithm cannot capture the underlying trend of the data. Underfitting would occur, for example, when fitting a linear model to non-linear data. Such a model too would have poor predictive performance. 29. How to identify if the model is overfitting or underfitting? overfitting is a concept when the model fits against the training dataset perfectly. While this may sound like a good fit, it is the opposite. In overfitting, the model performs far worse with unseen data. ‘A model can be considered an ‘overfit? when it fits the training dataset perfectly but does poorly with new test datasets. On the other hand, underfitting takes place when a model has been trained for an insufficient period of time to determine meaningful patterns in the training data. 30. How to interpret a Q-Q plot in a Linear regression model? Points on the Normal QQ plot provide an indication of normality of the dataset. If the data is normally distributed, the points will fall on the 45- degree reference line. If the data is not normally distributed, the points will deviate from the reference line. 31, What are the advantages and disadvantages of Linear Regression? Advantages 1. Linear Regression model perform well on linearly seperable data. 2. Linear Regression model is Easy to implement and easy to interpret 3. When LR model will get overfitted then we can reduce the overfitting by using L1 and L2 Regularization Disadvantages We are having assumptions on the Data in Linear Regression. Linearity Independence Linear Regression is sensitive to the Outliers 32. What is the use of regularisation? Explain L1 and L2 regularisations. L1 and L2 regularization are the best ways to manage overfitting and perform feature selection when there is a large set of features. localhost 8888inotebooks/DesktopNelocty/02_ML/AssignmentAssignment 13/87_Suryaprakash Dubey ass-13.ipynb ano‘i222, 905 PM '87_Suryaprakash Dubey ass-13 -Jupyter Notebook La Regularization, also called a lasso regression, adds the “absolute value of magnitude” of the coefficient as a penalty term to the loss function. L2 Regularization, also called a ridge regression, adds the “squared magnitude” of the coefficient as the penalty term to the loss function. when the predictor variables are highly correlated then multicollinearity can become a problem. This can cause the coefficient estimates of the model to be unreliable and have high variance. That is, when the model is applied to a new set of data it hasn’t seen before, it’s likely to perform poorly. One way to get around this issue is to use a method known as lasso regression, which instead seeks to minimize the RSS + AZ|BI] where j ranges from 1 to p and A2@. This second term in the equation is known as a shrinkage penalty. When A= @, this penalty term has no effect and lasso regression produces the same coefficient estimates as least squares. However, as A approaches infinity the shrinkage penalty becomes more influential and the predictor variables that aren’t importable in the model get shrunk towards zero and some even get dropped from the model. The advantage of lasso regression compared to least squares regression lies in the bias-variance tradeoff MSE = Variance + Bias**2 + Irreducible error The basic idea of lasso regression is to introduce a little bias so that the variance can be substantially reduced, which leads to a lower overall MSE. localhost 8888inotebooks/DesktopNelocty/02_ML/AssignmentAssignment 13/87_Suryaprakash Dubey ass-13.ipynb eno‘i222, 905 PM In[ ]: '87_Suryaprakash Dubey ass-13 -Jupyter Notebook Least squares regression coefficient estimates Test MSE Lasso regression coefficient estimates (A=some value > 0) Mean Squared Error Variance A When we use ridge regression, the coefficients of each predictor are shrunken towards zero but none of them can go completely to zero.In cases where only a small number of predictor variables are significant, lasso regression tends to perform better because it’s able to shrink insignificant variables completely to zero and remove them from the model. However, when many predictor variables are significant in the model and their coefficients are roughly equal then ridge regression tends to perform better because it keeps all of the predictors in the model. Whichever model produces the lowest test mean squared error (MSE) is the preferred model to use localhost 8888inotebooks/DesktopNelocty/02_ML/AssignmentAssignment 13/87_Suryaprakash Dubey ass-13.ipynb s010

Ass 11

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ass 11

Uploaded by

Copyright:

Available Formats

You might also like