Professional Documents
Culture Documents
1. Train-Test Split - Train set is used for creating the model, Test set is used for testing
the accuracy of the model. As per standards, for train we use 75% and for test 25%. The
purpose of splitting the data is to have enough data for the model for effective evaluation
of performance and also its easy to determine whether the model’s guesses are correct.
2. Cross Validation - Data is split into k different subsets. Based on k value it divides
the number of samples into bunches and takes 1 bunch for validation and remaining
bunches for training
EXPLORATORY DATA ANALYSIS
• The heatmap abaove tells us the that Volume and Price are 80% correlated to each
other
• The model tested for accuracy can be compared with the correlation for better
evaluation
Model Selection
• The dataset contains two variable which is differentiated as two parts dependent and independent
variable and also both have continuous type of data , Simple Linear Regression is chosen
• Simple Linear Regression acts on the formula y=mx+c where Y is Salary which is dependent X is
Experience which is independent
• In the above linear slope equation, C is the Intercept and M is coefficient of the slope equation
where Intercept is the value of the linear predictor when all covariates are zero and Coefficient is
the indication of direction of the relationship between independent variable and dependent
variable.
• The model is evaluated using K-Fold method.
EVALUATION METRICS AND ITS PLOTS
KFold(9).py
• Intercept is 0.07268747227117345
• Coefficient is 5.00615349
Results
• Mean absolute Error: 2.3107620550991297
• Mean Squared Error: 8.502593164969865
• Root mean square error: 2.9159206376322837
• R2 score: 0.64256142169696
• As per the Dataset given, above graph and results tells us that our model is 64%
accurate, it means 64% of the data points have a close relationship to best fit line
( predicted Y line) and 36% are the outliers.
EVALUATION METRICS AND ITS PLOTS
Volume vs Price for 6 Splits
• Intercept is 0.07268747227117345
•
Results
Coefficient is 5.00615349
Results • Evaluation of results for Train set(6)
• Evaluation of results for Test Set(6) • Mean absolute error: 2.3371761276958445
• Mean absolute error: 2.1780552084385225 • Mean Squared Error: 8.63311338146916
• Mean Squared Error: 7.846847017015575 • Root mean squared error: 1.5287825639036587
• Root mean squared error: 1.4758235695497353 • R2 Score: 0.6331644636128946
• R2 Score: 0.6811464554522144
• Intercept is 0.07268747227117345
Results
• Coefficient is 5.00615349
• Evaluation of results for Train set(7 splits)
Results
• Evaluation of results for Test Set(7 splits)
• Mean absolute error: 2.344536190212473
• Mean absolute error: 2.106690168287516 • Mean Squared Error: 8.682053103393914
• Mean Squared Error: 7.4182507201259655 • Root mean squared error:
• Root mean squared error: 1.4514441664382118 1.5311878363585811
• R2 Score: 0.6990249898793017 • R2 Score: 0.6322338273115873
• By splitting the data into train and test for 7 splits
• By the above results it is observed that our model is 69.99% accurate
EVALUATION METRICS AND ITS PLOTS
Volume vs Price for 8 Splits
RESULTS
• Intercept is 0.07268747227117345
• Evaluation of results for Train set(8 splits)
• Coefficient is 5.00615349
• Mean absolute error: 2.3565584343671806
• RESULTS
• Mean Squared Error: 8.76938690363058
• Mean absolute error: 1.9901874002227724
• Root mean squared error: 1.5351086067009008
• Mean Squared Error: 6.635036994344865
• R2 Score: 0.6274437855372744
• Root mean squared error:
1.410740018650769
• R2 Score: 0.7377094776620245
RESULTS
• Intercept is 0.07268747227117345
• Evaluation of results for Train set(9 splits)
• Coefficient is 5.00615349
• Mean absolute error: 2.3434892642156426
• RESULTS
• Mean Squared Error: 8.70135498942837
• Mean absolute error: 2.048649542445251
• Root mean squared error: 1.5308459309204316
• Mean Squared Error: 6.910707922234623
• R2 Score: 0.6324243477825833
• Root mean squared error:
1.4313104283995317
• R2 Score: 0.7157499601499937