You are on page 1of 14

Team members:

Kranthi Kumar (329)


Narasimharao (204)
Lakshmi Ganesh(990)
Akshay Babu(679)
Problem Statement:
To Build a model which predicts sales of Indic brands based on the money spent on different
platforms for marketing.

Data:
The file contains various information about various platforms (TV, Radio, Newspaper, Sales)
It contains 200 observations of 4 variables

Variables- Characteristics
TV - number
Radio – number
Newspaper – number
Sales - number

Data preparation:
The data is distorted in the provided file all the data was in the same column, so data needed
to be split into different columns and there were no outliers and missing variables.

Data mining solution:


Building models using Simple linear regression:
Model1:
Sales ~ TV
Observations:
p-value: 0.00000000000000022
Multiple R-squared: 0.8122
Adjusted R-squared: 0.8112
Degrees of freedom: 198
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.974821 0.322553 21.62 <0.0000000000000002 ***
TV 0.055465 0.001896 29.26 <0.0000000000000002 ***
SSE = 1043.549
Interpretation:
The p-value is significant as it is below 0.05, Multiple R-squared and Adjusted R-squared are
good as the values are close to 1 and both the coefficients of intercept and the variable Tv are
significant as the value is less than 0.05. SSE is 1043.549.

Model 2
Sales ~ Radio
Observations:
p-value: 0.0000003883
Degrees of freedom: 198
Multiple R-squared: 0.1222,
Adjusted R-squared: 0.1178
Coefficients:
Estimate Std. Error t value
(Intercept) 12.2357 0.6535 18.724
Radio 0.1244 0.0237 5.251
Pr(>|t|)
(Intercept) < 0.0000000000000002 ***
Radio 0.000000388 ***
SSE = 4876.81
Interpretation:
The p-value is significant as it is below 0.05, Multiple and Adjusted R-squared are weak so
the data may not perfectly fit the model. Both coefficients of Intercept and variable Radio are
significant as the value is less than 0.05. SSE is 4876.81

Model 3
Sales ~ Newspaper
Observations:
p-value: 0.02549
Degrees of freedom: 198
Multiple R-squared: 0.02495
Adjusted R-squared: 0.02003
Coefficients:
Estimate Std. Error t value
(Intercept) 13.95955 0.63829 21.870
Newspaper 0.03832 0.01703 2.251
Pr(>|t|)
(Intercept) <0.0000000000000002 ***
Newspaper 0.0255 *
SSE = 5417.355
Interpretation:
The p-value is significant as it is less than 0.05, Multiple and adjusted R-squared are very low
so it means the data may not fit the model. Both the coefficient of Intercept and variable
Newspaper are significant. SSE is 5417.355 which is high compared to other models.

Model 4
Sales ~Tv & Radio
Observations:
p-value: 0.00000000000000022
Degrees of freedom: 197
Multiple R-squared: 0.9026,
Adjusted R-squared: 0.9016
Coefficients:
Estimate Std. Error t value
(Intercept) 4.630879 0.290308 15.95
TV 0.054449 0.001371 39.73
Radio 0.107175 0.007926 13.52
Pr(>|t|)
(Intercept) <0.0000000000000002 ***
TV <0.0000000000000002 ***
Radio <0.0000000000000002 ***
SSE = 541.2105
Interpretation:
The p-value is significant as it is less than 0.05, Multiple and adjusted R-squared are perfect
as they are close to 1, so they perfectly fit the model. The coefficients of the Intercept and
two variables Tv and Radio are significant. SSE is 541.210 lower compared to other models.
Model 5
Sales ~ Tv & Newspaper
Observations
p-value: 0.00000000000000022
Degrees of freedom: 197
Multiple R-squared: 0.8236,
Adjusted R-squared: 0.8219
Coefficients:
Estimate Std. Error t value
(Intercept) 6.234744 0.375430 16.607
TV 0.055091 0.001844 29.869
Newspaper 0.026021 0.007271 3.579
Pr(>|t|)
(Intercept) < 0.0000000000000002 ***
TV < 0.0000000000000002 ***
Newspaper 0.000434 ***
SSE = 979.8426
Interpretation:
The p-value is significant as it is less than 0.05, Multiple and Adjusted R-squared are
comparatively high but less than the previous model, so model 4 is more perfect than this
model. The coefficients of the Intercept and both the variables TV and Newspaper are
significant. SSE is 979 which is less but high compared to model 4.

Model6
Sales ~ Radio & Newspaper
Observations:
p-value: 0.000002277
Degrees of freedom: 197
Multiple R-squared: 0.1236
Adjusted R-squared: 0.1147
Coefficients:
Estimate Std. Error t value
(Intercept) 12.060729 0.728494 16.556
Radio 0.119510 0.025383 4.708
Newspaper 0.009474 0.017304 0.548
Pr(>|t|)
(Intercept) < 0.0000000000000002 ***
Radio 0.0000047 ***
Newspaper 0.585
SSE = 4869.4
Interpretation:
The p-value is significant as it is less than 0.05, Coefficients of Intercept and variable Radio
are significant but that of the variable newspaper is not significant as it is more than 0.05.
Both Multiple and Adjusted R-squared are weak so data may not perfectly fit the model.

Model 7
Sales ~ Radio+ Newspaper +TV
Observations:
p-value: 0.00000000000000022
Degrees of freedom: 197
Multiple R-squared: 0.9026
Adjusted R-squared: 0.9011
Coefficients:
Estimate Std. Error t value
(Intercept) 4.6251241 0.3075012 15.041
TV 0.0544458 0.0013752 39.592
Radio 0.1070012 0.0084896 12.604
Newspaper 0.0003357 0.0057881 0.058
Pr(>|t|)
(Intercept) <0.0000000000000002 ***
TV <0.0000000000000002 ***
Radio <0.0000000000000002 ***
Newspaper 0.954
SSE = 541.2012
Interpretations:
The p-value is significant as it is less than 0.05, Multiple and adjusted R-squared are good as
they are close to 1. Coefficients of Intercept, TV, and Radio are significant but that of
Newspaper is not significant as it is more than 0.05. SSE is 541 which is very low compared
to other models.
From the above models, model 4 is the best as all the coefficients are significant and the
SSE is very low. So, we will use this model to predict sales.
Testing data set:
TV Radio Newspaper Projected
sales
276.12 47.25 74.736 24.729325
53.4 49.125 48.708 12.803405
20.64 57.375 74.844 11.903847
181.8 51.625 63.18 20.062588
216.96 13.5 63.072 17.890983
10.44 61.125 81 11.750372
69 41 25.38 12.782015
144.24 24.5 12.528 15.110375
10.32 2.625 1.08 5.474126
239.76 3.25 22.896 18.033880
79.32 7.25 26.136 9.726787
257.64 30 4.32 21.874347
28.56 43.875 71.172 10.888226
117 9.5 7.776 12.019566

Predicted sales using simple linear regression:


1 2 3 4 5 6 7 8 9
24.729325 12.803405 11.903847 20.062588 17.890983 11.750372 12.782015 12.782015
15.110375 5.474126
10 11 12 13 14
18.033880 9.726787 21.874347 10.888226 12.019566

Exploratory Analysis
Conclusion: From the above analysis TV and radio advertising impact the sales
more than News paper advertising. TV in particular is more effective as it is
directly correlated with sales, the more you spend on TV advertising the more
are the sales. So to predict the sales we must consider both amount spent on TV
advertising and Radio. Here Newspaper has less impact as there is no clear
relationship between Newspaper advertising expenditure and sales. So it is not
included in the variables that are being used in predicting the sales.

You might also like