You are on page 1of 16

Page |1

Summer Semester, 2020


Term Paper
ECO204, Sec: 2
Course Instructor: Biplab Kumar Nandi
Date of Submission: 1 October, 2020.

Term paper on the analysis of Car average mileage

Group name: BASIC

NAME ID

Farhana Hoque Mahee 2019-1-10-014

Anika Islam Tithi 2019-1-10-279

Md. Rafiu Haq 2019-1-10-247

Tajria Noor Usha 2019-1-30-020


Page |2

Introduction
Model Building:
This term paper is an empirical analysis of Car Average Mileage. In this term paper we want to
show the results of the analysis of the factors that affect Average miles per gallon. Model is built
based on research hypothesis and research hypothesis comes from data. Our data set is provided
to us by our course instructor.
We have a data set of car average mileage.
Table: 01
Data Set: Car Mileage
Observation Average miles Cubic feet Top speed, Vehicle weight,
per gallon (Y) of cab miles per hundreds of
space (x1) hour (x2) pounds (x3)
1 65.4 89 96 17.5
2 56 92 97 20
3 55.9 92 97 20
4 49 92 105 20
5 46.5 92 96 20
6 46.2 89 105 20
7 45.4 92 97 20
8 59.2 50 98 22.5
9 53.3 50 98 22.5
10 43.4 94 107 22.5
11 41.1 89 103 22.5
12 40.9 50 113 22.5
13 40.9 99 113 22.5
14 40.4 89 103 22.5
15 39.6 89 100 22.5
16 39.3 89 103 22.5
17 38.9 91 106 22.5
18 38.8 50 113 22.5
19 38.2 91 106 22.5
20 42.2 103 109 25
21 40.9 99 110 25
22 40.7 107 101 25
23 40 101 111 25
24 39.3 96 105 25
Page |3

25 38.8 89 111 25
26 38.4 50 110 25
27 38.4 117 110 25
28 38.4 99 110 25
29 46.9 104 90 27.5
30 36.3 107 112 27.5
31 36.1 114 103 27.5
32 36.1 101 103 27.5
33 35.4 97 111 27.5
34 35.3 113 111 27.5
35 35.1 101 102 27.5
36 35.1 98 106 27.5
37 35 88 106 27.5
38 33.2 86 109 30
39 32.9 86 109 30
40 32.3 92 120 30
41 32.2 113 106 30

Multiple regression model:


Y= Bₒ+ B₁X₁ + B₂X₂ + B₃X₃ + e
Here, Y is the average miles per gallon, X₁= cubic feet of cab space, X₂= top speed, miles per hour,
X₃= vehicle weight, hundreds of pounds.
Research Question: Does the cubic feet of cab space, top speed of the vehicle and vehicle weight
affect average mile per gallon of the vehicle?
Research Objective:
1. To estimate if the independent variables affect the average mile per gallon of a vehicle.
2. To identify if there is any relationship between the independent factors and the dependent
variable of the model.

Population regression model


Y= Bₒ+ B₁X₁ + B₂X₂ + B₃X₃ + e
Here, the dependent variable (Y) is average miles per gallon.
Page |4

The independent variables are cubic feet of cab space (X₁), top speed (X₂) and vehicle weight
(X₃).

Assumptions about expected sign of independent variables


The sign of X₁ should be negative, because the more the cubic feet of the cab space, the more oil
consumption will occur, which will decrease the average miles per gallon travelled by the car.
The sign of X₂ should be negative, because increasing top speed will raise fuel consumption, as a
result average mile per gallon will decrease.
The sign of X₃ should be negative, because the more the vehicle weight, the more fuel will be
consumed, which will decrease the average mile per gallon.
However, decreasing the cubic feet of the cab space, top speed and vehicle weight can increase the
average mile per gallon travelled by car.

Assumption of the error term


We have used error term in the model because we are not sure that only cubic feet of cab space,
top speed and vehicle weight are the key variables. This is not guaranteed that only these three
independent variables explain the 100% variation of average miles per gallon (dependent variable).
This is why by considering that we have not included the other factors that are also related to
average miles per gallon, we have included the error term.
The assumption about the error term (e) in the multiple regression model are:
1) The error term (e) is a random variable with mean or expected value of zero; that is E (e)=0.
For given values of X₁, X₂, X₃….Xp the expected or average value of Y is given by:
E(Y)= Bₒ+ B₁X₁ + B₂X₂ + B₃X₃ +…….+BpXp
E(Y) represents the average of all possible values of y that might occur for the given values
of X₁, X₂, X₃….Xp.
2) The variance of e is denoted by σ² and is the same for all values of the independent
variables.
3) The values of e are independent. The values of e for a set of particular values for the
independent variables is not related to the value of e for any other set of values.
4) The error term (e) is normally distributed random variable reflecting the deviation between
the y value and the expected value of y given by Bₒ+ B₁X₁ + B₂X₂ + B₃X₃ +…….+BpXp.
Because Bₒ, B₁,…….,Bp are constants for the given values of X₁,X₂, +…….,Xp, the
dependent variable (y) is also a normally distributed random variable.
Page |5

Descriptive statistics
Table: 02
Average Value Cubic feet Value Top speed, Value Vehicle Value
miles per of cab miles per weight,
gallon space hour hundreds of
pounds
Mean 41.40 Mean 90.98 Mean 105.39 Mean 24.39
Standard 1.18 Standard 2.72 Standard 0.96 Standard 0.51
Error Error Error Error
Median 39.30 Median 92.00 Median 106.00 Median 25.00
Mode 40.90 Mode 89.00 Mode 103.00 Mode 22.50
Standard 7.54 Standard 17.40 Standard 6.14 Standard 3.25
Deviation Deviation Deviation Deviation
Sample 56.79 Sample 302.67 Sample 37.69 Sample 10.56
Variance Variance Variance Variance
Kurtosis 2.00 Kurtosis 1.69 Kurtosis -0.01 Kurtosis -0.78
Skewness 1.46 Skewness -1.38 Skewness -0.26 Skewness 0.05
Range 33.20 Range 67.00 Range 30.00 Range 12.50
Minimum 32.20 Minimum 50.00 Minimum 90.00 Minimum 17.50
Maximum 65.40 Maximum 117.00 Maximum 120.00 Maximum 30.00
Sum 1697.4 Sum 3730.0 Sum 4321.0 Sum 1000.00
0 0 0
Sample 41.00 Sample 41.00 Sample 41.00 Sample size 41.00
size size size

Here the sample size is 41, that means 41 samples were measured for doing this survey. This data
set is based on these 41 samples.

The mean provides the measure of central location for the data. Here the mean is 41.40 for Average
miles per gallon (y) and 90.98 for cubic feet of cab space (x₁),105.39 for top speed (x2) and 24.39
for vehicle weight (x3).

The standard error shows the average distance that the observed values fall from the regression
line. Here, the standard error average mile per gallon (y) is 1.18, which means that the distance of
the observed values from the regression line is 1.18. Moreover, the standard error of cubic feet of
Page |6

cab space (x₁) is 2.72, the standard error of top speed (x2) is 0.96, and the standard error of vehicle
weight (x3) is 0.51. All of the standard error values represents the distance between the observed
values and the regression line.

The standard deviation measures the average distance between a single observation and the
mean. Here, for average mile per gallon (y), the distance between the observation and the mean is
7.54. As this standard deviation is relatively high, the variable fluctuates, this will hamper to satisfy
the properties of homoscedasticity.
For cubic feet of cab space (x1), the standard deviation is 17.40, which is comparatively higher
than other variables. As the standard deviation is very high, the variable is more fluctuated and it
will hamper to satisfy the properties of homoscedasticity.
Here, the standard deviation of top speed (x2) is 6.14 which is relatively higher, means the variable
fluctuates, this will hamper to satisfy the properties of homoscedasticity.
For vehicle weight (x3) the standard deviation is 3.25, which is relatively lower than the other
variables, means the variable is not fluctuating, this will not hamper to satisfy the properties of
homoscedasticity.

Skewness measures the direction and degree of asymmetry in the data distribution. Here, the
skewness for average miles per gallon (y) is 1.46, which is positive. Here, the mean is greater
than median by small amount which shows that the data are moderately skewed to the right.
Here, the skewness for cubic feet of cab space (x1) is -1.38, which means the data are negatively
skewed. Here, the mean is less than the median by small amount which shows that the data are
moderately skewed to the left.
Here, the skewness for top speed (x2) is -0.26, which means the data are negatively skewed.
Here, the mean is less than the median by small amount which shows that the data are
moderately skewed to the left.
Here, the skewness for vehicle weight is 0.05, which is positive. Here, the mean is greater than
median by small amount which shows that the data are moderately skewed to the right.

Analysis of regression estimation result

Multiple linear regression equation


Page |7

ŷ = bₒ+ b₁x₁ + b₂x₂ + b₃x₃


= 129.72 -0.06 x₁ -0.49 x₂ -1.27 x₃

Interpretation of coefficients
Table: 03
Coefficients
Intercept (bₒ) 129.7168027
Cubic feet of cab space -0.060212413
Top speed, miles per hour -0.491517252
Vehicle weight, hundreds of pounds -1.272550564

Here, the dependent variable is average mile per gallon (y). The independent variables are: Cubic
feet of cab space (x₁), top speed (x₂) and vehicle weight (x₃). The values of bₒ, b₁, b₂ and b₃ has
been found by using OLS. Ordinary least square (OLS) is a type of linear least square method, that
is used to estimate the unknown parameters in a linear regression model, under the assumption that
the errors are normally distributed.
bₒ= 129.72, it is the y-intercept. If, there is no relationship between average miles per gallon (y)
and cubic feet of cab space (x₁), top speed (x₂) and vehicle weight (x₃), then the value of y is
129.72. In other words, if there is no statistically significant relationship between dependent
variable and the independent variables, then ŷ=129.72.
b₁= -0.06, it indicates that if cubic feet of cab space increases by 1 unit, the average mile per gallon
will decrease by 0.06 units, on an average, holding all other independent variables (x2 and x3)
constant. There is a negative relationship between average miles per gallon and cubic feet of cab
space (x1).
b₂= -0.49, it indicates that if top speed, miles per hour, increases by 1 unit, then average miles per
gallon will decrease by 0.49 units, on an average, holding all other independent variables (x1 and
x3) constant. Average miles per gallon and top speed are inversely related.
b₃= -1.27, it means that if vehicle weight, hundreds of pounds, increases by 1 unit, the average
miles per gallon will decrease by 1.27 units, on an average, holding all other independent variables
(x1 and x2) constant. Average miles per hour is negatively related to vehicle weight.
Page |8

Interpretation of goodness of fit

Cubic feet of cab space Top speed, miles per


Line Fit Plot hour Line Fit Plot
80 80
Average miles per gallon

Average miles per gallon


60 Average miles 60 Average miles
40 per gallon 40 per gallon

20 20
Predicted Predicted
0 0
Average miles Average miles
0 50 100 150 per gallon 0 50 100 150 per gallon
Cubic feet of cab space Top speed, miles per hour

Vehicle weight, hundreds of


pounds Line Fit Plot
80
Average miles per gallon

60
40 Average miles per
gallon
20
Predicted Average
0 miles per gallon
0 10 20 30 40
Vehicle weight, hundreds of pounds

Table: 04
Regression Statistics
Multiple R 0.85
R Square 0.72
Adjusted R Square 0.70
Standard Error 4.11
Observations 41.00

R²= 0.72= 72%


As, R²= 72%, it means that 72% of the variation of average miles per gallon (y) is explained by
cubic feet of cab space (x₁), top speed (x₂) and vehicle weight (x₃). This model is a good fit because
R² is closer to 1.
Page |9

All the graphs of line fit plots show that the actual values of y and the predicted values of y are
very close, which also indicates that this model is a good fit.

Hypothesis test
T test:
Table: 05
Coefficients Standard t Stat P- Lower Upper 95%
Error value 95%
Intercept 129.72 11.75 11.04 0.00 105.91 153.52
Cubic feet of -0.06 0.04 -1.51 0.14 -0.14 0.02
cab space
Top speed, -0.49 0.12 -4.11 0.00 -0.73 -0.25
miles per hour
Vehicle weight, -1.27 0.24 -5.33 0.00 -1.76 -0.79
hundreds of
pounds

1. We want to test the relationship between average miles per gallon (y) and cubic feet of
cab space (x1), whether they are statistically significant or not.
Hypothesis:
Hₒ : B₁=0 [ There is no significant relationship between average miles per gallon (y)
and cubic feet of cab space (x1) ]
Ha : B₁≠0
t₁= b₁/ Sb₁
= -0.06/0.04
= -1.5
Critical value approach:
df= n – p – 1 = 41 - 3 - 1 = 37 α=0.01,
tα/₂= -2.715
Therefore, t≥ -tα/₂
As, t statistic value is greater than negative t critical value null hypothesis cannot be
rejected, at 1 percent level of significance.
P a g e | 10

p- value approach
p-value = 0.14, α=0.01
As, p-value is greater than α, null hypothesis cannot be rejected, at 1 percent level of
significance.
As a result, we can conclude that b₁=0, so there is no significant relationship between
average mile per gallon and cubic feet of cab space.

2. We want to test the relationship between average miles per gallon (y) and top speed, miles
per hour (x2), whether they are statistically significant or not.
Hypothesis:
Hₒ : B₂=0 [ There is no significant relationship between average miles per gallon (y) and
top speed (x2) ]
Ha : B₂≠0
t₂= b₂/ Sb₂
= -0.49/0.12
= -4.08
Critical value approach:
df= 37 α=0.01, tα/₂= -2.715
Therefore, t≤ -tα/₂
As, t statistic value is smaller than negative t critical value null hypothesis is rejected, at 1
percent level of significance.
p- value approach
p-value = 0.00, α=0.01
As, p-value is less than α, null hypothesis is rejected, at 1 percent level of significance.
As a result, we can conclude that b₂≠0, so average mile per gallon (y) and top speed (x2)
are statistically significant.

3. We want to test the relationship between average miles per gallon (y) and vehicle weight,
hundreds of pounds (x3), whether they are statistically significant or not.
P a g e | 11

Hypothesis:
Hₒ : B₃=0 [ There is no significant relationship between average miles per gallon
vehicle weight (x3) ]
Ha : B₃≠0
t₃= b₃/ Sb₃
= -1.27/0.24
= -5.29
Critical value approach:
df= 37 α=0.01, tα/₂= -2.715
Therefore, t≤ -tα/₂
As, t statistic value is smaller than negative t critical value null hypothesis can be
rejected, at 1 percent level of significance.
p- value approach
p-value = 0.00, α=0.01
As, p-value is less than α, null hypothesis can be rejected, at 1 percent level of
significance.
As a result, we can conclude that b₃≠0, so there is a statistically significant relationship
between average mile per gallon (y) and vehicle weight (x3).

F test
Table: 06
ANOVA
df SS MS F Significance
F
Regression 3 1645.324523 548.4415077 32.40161324 0.00
Residual 37 626.275477 16.92636424
Total 40 2271.6

We wanted to test whether a significant relationship exists between average miles per gallon (y)
and cubic feet of cab space (x₁), top speed, miles per hour (x₂) and vehicle weight, hundreds pf
pounds (x₃).
P a g e | 12

Hypothesis:
Ho: B₁ = B₂ = B₃ = 0 [ This means, average miles per gallon is not explained by all the
independent variables (x1, x2 and x3).
Ha: B₁ ≠ B₂ ≠ B₃ ≠ 0
F= MSR/MSE
= 548.44/16.93
= 32.4

Critical value approach:


df = 3, 37 α=0.01
Fα = 4.51 [ As there is no value of 37 in denominator of F distribution table, so we used
30 as denominator]

Therefore, F≥ Fα
As, F statistic value is greater than F critical value, we can reject null hypothesis.
P value approach:
P- value = 0.00, α= 0.01
p- value ≤ α
As, p- value is smaller than α, we can reject null hypothesis.
As a result, we can conclude that a significant relationship is present between average miles per
gallon (y) and cubic feet of cab space (x1), top speed, miles per hour (x2) and vehicle weight,
hundreds pf pounds (x3). This indicates that this model is overall significant.

Multicollinearity
Table: 07
Pair wise correlation among
variables
Average Cubic feet of Top speed, Vehicle weight,
miles per cab space miles per hour hundreds of pounds
gallon
Average miles per 1
gallon
Cubic feet of cab -0.32 1
space
Top speed, miles -0.64 0.01 1
per hour
P a g e | 13

Vehicle weight, -0.77 0.32 0.44 1


hundreds of pounds

Here, we will check the pair-wise correlation coefficient among independent variables, to see if
they have multicollinearity problem or not.
According to the rule of thumb, a sample correlation coefficient greater than +0.7 or less than -0.7
for two independent variables is a warning of potential problems with multicollinearity. Because,
if two independent variables are highly correlated, it is not possible to determine the separate effect
of any particular independent variable on the dependent variable.
The correlation coefficient of cubic feet of cab space (x₁), top speed (x₂), Rx₁x₂ is 0.01, which is
less than 0.7. They have correlation but not excessive correlation. As a result, there is no
multicollinearity problem between cubic feet of cab space and top speed.
The correlation coefficient of top speed (x₂) and vehicle weight (x₃), Rx₂x₃ is 0.44, which is less
than 0.7. They have correlation but not excessive correlation. As a result, there is no
multicollinearity problem between top speed and vehicle weight.
The correlation coefficient of cubic feet of cab space (x₁) and vehicle weight (x₃), Rx₁x₃ is 0.32,
which is less than 0.7. They have correlation but not excessive correlation. As a result, there is no
multicollinearity problem between cubic feet of cab space and vehicle weight.
As a result, there is no multicollinearity problem in this model.

Heteroscedasticity test

Residual plots against average miles per gallon


12
10
8
6
4
Residual

2
0
0 10 20 30 40 50 60 70
-2
-4
-6
-8
Average miles per gallon

Heteroscedasticity refers to data with unequal variability where the variance is not constant.
P a g e | 14

In our model we will test heteroscedasticity problem of the dependent variable.


The residual plots against average miles per gallon (y) does not show a horizontal band, which
indicates that the variance of e is not constant, as a result it has Heteroscedasticity problem.

Normality assumption

Normal Probability Plot


Average miles per gallon

80
60
40
20
0
0 20 40 60 80 100 120
Sample Percentile

Standardized residual plots against


independet variable
3
Standard Residual

0
0 20 40 60 80
-1

-2
Average miles per gallon

As more than 80% of the standardized residuals lie between the range from -2 to +2, so this model
satisfies the normal distribution. Therefore, on the basis of the standardized residuals, this plot
gives us no reason to question the assumption that e has a normal distribution. As a result, based
on standardized residuals this model follows a normal distribution.
P a g e | 15

Conclusion
After evaluating the regression result, we found the following results:
1. Most of the independent variables are statistically significant according to t test.
2. The model is a good fit according to R² value.
3. According to F test, this model is overall significant.
4. This model does not have any multicollinearity problem.
5. It follows normal distribution, as more than 80 percent standardized residual lies
between -2 to +2.
6. The model has heteroscedasticity problem.
We cannot say that the model is completely reasonable, because it does not follow constant
variance assumption, the main problem in this model is heteroscedasticity. We have to find remedy
for this heteroscedasticity problem. Without correction, this model is not guaranteed. If we cannot
correct this problem, then we have to use another model by adding more variables, and we have to
check whether that model fits all the conditions of a reasonable model or not.
Since all the properties of a reasonable model is fulfilled but constant variance property is not
fulfilled, that’s why our estimator is a biased estimator, so this estimator is not reliable, that’s why
we have to look for a better model.
P a g e | 16

You might also like