Professional Documents
Culture Documents
STREAM B; GROUP 9
From bwght=119.77-0.514cigs
The infant birth weight will be 119.77 ounces when zero average cigarettes are smoked
per day during pregnancy. The increase of average number of cigarettes smoked during
pregnant will reduce the weight of an infant by 0.514 ounces.
R^2=0.0227
=0.0227×100%
=2.27%
The correlation coefficient(r) can be computed from coefficient of determination for simple
linear regression model
Given
R^2=0.0227
Now;
r=√0.0227
r=0.151
Therefore the correlation coefficient is -0.151 it is negative because the slope is negative.
(v). What is the predicted birth weight cigs=20 (one pack per day)
Given cigs=20
From
bwght=119.77-0.514cigs
bwght=119.77-0.514(20)
bwght=119.77-10.28
bwght=109.49 ounces
Therefore when cigs =20 the birth weight wil be 109.49 ounces.
Question 2
From the given information of the laboratory that collected data about the cost of materials
used for testing necessary products over a period of one year, they want to know if the costs of
materials A, B and C have significant value on the overall cost of testing. From the given
observation;
Interpretation:
0.861 × 100 =86.1% meaning that 86.1 percent of dependent variable are explained by
independent variable.
(c). Which of the three coefficients can be considered as the most efficient? Why?
Reason; It is considered to be most efficient coefficient because it is the one with the lowest
standard error which is 5.181
The regressor which should be kept in our equation is the one with material C
Reason; Because it has a significant relationship with dependent variable by evaluating its P
value of 0.0343.
Question 3. Consequences of violation of OLS assumptions
Violation of this assumption the model will be incorrect and hence unreliable. When
you use the model for extrapolation, you are likely to get erroneous results. Hence, you should
always plot a graph of observed predicted values. If this graph is symmetrically distributed
along the 45-degree line, then you can be sure that the linearity assumption holds. If linearity
assumptions don’t hold, then you need to change the functional form of the regression, which
can be done by taking non-linear transformations of independent variables
Assumption 2: X values are independent with error term. This assumption is also referred to as
exogeneity. When this type of correlation exists, there is endogeneity. Violations of this
assumption can occur because there is simultaneity between the independent and dependent
variables, omitted variable bias, or measurement error in the independent variables.
Assumption 3: The error term has a population mean of zero, the error term accounts for the
variation in the dependent variable that the independent variables do not explain. Random
chance should determine the values of the error term. For your model to be unbiased, the
average value of the error term must equal zero.
Violation of this assumption you will end up with estimates which do not
accurately represent the influence of variables on the subject
Assumption 5: Observations of the error term are uncorrelated with each other, One
observation of the error term should not predict the next observation. For instance, if the error
for one observation is positive and that systematically increases the probability that the
following error is positive, that is a positive correlation. If the subsequent error is more likely
to have the opposite sign, that is a negative correlation. This problem is known both as serial
correlation and autocorrelation. Serial correlation is most likely to occur in time series models.
Violation of this assumption will made the OLS estimates won’t be BLUE, and
they won’t be reliable enough. this assumption is most likely to be violated in time series
regression models and, hence, intuition says that there is no need to investigate it. However,
you can still check for autocorrelation by viewing the residual time series plot. If
autocorrelation is present in the model, you can try taking lags of independent variables to
correct for the trend component.
Assumption 6: No independent variable is a perfect linear function of other explanatory
variables, Perfect correlation occurs when two variables have a Pearson’s correlation
coefficient of +1 or -1. When one of the variables changes, the other variable also changes by
a completely fixed proportion. The two variables move in union.
Assumption 7: The error term is normally distributed (optional),OLS does not require that the
error term follows a normal distribution to produce unbiased estimates with the minimum
variance. However, satisfying this assumption allows you to perform statistical hypothesis
testing and generate reliable confidence intervals and intervals. The easiest way to determine
whether the residuals follow a normal distribution is to assess a normal probability plot. If the
residuals follow the strait of graph, they are normally distributed.
Violation of this assumption will lead the opposite (unexpected) signs for your
regression coefficients (e. if you expect that the independent variable positively impacts your
dependent variable but you get a negative sign of the coefficient from the regression model). It
is highly likely that the regression suffers from multi-collinearity. If the variable is not that
important intuitively, then dropping that variable or any of the correlated variables can fix the
problem.
Conclusion; Linear regression models are extremely useful and have a wide range of
applications. When you use them, be careful that all the assumptions of OLS regression are
satisfied while doing an econometrics test so that your efforts don’t go wasted. These
assumptions are extremely important, and one cannot just neglect them. Having said that, many
times these OLS assumptions will be violated. However, that should not stop you from
conducting your econometric test.
QN4. Sources of Endogeneity.
An ice cream vendor sells ice cream on a beach. He collects data for total sales (Y) and
selling price (X) for 2 years. Where by the ice cream vendor used to increase the price
of the ice creams once the temperature(Z) was high as the demand went up. He forgot
to mention his pricing strategy to the data scientist. The linear regression thinks that as
the selling price increases, the sales increases. Therefore he forgot to mention
temperature(Z) on its model.
Thus, the issue of endogeneity arises when we have a Z that is related to Y, but it is also related
to X and not included in the model.
Sources of Endogeneity.
(I). Omitted variable. a variable that is correlated with both the independent variable in the
model and with the error term but unfortunately omitted in the model.
Yi = a + Bxi + yzi + ui
But zi is omitted from the regression model. Then the model that is actually estimated is;
Yi = a + Bxi + ui
Therefore, the Z variables has been absorbed in the error term and if the correlation of X and
Z is not equal to zero then Z correlated with the error term, here the X is not exogenous for a
and B since given X the distribution of Y depend not only a and B but also on Z and Y
(II). Simultaneity. Simultaneity is where the explanatory variable is jointly determined with the
dependent variable. In other words, X causes Y but Y also causes X. It is one cause of
endogeneity.
Suppose that two variables are codetermined, with each affecting the other
A system of simultaneous equations occurs when two or more left-hand side variables are
functions of each other.
that is:
y1 = a + b1x1 + g2y2 + e
y2 = a + g1x1 + g2y1 + e
(III). Measurement error. Measurement error Suppose the true model underlying the data is;
(2). Accept only if omitted variable is uncorrelated with all included variables; otherwise the
coefficient estimates will be biased up or down.
(3). Find proxy variable Suppose: y is the outcome q is the omitted variable z is the proxy for
q. Where by the proxy z have to be;
Therefore, Instrumental Variables (IV) is estimated by Two Stage Least Squares (TSLS)
which has the following stages;
Two-stage least squares (2SLS) approach Stage 1. Predict x2 as a function of all other
variables plus an IV (call it z): x2 = a + g1x1 + g2z + n Create predicted values of
x2. – call them x2p
Two-stage least squares (2SLS) approach Stage 2: Predict y as a function of x2p and
all other variables (but not z): y = a + b1x1 + b2 x2p + e Note: adjust the standard
errors to account for the fact that x2pis predicted.
References;
Gujarati, D., N. (2004). Basic econometrics, 4th edition, USA. The McGraw-Hill
companies.
Marno, V. (2004). A Guide to modern econometrics: 2nd - edition, John Wiley &
Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex P.O. 19 8SQ,
England. Retrieve from www.amazon.co.uk › ... › Economics › Econometrics
Wooldrige J. M. (2002) Econometric Analysis of Cross Section and Panel Data. The
MIT Press,
Gujarati, D and D. Porter, (2014) Basic Econometrics, 5th Edition. McGraw-Hill.
Maddala, George S., (2008) Introduction to Econometrics, Prentice Hall. Any edition.
Stock, J. H. and M. Watson. (2014) Introduction to Econometrics, 2nd
Edition. Pearson, Addison Wesley. 2007.
Ruud P.A. (2000) An Introduction to Classical Econometric Theory, Oxford U.P.,
Johnston J. and DiNardo, J. (1997) Econometric Methods, 4th Ed. McGraw-Hill.