The Institute of Finance Management: Name Registration Number

THE INSTITUTE OF FINANCE MANAGEMENT
FACULTY: ECONOMICS AND MANAGEMENT SCIENCE
PROGRAMME: BACHELOR OF SCIENCE IN ECONOMICS

AND FINANCE
ACADEMIC YEAR: 3rd YEAR-2021/2022
STREAM B; GROUP 9
ASSIGNMENT: GROUP ASSIGNMENT
SUBJECT: APPLIED ECONOMETRICS
MODULE CODE: ECU-08608

NAME REGISTRATION NUMBER
SALMA R NGASSA IMC/BEF/1921349
HASHIMU M SAID IMC/BEF/1910865
FATHIYA MOHAMED IMC/BEF/1920571
MAURICE NCHIMBI IMC/BEF/1912964
FADHILI K OMARI IMC/BEF/1911017
SAMWEL J LYIMO IMC/BEF/1912226
JOHN J THOMAS IMC/BEF/1910721
TRYPHONE ADRIA IMC/BEF/1920249
OLIVER H MAHOO IMC/BEF/1922477
JUMA MAJID MWANGA IMC/BEF/1912484
Question one
A researcher is interested in investigating the effects of cigarettes smoking on infant birth

weight. Teo variables of interest for that researcher are infant birth weight in ounces( bwght)
and average number of cigarettes the mother smoked per day during pregnancy (cigs).The
following simple regression model using data on n=1388 was estimated.
Bwght =119.77-0.514cigs. R^2=0.0227
From above information, answer the following questions
(i). Indicate the dependent and independent variables
 Dependent variable is birth weight in ounces

 Independent variable is average number of cigarettes the mother smoked per day during
pregnancy.
(ii) Interpret the regression coefficients
From bwght=119.77-0.514cigs
 The infant birth weight will be 119.77 ounces when zero average cigarettes are smoked
per day during pregnancy. The increase of average number of cigarettes smoked during
pregnant will reduce the weight of an infant by 0.514 ounces.
(iii). Interpret the value of coefficient of determination.
R^2=0.0227
=0.0227×100%
=2.27%
Therefore the variation in dependent variables is explained by independent variable by 2.27%
(iv). Determine and interpret the correlation coefficient.
The correlation coefficient(r) can be computed from coefficient of determination for simple
linear regression model
Given
R^2=0.0227
Now;
r=√0.0227
r=0.151
Therefore the correlation coefficient is -0.151 it is negative because the slope is negative.
Interpretation: There is a weak negative linear association between average number of

cigarettes and infant birth weight.
(v). What is the predicted birth weight cigs=20 (one pack per day)
Given cigs=20
From
bwght=119.77-0.514cigs
bwght=119.77-0.514(20)
bwght=119.77-10.28
bwght=109.49 ounces
Therefore when cigs =20 the birth weight wil be 109.49 ounces.
Question 2
From the given information of the laboratory that collected data about the cost of materials
used for testing necessary products over a period of one year, they want to know if the costs of
materials A, B and C have significant value on the overall cost of testing. From the given
observation;
(a). Specify the multiple linear regression model
The multiple linear regression model will be given as follows
Y=2921.795 - 5.647 +4.038 -20.592
(b). Determine and interpret the determination coefficient
From the given data; Coefficient of determination(R^2) =0.861
Interpretation:
0.861 × 100 =86.1% meaning that 86.1 percent of dependent variable are explained by
independent variable.
(c). Which of the three coefficients can be considered as the most efficient? Why?
The coefficient which is considered to be the most efficient is coefficient B (4.038)
Reason; It is considered to be most efficient coefficient because it is the one with the lowest
standard error which is 5.181
(d). Which regressors should we keep in our equation? Why?
The regressor which should be kept in our equation is the one with material C
Reason; Because it has a significant relationship with dependent variable by evaluating its P
value of 0.0343.
Question 3. Consequences of violation of OLS assumptions
Definition of OLS. is a form of statistical regression used as a way to predict unknown

values from an existing set of data. OLS chooses the parameters of a linear function of a set
of explanatory variables by the principle of least squares: minimizing the sum of the squares
of the differences between the observed dependent variable (values of the variable being
observed) in the given dataset and those predicted by the linear function of the independent
variable.
The following are the Consequences of violation of OLS assumptions
Assumption 1: The linear regression model is “linear in parameters.”When the dependent

variable (Y)(Y) is a linear function of independent variables (X's)(X′s) and the error term, the
regression is linear in parameters and not necessarily linear in X's that is Y=β0+β1X1+β2X2
+ε
Violation of this assumption the model will be incorrect and hence unreliable. When
you use the model for extrapolation, you are likely to get erroneous results. Hence, you should
always plot a graph of observed predicted values. If this graph is symmetrically distributed
along the 45-degree line, then you can be sure that the linearity assumption holds. If linearity
assumptions don’t hold, then you need to change the functional form of the regression, which
can be done by taking non-linear transformations of independent variables
Assumption 2: X values are independent with error term. This assumption is also referred to as
exogeneity. When this type of correlation exists, there is endogeneity. Violations of this
assumption can occur because there is simultaneity between the independent and dependent
variables, omitted variable bias, or measurement error in the independent variables.
Violation of this assumption will lead the problem of endogeneity where

explanatory variable is correlated with the error term. Endogenity it violates the exogeneity
assumption of the Gauss–Markov theorem. Where the error terms are not correlated with the
independent variables; that is the correlation of X variable with U variables equal to Zero.
Assumption 3: The error term has a population mean of zero, the error term accounts for the
variation in the dependent variable that the independent variables do not explain. Random
chance should determine the values of the error term. For your model to be unbiased, the
average value of the error term must equal zero.
Violation of this assumption you will end up with estimates which do not
accurately represent the influence of variables on the subject
Assumption 4: Homoskedasticity (constant) variance, the variance of the errors should be

consistent for all observations. In other words, the variance does not change for each
observation or for a range of observations. This preferred condition is known as
homoscedasticity (same scatter). If the variance changes, we refer to that as heteroscedasticity
Violation of this assumption will lead the problem of heteroscedastic then it

will be difficult to trust the standard errors of the OLS estimates. Hence, the confidence
intervals will be either too narrow or too wide. Also, violation of this assumption has a tendency
to give too much weight on some portion (subsection) of the data. Hence, it is important to fix
this if error variances are not constant. You can easily check if error variances are constant or
not. Examine the plot of residuals predicted values or residuals vs. time (for time series
models). Typically, if the data set is large, then errors are more or less homoscedastic. If your
data set is small, check for this assumption.
Assumption 5: Observations of the error term are uncorrelated with each other, One
observation of the error term should not predict the next observation. For instance, if the error
for one observation is positive and that systematically increases the probability that the
following error is positive, that is a positive correlation. If the subsequent error is more likely
to have the opposite sign, that is a negative correlation. This problem is known both as serial
correlation and autocorrelation. Serial correlation is most likely to occur in time series models.
Violation of this assumption will made the OLS estimates won’t be BLUE, and
they won’t be reliable enough. this assumption is most likely to be violated in time series
regression models and, hence, intuition says that there is no need to investigate it. However,
you can still check for autocorrelation by viewing the residual time series plot. If
autocorrelation is present in the model, you can try taking lags of independent variables to
correct for the trend component.
Assumption 6: No independent variable is a perfect linear function of other explanatory
variables, Perfect correlation occurs when two variables have a Pearson’s correlation
coefficient of +1 or -1. When one of the variables changes, the other variable also changes by
a completely fixed proportion. The two variables move in union.
Violation of this assumption the standard errors of OLS estimates won’t be

reliable, which means the confidence intervals would be too wide or narrow. Also, OLS
estimators won’t have the desirable BLUE property. A normal probability plot or a normal
quantile plot can be used to check if the error terms are normally distributed or not. A bow-
shaped deviated pattern in these plots reveals that the errors are not normally distributed.
Sometimes errors are not normal because the linearity assumption is not holding. So, it is
worthwhile to check for linearity assumption again if this assumption fails.
Assumption 7: The error term is normally distributed (optional),OLS does not require that the
error term follows a normal distribution to produce unbiased estimates with the minimum
variance. However, satisfying this assumption allows you to perform statistical hypothesis
testing and generate reliable confidence intervals and intervals. The easiest way to determine
whether the residuals follow a normal distribution is to assess a normal probability plot. If the
residuals follow the strait of graph, they are normally distributed.
Violation of this assumption will lead the opposite (unexpected) signs for your
regression coefficients (e. if you expect that the independent variable positively impacts your
dependent variable but you get a negative sign of the coefficient from the regression model). It
is highly likely that the regression suffers from multi-collinearity. If the variable is not that
important intuitively, then dropping that variable or any of the correlated variables can fix the
problem.
Conclusion; Linear regression models are extremely useful and have a wide range of
applications. When you use them, be careful that all the assumptions of OLS regression are
satisfied while doing an econometrics test so that your efforts don’t go wasted. These
assumptions are extremely important, and one cannot just neglect them. Having said that, many
times these OLS assumptions will be violated. However, that should not stop you from
conducting your econometric test.
QN4. Sources of Endogeneity.
Meaning of endogeneity. In econometrics, endogeneity broadly refers to situations in which an

explanatory variable is correlated with the error term. Endogenity it violates the exogeneity
assumption of the Gauss–Markov theorem. Where the error terms are not correlated with the
independent variables; that is the correlation of X variable with U variables equal to Zero.
Example of on how endogenity problem arise
 An ice cream vendor sells ice cream on a beach. He collects data for total sales (Y) and
selling price (X) for 2 years. Where by the ice cream vendor used to increase the price
of the ice creams once the temperature(Z) was high as the demand went up. He forgot
to mention his pricing strategy to the data scientist. The linear regression thinks that as
the selling price increases, the sales increases. Therefore he forgot to mention
temperature(Z) on its model.
Thus, the issue of endogeneity arises when we have a Z that is related to Y, but it is also related
to X and not included in the model.
Sources of Endogeneity.
(I). Omitted variable. a variable that is correlated with both the independent variable in the
model and with the error term but unfortunately omitted in the model.
Assume that the "true" model to be estimated is;
Yi = a + Bxi + yzi + ui
But zi is omitted from the regression model. Then the model that is actually estimated is;
Yi = a + Bxi + ui
Therefore, the Z variables has been absorbed in the error term and if the correlation of X and
Z is not equal to zero then Z correlated with the error term, here the X is not exogenous for a
and B since given X the distribution of Y depend not only a and B but also on Z and Y
(II). Simultaneity. Simultaneity is where the explanatory variable is jointly determined with the
dependent variable. In other words, X causes Y but Y also causes X. It is one cause of
endogeneity.
Suppose that two variables are codetermined, with each affecting the other
A system of simultaneous equations occurs when two or more left-hand side variables are
functions of each other.
that is:
y1 = a + b1x1 + g2y2 + e
y2 = a + g1x1 + g2y1 + e
(III). Measurement error. Measurement error Suppose the true model underlying the data is;
y = a + b1x1 + b2x2 + e but you estimate the model y = a + b1x1 + b2x2* + e
where (x2* = x2 + j).
Therefore; x2 will be endogenous if j depends on x2. Example: Suppose that x2 measures

hospital size (no. of beds), and that the measurement error is greater for larger hospitals. Then
as x2 grows, so does j. Thus e is correlated with x2, causing endogeneity.
By continued Rearranging the equation,
we have y = a + b1x1 + b2x2* + e y = a + b1x1 + b2(x2 + j) + e y = a + b1x1 + b2x2 + (e

+b2 j) If j = f(x2) then error term is correlated with x2, causing endogeneity.
QN5. How to solve the problems of endogeneity.
1. Dealing with Omitted Variables
(1). Find additional data so that every relevant variable is included.
(2). Accept only if omitted variable is uncorrelated with all included variables; otherwise the
coefficient estimates will be biased up or down.
(3). Find proxy variable Suppose: y is the outcome q is the omitted variable z is the proxy for
q. Where by the proxy z have to be;
 Proxy z should be strongly correlated with q.

 Proxy z must be redundant (= ignorable) E (y | x, q, z) = E (y | x, q).
 Omitted q must be uncorrelated with other regressors conditional on z: (corr (q , xj) =
0 | z) for each xj
2. Dealing with Measurement Error
 Improve measurement - improved by refusing extreme outlier values.

 Decrease degree of errors to be small by using outside data for validation.
 Make sure that error is uncorrelated with included variables.
3. Dealing with instrumental Variables which Often used to deal with

simultaneity.
Given a Model: y = a + b1x1 + b2x2 + e Suppose that x2 is endogenous to y.
Therefore, instrumental variable is one that;
 is correlated with the endogenous variable x2.

 is uncorrelated with error term e
 should not enter the main equation (i.e., does not explain y)
Therefore, Instrumental Variables (IV) is estimated by Two Stage Least Squares (TSLS)
which has the following stages;
 Two-stage least squares (2SLS) approach Stage 1. Predict x2 as a function of all other
variables plus an IV (call it z): x2 = a + g1x1 + g2z + n Create predicted values of
x2. – call them x2p
 Two-stage least squares (2SLS) approach Stage 2: Predict y as a function of x2p and
all other variables (but not z): y = a + b1x1 + b2 x2p + e Note: adjust the standard
errors to account for the fact that x2pis predicted.
References;
 Gujarati, D., N. (2004). Basic econometrics, 4th edition, USA. The McGraw-Hill
companies.
 Marno, V. (2004). A Guide to modern econometrics: 2nd - edition, John Wiley &
Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex P.O. 19 8SQ,
England. Retrieve from www.amazon.co.uk › ... › Economics › Econometrics
 Wooldrige J. M. (2002) Econometric Analysis of Cross Section and Panel Data. The
MIT Press,
 Gujarati, D and D. Porter, (2014) Basic Econometrics, 5th Edition. McGraw-Hill.
 Maddala, George S., (2008) Introduction to Econometrics, Prentice Hall. Any edition.
 Stock, J. H. and M. Watson. (2014) Introduction to Econometrics, 2nd
Edition. Pearson, Addison Wesley. 2007.
 Ruud P.A. (2000) An Introduction to Classical Econometric Theory, Oxford U.P.,
 Johnston J. and DiNardo, J. (1997) Econometric Methods, 4th Ed. McGraw-Hill.

The Institute of Finance Management: Name Registration Number

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

The Institute of Finance Management: Name Registration Number

Uploaded by

Copyright:

Available Formats

THE INSTITUTE OF FINANCE MANAGEMENT

FACULTY: ECONOMICS AND MANAGEMENT SCIENCE

PROGRAMME: BACHELOR OF SCIENCE IN ECONOMICS

ACADEMIC YEAR: 3rd YEAR-2021/2022

ASSIGNMENT: GROUP ASSIGNMENT

SUBJECT: APPLIED ECONOMETRICS

MODULE CODE: ECU-08608

A researcher is interested in investigating the effects of cigarettes smoking on infant birth

Bwght =119.77-0.514cigs. R^2=0.0227

From above information, answer the following questions

(i). Indicate the dependent and independent variables

 Dependent variable is birth weight in ounces

(ii) Interpret the regression coefficients

(iii). Interpret the value of coefficient of determination.

Therefore the variation in dependent variables is explained by independent variable by 2.27%

(iv). Determine and interpret the correlation coefficient.

Interpretation: There is a weak negative linear association between average number of

(a). Specify the multiple linear regression model

The multiple linear regression model will be given as follows

Y=2921.795 - 5.647 +4.038 -20.592

(b). Determine and interpret the determination coefficient

From the given data; Coefficient of determination(R^2) =0.861

The coefficient which is considered to be the most efficient is coefficient B (4.038)

(d). Which regressors should we keep in our equation? Why?

Definition of OLS. is a form of statistical regression used as a way to predict unknown

The following are the Consequences of violation of OLS assumptions

Assumption 1: The linear regression model is “linear in parameters.”When the dependent

Violation of this assumption will lead the problem of endogeneity where

Assumption 4: Homoskedasticity (constant) variance, the variance of the errors should be

Violation of this assumption will lead the problem of heteroscedastic then it

Violation of this assumption the standard errors of OLS estimates won’t be

Meaning of endogeneity. In econometrics, endogeneity broadly refers to situations in which an

Example of on how endogenity problem arise

Assume that the "true" model to be estimated is;

y = a + b1x1 + b2x2 + e but you estimate the model y = a + b1x1 + b2x2* + e

where (x2* = x2 + j).

Therefore; x2 will be endogenous if j depends on x2. Example: Suppose that x2 measures

By continued Rearranging the equation,

we have y = a + b1x1 + b2x2* + e y = a + b1x1 + b2(x2 + j) + e y = a + b1x1 + b2x2 + (e

1. Dealing with Omitted Variables

(1). Find additional data so that every relevant variable is included.

 Proxy z should be strongly correlated with q.

2. Dealing with Measurement Error

 Improve measurement - improved by refusing extreme outlier values.

3. Dealing with instrumental Variables which Often used to deal with

Given a Model: y = a + b1x1 + b2x2 + e Suppose that x2 is endogenous to y.

Therefore, instrumental variable is one that;

 is correlated with the endogenous variable x2.

You might also like