Professional Documents
Culture Documents
Following are information from 420 schools in the year 2020 to evaluate the school
performance. Among the key variables in the dataset, some are-
Following are some statistical outputs from STATA software which you will need to answer
the questions. If you take help from these tables in answering the requirements, please
mention the Table number which you are referring to.
Requirements:
a) What type of dataset is it and why? What is the sample size? Which one is (are) the
dependent and independent variable(s)?
Ans:
Data Set Type Time Period Entity
Time Series Multiple Single
Cross Sectional Single Multiple
Panel Multiple Multiple
b) Briefly explain the key aspects of all the variables based on the descriptive/summary
statistics.
Ans:
Testscr: On average students of all the schools have test score of 500 points. As the standard
deviation of testscr is 15, there is a large deviation in Testscr among the schools. Minimum
score of a student in the data set 400 points. The highest score of a student among all the
school is 600 points.
STR: Average student teacher ratio is 20%. So, there is on average 1 teacher for every 5
students. As the the standard deviation of STR is 2, we can infer that there is little difference
among the schools in terms of STR. Minimum there is 1 teacher for 7 students. Maximum
there is 1 teacher for every 4 students.
AvgInc: On average a student’s family income is $15000. As the standard deviation is 7, so
there is moderate deviation in the family income of the students. As the minimum family
income of student is $5000, so we can say that there are many poor students who need
financial assistance. Again, some families are rich as the maximum family income is $55000.
Most of the families have lower income.
EL_Pct: On average 16% students are English learners. As the standard deviation of EL_Pct
is 18, there is a significant deviation among the schools in terms of English learner students.
There are some schools that have all native speaker students as the minimum value is 0%.
There are some schools that have non-native students mostly as the maximum value is 85%.
On average, the number of English learners among the schools are low.
Ans:
When two or more independent variables are highly correlated, then this problem in the
multiple linear regression model is called multicolinearity.
No, there is no multicollinearity in this dataset. Because, all the correlation value among the
independent variables in the correlation matrix in Table-2 are less than 0.80 and the VIF
value for the independent variable are less than 10 in Table-3.
Ans:
If the variance of the error term is not constant, then it is defined as heteroscedasticity.
In table-4, the Breusch-Pagan Test shows the P-value 0.45> 0.05, so we cannot reject the null
hypothesis which means that there is no heteroscedasticity in the data set.
e) Write down the simple population regression model considering only STR. Write
down the sample regression LINE considering only STR.
f) Based on requirement (e), interpret the economic and statistical significance of the
intercept and the slope of the simple linear regression line. Do you agree that class
size significantly influences students’ exam performance?
g) Based on requirement (e), if a school has a test score of 590 points and class size
(STR) of 18, what will be the predicted test score and the corresponding residual/error
of that school?
h) Based on requirement (e), comment on the goodness of fit of this simple linear
regression model.
i) Write down the multiple population regression model considering STR, AVGINC and
EL_PCT. Write down the sample regression LINE considering STR, AVGINC and
EL_PCT.
j) Based on requirement (i), interpret the economic and statistical significance of the
intercept and the slopes of the multiple linear regression line. Do you still agree that
class size significantly influences students’ exam performance?
Ans:
Constant:
Economic significance: If all the co-efficients of the independent variables are zero, then the
Testscr will be 480 units.
Statistical significance: As the P-value of the constant is 0.00 < 0.05, the constant value is
significant at 5% significance level.
STR:
Economic significance: If STR increases by 1%, then Testscr will decrease by .07 units
considering all other independent variables remain constant.
Statistical significance: As the P-value of the STR is 0.80 > 0.05, the STR variable is
insignificant at 5% significance level.
AvgInc:
Economic significance: If AvgInc increases by $1000, then Testscr will increase by 1.5 units
considering all other independent variables remain constant.
Statistical significance: As the P-value of the AvgInc is 0.00 < 0.05, the AvgInc variable is
significant at 5% significance level.
El_Pct:
Economic significance: If El_Pct increases by 1%, then Testscr will decrease by 0.49 units
considering all other independent variables remain constant.
Statistical significance: As the P-value of the El_Pct is 0.00 < 0.05, the El_Pct variable is
significant at 5% significance level.
As the STR variable is insignificant at 5% significance level, class size does not influence
students’ exam performance.
k) Based on requirement (i), if a school has a test score of 590 points, class size (STR) of
20, average income of $10 thousand and English learning students of 30%, what will
be the predicted test score and the corresponding residual/error of that school?
l) Based on requirement (i), comment on the goodness of fit of this multiple linear
regression model.
m) Based on requirements (h) and (l), does the multiple linear regression model fit better
than the simple linear regression model? Why or why not?
n) Comment on the validity of the whole/overall multiple regression model.
Ans: F-statistics of the multiple linear regression model is 300 with P-value 0.00 which is less
than .05, so the overall model is significant and valid.
o) Based on requirement (i), suppose you are considering taking the natural logarithm of
some variables in the multiple regression model. How would you explain the slope if
you take natural logarithm of (1) AVGINC only, (2) TESTSCR only, and (3) both
AVGINC and TESTSCR.