You are on page 1of 30

Use a significance level of 𝛼 = 0.05 throughout the exercises, except where stated differently.

Exercise 1 (Exam December 2012)


In a speed skating tournament, the electronic clock was temporarily out of order so that the 500 meters for
men were clocked manually by two teams of officials (A and B). The time measurements for the ten best
skaters are shown in Table 1.1, together with their differences (A – B) and the signs of those differences.
These data have the following sample means and standard deviations (for respectively A, B, and (A – B):
𝑥̅( = 36.244, 𝑠( = 0.411; 𝑥̅/ = 36.260, 𝑠/ = 0.411; 𝑥̅(0/ = −0.016, 𝑠(0/ = 0.024.

Table 1.1: Clocked times for top ten 500 m


Skater
1 2 3 4 5 6 7 8 9 10
A 35.58 35.90 36.00 36.03 36.12 36.21 36.37 36.56 36.73 36.94
B 35.62 35.92 36.03 36.01 36.11 36.25 36.35 36.58 36.77 36.96
A – B -0.04 -0.02 -0.03 0.02 0.01 -0.04 0.02 -0.02 -0.04 -0.02
Sign - - - + + - + - - -

a. Perform the appropriate t test to investigate whether the mean time clocked by A is lower than the
mean time clocked by B, by using the following steps: (i) State H0 and Ha. (ii) Give the distribution of the
relevant test statistic and its degrees of freedom, and find the relevant critical value. (iii) Compute the
value of the test statistic. (iv) Draw your conclusion, also in common speech.

b. Perform a sign test to investigate whether the times clocked by A are lower than those clocked by B, by
using the following steps: (i) State H0 and Ha. (ii) Give the relevant test distribution. (iii) Determine the
exact P-value. (iv) Draw your conclusion, also in common speech.

Table 1.2: Two non-parametric tests for differences between A and B


Ranks Ranks
Mean Sum of Mean Sum of
Group N Rank Ranks N Rank Ranks
a
A 10 10.25 102.50 Negative Ranks 7 6.57 46.00
b
Time B 10 10.75 107.50 Positive Ranks 3 3.00 9.00
A-B
c
Total 20 Ties 0
Total 10

Test Statisticsa a. A < B


b. A > B
Time
c. A = B
Mann-Whitney U 47.500
Wilcoxon W 102.500
Z -.189 Test Statisticsa
Asymp. Sig. (2-tailed) .850 A-B
Exact Sig. [2*(1-tailed Sig.)] .853b Z -1.916b
a. Grouping Variable: Official Asymp. Sig. (2-tailed) .055
b. Not corrected for ties. a. Wilcoxon Signed Ranks Test
b. Based on positive ranks.

c. Table 1.2 shows outcomes of two non-parametric tests to evaluate whether the times of A are lower
than those of B. (i) Which of these two tests is preferred in this situation, and why? (ii) State H0 and Ha
of this test. (iii) Determine the relevant (approximate) test distribution. (iv) Perform the test and draw a
clear conclusion, also in common speech.

d. (i) State the assumptions needed for the tests in (a), (b), and (c). (ii) Which of these three tests do you
prefer in this situation? (iii) Provide a clear motivation of your choice.
Exercise 2 (Resit July 2014)
Consider the market value (in Euros) of football players in the Eredivisie, the highest football league in the
Netherlands. A newspaper is interested in whether the market values of foreign players are higher than the
market values of Dutch players. Independent random samples of 28 Dutch players and 14 foreign players
who played at least one game in the 2013/14 season are available. A dummy variable indicates whether a
player is foreign (0 for Dutch players, 1 for foreign players).

Figure 2.1: Boxplots of market values of Dutch and foreign football players

a. Figure 2.1 shows boxplots of the market values of the Dutch and foreign players, as well as boxplots
of the logarithms. Give two reasons why the logarithm of the market values is more appropriate for
the application of a two-sample t test.

Table 2.1: Logarithm of market values of Dutch and foreign football players
Group Statistics
Foreign N Mean Std. Deviation Std. Error Mean
0 28 13.078 1.074 .203
logMarketValue
1 14 13.753 .754 .202

Independent Samples Test


Levene's Test t-test for Equality of Means
for Equality of
Variances
F Sig. t df Sig. (2- Mean Std. Error 95%
tailed) Difference Difference Confidence
Interval of the
Difference
Lower Upper
Equal 1.959 .169 ??? ??? .042 .676 .321 .027 1.325
variances
assumed
logMarketValue Equal ??? ??? .024 .676 .286 .095 1.256
variances
not
assumed
b. A two-sample t test is performed to test whether the market value is higher for foreign players than
for Dutch players. The results are shown in Table 2.1. Note that the logarithm of market value is
used. (i) First consider the output of Levene’s test. Which one of the two t tests should be applied
to compare the means: the one in the first row (pooled variance) or the one in the second row
(unpooled variance)? Motivate your answer. (ii) State H0 and Ha of the t test. (iii) Give the test
distribution, including the value(s) of the degrees of freedom. (iv) Compute the value of the test
statistic (write down the formula and perform step-by-step calculations). (v) Draw a clear
conclusion from the reported P-value, also in common speech.

Table 2.2: Rank sums of market values


Ranks
Foreign N Mean Rank Sum of Ranks
0 28 18.55 519.50
MarketValue 1 14 27.39 383.50
Total 42

c. Table 2.2 contains the rank sums of the market values of the Dutch and foreign players. Perform
the Wilcoxon rank sum test to investigate whether the market value is higher for foreign players
than for Dutch players, using the following steps: (i) Note that the actual market value is used in
Table 2.2 instead of the logarithm. Is this an issue for this test? (ii) State H0 and Ha. (iii) Retrieve the
value of the Wilcoxon test statistic from Table 2.2. (iv) Compute the mean and the standard
deviation of the distribution of the Wilcoxon test statistic. (v) Determine the P-value using the
normal approximation. (vi) Draw your conclusion, also in common speech.
Exercise 3 (Resit July 2013)

The management of a hospital wishes to investigate whether the operation times of a surgeon decline over
time because of learning effects. These times were recorded for four successive periods, denoted from 1
(first period) to 4 (last period). Further, the hospital scheduled the operations in a conventional way in
periods 1 and 2 and according to new management principles in periods 3 and 4. Table 3.1 contains
summary statistics of the operation times (in minutes) for the various sub-periods.

Table 3.1: Descriptive statistics of operation times per sub-period


Sub-period 1 2 3 4 1&2 3&4 1, 2, 3 & 4
Mean 56.95 43.40 40.14 39.85 49.91 40.05 44.54
Standard Dev. 16.79 8.80 8.51 7.20 14.82 8.11 12.62
Number Obs. 37 40 65 27 77 92 169

a. Show that the standard deviation in scheduling period 1&2 is significantly different than that in period
3&4, by using the following steps: (i) State H0 and Ha. (ii) Give the relevant test distribution with the
degrees of freedom. (iii) Determine the approximate rejection region for this test statistic. (iv) Compute
the value of the test statistic and draw your conclusion, also in common speech.

b. Perform the appropriate t-test to investigate whether the mean operation time in periods 1&2 is larger
than that in periods 3&4, by using the following steps: (i) State H0 and Ha. (ii) Give the (approximate)
degrees of freedom of the test distribution, and compute the value of the test statistic. (iii) Find an
approximation of the P-value. (iv) Draw your conclusion, also in common speech.

Table 3.2: Outcomes of two ANOVA tests for operation times (n=169)
Sum of Squares Df Mean Square F Sig.
2 Periods (1&2 and 3&4)
Between Groups 4071 ??1 … 30 0.000
Within Groups 22671 ??2 …
Total 26742 168
4 Periods (1, 2, 3 and 4)
Between Groups 7599 ??3 …. 22 0.000
Within Groups 19143 ??4 ….
Total 362072 168

c. Table 3.2 shows outcomes of two ANOVA tests. The first test (on top) compares the two scheduling
periods, and the second test (at the bottom) compares the four sub-periods. First consider the test on
top, comparing period 1&2 with period 3&4. (i) State H0 and Ha. (ii) Determine the two degrees of
freedom denoted by “??1” and “??2” in Table 1.2. (iii) Perform the test and draw your conclusion in a
way that is easily understandable by hospital managers. Next consider the test at the bottom,
comparing the four sub-periods. (iv) State H0 and Ha. (v) Determine the two degrees of freedom
denoted by “??3” and “??4” in Table 1.2. (vi) Perform the test and draw your conclusion in a way that is
easily understandable by hospital managers.
Table 3.3: Outcomes of two non-parametric tests for operation times
Test 2 periods (1&2 and 3&4) Test 4 periods (1, 2, 3, and 4)
Period N Mean Rank Sum of Ranks Period N Mean Rank
1&2 77 107.19 8253.50 1 37 128.82
3&4 92 66.43 6111.50 2 40 87.18
Total 169 3 65 66.02
Test statistic 6111.50 4 27 67.43
Z -5.398 Total 169
Asymp. Sig. (2-tailed) .000 Test statistic 43.113

d. Table 3.3 shows the outcomes of two non-parametric tests. The test on the left compares the two
scheduling periods, and the test on the right compares the four sub-periods. First consider the test on
the left, comparing period 1&2 with period 3&4. (i) Which test has been applied here? (ii) State H0 and
Ha. (iii) Perform the test and draw your conclusion. Next consider the test on the right, comparing the
four sub-periods. (iv) Which test has been applied here? (v) State H0 and Ha. (vi) Perform the test, show
all steps, and draw your conclusion.

e. (i) Consider again the ANOVA test and the nonparametric test for comparing period 1&2 with period
3&4. Which one of the two tests do you prefer? Motivate your answer. (ii) Is it possible to make a
choice between the ANOVA test and the nonparametric test for comparing all four sub-periods?
Motivate your answer.
Exercise 4 (Exam December 2013)

For a period of six weeks, a small business places an advertisement in the Business section and in the Sports
section of a local newspaper on every working day. To evaluate their advertising strategy, they keep track
of the number of inquiries from customers that they receive each day as a result of each of the
advertisements.

Table 4.1: Number of inquiries by newspaper section


Group Statistics
Section N Mean Std. Deviation Std. Error
Mean
Business 30 10.10 1.768 .323
Inquiries
Sports 30 8.87 1.978 .361

Independent Samples Test


Levene's t-test for Equality of Means
Test for
Equality of
Variances
F Sig. t df Sig. (2- Mean Std. Error 95% Confidence
tailed) Difference Difference Interval of the
Difference
Lower Upper
Equal .238 .628 2.546 58 .014 1.233 .484 .264 2.203
variances
assumed
Inquiries
Equal 2.546 57.288 .014 1.233 .484 .263 2.203
variances not
assumed

a. To test whether there is a significant difference in the average number of inquiries resulting from
the advertisements in the Business section and the Sports section, an independent samples t-test is
performed. The results are shown in Table 4.1. (i) First consider the output of Levene’s test. Which
one of the two t-tests should be applied to compare the means: the one in the first row (pooled
variance) or the one in the second row (unpooled variance)? Motivate your answer. (ii) State H0
and Ha of the t-test. (iii) Give the test distribution, including the values of the degrees of freedom.
(iv) Draw a clear conclusion from the reported P-value, also in common speech.
Table 4.2: Number of inquiries by week day
Descriptives
Inquiries
N Mean Std. Std. 95% Confidence Interval Minimum Maximum
Deviation Error for Mean
Lower Upper
Bound Bound
Monday 12 10.58 2.275 .657 9.14 12.03 8 14
Tuesday 12 8.92 1.676 .484 7.85 9.98 6 12
Wednesday 12 8.67 1.497 .432 7.72 9.62 6 12
Thursday 12 8.67 1.923 .555 7.45 9.89 5 12
Friday 12 10.58 1.505 .434 9.63 11.54 8 13
Total 60 9.48 1.961 .253 8.98 9.99 5 14

ANOVA
Inquiries
Sum of F Sig.
Squares
Between Groups 48.900 3.776 .009
Within Groups 178.083
Total 226.983

b. Now 1-factor ANOVA is performed to test whether there are significant differences between the
average numbers of inquiries per week day. (i) State H0 and Ha. (ii) Use the rule-of-thumb to check if
the assumption of equal variances is reasonable. Motivate your answer. (iii) Give the test
distribution, including the values of the degrees of freedom. (iv) Check the value of the test statistic
(write down the formula and perform step-by-step calculations). (v) Draw a clear conclusion based
on the reported P-value, also in common speech.
Exercise 5 (Exam December 2004)
In a survey on the degree of happiness, 1000 residents of the Netherlands filled out the questionnaire. The
question arises whether this sample is representative with respect to gender (male or female) and age in
three categories (under 25 years, between 25 and 50 years, and older than 50 years). From figures of
Statistics Netherlands (CBS), the percentage of persons in each of these six groups is known. The table
below presents the number of respondents in each group, as well as the percentage of the population in
each group (in parentheses).

CBS age < 25 25 £ age £ 50 age > 50


Male 135 (14.6%) 140 (16.4%) 225 (18.0%)
Female 140 (14.5%) 150 (16.6%) 210 (19.9%)

Test whether the sample is representative with respect to gender and age group.
Exercise 6 (Exam December 2013)

For a period of six weeks, a small business places an advertisement in the Business section and in the Sports
section of a local newspaper on every working day. To evaluate their advertising strategy, they keep track
of the number of inquiries from customers that they receive each day as a result of each of the
advertisements.

Figure 6.1: Number of inquiries by week day and newspaper section

a. Figure 6.1 shows a profile plot for the average number of inquiries by week day and newspaper section.
What are your expectations based on this plot for the following effects on the number of inquiries in a
2-factor ANOVA: (i) the effect of week day, (ii) the effect of newspaper section, (iii) the interaction of
week day and newspaper section? Motivate your answer.

Table 6.1: Number of inquiries by week day and newspaper section


Descriptive Statistics
Dependent Variable: Inquiries
Day Section Mean Std. Deviation N
Business 12.33 1.633 6
Monday Sports 8.83 1.169 6
Total 10.58 2.275 12
Business 9.67 1.366 6
Tuesday Sports 8.17 1.722 6
Total 8.92 1.676 12
Business 9.00 1.673 6
Wednesday Sports 8.33 1.366 6
Total 8.67 1.497 12
Business 10.00 1.265 6
Thursday Sports 7.33 1.506 6
Total 8.67 1.923 12
Business 9.50 1.049 6
Friday Sports 11.67 1.033 6
Total 10.58 1.505 12
Business 10.10 1.768 30
Total Sports 8.87 1.978 30
Total 9.48 1.961 60

Tests of Between-Subjects Effects


Dependent Variable: Inquiries
Source Type III Sum of F Sig.
Squares
Corrected Model 129.150a 7.334 .000
Intercept 5396.017 2757.760 .000
Day 48.900 6.248 .000
Section 22.817 11.661 .001
Day * Section 57.433 7.338 .000
Error 97.833
Total 5623.000
Corrected Total 226.983
a. R Squared = .569 (Adjusted R Squared = .491)

b. Table 6.1 contains the results of the 2-factor ANOVA. (i) Use the rule-of-thumb to check if the
assumption of equal variances is reasonable. Motivate your answer. (ii) Give the test distribution for
the interaction effect, including the values of the degrees of freedom. (iii) Check the value of the test
statistic for the interaction effect (write down the formula and perform step-by-step calculations). (iv)
Draw a clear conclusion regarding the significance of the interaction effect based on the reported P-
value, also in common speech.
Exercise 7 (Resit July 2014)
Starting a business is a dream of many Economics students, but it requires substantial start-up capital. Data
on start-up costs (in thousands of US Dollars) are available for five different types of businesses.

Table 7.1: Cross table of start-up costs and type of business


Cost * Business Crosstabulation
Business Total
Pizza Bakery Shoe Gift Pet
Count 5 4 7 5 13 34
low
Expected Count 6.9 5.8 6.4 6.4 8.5 34.0
Cost
Count 8 7 5 7 3 30
high
Expected Count 6.1 5.2 5.6 5.6 7.5 30.0
Count 13 11 12 12 16 64
Total
Expected Count 13.0 11.0 12.0 12.0 16.0 64.0

Chi-Square Tests
Value Asymp. Sig. (2-sided)
Pearson Chi-Square 8.209 .084
N of Valid Cases 64

a. In Table 7.1, a cross table of start-up costs and the type of business is computed. The start-up costs
are thereby divided into two categories: ‘low’ (costs less than or equal to $75,000) and ‘high’ (costs
higher than $75,000). (i) Which χ2 test is performed here? (ii) Show how the expected count 6.9 is
computed for pizza restaurants with a low start-up cost (write down the formula and perform step-
by-step calculations). (iii) What rule-of-thumb can be used to check whether this χ2 test is suitable?
Is it fulfilled for the given data? Motivate your answer. (iv) Give the test distribution, including the
value(s) of the degrees of freedom. (v) Draw a clear conclusion based on the reported P-value, also
in common speech.
Exercise 8 (Exam December 2008)

It is popular belief that men take more risk than women, particularly when they are single. The returns on
asset portfolios are recorded for four groups of investors, single men, married men, single women, and
married women. Results are shown in Table 8.1 and the boxplots in Figure 8.1. In total, there are 284
observations.

Table 8.1: Descriptive Statistics of returns on asset portfolios


Status N Minimum Maximum Mean Std. Deviation
Single men 47 -14.43 30.11 6.08 9.73
Married men 112 -10.92 36.95 13.86 10.15
Single women 39 -19.11 29.90 9.45 12.48
Married women 86 -11.99 40.90 15.46 9.69

Figure 8.1: Boxplots of returns on asset portfolios

50.00

269
40.00

30.00

20.00
Return

10.00

0.00

-10.00 259

-20.00

single man married man single woman married woman

Status

a. Focus on single men and single women. Test whether or not the average returns of these two
groups are different. You may assume that the variances in the two groups are not significantly
different. Use the following steps: (i) State H0 and Ha. (ii) Give the formula of the relevant test
statistic. (iii) Give the formula of the pooled variance, and give the value of the degrees of freedom
of the test distribution. (iv) Compute the value of the test statistic. (v) Draw your conclusion, also in
common speech

Table 8.3: ANOVA on Returns


Sum of Squares df Mean Square F Sig.
Between Groups 3263.414 ?? 1087.805 10.261 .000
Within Groups 29684.974 ?? 106.018
Total 32948.388 ??
Table 8.4: Ranks and Test
Status N Mean Rank Return
Single men 47 93.11 Chi-Square 26.041
Married men 112 152.02 Df ??
Single women 39 126.58 Asymp. Sig. .000
Married women 86 164.32
Total 284

b. Tables 8.3 and 8.4 contain results on the comparison of the average returns of all four groups of
investors. Answer the following questions: (i) What assumptions are needed for the test in Table
8.3? (ii) Compute the three values for the degrees of freedom in Table 8.3, with an explanation. (iii)
What conclusion do you draw from Table 8.3? (iv) Which test is performed in Table 8.4, and what is
the value of the degrees of freedom in Table 8.4? (v) What conclusion do you draw from Table 8.4?

Table 8.5: Tests of Between-Subjects Effects; Dependent Variable: Return


Source Type III Sum of Squares df Mean Square F Sig.
Corrected Model 3263 3 1088 10.26 .000
Intercept 29815 1 29815 281.23 .000
Female 367 ?? 367 3.46 .064
Single 2821 ?? 2821 26.61 .000
Female * Single 47 ?? 47 .44 .508
Error 29685 280 106
Total 76987 284
Corrected Total 32949 283

c. Table 8.5 contains further results on the comparison of the returns of all four groups of investors.
Answer the following questions: (i) What method is applied here? (ii) Compute the three values for
the degrees of freedom in Table 8.5, with an explanation. (iii) What conclusions do you draw from
Table 8.5?
Exercise 9

Data on age (in years) and income (in $) are available for males with an age between 25 and 64 years and
with similar education level (bachelor, not master). Answer the following questions.

a. What causes the vertical lines in the scatter plot?


b. Give a clear interpretation of the results in the regression table.
c. Give a clear interpretation of the histogram of the residuals. What are the consequences of your
findings?

Figures and tables

Coefficientsa
Unstandardized Coefficients Standardized
Coefficients
Model B Std. Error Beta t Sig.
(Constant) 24874.374 2637.420 9.431 .000
1
age 892.114 61.764 .188 14.444 .000
a. Dependent Variable: income
Exercise 10

A company is interested in the effect of radio and newspaper advertising on its sales. A sample of 22 cities
with approximately equal populations is selected for study during a test period of one month. Each city is
allocated a specific expenditure level both for local radio and for local newspaper advertising, and the sales
during the test month are recorded. All variables are expressed in thousands of dollars, and the combined
advertisement expenditures for radio and newspaper range from 40–100 thousand dollars per city.

a. Table 10.1 contains results obtained for two regression models. (i) Provide a clear interpretation of
the significance, the sign and the magnitude of the regression coefficient of ‘Radio’ in Model 1. (ii)
Provide a clear interpretation of the significance, and where relevant the sign and the magnitude of
the regression coefficients of Model 2.

b. Figure 10.1 shows a diagnostic plot for Model 2. (i) Give a clear interpretation of this scatter plot.
(ii) Do you think that the regression model needs to be adapted? If yes, how?

Figures and tables

Table 10.1: Regression models, with Sales as dependent variable


Model 1 Model 2
Variable Coefficient P-value Coefficient P-value
(Constant) 731.250 .000 148.201 .316
Radio 11.683 .002 13.733 .000
Newspaper 17.335 .000
Residual Variance (s2) 77366 31447
R Square .383 .762

Figure 10.1: Diagnostic plot


Exercise 11

Starbucks lists the number of calories for all their food items on the menu.

A regular customer is concerned with the amount of carbohydrates they consume. They look up the
carbohydrate content of some of the food items in an online database and try to predict the amount of
carbohydrates from the number of calories.

a. Table 11.1 and Figure 3.1 contain SPSS output from such a regression. (i) What are the assumptions
of the linear regression model? (ii) Which assumption is violated? (iii) What are the consequences on
the regression results?

Starbucks adds a new sandwich to the food menu. They know the amount of fat, carbohydrates, fiber and
protein (all in grams) in the sandwich and want to use this information to determine the number of calories.

b. Table 11.2 and Figure 11.2 contain SPSS output from a linear regression model. (i) Are any of the
assumptions violated? (ii) Give a clear interpretation of the significance, and where relevant the sign
and the magnitude of the regression coefficients. (iii) The new sandwich contains 15g fat, 40g
carbohydrates, 2g fiber and 20g protein. Predict the number of calories of this sandwich. (iv) Is it
possible to find a better prediction?

c. Table 11.3 contains additional results from an ANOVA for overall significance of the model. (i)
State H0 and Ha. (ii) Give the test distribution, with the values of the degrees of freedom. (iii) Check
the value of the test statistic (give the formula and perform step-by-step calculations). (iv) Draw your
conclusion from the reported P-value, also in common speech.
Figures and tables

Table 11.1: Regression coeffiicents


Coefficientsa
Unstandardized Coefficients Standardized
Coefficients
Model B Std. Error Beta t Sig.
(Constant) 8.944 4.746 1.884 .063
1
calories .106 .013 .675 7.923 .000
a. Dependent Variable: carb

Figure 11.1: Diagnostic plots


Table 11.2: Regression coeffiicents
Coefficientsa
Unstandardized Coefficients Standardized
Coefficients
Model B Std. Error Beta t Sig.
(Constant) 5.334 4.149 1.285 .203
fat 8.954 .182 .603 49.270 .000
2 carb 3.842 .078 .604 49.265 .000
fiber -.024 .700 .000 -.035 .972
protein 3.998 .184 .307 21.763 .000
a. Dependent Variable: calories

Figure 11.2: Diagnostic plots


Table 11.3: ANOVA for regression
ANOVAa
Model Sum of Squares df Mean Square F Sig.
Regression 835768.169 ?? 208942.042 1874.238 .000b
2 Residual 8026.636 ?? 111.481
Total 843794.805 76
a. Dependent Variable: calories
b. Predictors: (Constant), protein, carb, fat, fiber
Exercise 12 (Exam December 2013)
For 98 professions, the average yearly income (in Euro) and an average score for the prestige (on a
scale from 0 to 100) are collected from a survey. In addition, data is available on the percentage of
women in the field, as well as the type of occupation (‘bc’ = blue collar, ‘wc’ = white collar, ‘prof’
= professional and managerial).

Table 12.1 and Figure 12.1 contain results from two simple regression models: one with the average
income as the dependent variable, and one with the logarithm of the average income as the
dependent variable. The prestige of the profession is the explanatory variable.

Table 12.1: Simple regression models


Coefficientsa
Model Unstandardized Coefficients Standardized t Sig.
Coefficients
B Std. Error Beta
(Constant) -7819.295 5454.036 -1.434 .155
1
prestige 1051.269 108.450 .703 9.694 .000
a. Dependent Variable: income

Coefficientsb
Model Unstandardized Coefficients Standardized t Sig.
Coefficients
B Std. Error Beta
(Constant) 9.417 .103 91.099 .000
2
prestige .023 .002 .751 11.152 .000
b. Dependent Variable: ln_income

Figure 12.1: Simple regression models

a. (i) What are the assumptions of a linear regression model? (ii) Consider now the plots of the
residuals against the fitted values from the two regressions. Which model do you prefer?
Motivate your choice based on the regression assumptions. (iii) Interpret the coefficient of
prestige in the model of your choice. (iv) In the sample, prestige takes values between 17.3 and
87.2. Is there a meaningful interpretation of the constant term? Motivate your answer.
Table 12.2 contains the estimated coefficients for two regression models with the logarithm of the
average income as the dependent variable. The explanatory variables in model 4 are prestige,
percentage of women, as well as two dummy variables: one for white collar occupations (1 if
occupation = ‘wc’ and 0 otherwise), and one for professional and managerial occupations (1 if
occupation = ‘prof’ and 0 otherwise). In model 3, the two dummy variables are removed.
Table 12.2: Regression models
Coefficientsa
Model Unstandardized Coefficients Standardized t Sig.
Coefficients
B Std. Error Beta
(Constant) 9.725 .078 124.673 .000
3 prestige .021 .001 .698 14.815 .000
women -.008 .001 -.479 -10.168 .000
(Constant) 9.715 .107 91.052 .000
prestige .021 .003 .698 7.961 .000
4 women -.008 .001 -.509 -9.510 .000
wc .087 .073 .071 1.192 .236
prof .012 .105 .011 .114 .910
a. Dependent Variable: ln_income

b. (i) Why is the effect of the type of occupation measured by two dummy variables in model 4? (ii)
Give a clear interpretation of model 4, including the significance, and where relevant the sign
and magnitude of the coefficients. Can you give a clear interpretation of the coefficients of ‘wc’
and ‘prof’ based on the information in Table 12.2?

Model summaries for the two regression models from Table 12.2 are given in Table 12.3.
Table 12.3: Model summaries
Model Summaryc
Model R R Square Adjusted R Std. Error of the Change Statistics
Square Estimate R Square F Change Sig. F Change
Change
a
3 .890 .791 .787 .240743890391198 .791 180.209 .000
4 .892b .795 .786 .241131075429721 .004 .848 .432
a. Predictors: (Constant) , prestige, women
b. Predictors: (Constant) , prestige, women, wc, prof
c. Dependent Variable: ln_income

c. Consider the F-test in Table 12.3 for the two regression models. (i) Write down the regression
equation for model 4 (for the theoretical model with parameters β, not the estimated model). (ii)
State H0 and Ha. (iii) Give the test distribution under H0, including the values of the degrees of
freedom. (iv) Draw a clear conclusion based on the reported P-value, also in common speech.
d. Now consider the models from Table 12.2 for prediction. (i) Motivate why model 3 is preferable
over model 4 for making predictions. (ii) Compute a point prediction for the average income for
a profession with a prestige score of 60 and a percentage of women of 40.
Exercise 13 (Resit July 2013)
We consider data on travel distance (in kilometers) and travel time (in minutes) of commuters to the
Erasmus University Rotterdam. These data consist of 198 observations, 99 for students and 99 for
employees of this university. Figure 2.1 shows a scatter diagram, and Table 2.2 shows the outcomes
of a simple regression.

Figure 13.1

Table 13.2: Regression outcome (Dependent variable: travel time)


Unstandardized Coefficients
B Std. Error t Sig.
(Constant) 14.624 1.408 10.386 .000
distance .605 .031 19.664 .000

a. (i) Provide an interpretation of the results in Table 13.2. (ii) State the assumptions needed for
this regression. (iii) Use Figure 13.1 to conclude that at least one of these assumptions will be
violated, and provide a clear explanation of your answer. (iv) What are the consequences of this
violation for the results in Table 13.2?

Table 2.3: Regression outcome (Dependent variable: travel time)


Unstandardized Coefficients
B Std. Error t Sig.
(Constant) 11.265 2.413 4.669 .000
distance .710 .104 6.827 .000
distance_squared/100 -.152 .057 -2.668 .008
staff 2.723 3.898 .699 .486
distance X staff .224 .101 2.218 .028
(distance_squared/100) X staff .077 .058 1.328 .186

b. Table 13.3 shows results of a multiple regression, where “distance_squared/100” is one percent
of the squared distance, “staff” has value 1 for employees and 0 for students, and the last two
variables (with “X”) are the products of the indicated variables. (i) Give a clear interpretation of
the results in Table 2.3, including the significance, and where relevant the sign and magnitude
of the coefficients. (ii) Suppose that a professor and a student both live at 25 kilometers from the
university. What is the expected difference of their travel times?
Table 13.4: ANOVA for Table 2.3 and R-squared for Tables 13.2 and 13.3
ANOVA Table 2.3 Sum of Squares df Mean Square F Sig.
Regression 69975 5 13995 79.3 .000a
Residual 33894 192 177
Total 103869 197
R-squared Table 2.2 0.674
R-squared Table 2.3 0.703
a Predictors: (Constant), distance, distance_squared, staff, distance X staff, distance_squared X staff

c. Table 13.4 contains further results. (i) Explain the value 5 for the “df” in Table 2.4, by stating
the relevant null hypothesis. (ii) What conclusion do you draw from the ANOVA results in
Table 13.4? (iii) Use the values of R-squared in Table 13.4 to test whether the variables that are
added in Table 13.3, as compared to Table 13.2, are jointly significant. Show all steps of your
testing procedure.

Table 13.5: Cross table


Distance
Small Medium Large Total
Student Count 30 24 45 99
Expected Count 36.5 27.0 35.5 99.0
Employee Count 43 30 26 99
Expected Count 36.5 27.0 35.5 99.0
Total Count 73 54 71 198
Expected Count 73.0 54.0 71.0 198.0

d. In Table 13.5, the travel distance is divided in three groups: small (less than 10 km), medium
(between 10 and 25 km), and large (more than 25 km). (i) Perform an explicit computation to
check the value for the Expected Count for students living less than 10 km from the university.
Which statistical assumption is the basis for this computation? (ii) The test statistic has
numerical outcome 8.07. Determine the degrees of freedom of this test statistic and the
corresponding critical value. (iii). Perform the test and draw your conclusion, also in common
speech.
Exercise 14
We consider the Phillips curve, that is, the (negative) relation between inflation and unemployment.
The data are taken from the US, with monthly data from January 1970 till December 2006, on two
variables: DP, the monthly change in the inflation rate, and DU, the monthly change in
unemployment rate. Furthermore, DP(-1) and DU(-1) are the one-month lagged values of DP and
DU, respectively, and D9006 is a dummy variable with value 0 before June 1990 and value 1
thereafter.

Question 1

Provide an interpretation (both statistical and economic) of the results in Table 14.1.

Question 2

Provide an interpretation (both statistical and economic) of the results in Table 14.2.

Question 3

Use the results in Tables 14.2 and 14.3 to perform a Chow break test, in four steps:
(i) State H0 and Ha.
(ii) Give the degrees of freedom of the F-test and the relevant critical value.
(iii) Compute the value of the F-statistic.
(iv) Draw your conclusion, also in common speech.

Question 4

Use the results in Tables 14.4 and 14.5 to perform a Goldfeld-Quandt test, in four steps:
(i) State H0 and Ha.
(ii) Compute the value of the F-statistic.
(iii) Give the degrees of freedom of the F-test and the relevant critical value.
(iv) Draw your conclusion, also in common speech.
Tables and figures

Figure 14.1: Time series plots


DP DU
1.6 1.2

1.2

0.8 0.8

0.4
0.4
0.0

-0.4
0.0
-0.8

-1.2 -0.4
-1.6

-2.0 -0.8
1970 1975 1980 1985 1990 1995 2000 2005 1970 1975 1980 1985 1990 1995 2000 2005

Figure 14.2: Scatter plot


1.2

0.8

0.4
DU

0.0

-0.4

-0.8
-2 -1 0 1 2

DP

TABLE 14.1 Dependent Variable: DU


Sample (adjusted): 1970M02 2006M12
Included observations: 443 after adjustments
Coefficient Std. Error t-Statistic Prob.
C 0.001064 0.008333 0.127671 0.8985
DU(-1) 0.144571 0.047100 3.069454 0.0023
DP(-1) 0.005672 0.023909 0.237235 0.8126
R-squared 0.020969
S.E. of regression 0.175335

TABLE 14.2 Dependent Variable: DP


Sample (adjusted): 1970M02 2006M12
Included observations: 443 after adjustments
Coefficient Std. Error t-Statistic Prob.
C -0.004434 0.015627 -0.283765 0.7767
DP(-1) 0.332933 0.044834 7.425862 0.0000
DU(-1) -0.208746 0.088323 -2.363436 0.0185
R-squared 0.127898
S.E. of regression 0.328794
TABLE 14.3 Dependent Variable: DP
Sample (adjusted): 1970M02 2006M12
Included observations: 443 after adjustments
NOTE: D9006 has value 0 till 1990M05, 1 from 1990M06 onwards
Coefficient Std. Error t-Statistic Prob.
C -0.002337 0.021287 -0.109809 0.9126
DP(-1) 0.387677 0.057497 6.742530 0.0000
DU(-1) -0.224221 0.104186 -2.152116 0.0319
D9006 -0.004862 0.031377 -0.154942 0.8769
D9006*DP(-1) -0.147636 0.092431 -1.597261 0.1109
D9006*DU(-1) 0.108945 0.199527 0.546017 0.5853
R-squared 0.133655
S.E. of regression 0.328830

TABLE 14.4 Dependent Variable: DP


Sample (adjusted): 1970M02 1979M12
Included observations: 119 after adjustments
Coefficient Std. Error t-Statistic Prob.
C 0.046224 0.033774 1.368642 0.1738
DP(-1) 0.252090 0.089665 2.811474 0.0058
DU(-1) -0.240156 0.162024 -1.482225 0.1410
R-squared 0.082083
S.E. of regression 0.363050

TABLE 14.5 Dependent Variable: DP


Sample: 1997M01 2006M12
Included observations: 120
Coefficient Std. Error t-Statistic Prob.
C -0.006249 0.032831 -0.190351 0.8494
DP(-1) 0.232859 0.090909 2.561435 0.0117
DU(-1) -0.293748 0.265080 -1.108148 0.2701
R-squared 0.059476
S.E. of regression 0.358877
Exercise 15 (Resit July 2014)
Two topics that always come up in US election campaigns are the state of the economy and the
gasoline price. We investigate whether there is a relationship between the two. Data are available
from Q1 1987 to Q1 2014. Here DGDP and DGasPrice denote the quarterly change in gross
domestic product and gasoline price, DUM2007 is a dummy variable with value 0 for the quarters
before 2007 and value 1 afterwards, and DGDP(-1) denotes the value of DGDP in the previous
quarter, with similar definitions for DGDP(-2) and for DGasPrice(-1) and DGasPrice(-2).

Table 15.1: Regression models with DGDP as dependent variable


Coefficientsa
Model Unstandardized Coefficients Standardized t Sig.
Coefficients
B Std. Error Beta
(Constant) 37.740 11.189 3.373 .001
1 DGDP(-1) .343 .098 .340 3.501 .001
DGDP(-2) .169 .098 .168 1.727 .087
(Constant) 36.213 11.543 3.137 .002
DGDP(-1) .329 .104 .326 3.166 .002
2 DGDP(-2) .209 .104 .207 2.014 .047
DGasPrice(-1) 13.905 30.223 .043 .460 .646
DGasPrice(-2) -35.380 30.428 -.109 -1.163 .248
a. Dependent Variable: DGDP

Model Summary
Model R R Adjusted R Std. Error of the Change Statistics
Square Square Estimate R Square F Change Sig. F Change
Change
1 .438a .192 .176 76.128 .192 12.320 .000
b
2 .451 .203 .172 76.309 ??? ??? .497
a. Predictors: (Constant), DGDP(-1), DGDP(-2)
b. Predictors: (Constant), DGDP(-1), DGDP(-2), DGasPrice(-1), DGasPrice(-2)
Number of observations: 107

a. (i) Which hypotheses are tested by means of models 1 and 2 in Table 15.1? What is the name of this test?
(ii) Give the test distribution, including the value(s) of the degrees of freedom. (iii) Compute the value of
the test statistic. (iv) Draw a clear conclusion based on the reported P-value, also in common speech.
Table 15.2: Regression models with DGDP as dependent variable
Coefficientsa
Model Unstandardized Coefficients Standardized t Sig.
Coefficients
B Std. Error Beta
(Constant) 37.740 11.189 3.373 .001
3 DGDP(-1) .343 .098 .340 3.501 .001
DGDP(-2) .169 .098 .168 1.727 .087
(Constant) 48.093 17.869 2.691 .008
DGDP(-1) .182 .142 .180 1.283 .202
DGDP(-2) .293 .143 .290 2.054 .043
4
DUM2007 -26.870 23.632 -.143 -1.137 .258
DUM2007*DGDP(-1) .292 .196 .225 1.488 .140
DUM2007*DGDP(-2) -.311 .198 -.237 -1.571 .119
a. Dependent Variable: DGDP

Model Summary
Model R R Adjusted R Std. Error of Change Statistics
Square Square the Estimate R Square Change F Change Sig. F Change
3 .438a .192 .176 76.128 .192 12.320 .000
4 .487b .238 .200 75.017 ??? ??? .114
a. Predictors: (Constant), DGDP(-1), DGDP(-2)
b. Predictors: (Constant), DGDP(-1), DGDP(-2), DUM2007, DUM2007*DGDP(-1) , DUM2007*DGDP(-2)
Number of observations: 107

b. (i) Which hypotheses are tested by means of models 3 and 4 in Table 15.2? What is the name of this test?
(ii) Give the test distribution, including the value(s) of the degrees of freedom. (iii) Compute the value of
the test statistic. (iv) Draw a clear conclusion based on the reported P-value, also in common speech.
Table 15.3: Regression model for observations from Q3 1987 to Q1 1998
Coefficientsa
Model Unstandardized Coefficients Standardized t Sig.
Coefficients
B Std. Error Beta
(Constant) 37.700 15.659 2.408 .021
5 DGDP(-1) .274 .153 .273 1.793 .081
DGDP(-2) .266 .153 .265 1.739 .090
a. Dependent Variable: DGDP

Model Summary
Model R R Square Adjusted R Std. Error of the
Square Estimate
5 .444a .197 .157 48.709
a. Predictors: (Constant) , DGDP(-1), DGDP(-2)
Number of observations: 43

Table 15.4: Regression model for observations from Q3 2003 to Q1 2014


Coefficientsa
Model Unstandardized Coefficients Standardized t Sig.
Coefficients
B Std. Error Beta
(Constant) 32.334 18.795 1.720 .093
6 DGDP(-1) .484 .161 .480 3.007 .005
DGDP(-2) .016 .161 .016 .102 .919
a. Dependent Variable: DGDP

Model Summary
Model R R Square Adjusted R Std. Error of the
Square Estimate
a
6 .488 .238 .200 96.738
a. Predictors: (Constant) , DGDP(-1), DGDP(-2)
Number of observations: 43

c. Use the results in Tables 15.3 and 15.4 to perform a Goldfeld-Quandt test, using the following steps: (i)
State H0 and Ha. (ii) Compute the value of the test statistic. (iii) Give the test distribution, including the
value(s) of the degrees of freedom, and the relevant critical value(s). (iv) Draw a clear conclusion, also in
common speech.

You might also like