This action might not be possible to undo. Are you sure you want to continue?

Homework 3

This homework consists of a SAS exercise, some small questions and a theoretical exercise. For the SAS exercise, it is again important that you write your results in a report format. The small questions only require brief answers. Note that this homework should be made individually, and the marks that you will receive for this homework contribute to your ﬁnal score for Analysis of Continuous Data.

1

A SAS Exercise

The dataset caschoolsimp.xlsx contains a random sample of California elementary school districts. The data consists of test scores ( Y: testscr ) and class sizes ( X: stratio ): The test score is a districtwide average of reading and math scores on the Stanford achievement test, a test utilized by school districts in the USA. The student-teacher ratio, i.e. the total number of students in the district divided by the number of teachers, is used as a measure of the ( overall ) class size in the district. Policy makers are interested whether reducing class size, for instance by hiring more teachers, improves student’s education. Skeptics worry that reducing class size will increase costs without producing substantial beneﬁts1 . The aim of this homework is study the association between the two variables

You can ﬁnd more information on the research conducted on this subject at http://www.ed.gov/pubs/ReducingClass/index.html

1

1

In the data analysis in previous section. Assess the assumptions underlying the linear regression model (scope of the model. would your results still be valid if the error in the regression model was not normal ? Explain. 1.05). constancy of the variance. Perform a two-sided statistical hypothesis test to test the hypothesis that the regression coeﬃcient is zero (use α = 0. 2. When is it appropriate to do a one-sided test ? 2. Give a clear and useful interpretation of the estimated regression coefﬁcient. Give the advantages and disadvantages of a one-sided and a two-sided hypothesis test. Just write them on a separate paper. linearity of the curve. 2 Some Small Questions Answer the following questions brieﬂy. Although the questions are closely related to the simulation exercise. 6. Give the parameter estimates. their standard deviations and their 95% conﬁdence intervals. Answer the following questions: 1. 3. 2 .). What are the minimal conditions under which the regression parameter estimators are unbiased? 3. study of outliers and residuals. you must not write the answers in your report. Calculate a 95% conﬁdence interval of the regression coeﬃcient and interpret the interval.. . lack-of-ﬁt. 1/2 page).by ﬁtting a simple linear regression model Yi = β0 + β1 · Xi + i . Write an executive summary containing your main conclusions from your statistical analysis (max.. 4. and give a detailed discussion on the model quality. 5.

E(Y |X = x)? ( You may assume that X and Y are not independent ). and interpret these parameters. n). Suppose we analyze the data with the model Yi = β0 + β1 x1i + β2 x2i + εi . Assume that P (X2i = 1|X1i = 0) = q0 P (X2i = 1|X1i = 1) = q1 .4. Suppose that the blood pressure is modeled correctly by the following underlying model: Yi = β0 + β1 x1i + β2 x2i + β3 x1i x2i + εi (1) where εi ∼ N (0. Model (1) represents the true data-generating model. physicians are interested in the eﬀect of a new treatment on the blood pressure. Yi is the blood pressure for individual i. n). . On the other hand. . . σ 2 ) (i = 1. 1. (2) where εi ∼ N (0. . What is the interpretation of a conditional expectation. Can your ﬁndings be used to address the question of the policy makers ? Give a short discussion on the potential issues. a certain gene is known to have an extreme inﬂuence on the blood pressure. we aim to ﬁnd out what happens when the treatment eﬀect is estimated with and without accounting for the gene eﬀect. 3 . . They observe patients who use the new treatment and patients who do not. . 5. but this is of course unknown to the statistician. σ 2 ) (i = 1. . x1i indicates whether person i uses the new treatment (x1i =1) or not (x1i =0) and x2i = 1 (x2i = 0) indicates the presence (absence) of the gene. In this exercise. . 3 A Theoretical Exercise In a clinical study. Assume that the outcomes are obtained from a randomized study. Write the parameters β1 and β2 in function of the parameters of Model (1).

2. (3) where εi ∼ N (0. What happens if β3 = 0 in Model (1)? Give a discussion for both Models (2) and (3). In this case. we want to analyse the data with the simple model Yi = β0 + β1 x1i + εi . . 3. Write the parameter β1 in function of the parameters of Model (1). Suppose now that we do not know that there is a “confounding” eﬀect of the gene. n). σ 2 ) (i = 1. . . . 4 . and interpret this parameter.

Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

We've moved you to where you read on your other device.

Get the full title to continue

Get the full title to continue reading from where you left off, or restart the preview.

scribd