You are on page 1of 25
Regression and Correlation Methods 11.1. INTRODUCTION In Chapter 8, statistical methods for comparing the means of a normally distributed outcome variable between two populations were presented based on t tests, Suppose we call the outcome variable y and the group classification (or class) variable x. For t test applications, x takes on two values, Another way of looking at the methods in Chapter 8 is as techniques for assessing the possible association between a normally distributed variable y and a categorical variable x. We will see that these techniques are special cases of linear-regression methods. In linear regres- sion, we will study how to relate a normally distributed outcome variable y to one or more predictor variables x, ..., x, where the x’s may be either continuous or categorical variables. Obstetrics Obstetricians sometimes order tests to measure estriol levels from 24-hour urine specimens taken from pregnant women who are near term because level of estriol has been found to be related to infant birthweight. The test can provide indirect evidence of an abnormally small fetus. The relationship between estriol level and birthweight can be quantified by fitting a regression line that relates ‘the two variables, In Chapter 10, we also studied the Kappa statistic, which is a measure of associa- tion between two categorical variables. This index is useful when we are interested in how strong the association is between two categorical variables rather than in predicting one variable as a function of the other variable. To quantify the associa- tion between two continuous variables, we can use the correlation coefficient. In ‘this chapter we consider hypothesis-testing methods for correlation coefficients and extend the concept of a correlation coefficient to describe association among several continuous variables. Hypertension Much discussion has taken place in the literature concerning the familial aggregation of blood pressure. In general, children whose parents have high blood pressure tend to have higher blood pressure than their peers. One way of ex: pressing this relationship is by computing a correlation coefficient relating the blood pressure of parents and children over a large collection of families. In this chapter, we discuss methods of regression and correlation analysis in which the relationship between two different variables in the same sample are studied, 457 458 © CHAPTER 1 11.2 EQUATION 11.1 DEFINITION 11.1 FIGURE 11.4 Regression and Correlation Methods ‘The extension of these methods to the case of multiple-regression analysis, where the relationship between more than two variables ata time is considered, is also discussed. GENERAL CONCEPTS Obstetrics Greene and Touchstone conducted a study to relate birthweight and estriol level in pregnant women [1]. Figure 11.1 is a plot of the data from the study; the actual data points are listed in Table 11.1. As can be seen from the figure, there appears to be a relationship between estriol level and birthweight, although this re- lationship is not consistent and considerable scatter exists throughout the plot. How can this relationship be quantified? If x = estriol level and y = birthweight, then we can postulate a linear relationship between y and x that is of the following form: E(ylx)=a+Bx where F(y1x) = expected or average birthweight (9) among women with a given es trio level (x) That is, for a given estriol-level x, the average birthweight E(y[x) =a +x «1+ Bxis the regression Tine, where «is the intercept and f is the slope of the line The line ‘The relationship y = a + Bx is not expected to hold exactly for every woman, For example, not all women with a given estriol level have babies with identical birthweights. Thus, an error term ¢, which represents the variance of birthweight among all babies of women with a given estriol level x, is introduced into the mnodel. Data from the Greene-Touchstone study relating birthweight and estriol level in pregnant women near term 4 : 42 “0 . . Regression line 0 yn 2152 +0.608x . 7 8 9 4011 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 Estria (mg/24 he) Source: Based on the American Journal of Obstetrics and Gynecology, St, 1-9, 1968. 112 General Concepts 459 TABLE 11,1 Sample data from the Greene-Touchstone study relating birthweight and estriol level in pregnant women near term Esti Binhweigt Esto! Ginhweigh (g!24 m9) (i100) (9124 m9) (00) i 4 % i a h 1 7 28 ” "7 a2 2 9 5 18 25 32 3 8 5 19 27 4 12 27 20 18 34 5 4 27 21 18 6 16 27 22 18 35 7 16 24 23 16 35 8 4 30 24 19 34 8 18 30 25 18 35 10 16 31 26 ” 36 1" ” 30 27 18 37 12 19 31 28 20 38 13 21 30 29 22 40 4 24 28 30 25 38 16 18 32 31 24 43 16 18 32 ‘Sours: Based on he Ameticen Jura of Obstetrics acd Gymcceay, 868), 1-8, 1963. Let’s assume e follows a normal distribution, with mean 0 and variance o*, The full inear-regression model then takes the following form: EQUATION 11.2 aspure where ¢ is normally distributed with mean 0 and variance 6 DEFINITION 11.2 For any linear-regression equation of the form y = a~Bx~e,y is called the depen- dent variable and x is called the independent variable because we are trying to predict y as a function of x. Obstetrics Birthweight is the dependent variable and estriol is the independent variable for the problem posed in Example 11.3 because estriol levels are being used. to try to predict birthweight. (One interpretation of the regression line is that for a woman with estriol level x, the corresponding birthweight will be normally distributed with mean a + 6x and. variance o°. If o? were 0, then every point would fall exactly on the regression line, whereas the larger o* is, the more scatter occurs about the regression line. This rela- ‘tionship is illustrated in Figure 11.2. How can f be interpreted? If Bis greater than 0, then as x increases, the expected value of y= a+ Bx will increase NESEAILTESEEIE) onsets Tas san pean tbe the cin Fue 1 or birth an estriol) because as estriol increases, the average birthweight corespondingly increase. Iefpis less than 0, then as x increases, the expected value of y wil decrease

You might also like