You are on page 1of 10
STATIST Neo, Kosiag) artera a. otal | 989 Some As researchers, we are concemed with detecting relafionships between and among phenomena, Many research studies are designe! to find out whether there is an association between two variables, For example, we may want to determine if people's age are Télated to ther blood lpressures, or whether students’ anxiety levels ae related to their achievement cores. By discovering a relationship between variables, we can cften predict h person's status on a tiver.varieble if we know how he or she performs on tHe other variable, Since Prediction is one of the major goals of any science, the| discovery of relatior ships is of paramount importance. In this chapter, we present a way to show the rslatfonship between two sets of data and one comnonly used technique for measuring this relationship. In Chapter 12, we will demonstrate a method for making predictions about associated variables. + Note that the statistical techniques presented so far describe frequency dis- tributions or make references from samples where scores were obtained for only ‘one variable, In these cases, each datum represented a measurement based on only one characteristic, and we made statistical inferences using the sampling distribution that was appropriate for that statistic, Now we consider another very useful statistical technique that allows us to measure the relationship be- tween two sets of data obtained from the same sample, or between data from two samples where individuals in the samples have been matched on some basis. For example, this technique permits us to specify-the relationship between the pretest and posttest achievement test scorcs for fourth-grade students; or the ‘motivation ratings and aptitude scores for a group of college freshmen, or the high school grade-point averages and the college senior grade-point averages for a group of students. In such studies, we are not looking for differences between two groups of i stead, we wart to discover to what extent two sets of data are F related, This statistical technique is called correlation. ‘To illustrate its use, sup- Pose we have the following arithmetic and spelling achievernent scores for eight a3 i Tun ghitony 4 [SF students: | 0 ui Arithmetic Spling Uuchievement achievement Stulent scores scores ry 6 é B 4 4 i : c 3 3 D 2 2 E 3 Bos F 5 sas ‘ 6 i 1 =~ H 7 pp To depict graphically the corrslation between te variables on arithmetic achievement aiid spelling achievement, we draw what is called a scalter: diagram, To draw this diagram, which is shown in Figure 11-1, we choose one of the variables, for instance arithmetic achievement, te be represented on the vertical axis, and the other variable, spelling achievement, to be represented on the horizontal axis. Note that in this diagram the arithmetic scores are ranged along the vertical axis with the lowest score placed at the bottom, and the spelling Scores are ranged along the horizontal exis with the lowes! score placed at the left. This is the conventional method for erranging the scores in a scatter dia- gram, and it is consistent with the traditional Cartesian coordinate system, Arithmetic scores Oye esis aes Biches Spetting scores FIGURE 11-1. Scatter diagram of arithmetic scores and spelling 2cofs. Our data indicate that Student A received an arithmetic score of 6 and a spelling score of 6. To plot these two scores for Student A, we must locate ‘both scores in the diagram and make one dot that will represent both scores. fe do this, we locate valué 6 on the arithmetic axis and extend a korizonta! Ene from this position across the diagram. Next we locate volue 6 on the spell- ‘ng axis and extend a line vertically from this position. At the point where these 35 CORRELATION two lines intersect we place a dot. This one dot now represents both scores for Student A, as Figure 11-1 shows, Following the same procedure for each student in the group, we form a scatter diagram of the scores for all the students. We then find that we can draw a slraight line connecting all of these dots in the scatter diagram in-Figure 11-2. This diagram shows that for every increase in score value on one variable, there is a corresponding increase on the other variable. Since this is true for every pair of scores in our data, we conchude that the relationship between arithmetic scores and spelling scores:is perfect. In statistical tenis, we would call this particular relationship a perfect positive correlation. Ik is called perfect because the amount of increase in a score on one variable is exactly proportioné! to the amount of increase in the score on the corresponding variable, with no excep- tions. It is called positive because an- increase in a score on one variable is asso- ciated with an increase in the score on the corresponding variable, Now let’s look at a scatter diagram that depicts a perfect negative correlation Figure 11-2 is a scatter diagram that represents the relationship between the speeds of runners and the amounts of weight they are carying. It is evident from this diagram that the speed of each runner is inversely related to the amount of weight he or she is carrying. Again-in statistical terms, we would say that there is a perfect nepative correlation between these two vatiables, because we can draw a straight line that suns through all of the dots in Figure 11-2. It is a negative correlation because the two vaiiables are inversely related; that is, an increase in a score on one variable is associated with a decrease in the score on the corresponding variable, a maringirosi beban\ eee MARIN Amat = =. Yar 4+ ero A te & 20 ee eater 45678 90nW Weight carried ~~ Se FIGURE 11-2. A perfect negative correlation. To express the relationship betwcen two variables statistically, we must have some numerical index showing the degree of correlation. This index is termed the correlation cosfcient, and its magnitude indicates the degree to which two frequency distributions of data are related, ER 11 Many different correlation techniques are available to the statistician. Which cone is appropriate for use in a particular situation depends on the nature of the data being analyzed. This chapter presents a method for computing one type of correlation cocffic! int—the Perso product-moment correlation coefficient. It is named after its originator. Karl Pearson, and is derived by examining functions ot deviations of values from the “best-fitting” line. The term. moment is taken from the science of mechanics and refers to certain functions of deviations. The symbol for this correlation coeffi tis one of,the more commenly used correlational techniques. To use.it properly, however, we must assume that the variables are linearly related, and that the scores on-ezch variable come from normally distributed populations. If these assumptions cannot be made, this type of correlation analysis is inappropriate and other techniques must be used. (In Charter 19 we examine a correlation technique that can be used for data that do not meet these requirements.) “The coefficient of correlation for the perfect positive correlation shown in Figure i1-3(a) is r= 1.00. The coefficient for the perfect negative correlation showm in Figure 11-3(b) is = —1000. These are the maximum values for r. » i et 2 CG Perfect positive cerrelation Perfect negative correlation r= 100 r= =1.00 @ Moderate Moderate ‘Zers correlation positive correlation negative eorretation r= 00 re 6 re = FIGURE 11-3. Scatter diagrams depicting various levels of correlation. lat F = Gorneraion - Note that the-sign of the comrelation coefficient indicates whether the corre. lation is positive or negative and that the size of the perfect correlation is the same (1.00) regardless of whether itis positive or negative. This is important to remember because a common mistake is to think that a coefficient of r= ~ 1.00 represents no correlation, The coefficient that i idicates no degree of correlation is r = ,00. This condition occurs when scords on one variable are not related in any way to scores on the other variable, Figure 11-3(c) is a scatter Clagram of uncorrelated data, representing the relationships belween the 1Qs of soldiers and their rifle marksmanship scores. : As you may imagine, a perfect correlation betweer| two variables rarely Cccurs, Almost every time a relationship exists between {wo variables, its lese than perfect. In such cases, the coefficient is less than 1.00, For example, r= .B5 indicates that there is a fasly strong positive correlation Hetween two variables, 1a of Indicates that the positive correlation is not ab strong, and r indicates that there is practically no positive correlation, Likewr: indicates a fairly strong negative comelation, and r= +.12 indicates a weak negative correlation. Thus we see that all positive coefficients is relationships and all negative coefficients indicate inverse rel ships, and that the size of the coefficient indicates the strength of the relationship, Figures. 14340) and 11-3(4) depict scatter diagrams showing. moderate positive and negative correlations. : Suppose we obtain 1Qs and reading scores fr a group of students arid prepare the scatter diagram shown in Figure 11-4, We can see that the dots on the diagram tend to lie in a positive direction, although they certainly do not lie in @ straight line. This indicates that the correlation is positive, but less than Perfect. We can compute the corelaton cosfcient for these data in this ease, r=75. 220 ns no 19 105 100 95 nh sty lees gh ifr tal Wa Mw a TD 7S Reading achievement scones FIGURE 11-4, Scatter diagram of IQs and ceading scores. - Lel’s see how we determine the size of a correlation from the scatter of the dots in the diagram. First we draw a straight line through the dots that best represents the linear trend shown in the diagram. This line is positioned in the scatter diagram so that the average distance of the dots from it is as small as possible, as Figure 11-4 shows. If we measure the perpendicular distance of each dot from this line, square each distance, and sum these squared distances, the sum will be smaller than the sum we could obtain by placing the line in any other position in the diagram. A line placed in this fashion is called a best- fitting line. The total of the distances that the dots lie from «his best-fiting tine is.in- ty telated to the size of the correlation coefficient. For example, if the dots are widely scattered, the distances of the dots from the best-fitting line are great and the size of the coefficient is small, Cn the other hand, if the dots deviate very little from the best-ftting line, the coefficient is large. If there is no deviation from the best-fitting life, as in Figures 11-3(a) and 11-3(b), the coefficient is either 1.00 or — 1.00. "ATION OF ‘THE CORRELATION COEFFICIENT We do not need to prepare a scatter diagram to determine the degree of cor- relation between two variables; we have a statistical procedure that allows tis to compute the correlation coefficient directly fiom the dats. However, a scatter diagram is helpful because it gives us a visual indication of the linearity of the relationship and the variability of the data on each variable, Formula 14 presents the formula'for calculating the Pearson product-moment correlation coefficient, Calculation of the Pearson product-moment correlation coefficient. ! Bs NEXY — (EXMEY) VINEX? — EXPINEY? = (EY N = number of pairs of scores | d=N-2 Since we are dealing with two sets of data, we generally assign the-symbel X to the scores on one variable and the symbol Y to the scores on the other variable, The expression NEXY in the numerator of Formula 14 indicates that we obtain the product of each pair of scores, sum these products, and then ‘multiply this sum by the number of pairs of scores (N). To illustrate the application of Formula 14 with a simple example, suppose 4 teacher wishes to determine if the scores fourth-grade students obtain on 2 spelling test are directly related to their reading scores. The teacher obtains spelling scores (X) and reading scores (Y) for 12 fourth-grade students; these 89 scores are shown in Table 11-1. The computation of he correlation coeficient. SEN AEN a oe Gorue it given below the table, TABLE 11-1. Computation of the Pearson product-moment correlation coeffi nt . ‘Speling Reading : ; 1 1 : i : : vere et ==: : co Ex tress Wes Eau Exvesss teeta oo nme cg vinwon ~ GFE EHH peuvent ; To caleulate the Pearson product-moinent correlation ccefficient, you can use ‘ the PEARSON CORRELATION (7 ) program. This program uses Formula 14 for at the tine you run the program, or Prepare a data file on your data disk. If you enter the data a the terminal, you must enter the X seore followed by the YY score, for each individual in the sample ‘As mentioned eater, itis always wise to prepare a fle of the data so that if you need to use the data again you will not have to reenter it. Data files fer use with the PEARSON CORRELATION (7) Program must be prepared using the DATA PILE PREPARATION-—$2 Program. This program requires that you enter both scores for each iulividual betes entering data for the next individual. (You cannot use the DATA FILE PREPARATION-—#1. program io Prepare @ dota file for use with the PEARSON CORRELATICN (2) pro- Bram because the data must be “read” from the Flin pate for cach individual.) ale DAK ¥1LK PUREARATION $2 gmubady guutes tee same" number > of cores on each variable, and you must enter two score for oxte subject in turn. ¥ you have missing data on one variable fora subject de must exclude that subject altogether. Do not use a zero to indicate missing data; a zero is inter- Preted by the computer as a “score” of zero, Thie Program is useful for pre- Paring data files for a number of computer programe here paired data are ‘sed. The program gives us the opportunity to review! the data and correct the entries, if necessary, before filing. | To illustrate the use of the PBARSON CORRELATION (1) program, con sider the following example. To determine the degree of relationship between hostility and aggression in a groupof adolescent boys, a social scientist used ‘ an appropriate personality inventory and obtained the following data, : Husiily 1 Aggression Hostility Aggression 2 2 6 ry “a 2 Case 45 25 “ = > a nay 25 wo oy an a B a ) “6 6 45 2B “a 2s “e 26 a a a 25 a 7 50 2B a 2 4“ 4 Because the social scientist is interested in detecting a relationship between {two variables, the Pearson product-moment correlation is the appropriate stati tical technique to apply to these data. To use the computer program, first prepare a data file of the data ‘give To do this, use program DATA VILE PREPARATION—f2 and enter the dat saving it on your dota disk. Then use the PEARSON CORRELATION (2) pr es, gram with the data file you prepared. ‘The output of this program provides the mean, variance, and standard devi tion for each variable, and it yields a Pearson correlation coefficient of r= -869. This computer program prepares a scatter diagram of the data and pr sents it on the screen for you, if you choese. (The program also sng the coefficient of determination, which we will discuss in a moment, and ‘Aggression scores \ 2B met, eed) 2b ee ape a 40 41 42 43 44 45 46 47 48 49 5D 3 BD Hostility scores FIGURE 11-5. Scatter diagram of hostility scores and aggression scores, on CORRELATION culates the | ratio, degrees of freedom, and probabil later chapters dealing with statistical inference’) This analysis indicates « substantial corzelation bs sion scores in this sample. Figure 11-8 ceding data Note that there is a general positive trend in the cn these two vi y, which are covered in etween hostility and ageres- Presents the scatter diagram of the pre- istribution of the scores les, but that there is some “scatter” to the scores, { magnitude of a correlation coefficient if the relationship between the va Product-moment correlation coefficient tne relationship between the variables. There are other statistical methods for detex mining the correlation betweeri variables that have curvilinear selationshi Also, the correlation between variables with large variances tends to be greater than the correlation between variables that have a curtailed range of values, ‘A word of caution is needed on the interpretation of a correlation eoefScent. Although the coefficient indicates the degree to which two. variables are related. this does not necessarily mean that there is 2 tausal relationship between them. Conelation does not imply that one variable is causing the variation in the other. A simple example illustrates that correlation cannot be interpreted in this way. Suppose we find a correlation between children’s neatness of appearance and their punetuality in arriving at school. By no stretch of the imagination ean we say that being neat causes the children to be on time or that being punchzal causes them to be neat. This relationship :aay actually be caused by a third variable, such as the kind of parental 4V 35% 16 tops 33) S28lcacrs 168) . «1 tOSU ans Hey a2 LOS V4 2 to2y 34 sa 7S, 30642 fobie lee 32 (ae 2 1936. 44) erspelig I228 351665 SOM 22 144 35] 83dsny22 HRP 40] P80 gb 1a Sb 3af47Oposrs Eh selay2[ymya2 B49 43° gH GFHe 28 1bOD. 40960 fsHe4 F64 42lioBPGS13 961 31 YO BA gy38 139,37 (66 fess raph 36] Sure at a 359 334 1 Note: Save these data as a data file to use in exercises in later chapters. a jl Conipala the Bears protuct moment concation cocker nCoe h N= meeaeaa 2G b) Compute the coefficient of determination. What does it indicate about Ths ote (Ey J fuss ( © the reatonship between the variables of mechanical comprehension and 5 fanny divergent thinking in this sample? _2 sig \c) Prepare a scatter diagram of these data. 2.)An elementary school teacher wanted to see if there was a relationship pub) foo 03 between how long it took her pupils to complete a spelling quiz and the (iazer YI accuracy of thet answers. he obtained the fllowing data from the pupil, - (29 inker dus. : pbbi320- Ley 252 a = 6¥F}20 ~30898) 7 35 1 iazeias) % sa Bee 38 é 28 37 se 50 = 36362 Fr 46 ya » a9 a 40 4itot|3 2 30, 43 B- 49 3049 26.562 2) Compute the Pearson product-moment conltion coeffcen. 23 'b) Compute the coefficient of determination. What does it indicate about the relationship between spelling ability and! completion times? Us ©) Prepare a scatter diagram cf the relationship between these two variables = LH for this sample.

You might also like