The Homework solutions from Classof1 are intended to help students understand the approach to solving the problem and not forsubmitting the same in lieu of their academic submissions for grades.
Association and Causality
A common pitfall in data interpretation is to mistake association for causality. If two variables areshown to be correlated, this does not in itself establish that there is a causal relationship betweenthe two variables. It may, however, suggest that there is a causal relationship between the two variables and motivate scientists to try to identify a mechanism for a causal relationship.Nevertheless, statistical data analysis in itself cannot establish causality.For example, suppose that after a party some of the participants fall seriously ill. A doctorinterviews all of the people who attended the party and finds out how will they are and how much wine they consumed. Analysis of these data reveals a positive correlation between the level of illness and the amount of wine consumed. Does this prove that the wine was responsible for theillness? The data analysis certainly suggests that the wine may be responsible for the illness, or atleast may be a contributing factor.However, there are other possibilities. Suppose that the salted peanuts at the party are the realcause of the illness. Consequently, the more peanuts a person consumed, the more ill that person islikely to be, so that there is a causal relationship between peanuts and illness with a consequentpositive association. Also, suppose that the more peanuts a person consumed, the thirstier the
person is and so the greater is the person’s wine consumption. Consequently, there is also a positive
association between peanut consumption and wine consumption.The following scenario explains the positive association between wine consumption and illness,even though wine consumption in itself has nothing to do with the illness. In fact, conditional onthe amount of peanuts consumed, wine consumption has nothing to do with illness. Therefore,even though there is a positive correlation between wine consumption and illness, it would beincorrect to use this result to infer that the wine consumption caused the illness.This illustrates how the positive correlation exhibited between wine consumption and illness can beexplained by the presence of a third variable, in these case peanuts, which is positively correlated with both wine consumption and illness. The sample correlation coefficient is a convenient way of