You are on page 1of 4
2 Chapter Basic Concepts ‘SESS 1s SOMETIING that we are all freed to deal with throughout life. It arises in our daily interactions with those around us, in our interactions with the environment, inthe face of an impending exam, and, for many students, in the realization that they are required to take a statisties course. Although most of us learn to respond and adapt to stress, the learning process is often slow and paingul. This rather grim preamble may not sound like a great way to introduce a course on statistics, but it leads toa deseription ofa practical research project, Which in turn illustrates a number of important statistical concepts. I was involved in a very Similar project a number of years ago, so this example is far from hypothetical. ‘A group of educators has put together a course designed to teach high school students how to manage stress and the effect of stress management on self-esteem. They need an ‘outside investigator, however, who can tell them how well the course is working and, in particular, whether students who lake the course have higher self-esteem than do students ‘who have not taken the course, For the moment we will assume that we are charged with the task of designing an evaluation of their program. The experiment that we design will not be complete, but it will illustrate some of the issues involved in designing and analyz~ ing experiments and some of the slatistical concepls with which you must be familiar 1.1 Important Terms random sample randomly population sample Although the program in stress management was designed for high school students, it clearly would be impossible to apply it to the population of all high school students in the country. First, there are far too many such students. Moreover, it makes no sense to apply a program to everyone until we know whether it is a useful program. Instead of dealing with the entire population of high school students, we will draw a sample of students from that ‘population and apply the program to them, But we will not draw just any old sample. We Would like to draw a random sample, though I will say shortly that truly random samples are normally very impractical if not impossible. To draw a random sample, we would fol- low a particular set of procedures to ensure that each and every element of the population thas an equal chance of being selected. (The common example to illustrate a random sam- ple is to speak of putting names in a hat and drawing blindly. Although almost no one ever does exactly that, itis a nice illustration of what we have in mind.) Having drawn our sam- pile of students, we will randomly assign half the subjects to a group that will receive the stress-management prograt and half to a group that will not receive the program. ‘This description has already brought out several concepts that need further elaboration; ‘namely, a population, a sample, a random sample, and random assignment. A population is the entire collection of events students’ scores, people’s incomes, rats’ running speeds, tc.) in which you are interested. Thus, if you are interested in the self-esteem scores ofall high school students in the United States, then the collection of all high school students’ self-esteem scores would form a population—in this ease, a population of many millions of elements. If, on the other hand, you were interested in the self-esteem scores of high school seniors only in Fairfax, Vermont (a town of fewer than 4000 inhabitants), the popu- Tation would consis of only about 100 elements. ‘The point is that a population can be of any size. They could range from a relatively small sel of numbers, which ean be collected easily, to a large but finite set of numbers, which ‘would be impractical to collet in their entirety, In fat they ean be an infinite set of numbers, such as the set ofall possible cartoon drawings that students could theoretically produce, ‘which would be impossible o collect, Unfortunately for us, the populations we are interested in are usually very large, The practical consequence is that we seldom if ever, measure entire populations. Instead, we are forced to draw only a sample of observations from that popula ‘ion and to use that sample to infer something about the characteristies of the population, random assignment internal validity Section 1.1. Important Terms @ Assuming that the sample is truly random, we not only can estimate certain character- istics of the population, but also can have a very good idea of how accurate our estimates are. To the extent that the sample is not random, our estimates may or may not be meaning- ful, because the sample may oF may not accurately reflect the entire population. Randomness has atleast two aspects that we need to consider. The frst has to do with whether the sample reflects the population to which itis intended to make inferences. This primarily involves random sampling from the population and leads to what is called external validity. External validity refers to the question of whether the sample reflects, the population. A sample drawn from a small town in Nebraska would not produce a valid estimate of the percentage of the U.S. population that is Hispanie—nor would a sarmple drawn solely from the American Southwest. On the other hand, a sample from a small town in Nebraska might give us a reasonable estimate of the reaction time of people to stimuli presented suddenly. Right here you see one of the problems with diseussing random sam- pling. A nonrandom sample of subjects or participants may still be useful for us if we ean convince ourselves and others that it closely resembles what we would obtain if we could lake a trly random sample. On the other hand, if our nonrandom sample is not representa Live of what we would obiain with a truly random sample, our ability to draw inferences is compromised and our results might be very misleading, Before going on, et us clear up one point that tends to confuse many people. The prob- lem is that one person's sample might be another person's population. For example, if were (o conduct a study on the effectiveness of this book as a teaching instrument, one class's scores on an examination might be considered by me to be a sample, albeit a non- random one, of the population of seores of all students using, oF potentially using, this book, The class instructor, on the other hand, is probably not terribly concerned about this book, but instead cares only about his or her own students. He or she would regard the same set of scores as a population. In turn, someone interested in the teaching of statisties ‘might regard my population (everyone using my book) asa very nonrandom sample from a larger population (everyone using any textbook in slatisties). Thus, the definition of a pop- lation depends on what you are interested in studying, Tn our stress study iti highly unlikely that we would seriously consider drawing a truly random sample of U.S. high school students and administering the tress management pro- {gram to them, Its simply impractical o do so. How then are we going to take advantage ‘of methods and procedures based on the assumption of random sampling? The only way that we can do this is to be careful to apply those methods and procedures only when we have faith that our results would generally represent the population of interest. If we can't ‘make this assumption, we need to redesign our study. The issue is not one of statistical re- finement so much as itis one of common sense, To the extent that we think that our sample is not representative of U.S. high school students, we must limit our interpretation of the resulls. To the extent thatthe sample is representative of the population, our estimates have validity The second aspect of randomness concerns random assignment. Whereas random selection coneems the source of our data and is important for generalizing the resulls of our study (o the whole population, random assignment of subjects (once selected) 10 treatment groups is fundamental to the integrity of our experiment, Here we are speaking bout what is called internal validity. We want to ensure that the results we obtain are the result of the differences in the way we treat our groups, not a result of who we hap- pen to place in those groups. If, for example, we pul all of the timid students in our sam= ple in one group and all of the assertive students in another group, it is very likely that four results are as much or more a function of group assignment than of the trealments. we applied to those groups. In actual practice, random assignment is usually far more {important than random sampling, 4 Chapter I Basie Concepts variable independent variable dependent variables discrete variables quantitative data measurement data categorical data frequency data qualitative data Having dealt with the selection of subjects and their assignment to treatment groups, i is time to consider how we treat each group and how we will characterize the data that will result, Because we want to study the ability of subjects to deal with stress and maintain high self-esteem under different kinds of treatments, and because the response to stress is a function of many variables, a critical aspect of planning the study involves selecting the variables to be studied. A variable is a property of an object or event that ean take on dif- ferent values. For example, haie color is a variable because itis a property of an object (hair) and can take on different values (brown, yellow, red, gray, etc). With respect to our evaluation of the stress management program, such things asthe treatments we use, the stu- dent's self-confidence, social support, gender, degree of personal control, and treatment {group are all relevant variables, In satisties, we dichotomize the concept of a variable in terms of independent and de- pendent variables. In our example, group membership isan independent variable, because ‘we control it. We decide what the treatments will be and who will receive exch treatment We decide that this group over here will receive the stress management treatment and that ‘group over there will not, I' we had been comparing males and females we clearly do not ‘control a person's gender, but we do decide on the genders to study (hardly a difficult deci sion) and that we want to compare males versus females. On the other hand the data—such as the resuling self-esteem scores, scores on personal control, and so on—are the dependent ‘variables. Basicaly, the study is about the independent variables, and the resulls of the study (the data) are the dependent variables. Independent variables may be either quantita {ive or qualitative and discrete or continuous, whereas dependent variables are generally, bl certainly not always, quantitative and continuous, as we are about to define those terms ‘We make a distinction between discrete variables, such as gender or high school class, Which take on only a limited number of values, and continuous variables, such a8 age and self-esteem score, which can assume, atleast in theory, any value between the lowest and highest points on the scale As you will ee, this distinction plays an important role in the way we treat data Closely related to the distinction between diserete and continuous variables is the dis- tinction between quantitative and eategorical data, By quantitative data (sometimes called ‘measurement data), we mean the results of any sort of measurement—for example, ‘grades on atest, people's weights, scores on a seale of self-esteem, and so on. In al cases, ‘Some sort of instrument (in its broadest sense) has been used to measure something, and We are interested in “how much" of some property a particular object represents On the other hand, categorical data (also known as frequency data or qualitative data) are illustrated in such statements a study” of “Fifteen people were classed as “highly anxious,’ 33 as ‘neutral and 12 as ‘low anxious.’ Here we are categorizing things, and our data consist of frequencies for each cealegory (hence the name categorical data). Several hundred subjects might be involved in ‘our study, but the results (data) would consist of only two or three numbers—the number ‘of subjects falling in each anxiely category. In contrast i instead of sorting people with e- spect to high, medium, and low anxiety, we had assigned them each a score based on some “There are 34 females and 26 males in our op have dificult remembering whic isthe dependent variable and whichis the independent, ‘ible: Nox hat both “pendent” and “dat” stan with "a" * Actual continuous variable sone e which any valu baweea the extremes ofthe sale (e 32485687...) fn we reat it a srt whenever it can ae om oly 9 few ferent rao, Section 1.2 Descriptive and Inferential Statistics & ‘more or less continuous scale of anxiety, we would be dealing with measurement data, and the data would consist of scores for each subject on that variable. Note that in both situa- tions the variable i labeled anxiety. As with most distinetions, the one between measure- ‘ment and categorical data can be pushed too far. The distinetion is useful, however, and the answer to the question of whether a variable is a measurement or a categorical one is, almost always clear in practice. 1.2 Descriptive and Inferential Statistics Returning to our intervention program for sress, once we have chosen the variables to be ‘measured and the schools have administered the program tothe students, we are left with a collection of raw data—the scores, There are two primary divisions of the field of statisties that are concerned with the use we make of these data descriptive ‘Whenever our purpose is merely to describe a set of data, we are employing descriptive statistics statistics. For example, one of the first things that we would want to do with our data is to graph them, to calculate means (averages) and other measures, and to look for extreme scores or oddly shaped distributions of scores. These procedures are called descriptive sta- tistics hecause they are primarily aimed at describing the data. Descriptive statistics was ‘once looked down on as a rather uninteresting field populated primarily by those who drew distorted-looking graphs for such publications as Time magazine. Twenty-five years ago exploratory data John Tukey developed what he called exploratory statistics, or exploratory data analysis analysis (EDA) (EDA). He showed the necessity of paying close attention to the data and examining them in detail before invoking more technically involved procedures. Some of Tukey's innova tions have made their way into the mainstream of statistics, and will be studied in subse~ quent chapters, and some have not caught on as well. However, the emphasis that Tukey placed on the need to closely examine your data has been very influential, in part because of the high esteem in which Tukey was held as a statistician. After we have described our data in detail and are satisfied that we understand what the numbers have (o say on a superficial level, we will be particularly interested in what is called inferential statistics. In fact, most of this book will deal with inferential statisties. In designing our experiment on the effect of stress on self-esteem, we acknowledged that it ‘was not possible to measure the entire population, and therefore we drew samples from that population. Our basic questions, however, deal with the population itself, We might want ‘oask, for example, about the average self-esteem score for an entire population of students ‘who could have taken our program, even though all that we really have isthe average score for a sample of students who actually went through the prograrn ‘A measure, such as the average self-esteem score, that refers to an entire population is called a parameter. That same measure, when it is calculated from a sample of data that wwe have collected, is called a statistic. Parameters are the real entities of interest, and the corresponding statistics are guesses al reality. Although most of what we will do in this book deals with sample statistics (or guesses, if you prefer), Keep in mind thatthe reality of interest is the corresponding population parameter. We want (o infer something about the characteristics ofthe population (parameters) from what we know about the character~ istics ofthe sample (statistics). In our hypothetical study we are particularly interested in knowing whether the average self-esteem score of a population of students who poten- \ially might be enrolled in our program is higher, or lower, than the average self-esteem. score of students who might not be enrolled. Again we are dealing with the area of inferen- ial statisties, because we are inferring characteristics of populations from characteristics of samples

You might also like