You are on page 1of 67

Chapter 1: Introduction to Research

1.1 Introduction 1.2 Determining a Theory 1.3 Defining Variables 1.4 Developing the Hypothesis 1.5 Standardization 1.6 Selecting Subjects 1.7 Testing Subjects 1.8 Analyzing Results 1.9 Determining Significance 1.10 Communicating Results 1.11 Replication 1.12 Putting it All Together 1.13 Chapter Conclusion

Introduction Research is the cornerstone of any science, including both the hard sciences such as chemistry or physics and the social (or soft) sciences such as psychology, management, or education. It refers to the organized, structured, and purposeful attempt to gain knowledge about a suspected relationship. Many argue that the structured attempt at gaining knowledge dates back to Aristotle and his identification of deductive reasoning. Deductive reasoning refers to a structured approach utilizing an accepted premise (known as a major premise), a related minor premise, and an obvious conclusion. This way of gaining knowledge has been called a syllogism, and by following downward from the general to the specific, knowledge can be gained about a particular relationship. An example of an Aristotelian syllogism might be: Major Premise: Minor Premise: Conclusion: All students attend school regularly John is a student John attends school regularly

In the early 1600s, Francis Bacon identified a different approach to gaining knowledge. Rather than moving from the general to the specific, Bacon looked at the gathering of specific information in order to make general conclusions. This type of reasoning is called inductive and unlike Aristotelian logic allows new major premises to be determined. Inductive reasoning has been adopted into the sciences as the preferred way to explore new relationships because it allows us to use accepted knowledge as a means to gain new knowledge. For example: Specific Premises: Specific Premises: Conclusion: John, Sally, Lenny and Sue attended class regularly John, Sally, Lenny, and Sue received high grades Attending class regularly results in high grades

Researchers combine the powers of deductive and inductive reasoning into what is referred to now as the scientific method. It involves the determination of a major premise (called a theory or a hypothesis) and then the analysis of the specific examples (research) that would logically follow. The results might look something like:

Major Premise: Class Attendance: (Suspected Cause)

Attending classes regularly results in high grades Group 1: Group 2: John, Sally, Lenny and Sue attend classes regularly Heather, Lucinda, Ling, and Bob do not attend classes regularly John, Sally Lenny, and Sue received As and Bs Heather, Lucinda, Ling, and Bob received Cs and Ds

Grades: (Suspected Effect) Conclusion:

Group 1: Group 2:

Attending class regularly results in higher grades when compared with not attending class regularly (the Major Premise or Hypothesis is therefore supported)

Utilizing the scientific method for gaining new information and testing the validity of a major premise, John Dewey suggested a series of logical steps to follow when attempting to support a theory or hypothesis with actual data. In other words, he proposed using deductive reasoning to develop a theory followed by inductive reasoning to support it. These steps can be found in Table 1.1. Table Error! Bookmark not defined..1: Dewey's Scientific Method

The steps involved in the research process can vary depending on the type of research being done and the hypothesis being tested. The most stringent types of research, such as experimental methods (sometimes called laboratory research), contain the most structured process. Naturalistic observation, surveys, and other non-intrusive studies, are often less structured. A general process guide for doing research, especially laboratory research, can be found in Table 1.2. Table 1.Error! Bookmark not defined.: Steps Involved in the Research Process

Determining a Theory While you may see a theory as an absolute, such as the theory of gravity or the theory of relativity, it is actually a changing phenomenon, especially in the soft or social sciences. Theories are developed based on what is observed or experienced, often times in the real world. In other words, a theory may have no additional backing other than an educated guess or a hunch about a relationship. For example, while teaching a college course in research, I notice that non-traditional students tend to be more involved in class lectures and perform better on class exams than traditional students. My theory, then, could be that older students are more dedicated to their education than younger students. At this point, however, I have noticed only a trend within a single class that may or may not exist. I have developed a theory based on my observations and this theory, at least at this point, has no practical applications. Most theories are less concerned with application and more concerned with explanations. For example, I could assume, based on my observations, that older students have witnessed the importance of education through their work and interactions with others. With this explanation, I now have a theoretical cause and effect relationship: Students who have had prior experience in the workforce are more dedicated to their education than students who have not had this experience. Before moving beyond this point it is always wise to do a literature review on your topic and areas related to your topic. Results from this search will likely help you determine how to proceed with your research. If, for example, you find that several studies have already been completed on this topic with similar results, doing yet another experiment may add little to what is already known. If this is the case, you would need to rethink your ideas and perhaps replicate the previous research using a different type of subject or a different situation or you may choose to scrap the study all together.

Defining Variables Variables can be defined as any aspect of a theory that can vary or change as part of the interaction within the theory. In other words, variables are anything can effect or change the results of a study. Every study has variables as these are needed in order to understand differences. In our theory, we have proposed that students exposed to the workforce take a more active role in their education than those who have no exposure. Looking at this theory, you might see that several obvious variables are at play, including prior work experience and age of student. However, other variables may also play a role in or influence what we observed. It is possible that older students have better social skills causing them to interact more in the classroom. They may have learned better studying skills, resulting in higher examination grades. They may feel awkward in a classroom of younger students or doubt their ability more and therefore try harder to succeed. All of these potential explanations or variables need to be addressed for the results of research to be valid. Lets start with the variables that are directly related to the theory. First, the prior work experience is what we are saying has the effect on the classroom performance. We could say that work history is therefore the cause and classroom grades are the effect. In this example, our independent variable (IV), the variable we start with (the input variable) is work experience. Our dependent variable (DV), or the variable we end up with (the outcome variable) is grades. We could add additional variables to our list to create more complex research. If we also looked at the affect of study skills on grades, study skills would become a second independent variable. If we wanted to measure the length of time to graduation along with grades, this would become a second dependent variable. There is no limit to the number of variables that can be measured, although the more variables, the more complex the study and the more complex the statistical analysis. The most powerful benefit of increasing our variables, however, is control. If we suspect something might impact our outcome, we need to either include it as a variable or hold it constant between all groups. If we find a variable that we did not include or hold constant to have an impact on our outcome, the study is said to be confounded. Variables that can confound our results, called confounding variables, are categorized into two groups: extraneous and intervening. Extraneous Variables. Extraneous variables can be defined as any variable other than the independent variable that could cause a change in the dependent variable. In our study we might realize that age could play a role in our outcome, as could family history, education of parents or partner, interest in the class topic, or even time of day, preference for the instructors teaching style or personality. The list, unfortunately, could be quite long and must be dealt with in order to increase the probability of reaching valid and reliable results. Intervening Variables. Intervening variables, like extraneous variables, can alter the results of our research. These variables, however, are much more difficult to control for. Intervening variables include motivation, tiredness, boredom, and any other factor that arises during the course of research. For example, if one group becomes bored with their role in the research more so than the other group, the results may have less to do with our independent variable, and more to do with the boredom of our subjects. Developing the Hypothesis The hypothesis is directly related to a theory but contains operationally defined variables and is in testable form. Hypotheses allow us to determine, through research, if our theory is correct. In other words, does prior work experience result in better grades? When doing research, we are typically looking for some type of difference or change between two or more groups. In our study, we are testing the difference between having work experience and not having work experience on college grades. Every study has two hypotheses; one stated as a difference between

groups and one stated as no difference between groups. When stated as a difference between groups, our hypothesis would be, students with prior work experience earn higher grades than students without prior work experience. This is called our research or scientific hypothesis. Because most statistics test for no difference, however, we must also have a null hypothesis. The null hypothesis is always written with the assumption that the groups do not differ. In this study, our null hypothesis would state that, students with work experience will not receive different grades than students with no work experience. The null hypothesis is what we test through the use of statistics and is abbreviated H0. Since we are testing the null, we can assume then that if the null is not true then some alternative to the null must be true. The research hypothesis stated earlier becomes our alternative, abbreviated H1. In order to make research as specific as possible we typically look for one of two outcomes, either the null or the alternative hypothesis. To conclude that there is no difference between the two groups means we are accepting our null hypothesis. If we, however, show that the null is not true then we must reject it and therefore conclude that the alternative hypothesis must be true. While there may be a lot of gray area in the research itself, the results must always be stated in black and white. More on hypothesis testing will be discussed in chapter 9.

Standardization Standardization refers to methods used in gathering and treating subjects for a specific study. In order to compare the results of one group to the results of a second group, we must assure that each group receives the same opportunities to succeed. Standardized tests, for instance, painstakingly assure that each student receives the same questions in the same order and is given the same amount of time, the same resources, and the same type of testing environment. Without standardization, we could never adequately compare groups. For example, imagine that one group of students was given a particular test and allowed four hours to complete it in a quiet and well lit room. A second group was given the same test but only allowed 30 minutes to complete it while sitting in a busy school lunchroom full of laughing and talking children. If group 1 scored higher than group 2 could we truly say that they did better? The answer is obviously no. To make sure we can compare results, we must make everything equal between the two or more groups. Only then could we say that group 1 performed better than group 2. Standardization of the research methods is often a lengthy process. The same directions must be read to each student, the same questions must be given, and the same amount of time must be assured. All of these factors must be decided before the first subject can be tested. While standardization refers mainly to the testing situation itself, these principles of sameness involve the selection of subjects as well. Selecting Subjects If we want to know if Billy performed better than Sally, or if boys scored higher than girls in our class, or even if Asian children receive higher grades in our school than Caucasian children, the selection of subjects is rather simple. When we are testing the entire population of possible subjects, we are adequately assured that no subject bias has occurred. A population refers to the entire pool of possible subjects. In a classroom or other setting where the entire population is relatively small, testing all subjects may be simple. However, if we are attempting to understand or gain knowledge related to a large population, such as all third grade children, all depressed adults, or all retail employees, gathering and testing everyone would be relatively impossible. In this situation, we would need to gather a sample of the population, test this sample, and then make inferences aimed at the entire population of which they represent. When determining which potential subjects from a large population to include in our study there are several approaches to choose from. Each of these sampling techniques have its own strengths and, of course, its own weaknesses. The

idea behind adequate sampling, however, remains the same: to gather a sample of subjects that is representative of the greater population. The ideal research sample is therefore often referred to as a representative sample. Simple Random Sample. To assure that the sample of subjects taken from a known population truly represents the population, we could test every subject in the population and choose only those who fall around the mean of the entire population. This technique is usually pointless because doing so means we could just as easily have tested the entire population on our independent and dependent variables. Therefore in order to make sure all possible subjects have an equal opportunity to be chosen, simple random sampling is most often the selection method used. To choose a random group of 10 students from a class of 30, for example, we could put everyones name in a hat and use the first ten names drawn as our sample. In this method, subjects are chosen just as B6 is chosen in a game of BINGO. This technique can work well with a small population but can be time consuming and archaic when the population size is large. To choose 30 students from a class of 250 students would be easier utilizing technology and what is referred to as a random number table. A random number table is a computer generated list of numbers placed in random order. Each of the 250 students would be randomly assigned a number between one and 250. Then the groups would be formed once again using a random number generator. Figures 1.1 and 1.2 provide examples of how subject selection and subject assignment to groups might be determined based on this method. Figure 1.Error! Bookmark not defined.: Random Number Table

Figure 1.Error! Bookmark not defined.: Random Assignment of Subjects to Groups

Systematic Sample. When a population is very large, assigning a number to each potential subject could also be tiresome and time consuming. A systematic sample is a random sample compiled in a systematic manner. If you had a list of all licensed teachers, for example, and wanted to mail a survey to 200 of them, systematic sampling might be the sampling method of choice. For this example, a page and a teacher number on that page are determined at random. This would represent the first subject and the starting point for choosing the remaining subjects. A random number would be generated, for example 150. Then every 150 teacher would become a subject until you have selected enough for your study. If you complete the list before selecting enough subjects, you would continue back at the beginning of the list. Once the subjects are selected, the technique of random assignment can be used to assign subjects to particular groups. Stratified Random Sample. The use of a stratified sample refers to the breaking down of the population into specific subsets before choosing which ones will take part in the study. For example, if you are studying all third grade students in your state, you may want to make sure that every county in your state is represented in your study. If you used a simple random sampling technique, you could conceivably end up with many subjects from one county and no subjects from other counties. A stratified sample allows you to choose your subject pool randomly from a predetermined set of subsets. In this example, we may want to choose 10 subjects at random from each county within the state. Other subsets can also be used, such as age, race, or socioeconomic background. If you wanted to make sure that there were an equal number of males and females, you could use sex as your subset and then randomly choose the same number of subjects from each subset. This type of sampling is useful when the population has some known differences that could result in different outcomes. For instance, if you already know that 80% of the students are male, you may want to select 40 male students and 10 female students so that your sample represents the breakdown of sex within the population. Cluster Sample. Cluster sampling could be considered a more specific type of stratified sample. When this technique is used, potential subsets of subjects are first randomly eliminated and then the remaining subsets are used to randomly select the sample of subjects to be used in the study. For example, if you are measuring the effect of prior work experience on college grades in a particular state, you may first make a list of all colleges in the state. Then you would randomly select a number of colleges to either include or eliminate in the selection process. Once you have a subset of colleges, you could use the same technique to randomly include or eliminate the specific classes. From the remaining classes, you would then randomly select a group of students with work experience and a group of students with no work experience to be placed in your two groups. Nonprobability Sample. Nonprobability refers to a group of subjects chosen based on their availability rather then their degree of representativeness of the population. Surveys are often done in this manner. Imagine going to the local mall to gather information about the buying habits of mall shoppers. Your subject pool does not represent all mall shoppers but rather those mall shoppers who happen to walk by your location on that day. The same would hold true for a survey over the phone or via mail. Those who respond to your questions or return the mailed survey do not necessarily
th

represent the population at large. Instead, they represent the population who was home and was willing to respond to your questions or those who took the time to complete and return the survey. While at first glance this method seems unprofessional, it allows for the gathering of information in a short amount of time. It is not considered standardized research and would be scrutinized if submitted to a professional journal, but it does have its place. If youve ever visited a website and seen a survey, you might have felt compelled to click on the results link. When watching a news program you may have not changed channels because you are waiting for the results of a survey that will be reported at the end of the program. We are highly interested in these informal polls and using a nonprobability sample is a quick way to gather large amounts of information in a relatively short amount of time. Testing Subjects Once you determined your variables, applied the concept of standardization, and selected your subjects, you are almost ready to begin the testing process. The concept of testing refers to the application or analysis of your independent and dependent variables. If there is any manipulation of the subjects in your study, it occurs during this phase. Before testing any human subject, however, some type of consent form is necessary. Consent forms basically describe the study, how the results will be used, and any possible negative effects that may occur. They also give the subject the right to withdrawal from the study at any time without consequence. Your specific hypothesis does not need to be disclosed but each subject must be made aware of any general concerns and be able to ask questions before testing can begin. More on consent forms will be discussed in chapter 2. If your hypothesis, for example, asked if there is a difference in effectiveness of different treatments for depression, you might assign your subjects to one of several different groups: cognitive therapy, dynamic therapy, humanistic therapy and possibly no therapy. Each subject would likely be tested prior to the study to determine a baseline for his or her level of depression and would then begin a predetermined and standardized treatment plan. Because you are standardizing your study, each subject should get identical treatment short o the independent variable. In other words, the only thing you want to be different between the groups is the type of therapy received. The no therapy group would be considered a control group and may participate in some type of non-therapy related activity while the other subjects receive therapy. This group is used to determine if time plays a role in the decrease of depressive symptoms. Without a control group you couldnt say that any particular therapy was more helpful than no therapy because subjects may have improved merely because of some outside factor unrelated to treatment. If you recall, these factors are called extraneous variables, and control groups, along with randomization, help to keep the impact of these variables to a minimum.

Analyzing Results The specific analysis performed on the subjects depends on the type of subjects, the type of questions being asked, and the purpose of the research. When we gathered and tested all possible subjects from a known population we would use descriptive statistics to analyze our results. Descriptive statistics require the testing of everyone in the population and are used to describe the qualities of the population in numerical format. For example, we could say that the mean score on an IQ test for all third graders at Jefferson High School is 102. We could also state that there is no difference between the IQs of boys and girls within our subjects if the data support these conclusions. When we are using a sample of subjects smaller than the entire population, we must make some inferences using what we call inferential statistics. Like any inferences, we also assume a certain degree of error when making determinations about a population based on a sample of that population. Because of this, the results of inferential

statistics are often stated within a predetermined level of confidence. If we found that the mean of one group was 10 points higher than the mean of a second group in our work experience and college grades study, we could not assume that the population means are identical. We could, however, state that the means of the entire population are likely to differ by five to 15 points or that there is a 95% probability that the means of the entire population differs by ten points. In this sense, we are predicting the scores of the entire population based on the scores of our sample and stating them within a range or a predetermined level of confidence. This allows us to include the likely error that occurs whenever an inference is made.

Determining Significance The term significance when related to research has a very specific role. Significance refers to the level of certainty in the results of a study. We can say that our subjects differed by an average of ten points with 100% certainty because we personally witnessed this difference. To say that the population will differ is another story. To do this, we must determine how valid our results are based on a statistical degree of error. If we find, through the use of inferential statistics, that the grades of those with and without work experience are different me must state the estimated error involved in this inference. While the standard acceptable error is 5%, it can be as high as 20% or as low as 0.1%. The amount of error to be accepted in any study must be determined prior to beginning the study. In other words, if we want to be 95% confident in our results, we set the significance level at .05 (or 5%). If we want to be 99% confident, our significance level is set at .01. We can then state that there is a difference in the population means at the 95% significance level or at the 99% significance level if our statistics support this statement. If our statistics estimate that there is 10% error and we said we would accept only 5%, the results of our study would be stated as not significant. When determining significance, we are saying that a difference exists within our acceptable level of error and we must therefore reject the null hypothesis. When results are found to be not significant, the only option available is to accept the null hypothesis. Communicating Results Results of a study are disseminated in many forms. The highest level of communicating results is often in a peerreviewed professional journal. Peer-reviewed refers to a group of professionals in a particular field who read all submissions and publish only those that meet the highest degree of scrutiny and applicability. When errors are found in the sampling of subjects, the statistical analysis, or the inferences made, the study will often be rejected or returned to the author for revisions. Published articles in peer-reviewed journals would likely be the best source for research when you begin looking into your theory. Results of research studies are also disseminated through textbooks, book chapters, conferences, presentations, and newsletters. For example, a study comparing the average salary in a particular county might be published in the local newspaper or in a brochure for the chamber of commerce. Our study of non-traditional students and work experience might be summarized in a board meeting of the colleges department of student retention or published in a trade journal such as the Journal of Higher Education. Some studies are never released, especially if the results do not add to the already available research. Other studies are meant only to provide direction for larger studies. Our study of college students may be used only to determine if a larger study is likely to result in important findings. If we get significant results then a larger study, including a broader

subject pool, may then be conducted. These types of studies are often called pilot studies because the goal is not to gather knowledge about the population, but rather to guide further research in a particular area. Replication Replication is the key to the support of any worthwhile theory. Replication involves the process of repeating a study using the same methods, different subjects, and different experimenters. It can also involve applying the theory to new situations in an attempt to determine the generalizability to different age groups, locations, races, or cultures. For example, our study of non-traditional students may be completed using students from another college or from another state. It may be changed slightly to add additional variables such as age, sex, or race to determine if these variables play any role in our results. Replication, therefore, is important for a number of reasons, including (1) assurance that results are valid and reliable; (2) determination of generalizability or the role of extraneous variables; (3) application of results to real world situations; and (4) inspiration of new research combining previous findings from related studies. Putting it All Together Toward the beginning of this chapter we asked the question, do college students with work experience earn better grades than those without work experience. Knowing the steps involved in doing research and now having a basic understanding of the process, we could design our experiment and with fictional results could determine our conclusions and how to report our findings to the world. To do this, lets start with our theory and progress through each of the ten steps. Step 1: Determining a Theory. Theories are developed through our interaction with our environment. For our particular theory, we observed that older college students tend to perform better on classroom tests than younger students. As we attempt to explain why, we developed our theory that real world work experience creates a motivation in students that allows them to perform better than students without this motivation. Our theory, therefore, states that prior work experience will result in higher grades. Step 2: Defining Variables. Every experiment has an independent and a dependent variable. The independent variable (IV) is what we start with; it refers to the separation of our groups. In our case, we want to look at prior work experience so the presence or absence of this would constitute our experimental groups. We may place those students who have been in the work force for more than one year in group 1 and those with less than one year in group 2. Our dependent variable is our outcome measure so in our case we are looking for a difference in class grades. To operationally define the variable grades, we might use the final course average as our outcome measure. If the independent and dependent variable(s) are difficult to determine, you can always complete the following statement to help narrow them down: The goal of this study is to determine what effect _________ (IV) has on _________ (DV). For us, the goal is to determine what effect one year or more of prior work experience has on course average. Step 3: Determining Hypothesis. When we plug our variables into our original theory we get our research hypothesis. Simply stated, Students with one or more years of prior work experience will receive higher final course averages than students with less than one year of prior work experience. Since statistical analysis often tests the null hypothesis or the idea that there is no difference between groups, our null hypothesis could be stated as: Final course averages of

students with one or more years of prior work experience will not differ from final course averages of students with less than one year of prior work experience. Step 4: Standardization. To make sure that each subject, no matter which group they belong to, receives the same treatment, we must standardize our research. In our case, we are looking at final course averages so we must make sure that each student receives the same instruction, the same textbook, and the same opportunities to succeed. While this may be difficult in the real world, our goal is to get as close as possible to the ideal. Therefore, we may choose to gather subjects from a general psychology class since this is a class required of most students and will not be affected by college major. We may further decide to research only those students who have a specific instructor to keep the instruction between the two groups as similar as possible. Remember, our goal is to assure, at least as much as possible, that the only difference between the two groups is the independent variable. Step 5: Selecting Subjects. Because our population consists of all college students, it will be impossible to include everyone in the study. Therefore we need to apply some type of random selection. Since we want to use only those students who have the same instructor, we may ask all of this instructors students, prior to any teaching, how much work experience they have had. Those who report a year or more become the potential subject pool for group 1 and those who have less than one year become the subject pool for group 2. We could, at this point decide to include all of these subjects or to further reduce the subjects randomly. To reduce the subject pool we could assign each student in each group a random number and then choose, at random, a specific number of students to become subjects in our study. For the purpose of this example, we will randomly choose 20 students in each group to participate in our study. Step 6: Testing Subjects. Since we are not applying any type of treatment to our subjects, this phase in the procession can be omitted. If we were determining if the teaching styles of different instructors played a role in grades, we would randomly assign each student to a teacher. In that case, teaching style would become an independent variable in our study. Step 7: Analyzing Results. Our original question asked if final averages would be different between our two groups. To determine this we will look at the mean of each group. Therefore, we will add up the averages of the 20 subjects in each group and divide each of these by 20 (representing the number of subjects in each group). If, after comparing the means of each group, we find that group 1 has a mean of 88 and group 2 has a mean of 82 then we can descriptively state that there is a six-point difference between the means of the two groups. Based on this statistic, we would then begin to show support for our alternative hypothesis and can progress to the next step. Step 8: Determination of Significance. Our goal was not to describe what their averages were, but rather to make inferences about what is likely happening in the entire population. We must therefore apply inferential statistics to our results to determine the significant or lack of significant findings. We will set our confidence level at 95 percent and then

apply statistical analysis to our results to see if the difference of six points with a sample size of 40 is significant. Imagine that we did find a significant difference. In this case we could say that with a 95% confidence level, students with one year of more work experience receive higher averages than those with less than one year of work experience. Since the null hypothesis, which stated that no difference exists between the two groups, was not correct, we must reject it. And by rejecting the null, we automatically accept our alternative hypothesis. Step 9: Communicating Results. When communicating the results of our study we need to do several things. We need to make a case for why we did this research, which is often based on our literature search. We then need to report the process we took in gathering our sample and applying the treatment. We can then report our results and argue that there is a difference between the two groups and that this difference is significant enough to infer it will be present in the entire population. Finally, we must evaluate our research in terms of its strengths, weaknesses, applicability, and needs for further study. In terms of strengths, we might include the rigors of gathering subjects and the fact that we used a random sample of students. We may argue that the statistical methods used were ideal for the study or that we considered the recommendations of previously completed studies in this area. Weaknesses might include the small sample size, the limited pool from which our sample was gathered, or the reliance on self-reported work experience. To discuss applicability and needs for further studies we could suggest that more studies be completed that use a broader base of subjects or different instructors. We could recommend that other variables be investigated such as student age, type and location of college, family educational history, sex, race, or socioeconomic background. We might even suggest that while our findings were significant they are not yet applicable until these other variables are investigated. The sections of a research report and how to write this report in order to communicate results is the main focus of chapter 2. Step 10: Replication. The final step in any research is replication. This can be done by us but is most often completed by other researchers based on their own review of the literature and the recommendations made by previous researchers. If others compete a similar study, or look at different variables and continue to find the same results, our results become stronger. When ten other studies agree with ours, the chances are greatly improved that our results were accurate. If ten other studies disagree with our findings then the validity of our study will be, and most certainly should be, called into question. By replicating studies and using previously gained knowledge to search for new answers, our profession continues to move forward. After all, we used the ideas of other researchers to design our research, and future researchers may incorporate our findings to make recommendations in their research. The cycle is never ending and allows for perpetual seeking of new knowledge. Chapter Conclusion The process of research can be painstakingly time consuming. It can involve the overcoming of many obstacles and may unfortunately need to be revised several times as you progress through the steps. By completing your study in the correct order and making sure you dont forget important tasks, your progression from theory to publication will occur much more smoothly. For this reason, most graduate programs require that you work under the supervision of an experienced researcher for a number of years before beginning your own independent study. The final project in any

Ph.D. program will be a dissertation, which is a culmination of your knowledge in the subject matter and your ability to do research that adds to the knowledge base in your field. This book will look at each of the areas discussed in this chapter in more detail and provide an overview of research methods. The goal is to give you a solid understanding of the different types of research, ideas for completing your own research, and a method for avoiding a tragic ending to a graduate career. Studies have shown that a large percentage of doctoral students complete their coursework but leave school prior to completing the required research. The reasons include running out of time, failing to progress in a logical order, becoming discouraged with obstacles, and simply fearing the research process. By having a solid understanding of research methods and statistical inference, your chances of completing a research project are greatly enhanced. Remember that nobody knows everything about doing research and that asking questions and getting advice along the way is not only accepted, it is highly recommended. Look at research as a global phenomenon and prepare for the whole gestalt of your project but always make sure you are proceeding in a logical and organized fashion. If you work hard and work smart, youll soon be published and will be adding to the knowledge base in your specialty area. Once published, you are considered an expert and your research may someday appear in a college textbook or as a resource in someone elses research publication. Because this text is designed to focus on the methods of research, a basic understanding of statistics is assumed. As a refresher, however, and to better critique the results section of a research report, a discussion of descriptive and inferential statistics is included. During the course of reading this text, it may be wise to refer to these sections to clarify any statistical information presented in earlier chapters. By the end of this text, you should have a solid understanding of research methods and be able to intelligently analyze and critique a research report.

Chapter 2: The Research Report


2.1 Introduction 2.2 How the Research Report is used 2.3 Gathering of Information 2.4 Forming the Hypothesis 2.5 The Research Report 2.6 Chapter Conclusion

Introduction While the final research report falls toward the end of the research steps discussed in the last chapter, the process of writing the report actually starts in the first step. As part of refining your theory, a literature review is necessary and therefore any discussion about research should begin with the research report. While you search for and read through related articles, you will also begin to see trends in the research, topics that are addressed, and areas in need of further study. This process will help you gain a better idea of what has been done, what knowledge base is out there, and on what topics previous researchers recommend further study.

How is The Research Report Used While each research project and each researcher has different ideas about the importance and application of a research project, most agree that there are three distinct purposes of gathering information. The first, and perhaps most applicable to professional journals and academia, is Fundamental Research. When a theory is developed and in turn the related hypothesis is tested to determine support for the theory, we are gathering fundamental knowledge about a particular topic area. The purpose of fundamental research is to gather information and

advance a particular field of knowledge in order to better understand it and eventually apply what we learn. Applied Research, the second of the three, relates to the actual application of the fundamental knowledge. If we completed a project that supported the idea that work experience increases college performance, we may then want to apply this new knowledge to the college population. An example of applied research, using this example, might include a work program for high school seniors or college freshmen where students would be placed in one of two groups: completed work program and no work program. By taking a baseline of performance for each of these groups, we could then measure their subsequent performance to determine if the work program increased, decreased or had no effect on their college performance. The purpose of applied research, hence, is to improve or develop new programs and initiatives that will ultimately improve the lives of individuals. The final type of research is less formal than the first two. Action Research often takes place within an individual classroom or department. Teachers, managers, or team leaders are typically involved in the research process and the results are usually not intended to be generalized to a larger population. For example, if the performance of your class is below expectations, you may implement different techniques aimed at improving grades, attendance, or participation. As you implement these new techniques you may begin to see a trend in the classroom. Technique one may reduce male participation but increase female participation; technique two may increase both but reduce grades; technique three may decrease both but increase class attendance and participation. Based on results from these applications, the teacher, manager, or team leader can redesign activities, policies, and techniques in order to improve performance. Action research is less rigid in that there is rarely ever a control group, the procedures are not necessarily standardized, the outcome is often subjective and the intent is not to publish. On the other hand, action research can be a highly effective way to improve situations within a specific classroom or department. Gathering of Information To understand what has been done and what is needed in your particular area of interest, a review of the literature is needed. This review most often starts at your university library although with the advancements in technology many databases are available entirely online through your university or as a fee-based service through a journal publisher. It is important to have an understanding of the topic that you are searching since you will likely use key words related to your topic. For our study on work experience and grades, we would likely begin our search by using key words or key phrases such as: College achievement Non traditional students College grades Success in college Motivation and grades Age and college success

These phrases help narrow down your search but could also result in articles that are not elated or dont seem to be related based on the title or the abstract. The results you find may also cue you into other search terms that you hadnt originally associated with your topic. It may also be wise to include such phrases as meta analysis or literature review as these articles are most likely to have an extensive works cited section and can usually shorten your search time. While your librarian can help you determine which search engines to use, Table 2.1 lists several important databases that may be good places to start. Table 2.1: Research Databases

Once you gather the articles you want to look into further, the next step is to gain access to the full text of each article. This can be a simple project at some universities or may be painfully arduous. Some databases contain the full text of the article, which makes it easy to read or print. Others times you can simply make copies of hard copies of the journal on your library shelves. If your library doesnt carry the article or the journal, you can often request it through a library exchange program or you may need to order it, especially if you are looking for a dissertation, from the publisher at a nominal fee.

Forming the Hypothesis The entire process of gathering the published information may be quite time consuming and may require multiple trips to the library and a lot of time reading, highlighting and making notes. During this process you should start thinking about your particular study. Make notes as to what previous researchers are recommending and start to organize the articles into categories. Create a category for rejected articles, those that do not relate to your topic, but dont throw them away just yet. Make a category for meta-analysis or literature review articles to be used for summarizing and finding other research. And finally, make a category for articles that appear to relate to your topic. This last one can then be separated into different sections that will be helpful when writing the literature review of your report. For our study, such sections might include: biographical data on non-traditional students, differences in college grades, motivation for learning, or post-career education, to name only a few examples. Through the process of reviewing these articles, it is likely that you will find new articles of interest and need to request copies of the full text. This second round can then be incorporated into both your notes and article organization and may, in turn, result in yet a third request for copies. As you can see, there is no clear way to get all the articles you need from a single library search or a single database. Multiple trips will almost always be warranted and you may spend weeks or even months completing the whole process. During this time, however, you will gain a great deal of knowledge about your topic and will be able to fine-tune your theory in order to develop the hypothesis that you will eventually test. You may find that your original theory has been well tested so completing another study would not add additional information. However, you may also find, through your own deduction or through the researchers recommendations, a new path that is both needed and which interests you as a researcher.

Spend the time reviewing the literature wisely as doing so can prevent major headaches in the future. If you understand what pitfalls other researchers ran into you can avoid them before they interrupt your study. Also begin to look at feasibility during this process. If your idea is to place students in particular classes you will likely run into problems from both college administrators and potential subjects. If you need background information from subjects, make sure you will be able to gather it without jumping through too many hoops. In other words, do your best to fit the research needs to the practical limitations you will always be forced to deal with. The Research Report As you read through the large number of published studies, you will likely notice that the reports tend to follow both a topical pattern and a style of writing. Most professional journals require both in order to maintain consistency within the journal and to assure that information is organized in an understandable fashion. Imagine reading an article that begins with the actual experiment but in the middle begins to justify why the experiment was conducted. The article then reviews a limited number of articles and then jumps back to a discussion of how they chose their subjects. In the middle of all of this is a running critique of their methods along with haphazardly placed recommendations. Finally, the report ends with a list of articles but no reference is found as to the importance or use of these articles in the current study. This type of report is likely to be discarded well before the end. If it had contained important new knowledge this information never made it to its intended destination. The journal would likely not publish it and had it gotten published, would have frustrated the reader to the point of confusion and disregard. Therefore, we follow a specific writing style to avoid this type of mess. And, while following a style may seem time consuming and frustrating in itself, it helps assure that your newfound knowledge makes its way into the world. The American Psychological Association [APA] has developed what is the most well known and most used manual of publication style in any of the social sciences. The most recent version was published in 2002 and marks the fifth edition. While the text is somewhat daunting at first glance, the style does assure that your knowledge will be disseminated in an organized and understood fashion. Most research reports follow a specific list a sections as recommended by this manual. These sections include: Title Page, Abstract, Introduction, Methods, Results, Discussion, References, Appendices, and Author Note. Each of these areas will be summarized below, but for any serious researcher understanding the specifics of the APA manual is imperative. Title Page. The title page of a research report serves two important functions. First, it provides a quick summary of the research, including the title of the article, authors names, and affiliation. Second, it provides a means for a blind evaluation. When submitted to a professional journal, a short title is placed on the title page and carried throughout the remainder of the paper. Since the authors names and affiliation are only on the title page, removing this page prior to review reduces the chance of bias by the journal reviewers. Once the reviews are complete, the title page is once again attached and the recommendations of the reviewers can be returned to the authors. Abstract. The abstract is the second page of the research report. Consider the abstract a short summary of the article. It is typically between 100 and 150 words and includes a summary of the major areas of the paper. Often included in an abstract are the problem or original theory, a one or two sentence explanation of previous research in this area, the characteristics of the present study, the results, and a brief discussion statement. An abstract allows the reader to quickly understand what the article is about and help him or her decide if further reading will be helpful. Introduction. The main body of the paper has four sections, with the introduction being the first. The purpose of the introduction is to introduce the reader to the topic and discuss the background of the issue at hand. For instance, in our article on work experience, the introduction would likely include a statement of the problem, for example: prior work experience may play an important role in student achievement in college.

The introduction also includes a literature review, which typically follows the introduction of the topic. All of the research you completed while developing your study goes here. It is important to bring the reader up to date and lead them into why you decided to conduct this study. You may cite research related to motivation and success after college and argue that gaining prior work experience may delay college graduation but also helps to improve the college experience and may ultimately further an individuals career. You may also review research that argues against your theory. The goal of the introduction is to lead the reader into your study so that he has a solid background of the material and an understanding of your rationale. Methods. The methods section is the second part of the body of the article. Methods refers to the actual procedures used to perform the research. Areas discussed will usually include subject recruitment and assignment to groups, subject attributes, and possibly pretest findings. Any surveys or treatments will also be discussed in this section. The main point of the methods section is to allow others to critique your research and replicate it if desired. The methods section is often the most systematic section in that small details are typically included in order to help others critique, evaluate, and/or replicate the research process. Results. Most experimental studies include a statistical analysis of the results, which is the major focus of the results section. Included here are the procedures and statistical analyses performed, the rationale for choosing specific procedures, and ultimately the results. Charts, tables, and graphs are also often included to better explain the treatment effects or the differences and similarities between groups. Ultimately, the end of the results section reports the acceptance or rejection of the null hypothesis. For example, is there a difference between the grades of students with prior work experience and students without prior work experience? Discussion. While the first three sections of the body are specific in terms of what is included, the discussion section can be less formal. This section allows the authors to critique the research, discuss how the results are applicable to real life or even how they dont support the original theory. Discussion refers to the authors opportunity to discuss in a less formal manner the results and implications of the research and is often used to suggest needs for additional research on specific areas related to the current study. References. Throughout the paper and especially in the introduction section, articles from other authors are cited. The references section includes a list of all articles used in the development of the hypothesis that were cited in the literature review section. You many also see a sections that includes recommended readings, referring to important articles related to the topic that were not cited in the actual paper. Appendices. Appendices are always included at the end of the paper. Graphs, charts, and tables are also included at the end, in part due to changes that may take place when the paper is formatted for publication. Appendices should include only material that is relevant and assists the reader in understanding the current study. Actual raw data is rarely included in a research paper. Author Note. Finally, the authors are permitted to include a short note at the end of the paper. This note is often personal and may be used to thank colleagues who assisted in the research but not to the degree of warranting coauthorship. This section can also be used to inform the reader that the current study is part of a larger study or represents the results of a dissertation. The author note is very short, usually no more than a few sentences. Chapter Conclusion We perform research in order to learn new information, to confirm previous research, or to explore new possibilities. Research is meant to be shared with others so that the members of each profession can grow together and continue to move forward. Imagine if the inventor of the wheel kept his new knowledge secret. What if every teacher in a school district did the same project with their classroom and found it to be a negative experience but never shared it with any other teacher?

In order to share this knowledge, a standardized method of disseminating the information is required, and often this method is the research report. Although we have not yet discussed the various types of research or the specifics of any statistical technique, having an idea of what your ultimate project will look like can only serve to improve the process. The research report is designed to be standard, utilizing the APA manual as a guide, parsimonious in that excess unnecessary material should not be included, and professional. When performing research and writing the report, researchers are expected to follow strict ethical guidelines as directed by their professions. The APA ethical guidelines can be found on the Internet (http://www.apa.org) and are often used as a template for other professions. Any good researcher will read these ethical guidelines, understand them, and do everything in his or her power to adhere to them. Researchers must also be aware of policies and guidelines developed by the college or organization that is sponsoring the research or paying their salaries. Finally, when writing your research report, always remember to be polite, particular, and parsimonious. Keep your review of others research polite and focused on the methods or results, not the researcher. Be aware of and discuss both the pros and cons of the studies but remember that future researchers will be doing the same critique of your study. Be particular in your choice of articles to include and make sure that they are relevant to your study. Discuss your methods and results in a matter of fact fashion. And finally, be parsimonious in your writing. Keep it as simple as possible, dont use big words just to appear more academic, and avoid the inclusion of fluff and fillers just to increase your papers length. While a typical submission for publication in a journal is between 15 and 40 double spaced pages, some of the best research reports are as short as four or five.

Chapter 3: Research Tools of the Trade


3.1 Introduction 3.2 Hardware Tools of Research 3.3 Software Tools of Research 3.4 Method Tools of Research 3.5 Chapter Conclusion

Introduction Every profession and professional activity has tools that help improve techniques and assure a quality product. Research is no exception. The tools of the trade are often classified into three categories: hardware, software, and methods or knowledge. Hardware tools may include hammers, copy machines, trucks, computers, or cell phones. Software tools relate to computer programs such as word processing or data base programs, and changeable forms such as written tests, worksheets, or rubrics. The methods of a profession refer to knowledge and understanding of the procedures involved. For example, when a medical professional performs a CAT scan on a patient, the machine they use is considered hardware, the computer program used to collect and analyze the data is the software, and the use of the machine and software as well as the interpretation of results and application to treatment is the methods. All three types of tools are typically necessary to every profession, with research being no exception. Hardware Tools of Research The most commonly used hardware tool in research is the computer. We use this device in order to run software, to measure changes in the autonomic nervous system, and to detect brain damage and other physical problems. Other instruments that might be seen in social science research might include a device to measure reflexes or muscle strength, or the hardware, such as blocks and puzzles, associated with an intelligence test. Hardware provides us with a means to interact with our subjects and therefore gather information on their performance. Software Tools of Research The software tools of research are typically more abundant than hardware tools in the social sciences. Software is usually thought of to mean computer programs that tell the hardware what to do, but any tool not related to a physical

device can be considered software. Included in this category is statistical software, consent forms, published tests, questionnaires, observation forms, and, to a lesser degree, the interview. Statistical Software. Simple statistical problems, such as determining the mean or the median of a small data set, can easily be done with a calculator. Most formulas that will be used in a research report, however, are a lot more complex. While a calculator will work, a statistical program can reduce the computation time by hours, days, or even weeks. Imagine trying to determine the mean, standard deviation, t-score, and z-score conversions of twelve data sets each containing 300 subjects. Even the best statistician will spend many hours on this project that could be done by a computer in a matter of minutes once the data is entered. The most widely used statistical software used for social science research is the Statistical Package for the Social Sciences (SPSS) and is relatively easy to use if you have basic computer knowledge. SPSS can perform hundreds of statistical computations and even graph your data. Another program, SAS, also performs these functions and is gaining popularity with many researchers. Both, however, can be expensive to purchase so it would be wise to use your schools software or look into a student version. Consent Forms. The consent form is a necessity for anyone doing research with human subjects (see Figure 3.1). The purpose of the consent form is to provide information to the potential subject regarding the experiment or study and to answer questions regarding their participation. Consent forms should always include both potential benefits and harm that may result from participation as well as the option to quit the study at any time without repercussions. Table 3.1 lists ten important areas that should be included in a consent form. The title of the study, purpose, and researchers names and affiliation are often included toward the top of the form. Contact information, such as phone numbers, should also be included. The bulk of the consent form involves the specific procedures that will be used, how the information gathered during the study will be used, and the expected experience that the subject will endure if he or she agrees to participate. Procedures and Requirements. Before a potential subject can consent to any study, especially those involving invasive techniques, he or she must be made aware of the process that will take place. The procedures and requirements section explains to the subject what he or she will experience during the study. If a survey is involved, this section might inform the subject that they will be asked to respond to questions related to their past work experience or their college grades, for example. Figure 3.1: Sample Statement of Informed Consent

Statement of Voluntary Consent To participate as a subject in the study described below

Date: January 14, 2004 Name of Study: The effects of work experience on college performance Purpose of Study: To better understand the role of work experience and college performance and to make recommendations to the college administration regarding the college work study program. Primary Researcher(s): Dr. Christopher L. Heffner , [Affiliation]

Contact Information: [Include office phone, school or organization address, and information regarding any committees on ethical research within your organization] As a volunteer participant in the above mentioned research, I understand that I will be asked to complete a survey that will ask questions related to my work experience and college grades. The survey typically takes about 20 minutes to complete although this time can vary depending on each subject. I also understand that I may consider some of the questions personal in nature but that the information I provide will be used exclusively for this project and will in no way be associated with my name, address, student ID or any other identifiable information. As a participant in this study I am aware that the questions on the research survey may cause anxiety or stress depending on my personal situation but that most find the experience harmless and even enjoyable. As a participant, I am aware that the responses I provide may assist future college students at this University and perhaps other colleges across the country. By signing below, I state that I have read this consent form in its entirety and that all of my questions have been answered. I understand that I may withdrawal from this study at any time and that my participation or lack of participation will in no way affect my status as a [student, patient, employee, etc.] Subject Signature ________________________ Witness Signature ________________________ Date ______ Date ______

Use of Information. Whenever research is done, information is gathered. This information can be relatively innocuous such as hair color, gender, or number of siblings or it can be quite personal such as sexual history, views on abortion, or yearly income. No matter what information is collected, the subject has the right to know how this information will be used. Often times this section states that subjects are assigned a number and that no personal information will be associated with their identifiable information such as name, address, phone number or other characteristic that distinguishes them from others. In this situation, we would then know that subject number 14 is a female who is pro-life, and earns $35,000 per year, for example, but we wouldnt be able to identify her. Table 3.1: Checklist for the Statement of Informed Consent

Potential Benefits. There are often potential benefits to any study for the individual subject and should always be potential benefits to the population at large. This section allows the researcher to make statement such as, the results of this research could lead to better teaching methods that may ultimately improve the way children learn. Benefits to the potential subject should also be stated, such as free medication and treatment for a medical study, or consultation with a psychologist. Even something as simple as self-growth could be included in this section. Potential Harms. Perhaps, more important than benefits, are the potential harms that could result from participating in the study. The subject must be informed of any risks involved, including both physical and emotional. Even if these potential harms are rare or farfetched, the subject has the right to know before consenting to participate. Often you will see statements that discuss the possibility of negative thoughts or feelings associated with the subjects responses or actions. For example, a study addressing depression after the death of a loved on could easily bring about sad thoughts and depressed feelings as the subject answers questions related to his or her experience. If this is a possibility, it should be addressed in the consent form. Statement of Voluntary Consent. The statement of voluntary consent typically contains two main statements. First, the subject will be agreeing that he or she has (a) read this form, (b) understand the form, and (c) has had all questioned answered. Second, the subject should be informed that he or she can drop out of the study at any point (if this is feasible without causing harm) without repercussions. This means that at any point from signing the form to the completion of the study, they could refuse to continue being a subject and will not be harassed, humiliated, or otherwise coerced. Published Standardized Tests. Often researchers want to gather information related to a general area such as personality or intelligence. For these instances, the use of a standardized test may be the best choice. With already published tests you can be sure of both validity and reliability and can save a lot of time that might otherwise be spent on test construction. Standardized tests can be classified into five main categories: achievement, aptitude, interest, personality, and intelligence. Achievement Tests. Achievement tests are designed specifically to measure an individuals previously learned knowledge or ability. They are available for many topic areas related to psychology, education, business, and other fields. Achievement tests require that prior learning take place and that this learning be demonstrated in order to pass.

Aptitude Tests. Aptitude tests attempt to predict an individuals performance in some activity at some point in the future. They do not require any specific prior learning although basic knowledge related to reading and writing is usually required and some preparation, such as studying up on math formulas or sentence structure, can be helpful. A wellknown example of this type is the Scholastic Achievement Test (SAT), designed to predict future college performance. Interest Inventories. Interest inventories also require only general knowledge but no preparation is needed. These tests look at an individuals subjective interests in order to make predictions about some future behavior or activity. Perhaps the most used interest inventory is the Strong Interest Inventory, which compares interests related to specific careers in order to help guide an individuals career path. Endorsed interests are compared with the interests of successful individuals in various fields and predictions are made regarding the test-takers fit with the various career fields. Personality Tests. Typically designed to assess and diagnose personality and mental health related disorders, personality tests are used extensively by psychologists in clinical, educational, and business related settings. By far the most widely used test of this type is the Minnesota Multiphasic Personality Inventory, Second Edition (MMPI-2), which compares an individuals responses on a series of true-false items to those suffering from various mental disorders such as depression, schizophrenia, and anxiety. The theory behind the test argues that if you endorse items similar to the items endorsed by those with depression, for example, then the chances that you are also depressed increases. Intelligence Tests. Intelligence tests could be classified as aptitude tests since they are sometimes used to predict future performance. They could also be classified as personality tests since they can be used to diagnose disorders such as learning disabilities and mental retardation. However, because of their limited scope, we will place them in their own category. The purpose of an intelligence test is to attain a summary score or intelligence quotient (IQ) of an individuals intellectual ability. Scores are compared to each other and can be broken down into different subcategories depending on the intelligence test used. The most commonly used tests of this type are the Wechsler Scales, including the Wechsler Adult Intelligence Scale (WAIS), the Wechsler Intelligence Scale for Children (WISC), and the Wechsler Preschool and Primary Scale of Intelligence (WPPSI). Self-Response Questionnaires. Self-response questionnaires are a great way to gather large amounts of information in a relatively short amount of time. A questionnaire, similar to a survey you might see on a web page, allows subjects to respond to questions, rate responses, or offer opinions. Their responses can then be used to place them in specific categories or groups or can be compared to other subjects for data analysis. A concern with self-report, however, is the accuracy of the responses. Unlike direct observation, there is no way of knowing if the subject has told the truth or whether or not the question was understood as intended. There are several different methods for gathering information on a questionnaire or survey, including a Likert scale, the Thurstone technique, and the semantic differential. Likert Scale. The Likert scale is a popular method used ion surveys because it allows the researcher to quantify opinion based items. Questions are typically grouped together and rated or responded to based on a five-point scale. This scale typically ranges in order from one extreme to the other, such as (1) very interested; (2) somewhat interested; (3) unsure; (4) not very interested; and (5) not interested at all. Items that might be rated with this scale representing the subjects level of interest could include a list of careers or academic majors, for example. Table 3.2 lists some examples of a five-point Likert Scale. Table 3.2: Examples of Five-point Likert Scales

Thurstone Technique. The Thurstone technique allows subjects to express their beliefs or opinions by checking items that apply to them. It requires that a series of statements, usually 10 or 20, be created by experts in a specific area of interest. These statements are placed in order of intensity or rated in order of intensity by the experts. The subject is then asked to check all of the statements that apply to her and a median score is computed based on which items she has checked. For example, if the subject checked three items that were rated an intensity of four and three items rated an intensity of two, her median score would be 3. The subject who checked higher rated items would receive a higher median score and the subject who checked lower level items would receive a lower median score. Table 3.3 provides a hypothetical Thurstone scale for major depressive disorder with the symptoms starting at the bottom representing low intensity and then progressing upward to the top, which represents high intensity. Table 3.3: Sample Thurstone Questionnaire

Semantic Differential. Somewhat similar to a likert scale, a semantic questionnaire asks subjects to rate their opinion on a scale representing two extremes and a series of points in between. Unlike the Likert, however, this technique usually provides a total of seven points rather than five, and the points in between the extremes are not labeled. The subject is therefore forced to provide his own rating on a one to seven scale only knowing the description of the two extremes. Good examples of these semantic differentials include dichotomies such as extroverted/introverted, friendly/cruel, interested/not interested, or biological/environmental. Figure 3.2 provides an example of what the results of research might look like using a semantic differential technique. Figure 3.2: Research Results of a Hypothetical Semantic Differential Scale

Observation Forms. Observation of subjects has been a longstanding means of gathering information. While this method works well for informal research or research involving only a single subject, it is not an easy task to generalize to the greater population. Scientific methods require that any observation be as standardized and objective as possible if generalization is to occur. Observation forms are often used to allow researchers to detail their observations on an agreed upon scale and observer ratings are often correlated to determine if all are measuring the same behaviors. Human behavior is complex and measuring it merely by watching behaviors presents unique challenges. Imagine rd measuring violent behavior is a group of 3 graders during recess. What one observer sees as violent, pushing another child for example, a second researcher may view as aggressive but not violent. A third observer may miss the behavior all together due to other behaviors or some distracter on the playground. To increase validity and reliability, the following steps are recommended: 1. 2. Operationally define behaviors to be observed as much as possible Practice observations before the study to correlate results in order to make sure all observers are measuring the same behaviors Use at least two observers per subject when possible in order to minimize missed behaviors or misinterpretations Retrain observers frequently and correlate these training observations.

3.

4.

Q-Sort. The Q-Sort is a technique adopted originally into humanistic psychology as a means for a client to self-evaluate his current status and then decide on treatment goals. Since the instrument is completed and interpreted by the client without the opinion of the therapist, it allows the client to be in complete control of what issues are to be worked on. This follows the humanistic view that the client, not the therapist, is the one with the answers to the clients problems. The Q-Sort consists of a number of cards, often as many as 40 or 50, each consisting of a single trait, belief, or behavior. The goal is to sort these cards into one of five columns ranging from statements such as, very much like me to not at all like me. There are typically a specific number of cards allowed for each column, forcing the client to balance the cards evenly. The qualities in each column are then recorded and the results are used to assist the client in determining issues he or she wishes to work on in treatment. The Q-Sort can also be completed during and after

treatment to assess changes and progress toward the self-determined goals. Figure 3.3: Sample Outcome of a Q-Sort

The Interview. The interview is a well-known means of gathering information, especially in organizational settings such as hiring employees or gathering background information. Just like observations, interviews can be very subjective. It is therefore important to determine questions prior to the study and to develop a protocol that will be followed by all interviewers. This protocol will ask the same questions in the same order to every subject and responses should be recorded exactly. It is also helpful if possible responses to questions are kept as closed ended as possible. For example, how many siblings do you have? rather than tell me about your siblings. What makes the interview different than a self-administered questionnaire is the ability to judge behaviors, and to inspire or clarify responses. Interviews can be time consuming, especially if a large number of subjects are to be interviewed. And, if open-ended questions are used, the quantitative procedures of statistics become much more difficult.

Method Tools of Research Knowledge of the procedures involved in research is perhaps the most important tool. Without this, the best computers and most reliable tests would be useless. For the researcher, there are several important areas that must be mastered, including the knowledge of scientific methods, the use of statistical software, the practical application of statistical formulas, knowledge of writing style, and the ability to gather information and critically review the research of others. For this reason, this text would be considered a method tool in that it is designed to provide knowledge of research methods and assist you in the process of performing and evaluating the research of others.

Chapter Conclusion Prior to beginning any study it is important to gather and understand the methods that are to be applied. When forms or subjective measures are to be used, the researcher should always gather the necessary material before subject recruitment and practice with this material before testing begins. While many researchers develop their own questionnaires, interviews, or consent forms, considering the use of a published form can be beneficial. Data

collection methods using various types of scales can assist the researcher by providing high validity and reliability. It is often time consuming to develop the exact questions to be used with each question written in a specific manner as to not create any bias or confusion. Preprinted and published forms, when available, can eliminate this concern and free up a lot of time. By utilizing the necessary hardware, mastering the use of selected software, and gathering forms, training observers, or mastering the scientific method, you can develop a strong foundation on which to base your research. Rushing into an experiment typically creates more problems and costs more time than proceeding in an organized fashion. Spend the extra time to prepare for your study because theres nothing worse than having to start over because of simple errors that could have easily been avoided. The remainder of this book covers specific types of research, each with its own strengths and weaknesses. You will also be presented with some basic descriptive and inferential statistics that can be considered a refresher course for those who have taken a formal statistics class. And finally, since the ultimate goal of this text is to provide the tools necessary to evaluate the research of others and incorporate research into your professional activities, we will end with a chapter focused on the critical analysis of the research report.

Chapter 4: Single Subject Design


4.1 Introduction 4.2 ABAB Design 4.3 Multiple Baselines 4.4 Chapter Conclusion

Introduction Single subject designs are thought to be a direct result of the research of B.F. Skinner who applied the techniques of operant conditioning to subjects and measured the outcomes at various points in time. Because of this, single subject designs are often considered the design of choice when measuring behavioral change or when performing behavioral modification. Rather than comparing groups of subjects, this design relies on the comparison of treatment effects on a single subject or group of single subjects. An important aspect of this type of study is the gathering of pretest information, often called a baseline measure. It is important to measure the dependent variable or behavior prior to administering any treatment. Without this information, it is difficult, and likely impossible to determine if any change has occurred. Also often associated with this design are periods of measurement to determine not only a change but the degree of change through the process of behavioral modification. Well look at the two most common applications of this design, including the A-B-A-B design and Multiple Baselines.

A-B-A-B Design The A-B-A-B design represents an attempt to measure a baseline (the first A), a treatment measurement (the first B), the withdrawal of treatment (the second A), and the re-introduction of treatment (the second B). In other words, the A-B-A-B design involves two parts: (1) gathering of baseline information, the application of a treatment and measurement of the effects this treatment; and (2) measurement of a return to baseline or what happens when the treatment is removed and then again applying the treatment and measuring the change. In terms of an actual study, imagine you are attempting to train your dog to sit on command. As a simple example how this design might work, imagine you just adopted two untrained puppies. Youre first goal as a new dog owner is to teach both puppies to sit on command. You want to measure the effects of using a treat or biscuit versus

using verbal and physical praise, so you apply the treat to Puppy 1 and the praise to Puppy 2. The initial A in this design refers to a baseline for each subject. To determine this, you might say the command sit to each puppy at ten different times and measure how many times each puppy actually sat. If puppy 1 sat twice, the ratio would be 2:10 or 20%. If puppy 2 sat twice, the ratio would be 2:10 or 20% as well. The next step would be to apply the treatment or training to each puppy. Each puppy would be commanded to sit twice every ten-minutes for a two-hour period. Puppy 1 would receive a biscuit if he responded appropriately and puppy 2 would receive praise. Neither would receive any reinforcement or punishment for noncompliance. At the end of the two-hour period, the change due to treatment is measured (the first B). If puppy 1 sat 16 times, the ratio would be 16:24 or 67% and if puppy 2 sat 12 times, the ratio would be 12:24 or 50%. The training would then cease for a period of time, say 24 hours, and the puppies would then be commanded to sit similar to how we determined the original baseline. This second baseline (the second A) measures the effects of extinction, or the withdrawal of the positive reinforcer, on behavior. Without continual reinforcement, we are determining if the second baseline returns to the original or if the behavioral change we experienced will continue. For the sake of this example, assume the second baseline for puppies 1 and 2 is 20% and 45% respectively. Finally, the treatment is once again applied (the second B) to measure the effects of spontaneous recovery. Well assume the re-application of the treatment or training resulted in a ratio of 67% for puppy 1 and 80% for puppy 2. So what does this all mean? If you look back at the original A-B, youll notice that the training with a biscuit increased the ratio of response from 20% to 67% and the training with praise increased the behavior from 20% to 50%. From this initial data, it appears as if the use of a biscuit produced a greater change in behavior. However, looking at the withdrawal of the reinforcer produced a return to the baseline for puppy 1 while puppy 2 held onto some behavioral change. This would suggest that the use of a biscuit for training produces a greater change but results in a greater loss when it is no longer used. The puppy that received the praise was more likely to hold onto gains in his behavioral change. See Figure 4. The final training also suggests that the puppy that received the biscuit returned to where he was after treatment once the reinforcer was applied again but that puppy 2 jumped way up to 80%. The final outcome for our single subject design using two subjects would suggest that the use of praise results in an overall increase in behavioral change when compared with the biscuit. In the final analysis, the praise would be considered a superior reinforcer or training method than the biscuit. Figure 4.1: Determination of Best Training Method

Finally, different treatments can be applied to different subjects in order to compare results. Figure 4.2 shows the application of the ABABA design on three subjects each being applied a different treatment. From the chart it is obvious that treatment 1 increases the desired behavior but that this behavior returns quickly to baseline when the treatment is discontinued. Treatment 2 shows no change in behavior in any of the trials. Treatment 3, however, shows an increase in the desired behavior when the treatment is applied and that this increase remains consistent even when the treatment is discontinued. If the goal is a longstanding increase in behavior, treatment 3, from the information available and of the choices offered, is obviously the best approach. The A-B-A-B design can obviously be altered to include any number of baselines and treatment phases. To determine the effects of treatment and the degree of extinction only, a simpler A-B-A design would be used To

determine if additional training changes the ultimate results, a more complex A-B-A-B-A-B-A-B could be employed. Each study could be completed with only one subject or the results of different subjects with different treatment approaches could be compared (See Figure 4.2). The complexity of the study depends on both the original intention and feasibility. Figure 4.2: Application Three Different Treatments on Three Single Subjects

When a more complex schedule is applied, the extinction phase for one treatment can become the baseline phase for an additional treatment. Simply put, this method allows overlapping treatments to be tested with only a single subject. Figure 4.3 demonstrates the hypothetical outcome of using a single subject to determine the effects of three different treatment methods. For the first treatment, the initial A is considered the baseline for Treatment A and B1 represents the application of the first treatment. A2 represents the removal of Treatment A and also acts as the baseline for the second treatment. The second treatment is applied at B2 followed by another extinction phase (A3), which then becomes the baseline for the third treatment applied at B3. The final A represents the extinction phase of the final treatment method. Looking at the results in Figure 4.3, you will notice that Treatment A appears to have had the most effect, however, once Treatment A was discontinued, the behavior returned nearly to baseline. Treatment B had a moderate effect but during the extinction phase, the behavior remained at a moderate level suggesting a low or non-existent extinction of the treatment. Finally, Treatment C appears to have reduced the desired behavior and is therefore acting as a punisher rather than a reinforcer. Luckily the behavior returned to baseline once Treatment C was discontinued. The results therefore show that the best treatment in terms of change is Treatment A but the best treatment in terms of longer standing impact is Treatment B. Treatment C is the worst treatment as it had the opposite effect than intended. Figure 4.3: Determination of Best Training Method

Multiple Baselines Some concerns of the A-B-A-B design include the effects of maturation, timing of training, amount of training, and other threats to internal validity. For instance, through the A-B-A-B design, we have no way of knowing if a little more biscuit training would have increased the response even greater than that of the praise. We also dont know if other variables aside from those measured actually caused the change in behavior unless we spend a great amount of time and effort controlling for these possible confounds. A way to minimize these weaknesses is through the technique known as multiple baselines. Multiple baselines approach uses a varying time schedule that allows the researcher to determine if the application of treatment is truly influencing the change in behavior. For example, we might vary the length of time in the initial baseline determination and then apply the treatment to determine if the change in behavior corresponds with the introduction of treatment. We might apply varying amounts of a specific treatment (verbal praise verses verbal and physical praise) to better understand not only the best treatment but also the best amount of treatment. Figure 4.4 uses the multiple baseline design to determine if the timing of treatment is important. Notice how the behavioral change took place for each subject immediately following the introduction of treatment. This graph shows that the timing of treatment is not important but also shows that change is directly related to the treatment. Figure 4.5 tells us a different story. Had we only tested subject A, we might have assumed that the treatment effected the change. Subjects B and C however, demonstrate that the treatment actually had no effect on the behavioral change. In fact, the behavior changed at the same time for each subject regardless of when the treatment was applied. Without the time-lagged approach of multiple baseline design, this phenomenon would have been missed. Figure 4.4: Determination of Best Training Method

Figure 4.5: Determination of Best Training Method

Chapter Conclusion Single subject designs rely on the application of treatment to a single subject or group of single subjects in order to determine treatment effects. It can be used to determine the effects of Pavlovs classical conditioning, including baseline, treatment, extinction, and spontaneous recovery. It can also be used to determine the effects of the same treatment on different subjects, different treatments on different subjects, and even different treatments on the same subject. Multiple baselines go beyond the simple A-B-A-B designs and control for more variables, providing a better understanding of the outcome as well as increased generalizability to the population at large. With multiple baseline designs, a time-lagged approach can be used to determine if the introduction of the treatment is actually causing the change in behavior. Single subject designs are often considered the research method of choice for behavioral research attempting to measure changes in behavior due to the application of reinforcement. It provides a powerful means for teachers to determine the most effective reward or discipline technique for a specific student or for managers to determine the best method of compensation or reward. However, the issue of generalizability is significant. Can we truly say that if the treatment causes a change in a single subject or even a small group of single subjects that this change will also occur within the whole population? Generalizability, often a major concern in research, is addressed in chapter 5.

Chapter 5: Experimental Design


5.1 Introduction 5.2 Pre-Experimental Design 5.3 Quasi-Experimental Design 5.4 True Experimental Design 5.5 Chapter Conclusion

Introduction

The design of any experiment is of utmost importance because it has the power to be the most rigid type of research. The design, however, is always dependent on feasibility. The best approach is to control for as many confounding variables as possible in order to eliminate or reduce errors in the assumptions that will be made. It is also extremely desirable that any threats to internal or external validity be neutralized. In the perfect world, all research would do this and the results of research would be accurate and powerful. In the real world, however, this is rarely the case. We are often dealing with human subjects, which in itself confounds any study. We are also dealing with the restraints of time and situation, often resulting in less than perfect conditions in which to gather information. There are three basic experimental designs, each containing subsets with specific strengths and weaknesses. These three basic designs include: (1) pre-experimental design; (2) quasi-experimental design; and (3) true experimental design. They will be discussed below and as you will discover, are addressed in order of effectiveness.

Chapter 5: Experimental Design


5.1 Introduction 5.2 Pre-Experimental Design 5.3 Quasi-Experimental Design 5.4 True Experimental Design 5.5 Chapter Conclusion

Pre-Experimental Design Pre-experimental designs are so named because they follow basic experimental steps but fail to include a control group. In other words, a single group is often studied but no comparison between an equivalent non-treatment group is made. Examples include the following: The One-Shot Case Study. In this arrangement, subjects are presented with some type of treatment, such as a semester of college work experience, and then the outcome measure is applied, such as college grades. Like all experimental designs, the goal is to determine if the treatment had any effect on the outcome. Without a comparison group, it is impossible to determine if the outcome scores are any higher than they would have been without the treatment. And, without any pre-test scores, it is impossible to determine if any change within the group itself has taken place. One Group Pretest Posttest Study. A benefit of this design over the previously discussed design is the inclusion of a pretest to determine baseline scores. To use this design in our study of college performance, we could compare college grades prior to gaining the work experience to the grades after completing a semester of work experience. We can now at least state whether a change in the outcome or dependent variable has taken place. What we cannot say is if this change would have occurred even without the application of the treatment or independent variable. It is possible that mere maturation caused the change in grades and not the work experience itself. The Static Group Comparison Study. This design attempts to make up for the lack of a control group but falls short in relation to showing if a change has occurred. In the static group comparison study, two groups are chosen, one of which receives the treatment and the other does not. A posttest score is then determined to measure the difference, after treatment, between the two groups. As you can see, this study does not include any pre-testing and therefore any difference between the two groups prior to the study are unknown. Table 5.1: Diagrams of Pre-Experimental Designs

True Experimental Design True experimental design makes up for the shortcomings of the two designs previously discussed. They employ both a control group and a means to measure the change that occurs in both groups. In this sense, we attempt to control for all confounding variables, or at least consider their impact, while attempting to determine if the treatment is what truly caused the change. The true experiment is often thought of as the only research method that can adequately measure the cause and effect relationship. Below are some examples: Posttest Equivalent Groups Study. Randomization and the comparison of both a control and an experimental group are utilized in this type of study. Each group, chosen and assigned at random is presented with either the treatment or some type of control. Posttests are then given to each subject to determine if a difference between the two groups exists. While this is approaching the best method, it falls short in its lack of a pretest measure. It is difficult to determine if the difference apparent at the end of the study is an actual change from the possible difference at the beginning of the study. In other words, randomization does well to mix subjects but it does not completely assure us that this mix is truly creating an equivalency between the two groups. Pretest Posttest Equivalent Groups Study. Of those discussed, this method is the most effective in terms of demonstrating cause and effect but it is also the most difficult to perform. The pretest posttest equivalent groups design provides for both a control group and a measure of change but also adds a pretest to assess any differences between the groups prior to the study taking place. To apply this design to our work experience study, we would select students from the college at random and then place the chosen students into one of two groups using random assignment. We would then measure the previous semesters grades for each group to get a mean grade point average. The treatment, or work experience would be applied to one group and a control would be applied to the other. It is important that the two groups be treated in a similar manner to control for variables such as socialization, so we may allow our control group to participate in some activity such as a softball league while the other group is participating in the work experience program. At the end of the semester, the experiment would end and the next semesters grades would be gathered and compared. If we found that the change in grades for the experimental group was significantly different than the change in the grades of our control group, we could reasonably argue that one semester of work experience compared to one semester of non-work related activity results in a significant difference in grades.

Table 5.3: Diagrams of True Experimental Designs

Chapter Conclusion The experiment, especially the true experimental design is often the measure of choice when attempting to determine a cause and effect relationship. Utilizing randomization and the pre-testing and post-testing of both an experimental group and a control group allows us to control for more confounding variables than any other research method. These confounding variables, when not addressed, can often result in inaccurate findings. Controlling for confounding variables is important in research and especially important in the experimental designs. This process helps us assure valid results both internally and externally. The threats to internal validity, those that apply to the experimental situation itself, and external validity, those relating to the generalizability of our results to the real world are also issues of great concern to researchers. As the saying goes: garbage in, garbage out. If we start with a flawed design we will end up with flawed results. As the degree of control for each of the designs discussed increases, the difficulty in performing the research also increases. Feasibility is always an issue and even when the most stringent control is used, the mere fact that the subjects have agreed to participate in the experiment may have a negative effect on the studys generalizability. Are volunteer subjects truly representative of the population at large? As you can see, there are varying degrees of experimental research, but there is no perfect experiment that controls for all possible variables and assures us of 100% generalizability.

Chapter 6: Other Research Designs


6.1 Introduction 6.2 Historical Research 6.3 Developmental Research 6.4 Qualitative Research 6.5 Chapter Conclusion

Introduction Non-experimental studies play an important role in research and are likely more popular than their more standardized

counterpart. While quantitative information is typically gathered, the focus of the research designs to follow is more qualitative. These major designs discussed below include historical, developmental, and qualitative research designs.

Historical Research When we think of research, we often think of a laboratory or classroom where two or more groups receive different treatments or alternative training methods. We would then determine if the treatment or training had an impact on some outcome measure. This type of research is the best at predicting cause and effect relationships and is often cited as the most rigorous and standardized form. While the experiments described above have a definite place in the research arena, sometimes we gain the best knowledge by looking into the past rather than into the future. Historical research attempts to do just that. Through a detailed analysis of historical data, we can determine, perhaps to a lesser extent, cause and effect relationships. We can also help prevent the present day teachers, managers, and other users of research from making the same mistakes that were made in the past. Historical research can also mean gathering data from situations that have already occurred and performing statistical analysis on this data just as we would in a traditional experiment. The one key difference between this type of research and the type described in the first paragraph concerns the manipulation of data. Since historical research relies on data from the past, there is no way to manipulate it. Studying the grades of older students, for example, and younger students may provide some insight into the differences between these two groups, but manipulating the work experience is impossible. Therefore, historical research can often lead to present day experiments that attempt to further explore what has occurred in the past.

Chapter 6: Other Research Designs


6.1 Introduction 6.2 Historical Research 6.3 Developmental Research 6.4 Qualitative Research 6.5 Chapter Conclusion

Developmental Research The purpose of developmental research is to assess changes over an extended period of time. For example, developmental research would be an ideal choice to assess the differences in academic and social development in lowincome versus high-income neighborhoods. It is most common when working with children as subjects for obvious reasons and can be undertaken using several methods: longitudinal, cross sectional, and cross sequential. Longitudinal Studies. Longitudinal studies assess changes over an extended period of time by looking at the same groups of subjects for months or even years. Looking at academic and social development, we may choose a small sample from each of the low- and high-income areas and assess them on various measures every six months for a period of ten years. The results of longitudinal studies can provide valuable qualitative and quantitative data regarding the differences in development between various groups. The major concern with longitudinal research, aside from the obvious lack of control, randomization, and standardization, is the length of time it takes to complete the study. Imagine starting a project that must be constantly maintained for a period of ten or more years. The subject mortality rate due to illness, relocation, and other factors alone could result in major concerns, not to mention the amount of energy and time that must be devoted to the research. Cross Sectional Studies. One way to reduce the amount of time and the mortality rate in a developmental study is to assess different ages at the same time rather than using the same groups over an extended period. A cross sectional study might look at the same theory regarding academic and social development but assess a small group of three year olds, six year olds, nine year olds and 12 year olds at the same time. The assumption is that the differences between the age ranges represent natural development and that of a longitudinal study had been used, similar results would be found. The obvious benefit is in the length of time it takes to complete the study, but the assumptions that the six year old group will achieve the same academic and social development as the nine year old group can be invalid. Cross Sequential Studies. Cross sequential studies combine both longitudinal and cross sectional methods in an attempt to both shorten the length of the research and minimize developmental assumptions. For this method, groups of different age children (three, six, and nine for example) may be studied for a period of three years to both assess developmental changes and assure that the typical three year old is similar to the typical 6 year old after three years of development. Qualitative Research The purpose of qualitative research is to gather non-numerical data to help explain or develop a theory about a relationship. Methods used to gain qualitative information include surveys, observation, case studies, and interviews and the information derived from these means can be combined into a story like description of what is happening or what has happened in the past. For example, to better understand variables that are difficult to quantify, such as attitudes, religious beliefs, or political opinions, qualitative research could be used to draw a picture about a specific population or group of people. Qualitative research is often also used as a pilot study in order to gather information that may later lead to a quantitative study.

Chapter Conclusion While the majority of the research studies published in professional journals are quantitative in nature, other types of research hold an important place in the gathering and understanding of information. These non-quantitative studies, while they may include some numerical data, often help provide more color, understanding, and personalization to information that might otherwise be missed. Historical research helps us understand what has occurred in the past so that we can change the present and future. Developmental research helps us to understand maturation and natural development in order to better assess the needs of people at various stages of life. Qualitative research helps us to add individuality and to explain phenomena that are not easily quantified. Any of these types of research can also be used to gather information that may later lead to a quantitative exploration of data.

Chapter 7: Variables, Validity, and Reliability


7.1 Introduction 7.2 Variables 7.3 Test Validity and Reliability 7.4 Experimental Validity 7.5 Chapter Conclusion

Introduction In the previous two units we have discussed the purpose of research, the research report, subject

selection, and the various types of research design. The final unit, including this chapter, will begin to add quantitative knowledge to your research repertoire, which will allow you to critically analyze not only the methodologies of research but also the statistical results. Before analyzing any data, however, and even before testing any subjects, the issues of variable selection and control, reliability, and validity must be addressed. The purpose of any research is to determine if a theory is supported or not based on statistical analysis. A theory is an educated guess about a relationship but in order for research to be conducted on a theory, it must first be operationalized. To operationalize a theory, all variables must be defined and the methods of conducting the research must be determined. Once this is done, the resulting statement about the relationship is called a hypothesis. The hypothesis is what gets tested in any research study.

Variables A variable, as opposed to a constant, is simply anything that can vary. If we were to study the effects of work experience on college performance, we might look at the grades of students who have worked prior to starting college and the grades of students who did not work prior to starting college. In this study, you may notice that both groups are students so student status remains constant between the two groups. You may also notice that work experience is not the same between the two groups, therefore work experience varies and is considered a variable. If we choose students for each group who are of similar age or similar background, we are holding these aspects constant and therefore, they too will not vary within our study. Every experiment has at least two types of variables: independent and dependent. The independent variable (IV) is often thought of as our input variable. It is independent of everything that occurs during the experiment because once it is chosen it does not change. In our experiment on college performance, we chose two groups at the onset, namely, those with work experience and those without. This variable makes up our two independent groups and is therefore called the independent variable. The dependent variable (DV), or outcome variable, is dependent on our independent variable or what we start with. In this study, college grades would be our dependent variable because it is dependent on work experience. If we chose to also look at men versus women, or older students versus younger students, then these variables would be other independent variables and the outcome, our dependent variable (college grades), would be dependent on them as well. Remember that whatever is the same between the two groups is considered a constant because they do not vary between groups but rather remain the same and therefore do not affect the outcome of each group differently. Confounding Variables. Researchers must be aware that variables outside of the independent variable(s) may confound or alter the results of a study. As previously discussed, any variable that can potentially play a role in the outcome of a study but which is not part of the study is called a confounding variable. If, for instance, we had two groups in the above mentioned study but did not control for age then age itself may be a confound. Imagine comparing students with work experience with a mean age of 40 with students without work experience and a mean age of 18. Could we reasonably say that work experience caused the student to receive higher grades? This extraneous variable can play havoc on our results as can any intervening variable such as motivation or attention. Addressing confounds before they alter the results of your study is always a wise decision.

Chapter 7: Variables, Validity, and Reliability


7.1 Introduction 7.2 Variables 7.3 Test Validity and Reliability 7.4 Experimental Validity 7.5 Chapter Conclusion

Test Validity and Reliability Whenever a test or other measuring device is used as part of the data collection process, the validity and reliability of that test is important. Just as we would not use a math test to assess verbal skills, we would not want to use a measuring device for research that was not truly measuring what we purport it to measure. After all, we are relying on the results to show support or a lack of support for our theory and if the data collection methods are erroneous, the data we analyze will also be erroneous. Test Validity. Validity refers to the degree in which our test or other measuring device is truly measuring what we intended it to measure. The test question 1 + 1 = _____ is certainly a valid basic addition question because it is truly measuring a students ability to perform basic addition. It becomes less valid as a measurement of advanced addition because as it addresses some required knowledge for addition, it does not represent all of knowledge required for an advanced understanding of addition. On a test designed to measure knowledge of American History, this question becomes completely invalid. The ability to add two single digits has nothing do with history. For many constructs, or variables that are artificial or difficult to measure, the concept of validity becomes more complex. Most of us agree that 1 + 1 = _____ would represent basic addition, but does this question also represent the construct of intelligence? Other constructs include motivation, depression, anger, and practically any human emotion or trait. If we have a difficult time defining the construct, we are going to have an even more difficult time measuring it. Construct validity is the term given to a test that measures a construct accurately and there are different types of construct validity that we should be concerned with. Three of these, concurrent validity, content validity, and predictive validity are discussed below. Concurrent Validity. Concurrent Validity refers to a measurement devices ability to vary directly with a measure of the same construct or indirectly with a measure of an opposite construct. It allows you to show that your test is valid by comparing it with an already valid test. A new test of adult intelligence, for example, would have concurrent validity if it had a high positive correlation with the Wechsler Adult Intelligence Scale since the Wechsler is an accepted measure of the construct we call intelligence. An obvious concern relates to the validity of the test against which you are comparing your test. Some assumptions must be made because there are many who argue the Wechsler scales, for example, are not good measures of intelligence. Content Validity. Content validity is concerned with a tests ability to include or represent all of the content of a particular construct. The question 1 + 1 = ___ may be a valid basic addition question. Would it represent all of the content that makes up the study of mathematics? It may be included on a scale of intelligence, but does it represent all of intelligence? The answer to these questions is obviously no. To develop a valid test of intelligence, not only must there be questions on math, but also questions on verbal reasoning, analytical ability, and every other aspect of the construct we call intelligence. There is no easy way to determine content validity aside from expert opinion. Predictive Validity. In order for a test to be a valid screening device for some future behavior, it must have predictive validity. The SAT is used by college screening committees as one way to predict college grades. The GMAT is used to predict success in business school. And the LSAT is used as a means to predict law school performance. The main concern with these, and many other predictive measures is predictive validity because without it, they would be worthless.

We determine predictive validity by computing a correlational coefficient comparing SAT scores, for example, and college grades. If they are directly related, then we can make a prediction regarding college grades based on SAT score. We can show that students who score high on the SAT tend to receive high grades in college. Test Reliability. Reliability is synonymous with the consistency of a test, survey, observation, or other measuring device. Imagine stepping on your bathroom scale and weighing 140 pounds only to find that your weight on the same scale changes to 180 pounds an hour later and 100 pounds an hour after that. Base don the inconsistency of this scale, any research relying on it would certainly be unreliable. Consider an important study on a new diet program that relies on your inconsistent or unreliable bathroom scale as the main way to collect information regarding weight change. Would you consider their results accurate? A reliability coefficient is often the statistic of choice in determining the reliability of a test. This coefficient merely represents a correlation (discussed in chapter 8), which measures the intensity and direction of a relationship between two or more variables. Test-Retest Reliability. Test-Retest reliability refers to the tests consistency among different administrations. To determine the coefficient for this type of reliability, the same test is given to a group of subjects on at least two separate occasions. If the test is reliable, the scores that each student receives on the first administration should be similar to the scores on the second. We would expect the relationship between he first and second administration to be a high positive correlation. One major concern with test-retest reliability is what has been termed the memory effect. This is especially true when the two administrations are close together in time. For example, imagine taking a short 10-question test on vocabulary and then ten minutes later being asked to complete the same test. Most of us will remember our responses and when we begin to answer again, we may just answer the way we did on the first test rather than reading through the questions carefully. This can create an artificially high reliability coefficient as subjects respond from their memory rather than the test itself. When a pre-test and post-test for an experiment is the same, the memory effect can play a role in the results. Parallel Forms Reliability. One way to assure that memory effects do not occur is to use a different pre- and posttest. In order for these two tests to be used in this manner, however, they must be parallel or equal in what they measure. To determine parallel forms reliability, a reliability coefficient is calculated on the scores of the two measures taken by the same group of subjects. Once again, we would expect a high and positive correlation is we are to say the two forms are parallel. Inter-Rater Reliability. Whenever observations of behavior are used as data in research, we want to assure that these observations are reliable. One way to determine this is to have two or more observers rate the same subjects and then correlate their observations. If, for example, rater A observed a child act out aggressively eight times, we would want rater B to observe the same amount of aggressive acts. If rater B witnessed 16 aggressive acts, then we know at least one of these two raters is incorrect. If there ratings are positively correlated, however, we can be reasonably sure that they are measuring the same construct of aggression. It does not, however, assure that they are measuring it correctly, only that they are both measuring it the same.

Experimental Validity If a study is valid then it truly represents what it was intended to represent. Experimental validity refers to the manner in which variables that influence both the results of the research and the generalizability to the population at large. It is broken down into two groups: (1) Internal Validity and (2) External Validity.

Internal Validity. Internal validity refers to a studys ability to determine if a causal relationship exists between one or more independent variables and one or more dependent variables. In other words, can we be reasonably sure that the change (or lack of change) was caused by the treatment? Researchers must be aware of aspects that may reduce the internal validity of a study and do whatever they can to control for these threats. These threats, if left ignored, can reduce validity to the point that any results are meaningless rendering the entire study invalid. There are eight major threats to internal validity that are discussed below and summarized in Table 7.1. History. History refers to any event outside of the research study that can alter or effect subjects performance. Since research does not occur within a vacuum, subjects often experience environmental events that are different from one another. These events can play a role in their performance and must therefore be addressed. One way to assure that these events do not impact the study is to control them, or make everyones experience identical except for the independent variable(s). Since this is often impossible, using randomization procedures can often minimize this risk, assuring that outside events that occur in one group are also likely to occur in the other. Maturation. While not a major concern in very short studies such as a survey study, maturation can play a major role in longer-term studies. Maturation refers to the natural physiological or psychological changes that take place as we age. This is especially important in childhood and must be addressed through subject matching or randomization. For instance, an episode of major depression typically decreases significantly within a six-month period even without treatment. Imagine we tested a new medication designed to treat depression. If our results showed that subjects who took this medication showed a significant decrease in depressive symptoms within six months, could we truly say that the medication caused the decrease in symptoms? Probably not, especially since maturation alone would have shown similar results. Testing. People tend to perform better at any activity the more they are exposed to that activity. Testing is no exception. When subjects, especially in single group studies, are given a test as a pretest and then the same test as a posttest, the chances that they will perform better the second time due merely to practice is a concern. For this reason, two group studies with a control group are recommended. Statistical Regression. Statistical regression, or regression to the mean, is a concern especially in studies with extreme scores. It refers to the tendency for subjects who score very high or very low to score more toward the mean on subsequent testing. If you get a 99% on a test, for instance, the odds that your score will be lower the second time are much greater than the odds of increasing your score. Instrumentation. If the measurement device(s) used in your study changes during the course of the study, changes in scores may be related to the instrument rather than the independent variable. For instance, if your pretest and posttest are different, the change in scores may be a result of the second test being easier than the first rather than the teaching method employed. For this reason, it is recommended that pre- and posttests be identical or at least highly correlated. Selection. Selection refers to the manner in which subjects are selected to participate in a study and the manner in which they are assigned to groups. If there are differences between the groups prior to the study taking place, these differences will continue throughout the study and may appear as a change in a statistical analysis. Addressing these differences through subject matching or randomization is highly recommended. Experimenter Bias. We engage in research in order to learn something new or to support a belief or theory. Therefore, we as researchers may be biased toward the results we want. This bias can effect our observations and possibly even result in blatant research errors that skew the study in the direction we want. Using an experimenter who is unaware of the anticipated results (usually called a double blind study because the tester is blind to the results) works best to control for this bias. Mortality. Mortality, or subject dropout, is always a concern to researchers. They can drastically affect the results when the mortality rate or mortality quality is different between groups. Imagine in the work experience study if many

motivated students dropped out of one group due to illness and many low motivated students dropped out of the other group due to personal factors. The result would be a difference in motivation between the two groups at the end and could therefore invalidate the results. Table 7.1: Controlling for Threats to Internal Validity Threat to Internal Validity History Maturation Testing Statistical Regression Instrumentation Selection Experimenter Bias Mortality Controlling Threat Random selection, random assignment Subject matching, randomization Control group Omit extreme scores, randomization Instrumental consistency, assure alternative form reliability Random selection, random assignment Double blind study Subject matching and omission

External Validity. External validity refers to the generalizability of a study. In other words, can we be reasonable sure that the results of our study consisting of a sample of the population truly represents the entire population? Threats to external validity can result in significant results within a sample group but an inability for this to be generalized to the population at large. Four of these threats are discussed below and summarized in Table 7.2. Demand Characteristics. Subjects are often provided with cues to the anticipated results of a study. When asked a series of questions about depression, for instance, subjects may become wise to the hypothesis that certain treatments work better in treating mental illness. When subjects become wise to anticipated results (often called a placebo effect), they can begin to exhibit performance that they believe is expected of them. Making sure that subjects are not aware of anticipated outcomes (referred to as a blind study) reduces the possibility of this threat. Hawthorne Effects. Similar to a placebo, research has found that the mere presence of others watching your performance causes a change in your performance. If this change is significant, can we be reasonably sure that it will also occur when no one is watching? Addressing this issue can be tricky but employing a control group to measure the Hawthorne effect of those not receiving any treatment can be very helpful. In this sense, the control groups is also being observed and will exhibit similar changes in their behavior as the experimental group therefore negating the Hawthorne effect. Order Effects (or Carryover Effects). Order effects refer to the order in which treatment is administered and can be a major threat to external validity if multiple treatments are used. If subjects are given medication for two months, therapy for another two months, and no treatment for another two months, it would be possible, and even likely, that the level of depression would be least after the final no treatment phase. Does this mean that no treatment is better than the other two treatments? It likely means that the benefits of the first two treatments have carried over to the last phase, artificially elevating the no treatment success rates. Treatment Interaction Effects. The term interaction refers to the fact that treatment can affect people differently depending on the subjects characteristics. Potential threats to external validity include the interaction between

treatment and any of the following: selection, history, and testing. As an example, assume a group of subjects volunteer for a study on work experience and college grades. One group agrees to find part time work the summer before starting their freshman year and the other group agrees to join a softball leaguer over the summer. The group that agreed to work is likely inherently different than the group that agreed to play softball. The selection itself may have placed higher motivated subjects in one group and lower motivated students in the other. If the work groups earn higher grades in the first semester, can we truly say it was caused by the work experience? It is likely that the motivation caused both the work experience and the higher grades. Table 7.2: Controlling for Threats to External Validity Threat to Internal Validity Demand Characteristics Hawthorne Effect Order Effects Treatment Interaction Effects Controlling Threat Blind study, control group Control group Counterbalancing treatment order, multiple groups Subject matching, naturalistic observation

Chapter 7: Variables, Validity, and Reliability


7.1 Introduction 7.2 Variables 7.3 Test Validity and Reliability 7.4 Experimental Validity 7.5 Chapter Conclusion

Chapter Conclusion Understanding how to manipulate variables and control for potential threats to experimental validity can be the difference between a solid research study and a near meaningless study. Variables are the basis for all of the statistics you will perform on your data. If you choose your variables wisely and make sure to control for as many confounds and threats to experimental validity as possible, your study is much more likely to add to the knowledge base in your area of specialty. Assuring that the measurement devices used are both valid and reliable will also add a lot to significant results. When any of these is called into question, the entire study gets called into question. Once again, garbage in garbage out.

Chapter 8: Descriptive Statistics


8.1 Introduction 8.2 Scales of Measurement 8.3 Types of Distributions 8.4 Measures of Central Tendency 8.5 Measures of Variability 8.6 The Correlation 8.7 Chapter Conclusion

Introduction A statistic is a numerical representation of information. Whenever we quantify or apply numbers to data in order to organize, summarize, or better understand the information, we are using statistical methods. These methods can range from somewhat simple computations such as determining the mean of a

distribution to very complex computations such as determining factors or interaction effects within a complex data set. This chapter is designed to present an overview of statistical methods in order to better understand research results. Very few formulas or computations will be presented, as the goal is merely to understand statistical theory. Before delving into theory, it is important to understand some basics of statistics. There are two major branches of statistics, each with specific goals and specific formulas. The first, descriptive statistics, refers to the analysis of data of an entire population. In other words, descriptive statistics is merely using numbers to describe a known data set. The term population means we are using the entire set of possible subjects as opposed to just a sample of these subjects. For instance, the average test grade of a third grade class would be a descriptive statistic because we are using all of the students in the class to determine a known average. Second, inferential statistics, has two goals: (1) to determine what might be happening in a population based on a sample of the population (often referred to as estimation) and (2) to determine what might happen in the future (often referred to as prediction). Thus, the goals of inferential statistics are to estimate and/or predict. To use inferential statistics, only a sample of the population is needed. Descriptive statistics, however, require the entire population be used. Many of the descriptive techniques are also used for inferential data so well discuss these first. Lets start with a brief summary of data quality.

Scales of Measurement Statistical information, including numbers and sets of numbers, has specific qualities that are of interest to researchers. These qualities, including magnitude, equal intervals, and absolute zero, determine what scale of measurement is being used and therefore what statistical procedures are best. Magnitude refers to the ability to know if one score is greater than, equal to, or less than another score. Equal intervals means that the possible scores are each an equal distance from each other. And finally, absolute zero refers to a point where none of the scale exists or where a score of zero can be assigned. When we combine these three scale qualities, we can determine that there are four scales of measurement. The lowest level is the nominal scale, which represents only names and therefore has none of the three qualities. A list of students in alphabetical order, a list of favorite cartoon characters, or the names on an organizational chart would all be classified as nominal data. The second level, called ordinal data, has magnitude only, and can be looked at as any set of data that can be placed in order from greatest to lowest but where there is no absolute zero and no equal intervals. Examples of this type of scale would include Likert Scales and the Thurstone Technique. The third type of scale is called an interval scale, and possesses both magnitude and equal intervals, but no absolute zero. Temperature is a classic example of an interval scale because we know that each degree is the same distance apart and we can easily tell if one temperature is greater than, equal to, or less than another. Temperature, however, has no absolute zero because there is (theoretically) no point where temperature does not exist. Finally, the fourth and highest scale of measurement is called a ratio scale. A ratio scale contains all three qualities and is often the scale that statisticians prefer because the data can be more easily analyzed. Age, height, weight, and scores on a 100-point test would all be examples of ratio scales. If you are 20 years old, you not only know that you are older than someone who is 15 years old (magnitude) but you also know that you are five years older (equal intervals). With a ratio scale, we also have a point where none of the scale exists; when a person is born

his or her age is zero. Table 8.1: Scales of Measurement Scale Level Scale of Measurement Scale Qualities Magnitude 4 Ratio Equal Intervals Absolute Zero Magnitude 3 Interval Equal Intervals 2 1 Ordinal Nominal Magnitude None Likert Scale, Anything rank ordered Names, Lists of words Temperature Age, Height, Weight, Percentage

Example(s)

Types of Distributions When datasets are graphed they form a picture that can aid in the interpretation of the information. The most commonly referred to type of distribution is called a normal distribution or normal curve and is often referred to as the bell shaped curve because it looks like a bell. A normal distribution is symmetrical, meaning the distribution and frequency of scores on the left side matches the distribution and frequency of scores on the right side. Many distributions fall on a normal curve, especially when large samples of data are considered. These normal distributions include height, weight, IQ, SAT Scores, GRE and GMAT Scores, among many others. This is important to understand because if a distribution is normal, there are certain qualities that are consistent and help in quickly understanding the scores within the distribution The mean, median, and mode of a normal distribution are identical and fall exactly in the center of the curve. This means that any score below the mean falls in the lower 50% of the distribution of scores and any score above the mean falls in the upper 50%. Also, the shape of the curve allows for a simple breakdown of sections. For instance, we know that 68% of the population fall between one and two standard deviations (See Measures of Variability Below) from the mean and that 95% of the population fall between two standard deviations from the mean. Figure 8.1 shows the percentage of scores that fall between each standard deviation. Figure 8.1: The Normal Curve

As an example, lets look at the normal curve associated with IQ Scores (See Figure 8.2). The mean, median, and mode of a Wechslers IQ Score is 100, which means that 50% of IQs fall at 100 or below and 50% fall at 100 or above. Since 68% of scores on a normal curve fall within one standard deviation and since an IQ score has a standard deviation of 15, we know that 68% of IQs fall between 85 and 115. Comparing the estimated percentages on the normal curve with the IQ scores, you can determine the percentile rank of scores merely by looking at the normal curve. For example, a th person who scores at 115 performed better than 87% of the population, meaning that a score of 115 falls at the 87 percentile. Add up the percentages below a score of 115 and you will see how this percentile rank was determined. See if you can find the percentile rank of a score of 70. Figure 8.2: IQ Score Distributions

Skew. The skew of a distribution refers to how the curve leans. When a curve has extreme scores on the right hand side of the distribution, it is said to be positively skewed. In other words, when high numbers are added to an otherwise normal distribution, the curve gets pulled in an upward or positive direction. When the curve is pulled downward by extreme low scores, it is said to be negatively skewed. The more skewed a distribution is, the more difficult it is to interpret. Figure 8.3: Distribution Skew

Kurtosis. Kurtosis refers to the peakedness or flatness of a distribution. A normal distribution or normal curve is considered a perfect mesokurtic distribution. Curves that contain more score in the center than a normal curve tend to have higher peaks and are referred to as leptokurtic. Curves that have fewer scores in the center than the normal curve and/or more scores on the outer slopes of the curve are said to be platykurtic. Figure 8.4: Distribution Kurtosis

Statistical procedures are designed specifically to be used with certain types of data, namely parametric and nonparametric. Parametric data consists of any data set that is of the ratio or interval type and which falls on a normally distributed curve. Non-parametric data consists of ordinal or ratio data that may or may not fall on a normal curve. When evaluating which statistic to use, it is important to keep this in mind. Using a parametric test (See Summary of Statistics in the Appendices) on non-parametric data can result in inaccurate results because of the difference in the quality of this data. Remember, in the ideal world, ratio, or at least interval data, is preferred and the tests designed for parametric data such as this tend to be the most powerful. Measures of Central Tendency There are three measures of central tendency and each one plays a different role in determining where the center of the distribution or the average score lies. First, the mean is often referred to as the statistical average. To determine the mean of a distribution, all of the scores are added together and the sum is then divided by the number of scores. The mean is the preferred measure of central tendency because it is used more frequently in advanced statistical procedures, however, it is also the most susceptible to extreme scores. For example, if the scores 8 9 and 10 were added together and divided by 3, the mean would equal 9. If the 10 was changed to 100, making it an extreme score, the mean would change drastically. The new mean of 8 9 and 100 would be 39. The median is another method for determining central tendency and is the preferred method for highly skewed distributions. The media is simply the middle most occurring score. For an even number of scores there will be two middle numbers and these are simply added together and divided by two in order to determine the median. Using the

same distribution as above, the scores 8 9 and 10 would have a median of 9. By changing the 10 to a score of 100 youll notice that the median of this new positively skewed distribution does not change. The median remains equal to 9. Finally, the mode is the least used measure of central tendency. The mode is simply the most frequently occurring score. For distributions that have several peaks, the mode may be the preferred measure. There is no limit to the number of modes in a distribution. If two scores tie as the most frequently occurring score, the distribution would be considered bimodal. Three would be trimodal, and all distributions with two or more modes would be considered multimodal distributions. Figure 8.5: Measures of Central Tendency

Interestingly, in a perfectly normal distribution, the mean, median, and mode are exactly the same. As the skew of the distribution increases, the mean and median begin to get pulled toward the extreme scores. The mean gets pulled the most which is why it becomes less valid the more skewed the distribution. The median gets pulled a little and the mode typically remains the same. You can often tell how skewed a distribution is by the distance between these three measures of central tendency.

Measures of Variability Variability refers to how spread apart the scores of the distribution are or how much the scores vary from each other. There are four major measures of variability, including the range, interquartile range, variance, and standard deviation. The range represents the difference between the highest and lowest score in a distribution. It is rarely used because it considers only the two extreme scores. The interquartile range, on the other hand, measures the difference between the outermost scores in only the middle fifty percent of the scores. In other words, to determine the interquartile range, th th the score at the 25 percentile is subtracted from the score at the 75 percentile, representing the range of the middle 50 percent of scores. The variance is the average of the squared differences of each score from the mean. To calculate the variance, the difference between each score and the mean is squared and then added together. This sum is then divided by the number of scores minus one. When the square root is taken of the variance we call this new statistic the standard deviation. Since the variance represents the squared differences, the standard deviation represents the true differences and is therefore easier to interpret and much more commonly used. Since the standard deviation relies on the mean of the distribution, however, it is also affected by extreme scores in a skewed distribution. The Correlation The correlation is one of the easiest descriptive statistics to understand and possibly one of the most widely used. The term correlation literally means co-relate and refers to the measurement of a relationship between two or more variables. A correlational coefficient is used to represent this relationship and is often abbreviated with the letter r. A correlational coefficient typically ranges between 1.0 and +1.0 and provides two important pieces of information

regarding the relationship: Intensity and Direction. Intensity refers to the strength of the relationship and is expressed as a number between zero (meaning no correlation) and one (meaning a perfect correlation). These two extremes are rare as most correlations fall somewhere in between. In the social sciences, a correlation of 0.30 may be considered significant and any correlation above 0.70 is almost always significant. The absolute value of r represents the intensity of any correlation. Direction refers to how one variable moves in relation to the other. A positive correlation (or direct relationship) means that two variables move in the same direction, either both moving up or both moving down. For instance, high school grades and college grades are often positively correlated in that students who earn high grades in high school tend to also earn high grades in college. A negative correlation (or inverse relationship) means that the two variables move in opposite directions; as one goes up, the other tends to go down. For instance, depression and self-esteem tend to be inversely related because the more depressed an individual is the lower his or her selfesteem. As depression increases, then, self-esteem tends to decrease. The sign in front of the r represents the direction of a correlation. Figure 8.6: Scatter plots for sample correlations

Scatter Plot. Correlations are graphed on a special type of graph called a scatter plot (or scatter gram). On a scatter plot, one variable (typically called the X variable) is placed on the horizontal axis (abscissa) and the Y variable is placed on the vertical axis (ordinate). For example, if we were measuring years of work experience and yearly income, we would likely find a positive correlation. Imagine we looked at ten subjects and found the hypothetical results listed in Table 8.2. Table 8.2: Sample Correlation Data Subject Number 1 2 3 4 5 Experience in Years 0 5 5 10 10 Income in Thousands 20 30 40 30 50 Subject Number 6 7 8 9 10 Experience in Years 15 20 25 30 35 Income in Thousands 50 60 50 70 60

Notice how each subject has two pieces of information (years of experience and income). These are the two variables that we are looking at to determine if a relationship exists. To place this information in a scatter plot we will consider experience the X variable and income the Y variable (the results will be the same even if the variables are reversed) and then each dot will represent one subject. The scatter plot in Figure 8.7 represents this data. Notice how the line drawn through the data points has an upward slope. This slope represents the direction of the relationship and tells us that as experience increases so does income. Figure 8.7: Scatter Plot for Sample Data

Correlation and Causality. One common mistake made by people interpreting a correlational coefficient refers to causality. When we see that depression and low self-esteem are negatively correlated, we often surmise that depression must therefore cause the decrease in self-esteem. When contemplating this, consider the following correlations that have been found in research: Positive correlation between ice cream consumption and drownings Positive correlation between ice cream consumption and murder Positive correlation between ice cream consumption and boating accidents Positive correlation between ice cream consumption and shark attacks

If we were to assume that every correlation represents a causal relationship then ice cream would most certainly be banned due to the devastating effects it has on society. Does ice-cream consumption cause people to drown? Does ice cream lead to murder? The truth is that often two variables are related only because of a third variable that is not accounted for within the statistic. In this case, the weather is this third variable because as the weather gets warmer, people tend to consume more ice cream. Warmer weather also results in an increase in swimming and boating and therefore increased drownings, boating accidents, and shark attacks. So looking back at the positive correlation between depression and self-esteem, it could be that depression causes self-esteem to go down, or that low self-esteem results in depression, or that a third variable causes the change in both. When looking at a correlational coefficient, be sure to recognize that the variables may be related but that it in no way implies that the change in one causes the change in the other. Specific Correlations. Up to this point we have been discussing a specific correlation known as the Pearson Product Moment Correlation (or Pearsons r) which is abbreviated with the letter r. Pearson is the most commonly cited correlation but can only be used when there are only two variables that both move in a continuous linear (straight line) direction. When there are more than two variables, when the variables are dichotomous (true/false or

yes/no) or rank ordered, or when the variables have a nonlinear or curved direction, different types of correlations would be used. The Biserial and Point Biserial Correlations are used when one variable is dichotomous and the other is continuous such as gender and income. The phi or tetrachoric correlations are used when both variables are dichotomous such as gender and race. And finally, Spearmans rho correlation is used with two rank ordered variables and eta is used when the variables are nonlinear.

Chapter Conclusion While this chapter only provided a quick basic summary of descriptive statistics, it should give you a good idea of how data is summarized. Remember, the goal of descriptive statistics is to describe, in numerical format, what is currently happening within a known population. We use measures such as the mean, median, and mode to describe the center of a distribution. We use standard deviation, range, or interquartile range to describe the variability of a distribution, and we use correlations to describe relationships among two or more distributions. By knowing this information and by understanding the basics of charting and graphing in descriptive statistics, inferential statistics become easier to understand

Chapter 9: Inferential Statistics


9.1 Introduction 9.2 Inferential Procedures 9.3 Research Error 9.4 Alternative and Null Hypothesis 9.5 Probability of Error 9.6 Type I and Type II Errors 9.7 Chapter Conclusion

Introduction Since the purpose of this text is to help you to perform and understand research more than it is to make you an expert statistician, the inferential statistics will be discussed in a somewhat abbreviated manner. Inferential statistics refer to the use of current information regarding a sample of subjects in order to (1) make assumptions about the population at large and/or (2) make predictions about what might happen in the future. The basic statistical methods explained in the previous chapter are used a great deal in inferential statistics, but the data is taken a step further in order to generalize or predict. We can easily determine the mean of a known sample of subjects by adding up all of their scores and dividing by the number of subjects. The mean of a sample is therefore a known variable. To determine the mean of the population that has not been testing or to predict the mean of a test that has not yet been taken requires the researcher to make assumptions because these variables are not known to us. The goal of inferential statistics is to do just that - to take what is known and make assumptions or inferences about what is not known. This chapter will focus on the basic statistical procedures used for various types of data and will conclude with an explanation of how this data is used to estimate errors and make inferences.

Inferential Procedures Specific procedures used to make inferences about an unknown population or unknown score vary depending on the type of data used and the purpose of making the inference. There are five main categories of inferential procedures that will be discussed in this chapter: t-test, ANOVA, Factor Analysis, Regression Analysis, and Meta

Analysis. t-Test. A t-test is perhaps the most simple of the inferential statistics. The purpose of this test is to determine if a difference exists between the means of two groups (think t for two). For example, to determine if the GPAs of students with prior work experience differs from the GPAs of students without this experience, we would employ the t-test by comparing the GPAs of each group to each other. To compare these groups, the t-test statistical formula includes the means, standard deviations, and number of subjects for each group. Each of these sets of data can be derived by using descriptive statistics discussed in the previous chapter. Therefore, the t-test can be computed by hand in a relatively short amount of time depending on the number of subjects within each data set. ANOVA. The term ANOVA is short for Analysis of Variance and is typically used when there are one or more independent variables and two or more dependent variables. If we were to study the effects of work experience on college grades, we would have one independent and one dependant variable and a simple t-test would suffice. What if we also wanted to understand the effects of age, race, and economic background on college grades? To use a simple t-test would mean we would have to perform one t-test for every pair of data. For this example, we would need to compare work and grades, age and grades, race and grades, and income and grades, resulting in four independent statistical procedures. Add an additional dependent variable, such as length of time it takes to graduate and we double the number of procedures required to eight. We could do eight t-tests or we could simply do an ANOVA, which analyzes all eight sets of data at one time. The ANOVA is superior for complex analyses for two reasons, the first being its ability to combine complex data into one statistical procedure. The second benefit over a simple t-test is the ANOVAs ability to determine what are called interaction effects. With a t-test we could determine if the means of older and younger students are different on the variable of grades (referred to as a main effect). We could also determine whether or not the means of whites and blacks differed in terms of grades (main effect as well), but we could not determine how these two variables (age and race) interact with each other. Consider the data in Table 9.1, representing the number of data points we would have for a study with just three independent variables (each with only two levels) and two dependent variables. If you look at the data closely, you may notice that the mean GPA for blacks is 3.0 and the mean GPA for whites is also 3.0. A simple t-test comparing the means of blacks and whites would certainly not find a difference. However, when you combine with this the interaction of GPA and age, the data looks completely different. The mean GPA is 2.5 for older blacks, 3.5 for older whites, 3.5 for younger blacks, and 2.5 for younger whites. Now we can see that there is a difference between blacks and whites: (1) older blacks have higher GPAs than older whites and (2) younger whites have higher GPAs than younger blacks. This represents the interaction effects of race and age that would not have been detected by a simple t-test. Table 9.1: Hypothetical Three Way Analysis of Variance with Two Means Independent Variables Dependent Variables Work Yes No Age Older Older Race Black Black GPA 3.0 2.0 Time 12 8

Yes No Yes No Yes No

Older Older Younger Younger

White White Black Black

4.0 3.0 3.0 4.0 2.0 3.0

12 8 4 8 4 8

Younger White Younger White

Looking at work experience and length of time to graduation also reveals interesting results. For those with work experience, the mean time to graduation was eight years. For those without work experience, the average time to graduation was also eight years. But this simple main effect does not tell the whole story. See if you can determine any interaction effects that play a role in the length of time to graduation. Factor Analysis. A factor analysis is used when an attempt is being made to break down a large data set into different subgroups or factors. By using a somewhat complex procedure that is typically performed using specialized software, a factor analysis will look at each question within a group of questions to determine how these questions accumulate together. If we were to give a class a test on basic mathematics and then perform a factor analysis on the results, for example, we would likely find that questions related to addition tend to be answered at the same rate and questions related to subtraction would tend to be answered at the same rate. In other words, students who are good at addition would do well on most addition questions and students who were poor at addition would score poorly on most addition questions. Therefore a math test consisting of addition and subtraction would likely have two factors. Regression Analysis. When a correlation is used we are able to determine the strength and direction of the relationship between two or more variables. If we determined that the correlation between a midterm and a final exam was +.95, we could say that these two tests are strongly and directly related to each other. In other words, a student who scored high on one would likely score high on the other. Regression Analysis takes this a step further. By creating a regression formula based on the known data, we can predict a students score on the final (for example) merely by knowing her score on the midterm. If two variables were correlated at +1.0 or 1.0 (perfect correlations) this prediction would be extremely accurate. If the correlation coefficient was +/-0.9, the prediction would be good but less accurate than a perfect correlation. The farther from a perfect correlation, the less accurate the results of the prediction. Take a look at the perfectly correlated scores for the first five students below and see if you can predict the final exam score for the sixth student based on her score on the midterm. Table 9.2: Hypothetical Test Scores Student Midterm Final

Bob Sue Ling Frank Henry Lisa

80 50 60 80 90 70

88 55 66 88 99 ??

When the data set is much larger and the correlation less than perfect, making a prediction requires the use of the statistical regression, which is basically a geometric formula used to determine where a score falls on a straight line. By using this statistic, we develop a formula that is used to estimate one data point based on another data point in a known correlation. The formula for the data above would be Final = Midterm X 1.1. Did you predict Lisas score on the final correctly? Meta Analysis. A meta analysis refers to the combining of numerous studies into one larger study. When this technique is used, each study becomes one subject in the new meta study. For instance, the combination of 12 studies on work experience and college grades would result in a meta study with 12 subjects. While the process is a little more complex than this in reality, the meta analysis basically combines many studies together to determine if the results of all of them, when taken as a whole, are significant. The meta study is especially helpful when different related studies conducted in the past have found different results.

Chapter 9: Inferential Statistics


9.1 Introduction 9.2 Inferential Procedures 9.3 Research Error 9.4 Alternative and Null Hypothesis 9.5 Probability of Error 9.6 Type I and Type II Errors 9.7 Chapter Conclusion

Research Error Every statistic contains both a true score and an error score. A true score is the part of the statistic or number that truly represents what was being measured. An error score is that part of the statistic or number that represents something other than what is being measured. Imagine standing on your bathroom scale and weighing 140 pounds then standing on your doctors scale an hour later and weighing 142. Is it likely you gain 2 pounds on the way to the doctors office? The difference between the two numbers has much more to do with error than it does weight gain, especially in that short of a time span. When a scale, or any measuring device, provides a score, this score is actually only an estimate of what your true score really is. When your bathroom scale reads 140 pounds, it should be interpreted as an estimate of your true weight which may actually be 141. If this is the case, then your score (weight, in this case) of 140 represents 141 pounds of true weight and one pound of error.

Confidence Level. When we use statistics to summarize any phenomenon, we are always concerned with how much of that statistic represents the true score and how much is error. Imagine a person scores a 100 on a standardized IQ test. Is his true IQ really 100 or could this score be off some due to an unknown level of error? Chances are that there is error associated with his score and therefore we must use this score of 100 as an estimate of his true IQ. When using an achieved score to estimate a true score, we must determine how much error is associated with it. Methods to estimate a true score are called estimators, and fall into three main groups: Point Estimation; Interval Estimation; and Confidence Interval Estimation. Point Estimation. In point estimation, the value of a sample statistic or achieved score is used as a best guess or quick estimate of the population statistic or true score. In other words, if a sample of students average 78 on a final examination, you could estimate that all students would average 78 on the same test. The major concern of point estimation is the lack of concern for error; the achieved score is assumed to be the true score. Interval Estimation. Interval estimation goes a step further and assumes that some level of error has occurred in the achieved score, which is almost always the case. If the sample students achieve an average of 78, we could estimate the amount of error and then provide an estimate of the true score based on an interval rather than a single point. There are different methods to determine error but perhaps the most commonly used is called the standard error of the mean. Using a simple statistical formula, the amount of error is determined and the true score is said to be the achieved score plus or minus the standard error of the mean. For instance, if the students average 78 on their exam and the standard error of the mean is determined to be 3 points, the students true average would be estimated as 78 +/- 3 or between 75 and 81. Confidence Interval Estimation. The confidence interval estimation uses the same method as the interval estimation but provides a level of confidence or certainty in the true score. Through more complex statistics, a specific level of confidence in an interval can be determined. We might say then, based on these statistics, that we are 95% confident that the true score lies somewhere between 78 and 81. The more confident we are, the larger the interval. Imagine this exam has a possibility of 100 points. We would be 100% sure than a student will score somewhere between 0 and 100. In fact, we are always 100% confident that a true score falls somewhere between the minimum possible score and the maximum possible score. Narrowing the true score down, however, reduces our level of confidence. We might only be 98% sure that the true score falls somewhere between 70 and 90, and only 95% confident that the true score falls somewhere between 75 and 81. A good way to look at confidence interval estimation is to consider the role of a six-sided dice. How confident would you be that rolling the die once would result in a number between one and six? You should be 100% confident because those are the only possible scores. How sure would you be that the role would net an even number or an odd number? Since half of the numbers are even and half are odd, you would be 50% confident that one of these two possibilities would occur. Now, what about rolling only a one? Since there are six possible scores and you are estimating the roll to net only one of those six, you should see the odds as 1:6. Therefore you would be about 17% confident that the next roll would result in a score of one. The more we pinpoint the score, the less confident we are in our prediction.

Alternative and Null Hypothesis The purpose of any research is to determine if your theory is true or not based on statistical analysis. A theory is an educated guess about a relationship but in order for research to be conducted on a theory, it must first be operationalized. To operationalized a theory, all variables must be defined and the methods of conducting the research must be determined. Once this is done, the resulting statement about the relationship is called a hypothesis. The hypothesis is what gets tested in any research study. As discussed in chapter one, every experiment has two hypotheses. The null hypothesis states that there is no change or difference as a result of the independent variable. In other words, work experience does not result in a difference in grades among college students. The alternative hypothesis states that there is a change or difference. When we perform statistics, we are always testing for the null and therefore results of any statistical procedures are always stated in regard to the null hypothesis. If we find that students with work experience perform at the same level as those without work experience, for example, our results show that there is no difference. We would therefore accept our null hypothesis. If we find that one group performs significantly different than the other, we would then reject the null hypothesis, and by definition, accept the alternative.

Probability of Error Since every score has some level of error researchers must decide how much error they are willing to accept prior to performing their research. This acceptable error is then compared with the probability of error and if it is less, the study is said to be significant. For example, if we stated that we would accept 5% error at the onset of the study and our results indicated that the probability of error was 3%, we would reject the null hypothesis and state that the difference between the two groups was significant. If, however, the probability of error were shown to be 6%, we would accept the null hypothesis and state that the difference between the two groups was not significant. The probability of error is often abbreviated with a lower case p, and the acceptable error is abbreviated with a lower case alpha (). When we accept the null, then p > and when we reject the null, then p < = You will often see these symbols at the end of significance statements in research reports. While alpha can change, depending on the level set at the onset of the experiment, it should not change once the experiment begins. Common levels of acceptable error (referred to as significance) include, in order of use, 0.05, 0.01, 0.001, and 0.1. Type I and Type II Errors Since we are accepting some level of error in every study, the possibility that our results are erroneous are directly related to our acceptable level of error. If we set alpha at 0.05 we are saying that we will accept 5% error, which means that if the study were to be conducted 100 times, we would expect significant results in 95 studies, and non-significant results in 5 studies. How do we then know that our study doesnt fall in the 5% error category? We dont. Only through replication can we get a better idea of this. There are two types of error that researchers are concerned with: Type I and Type II. A Type I error occurs when the results of research show that a difference exists but in reality there is no difference. This is directly related to alpha in that alpha was likely set too high and therefore lowering the amount of acceptable error would reduce the chances of a Type I error.

Lowering the amount of acceptable error, however, also increases the chances of a Type II error, which refers to the acceptance of the null hypothesis when in fact the alternative is true. When there is a significant difference in the population but we fail to find this difference, our study is said to lack power. Power, abbreviated with the upper case beta (b), refers to a studys strength to find a difference when a difference actually exists. In other words, the greater the chances of a Type I error, the less likely a Type II error, and vice versa. These two errors are summarized in Figure 9.1 Figure 9.1: Type I and Type II Errors

Chapter Conclusion Inferential statistics are used to make inferences about an unknown variable based on known descriptions. Making sure you have a solid understanding of descriptive statistics plays an important role in taking this data to the next step. In this chapter we discussed basic inferential procedures and the different scenarios in which they are used. A t-test is a simple procedure used to compare the means of two groups. An ANOVA allows us to compare multiple groups on multiple independent variables and therefore can look at both main and interaction effects within the data set. The factor analysis is used to find subgroups or factors within a large distribution of scores. A regression analysis uses the correlation in order to make predictions about future or unknown scores. And finally, the meta analysis is used to combine multiple studies into one larger study. We also discussed alternative and null hypothesis and the determination of type I and type II errors. These last two identify the significance of any results section. Remember, if the probability of error in your study is greater than your accepted error, no matter what the numbers look like, you must accept the null hypothesis. In this case, the difference that may appear in the raw data is not significant enough to infer a difference in the overall population. Conversely, if the accepted error is less than the probability of error, you must reject the null hypothesis. There is no gray area in this aspect as there are only two outcomes of any inferential procedure: reject or accept the null.

Chapter 10: Critical Analysis


10.1 Introduction 10.2 Abstract 10.3 Introduction (Literature Review) 10.4 Methods 10.5 Results 10.6 Discussion 10.7 References 10.8 Appendices 10.9 Chapter Conclusion

Introduction While many professionals in education, psychology, management, and other social science fields perform research and use statistics to analyze results, many more read the results of research and apply it to the real world. Therefore it is vitally important to be able to critically analyze a research report to determine if the methods and results are valid and if they apply to you as a professional. This chapter will look at each of the major sections of the research report and will provide ideas for what to look for, how to apply the information, and how to determine if a specific study is worth incorporating into your work.

Abstract The abstract is almost always the first section we read in a research report. As such, it should provide valuable information about what is included in the remainder of the report. According to the APA manual, the abstract should be no more than 150 words in order to provide enough information about the paper without replicating large sections. In other words, a good abstract will include, in general, one or two sentences for each of the following: (1) Statement of the problem; (2) Brief summary of the literature; (3) Brief Summary of the methods used in the present research; (4) Brief summary of the results found in the present research; and (5) The significance of the present study and/or need for further research.

Introduction (Literature Review) The purpose of the introduction is to give the reader a solid background of the phenomena being studied. It should provide the reader with information that will lead to the statement of the problem. This information can include a need to replicate previous studies due to shortcomings or to apply previous research to a different population. It can also discuss a new theory and the qualitative and quantitative data that has lead to this new idea about a relationship. In general, by the time the introduction section is read, the reader should be able to understand the need for the present study and have a solid understanding of the researchers theory.

Methods The methods section is often the most precisely written part of a research report. Since replication and analyzing methods is so important, a good deal of time should be spent analyzing this section. As a consumer of research, it is imperative for you to understand the foundation of each study and be able to critically analyze how the data that will lead to the results section was derived. When reading the methods section you should look for information regarding the subjects and the manner in which the subjects were selected. You should be able to discuss the pitfalls of not using randomization, or of various types of randomization. You should be able to understand the strengths and weaknesses of the type of design used and how the researchers used control groups or groups that were not equivalent. The use of standardized procedures is also important, as we ideally want every group to experience the same environment except for the variable(s) being measured. If confounding variables are not controlled for, you should be able to discuss how this lack of control might impact the results of the study. Issues related to internal and external validity should be carefully addressed as well. Look for how the researchers

addressed subject maturation in longitudinal or longer term studies, how they handled extreme scores and the tendency for scores to regress toward the mean, and how they dealt with differences in mortality or drop out rates between groups. And finally, you should be aware of what assessment procedures were used, whether these instruments are valid and reliable, and whether or not they were used correctly.

Chapter 10: Critical Analysis


10.1 Introduction 10.2 Abstract 10.3 Introduction (Literature Review) 10.4 Methods 10.5 Results 10.6 Discussion 10.7 References 10.8 Appendices 10.9 Chapter Conclusion

Results The results section will likely require at least some basic understanding of statistics as this section is often the most technical. The main ideas that should be analyzed in a results section include the statistical procedures used, the reporting of the numerical findings, and the determination of significance. The procedures used should correspond with the data they are working with. For instance, if the data is nominal or ordinal and the procedure used was parametric (See Summary of Statistics), then the results will be skewed at best and completely invalid at worst. The numerical results of the statistics should also be reported so if the study shows a significant t test, the actual ttest score, the degrees of freedom, and the probability of error (p) as compared with the acceptable error (a) should all be reported. Finally, the decision for how much error was acceptable for the researchers should be addressed. Typically .05 or .01 are used, representing an acceptable error level of 5% or 1%, respectively. If the level is set different from these two, the rationale should be addressed.

Discussion The discussion section allows the researchers to qualify their results and to discuss areas of concern regarding their research and areas of further study that may be needed. A good researcher will want to critique his or her own study before others do. This provides the reader with a more qualitative understanding of the findings. Shortcomings such as a lack of randomization, failure to use a control group, as well as many other issues, should be addressed and explained. Not only does this show that the researcher didnt just arbitrarily omit steps, but it may help future researchers improve upon the present study. The results of the research and the relevance to the topic at large should also be addressed. This is mainly to provide insight into what research might be completed in the future to either fix problems in the current study or advance the original theory. The researcher is typically well versed in the topic of study, or at least should be, and can help lead the profession to further advancements and knowledge through his discussion of the current study. The discussion section is the most qualitative part of a research report and therefore may be the most important section of the entire paper.

References References allow the reader to look up information and read more about particular phenomena or research that

was discussed in the current study. While reading the study, the quality of the references should be considered. If, for instance, a study uses research from unprofessional or questionable sources, the validity of his arguments will also be questionable. Look for journals that are known and respected in the profession as this will help to determine the strength of the supporting material that led to the development of the original theory.

Appendices A research study can have many appendices or it can have none. When addressing this section, look at the need for the data in the first place. Did the author place information here just to take up space? Is the information confusing and difficult to understand? Was it addressed in he paper or just included arbitrarily? Does it provide information that helps the reader understand the methods or results of the study? And finally, does it provide assistance in analyzing the study as a whole and the need for replication or further research?

Chapter Conclusion While many professionals in education and the social sciences perform research, the majority of us use this information in real life application. Teachers use research regarding teaching methods and learning style to help improve the education of their students. Therapists use research to provide better treatment for specific mental illnesses. Managers use research to help them improve retention rates, worker satisfaction, or communication. The purpose of research, then, is not merely to gather information, but to communicate this information to the research consumers. Through this text, you should have a solid understanding of the importance of research, the methods of developing a hypothesis, and the specific designs used for particular types of research. You should understand the importance of standardization, randomization, controlling for confounds, and assuring internal and external validity. You should have a basic understanding of descriptive and inferential statistics, and be aware that the foundation for your study and the discussion of results are often more important than the results themselves. Perhaps most importantly though, you should understand how to critically analyze a research report from start to finish. You should be able to address the strengths and weaknesses of particular designs and the specific methods of a study. You should be able to discuss results and combine studies in order to develop your own theories and to apply this information to your professional life. Research is not an end in itself, nor is it a means to an end. Research is merely a continuing and vital part of our need to understand and grow.

1.2 Why Have We Chosen to Work with SPSS? There is no question that business, education, and all fields of science have come to rely heavily on the computer. This dependence has become so great that it is no longer possible to understand social and health science research without substantial knowledge of statistics and without at least some rudimentary understanding of statistical software. The number and types of statistical software packages that are available continue to grow each year. In this book we have chosen to work with SPSS, or the Statistical Package for the Social Sciences. SPSS was chosen because of its popularity within both academic and business circles, making it the most widely used package of its type. SPSS is also a versatile package that allows many different types of analyses, data transformations, and forms of output - in short, it will more than adequately serve our purposes. The SPSS software package is continually being updated and improved, and so with each major revision comes a new version of that package. In this book, we will describe and use the most recent version of SPSS, called SPSS for Windows 14.0 2.1 Introduction to SPSS The capability of SPSS is truly astounding. The package enables you to obtain statistics ranging from simple descriptive numbers to complex analyses of multivariate matrices. You can plot the data in histograms, scatterplots, and other ways. You can combine files, split files, and sort files. You can modify existing variables and create new ones. In short, you can do just about anything you'd ever want with a set of data using this software package. A number of specific SPSS procedures are presented in the chapters that follow. Most of these procedures are relevant to the kinds of statistical analyses covered in an introductory level statistics or research methods course typically found in the social and health sciences, natural sciences, or business. Yet, we will touch on just a fraction of the many things that SPSS can do. Our aim is to help you become familiar with SPSS, and we hope that this introduction will both reinforce your understanding of statistics and lead you to see what a powerful tool SPSS is, how it can actually help you better understand your data, how it can enable you to test hypotheses that were once too difficult to consider, and how it can save you incredible amounts of time as well as reduce the likelihood of making errors in data analyses.

Introduction to SPSS
SPSS is a software package used for conducting statistical analyses, manipulating data, and generating tables and graphs that summarize data. Statistical analyses range from basic descriptive statistics, such as averages and frequencies, to advanced inferential statistics, such as regression models, analysis of variance, and factor analysis. SPSS also contains several tools for manipulating data, including functions for recoding data and computing new variables, as well as for merging and aggregating datasets. SPSS also has a number of ways to summarize and display data in the form of tables and graphs.

1.2. Overview of SPSS for Windows


SPSS for Windows consists of five different windows, each of which is associated with a particular SPSS file type. This document discusses the two windows most frequently used in analyzing data in SPSS, the Data Editor and the Output Viewer windows. In addition, the Syntax Editor and the use of SPSS command syntax is discussed briefly. The Data Editor is the window that is open at start-up and is used to enter and store data in a spreadsheet format. The Output Viewer opens automatically when you execute an analysis

or create a graph using a dialog box or command syntax to execute a procedure. The Output Viewer contains the results of all statistical analyses and graphical displays of data. The Syntax Editor is a text editor where you compose SPSS commands and submit them to the SPSS processor. All output from these commands will appear in the Output Viewer. This document focuses on the methods necessary for inputting, defining, and organizing data in SPSS.

1.3 Bivariate and Multivariate Correlational Research In any field of science, research represents the way in which predictions are tested, theories are developed, and the knowledge base is expanded. Without research, science would not exist, and so understanding research methodology is the key to understanding how scientists come to know what they do. Only by understanding the research process can you grasp the power of the scientific method while also realizing the limitations (and possibility of errors) inherent in the process. Scientists use a variety of methodologies in their pursuit of knowledge, and the social and behavioral sciences employ just about all of them. Some social science research is aimed primarily at description - the researcher may be interested in describing people's responses on a number of individual variables. For example, a political scientist may want to characterize attitudes toward the President and his social policies. Or a sociologist may want to describe the characteristics of a particular religious cult. In either case, the researcher would choose a variety of statistical measures, called univariate and multivariate procedures, to describe responses on each of these individual variables. Typically these statistical measures would include indices of centrality (e.g., a mean or median) and dispersion (e.g., the range or standard deviation). Conceptually, these are not complex statistics; but computation - actually calculating them - may require tedious (and error-prone) work. Who would want to calculate by hand (or even with a calculator) the mean response to four questions taken from 2,200 people, a sample size not uncommon in some areas of social science research? Beyond simple description of individual variables, a great deal of research attempts to show that two variables are in some way related to one other. Often, this type of research is also descriptive in nature, but rather than describing single variables, it describes relationships between two or more variables. If the procedure attempts to relate just two variables at a time, we refer to the techniques as bivariate. For example, a sociologist might attempt to show that there is a relationship between one's attitudes about a President and one's income level. Or a nutritionist may demonstrate a correlation between one's daily sodium intake and blood pressure. In this type of research, the investigator takes measures on two variables and determines how strongly they are linked and, in some cases, whether they are positively or negatively related. Common bivariate strategies with which you may be familiar include crosstabulation and correlation. As you may have already learned, although a researcher may be able to demonstrate a strong relationship between variables with correlational methodology, s/he is never justified in concluding causality between variables. Monthly sales of winter coats correlates negatively with the number of building construction injuries, yet obviously there is no causal connection between these two variables. Can you identify the mysterious unknown variable (sometimes called a lurking variable!) that explains the changes in both variables? (Hint: everybody talks about it, but no one ever does anything about it!) Of course, the correlational approach to understanding behaviors, attitudes, or responses need not be limited to the investigation of just two variables at a time. One's blood pressure (or other measures of cardiovascular health) may be related to many variables besides sodium intake, including exercise, water intake, other dietary factors, family history, medications, and so on. Statistical procedures which

determine how several variables might relate to one or more variables of interest (e.g., blood pressure, and perhaps other indices of cardiovascular health) are referred to as multivariate procedures. Such procedures can be conceptually quite complex, and computationally, they are nearly impossible to perform without a computer.

1.4 Experimental Research A third research strategy, called the experimental method, represents the scientist's way of trying to establish a cause-effect relationship between two or more variables. In a classic experimental design, the researcher systematically manipulates or changes the variable presumed to be the causal agent (this variable is called the independent variable) and then determines whether there are corresponding changes in the variable under observation (the presumed effect called the dependent variable). Consider the previous correlational example investigating the relationship between sodium intake and blood pressure. If we should in fact find a correlation between these variables, we might then be interested in trying to establish a causal link between them. The experimenter might randomly assign participants to groups that receive different amounts of daily sodium intake (this represents a way of manipulating or changing the value of the independent variable) and compare blood pressure (the dependent variable) across the groups after a specified period of time. The experimenter would also attempt to control unwanted or extraneous variables (e.g., family history of cardiovascular disease) which might impact the dependent variable by holding these variables constant (e.g., equating the groups on family history). These are sometimes called extraneous variables, or if held constant, they may be called control variables. By using the experimental method involving manipulating and independent variable while controlling extraneous variables, the researcher could indeed establish a stronger causal connection between the two variables of interest, in this case, sodium intake and blood pressure. Examples of statistics used in experimental research include the t-test and Analysis of Variance (ANOVA). Just as with correlational procedures, experimental methods need not be restricted to the investigation of two variables at a time. They too may be multivariate, in that the effects of several independent variables may be investigated on one or more dependent variables simultaneously. As you might expect, these procedures can become quite complex, and are invariably simplified with the aid of a computer. The aforementioned distinctions among descriptive, correlational, and experimental methodologies are convenient categorizations for the beginning student, but in fact these clean distinctions are often lost in actual research situations. Some experimental designs are called quasi experimental because they use independent variables which cannot be manipulated (e.g., race, eye color, or IQ). These designs are really more correlational than experimental in that they do not permit strong cause-effect conclusions. Some multivariate procedures combine correlational and experimental strategies (an example is a procedure called analysis of covariance). And indeed, some statistical experimental designs may be considered a variation of the "general linear model," one which is most easily conceptualized in terms of multiple regression, a correlational technique. While this overlap in methodologies is certainly confusing to the beginning student of statistics, one point clearly emerges. Research strategies and ensuing statistical procedures come in all shapes, sizes, and variations. These more subtle distinctions, however, are not critical for the beginning

student of science; nevertheless, when dealing with multivariate procedures, their understanding becomes more useful, and so we will reiterate some of these distinctions when we approach those topics.

1.5 So Where Do Statistics and Computers Fit In? No matter what scientific methodology is used, the variables under investigation must always be measured in some way or another. And measurement generates numbers - or data - often much more data than we can readily interpret by quickly scanning the individual scores. In multivariate psychological or sociological studies, as many as 50-500 variables on over several hundred participants may be measured in a single study. As a result, statistical tools have been devised to help the researcher summarize and interpret raw data. Statistical procedures not only provide a way of summarizing data into quick and manageable information (if you're in a hurry, knowing the mean IQ of 100 ninth graders is probably more meaningful than viewing all 100 IQ scores), but they also help the researcher make decisions about whether relationships between two or more variables, such as in our example with blood pressure and sodium intake, are actually real ones. Just as statistical methods provide us with a tool for transforming data into manageable information, computers enable us to use statistical tools of greater complexity and with greater effectiveness. One can crush ice cubes, for example, by beating them with a rolling pin, or by using the blender--which would you prefer? Computers offer the same type of advantage when one is dealing with the processing of information. Why use a calculator, or rely solely on your own mental capabilities, when computers can achieve the same ends with a lot less agony? Many jobs that require several hours of computer time would require days, even months, of human time. In fact, some tasks, because of their complexity and length, simply could not be performed without the help of the computer. But another advantage is that computers do not make mistakes. So called computer errors are usually nothing more than human errors (e.g., giving the wrong command, a mistake in the software program, etc.) that are attributed to the computer. In nearly all cases, the computer is merely doing what it has been instructed to do, rather than what it should have been instructed to do. Finally, the computer never tires or complains about tedious or repetitive tasks. Have you ever had to balance several months of checkwriting in your checkbook? As you know, even these relatively simple tasks can end up being rather arduous, especially if your balance and the bank's balance don't match. Just imagine how tedious the bank's job would be if its staff had to perform all the calculations for the checking accounts by hand, or with calculators! So, while computers may be sophisticated pieces of machinery beyond your immediate comprehension (you don't really have to know what a chip is or how many bytes are on the hard disk in order to use a computer effectively), they function simply as tools that can help you do your job more easily, efficiently, and effectively. As with any tool, you, the apprentice, may need to understand some general characteristics of the tool in order to use it effectively, and, as you might expect, the more training you have, the easier it will be to master its use. Now that you know that the computer is a tool, what does this tool work on? Is a computer really analogous to, say, a more conventional tool such as a table saw or blender? In a manner of speaking, yes. Just as these conventional tools can help transform raw material into finished products (wood into a desk, or vegetables into a puree), a computer starts with the raw material of data and produces

a transformed product in the form of information. Although computers can also perform a wide range of other functions, the transformation of numbers (data) into information represents the traditional role of computers. This is exactly what you will be doing with the computer when you learn to use a statistical software package. Consider the simple example of isolated numbers in Table 1.1, Column A. Here we have raw data, a series of numbers with no immediate meaning. Table 1.1 B Week's High Temperatures Monday: 71 degrees Tuesday: 73 degrees Wednesday: 68 degrees Thursday: 61 degrees Friday: 65 degrees Week's Average: 67.6 degrees But now consider the numbers under Column B. Now that headings have been added, the numbers organized, and the average computed, the numbers take on meaning - they have become information. In this case, data were processed (though not necessarily with a computer), with the result being information. Essentially, the computer performs a similar function: it takes a series of symbols and processes them so as to allow a meaningful interpretation. <

A Score 71 73 68 61 65

1.6 Answers to Some Basic Questions about Computers

1.6a How do you make the computer do what you want?

Now that you know some basics about computers, let's talk about how we make the computer process data to produce information. Obviously, we need a way to communicate with the computer to make it do what we want. This basic process is accomplished with the use of software. Software is a program or collection of programs used to perform a function. A program is a set of specific instructions for the computer to execute. A software package refers to the software program, along with other materials such as manuals. However you decide to use the computer, it is the software that enables you to do so. If you want to use the computer to keep track of various business accounts, you need software that allows you to do that. If you want to use the computer for word processing, or data analysis, or producing graphs, you need software that can be used for each different application. Our primary interest is in learning to use a software program for data analysis, but as part of that process, it is often necessary to learn to use other software systems (e.g., Windows) on your computer as well. With the advent of the first statistical package for data analysis in the mid 1960's, the task of telling the computer what to do and how to do it became greatly simplified for the person interested in data processing.

1.6b What is a statistical software package?

A statistical software package consists of a series of pre-written and pre-tested programs that perform a number of specified operations on data. For example, the software package may have a program that calculates the mean and median for a set of data. Many statistical software packages are currently available. Since these packages are merely large software programs, they are purchased separately from the computer and separately from each other. They are then stored on disk as part of the secondary or auxiliary memory of the main computer or server. Or if you're using a single-user (PC) system, they may be stored directly on the hard drive. The cost or annual subscription to these packages may range anywhere from several hundred to thousands of dollars. Each statistical package has its own set of unique capabilities and commands. However, there are common elements to the logic behind almost all statistical packages, so as you learn one system, you'll also find it easier to work with other systems.

1.6c Overview and characteristics of statistical software packages

The number and types of statistical software packages that are available continue to grow each year. Often these packages are designed to target a particular audience, for example, researchers specifically in the social science researchers, or in biomedical fields, or public health, and so on. Statistical software packages share many common features. For example, most can transform data, create and merge data files, construct new variables, and perform various statistical computations such as means, correlations, t-tests and so on. But not all packages can perform all operations as easily or efficiently, so there may be times when the researcher finds it necessary to use a package with which s/he is not familiar. A situation like this may arise when the user wants to perform a statistical analysis that is infrequently used by social scientists. Whereas one package may not be capable of performing this analysis easily, another might have just that capability. But as we've mentioned, once you have learned the logic and rules of one statistical software package, it is usually quite easy to switch to another package