Research

CONTENTS
CHAPTER TOPIC Introduction to Educational Research 10 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Quantitative, Qualitative, and Mixed Research 19 Developing Research Questions and Proposal Preparation 29 Research Ethics 34 Standardized Measurement and Assessment 45 Methods of Data Collection 58 Sampling 65 Validity of Research Results 78 Experimental Research 94 Quasi-Experimental and Single-Case Designs 105 Non-experimental Quantitative Research 113 Qualitative Research 119 Historical Research 122 Mixed Model and Mixed Method Research 132 Descriptive Statistics 144 Inferential Statistics 162 Data Analysis in Qualitative Research 174 Preparation of the Research Report PAGES 1 1
Page 1 of 179
Chapter 1 Introduction to Educational Research The purpose of Chapter One is to provide an overview of educational research and introduce you to some important terms and concepts. My discussion in this set of lectures will usually center around the same headings that are used in the book chapters. You might want to have your book open as you read through my lectures. My goal is to help you to better understand the material in the book. Why Study Educational Research? Here are a few reasons to take this course and learn about educational research: To become "research literate." Because we live in a society that's driven by research. To improve your critical thinking skills. To learn how to read and critically evaluate published research. To learn how to design and conduct research in case the need arises one day. Areas of Educational Research There are many areas in educational research. As you can see in Table 1.1 (reproduced here for your convenience), there are 10 major divisions in our largest Association and there are many special interest groups (SIGs). Do you see any areas that are of interest to you?
Page 2 of 179
To learn more about the areas of educational research and current issues, we recommend that you explore the AERA website at http://aera.net . By the way, The AERA has great student membership rates. Examples of Educational Research Many examples of educational research are discussed throughout your textbook. To get you started, we have reproduced the abstracts from four journal articles in this section of the book.
Page 3 of 179
An excellent way to see examples of recent educational research articles is to browse through educational journals. One excellent journal to get you started is entitled the "Journal of Educational Psychology." General Kinds of Research In this section we discuss five general kinds of research: basic research, applied research, evaluation research, action research, and orientational research. Basic and Applied Research Basic research is research aimed at generating fundamental knowledge and theoretical understanding about basic human and other natural processes. Applied research is focused on answering practical questions to provide relatively immediate solutions. Basic and applied research can be viewed as two endpoints on a research continuum, with the center representing the idea that research can be applied research can contribute to basic research and vice versa. Here is the continuum: Basic............................Mixed.............................Applied Research examining the process of cognitive "priming" is an example of relatively basic research; a comparison of the effectiveness of two approaches to counseling is an example of relatively applied research. Basic and applied research are generally conducted by researchers at universities. Evaluation Research Evaluation involves determining the worth, merit, or quality of an evaluation object. Evaluation is traditionally classified according to its purpose: Formative evaluation is used for the purpose of program improvement. Summative evaluation is used for the purpose of making summary judgments about a program and decisions to continue of discontinue the program. A newer and currently popular way to classify evaluation is to divide it into five types: Needs assessment, which ask this question: Is there a need for this type of program? Theory assessment, which asks this question: Is this program conceptualized in a way that it should work? Implementation assessment, which asks: Was this program implemented properly and according to the program plan? Impact assessment, which asks: Did this program have an impact on its intended targets? Efficiency assessment, which asks: Is this program cost effective? Evaluation is generally done by program evaluators and is focused on specific programs or products.
Page 4 of 179
Action Research Action research focuses on solving practitioners local problems. It is generally conducted by the practitioners after they have learned about the methods of research and research concepts that are discussed in your textbook. It is important to understand that action research is also a state of mind; for example, teachers who are action researchers are constantly observing their students for patterns and thinking about ways to improve instruction, classroom management, and so forth. We hope you get this state of mind as you read our textbook! Orientational Research Orientational research is done for the purpose of advancing an ideological position. It is traditionally called critical theory. We use the broader term orientational research because critical theory was originally concerned only with class inequalities and was based on the Karl Marxs theory of economics, society, and revolution. Orientational research is focused on some form of inequality, discrimination, or stratification in society. Some areas in which inequality manifests itself are large differences in income, wealth, access to high quality education, power, and occupation. Here are some major areas of interest to orientational researchers: Class stratification (i.e., inequality resulting from ones economic class in society). Gender stratification (i.e., inequality resulting from ones gender). Ethnic and racial stratification (i.e., inequality resulting from ones ethnic or racial grouping). Sexual orientation stratification (i.e., inequality and discrimination based on ones sexual preferences) Many orientational researchers work for universities or interest group organizations. Sources of Knowledge In this section we discuss how people learn about the world around them and gain knowledge. The major ways we learn can be classified into experience, expert opinion, and reasoning. Experience The idea here is that knowledge comes from experience. Historically, this view was called empiricism (i.e., original knowledge comes from experience). The term empirical means "based on observation, experiment, or experience." Expert Opinion Because we dont want to and dont have time to conduct research on everything, people frequently rely on expert opinion as they learn about the world. Note, however, that if you rely on an experts opinion it is important to make sure that the expert is an expert in the specific area under discussion and you should check to see if the expert has a vested interest in the issue. Reasoning.
Page 5 of 179
Historically, this idea was called rationalism (i.e., original knowledge comes from thought and reasoning). There are two main forms of reasoning: Deductive reasoning (i.e., the process of drawing a specific conclusion from a set of premises). Deductive reasoning is the classical approach used by the great rationalists in the history of western civilization. Note that, in formal logic and mathematics, a conclusion from deductive reasoning will necessarily be true if the argument form is valid and if the premises are true. Inductive reasoning (i.e., reasoning from the particular to the general). The conclusion from inductive reasoning is probabilistic (i.e., you make a statement about what will probably happen). The so called problem of induction is that the future might not resemble the present. The Scientific Approach to Knowledge Generation Science is also an approach for the generation of knowledge. It relies on a mixture of empiricism (i.e., the collection of data) and rationalism (i.e., the use of reasoning and theory construction and testing). Dynamics of science. Science has many distinguishing characteristics: Science is progressive. In other words, "We stand on the shoulders of giants" (Newton). Science is rational.

Science is creative. Science is dynamic. Science is open. Science is "Critical." Science is never-ending.
Basic Assumptions of Science In order to do science, we usually make several assumptions. Here they are as summarized in Table 1.3.
Page 6 of 179
Scientific Methods There are many scientific methods. The two major methods are the inductive method and the deductive method.
The deductive method involves the following three steps: 1. State the hypothesis (based on theory or research literature). 2. Collect data to test the hypothesis. 3. Make decision to accept or reject the hypothesis. The inductive method. This approach also involves three steps: 1. Observe the world. 2. Search for a pattern in what is observed. 3. Make a generalization about what is occurring.
Virtually any application of science includes the use of both the deductive and the inductive approaches to the scientific method either in a single study or over time. This idea is demonstrated in Figure 1.1. The inductive method is as bottom up method that is especially useful for generating theories and hypotheses; the deductive method is a top down method that is especially useful for testing theories and hypotheses.
Page 7 of 179
Theory The word "theory" most simply means "explanation." Theories explain "How" and "Why" something operates as it does. Some theories are highly developed and encompass a large terrain (i.e., "big" theories or "grand" theories); others theories are "smaller" theories or briefer explanations. We have summarized the key criteria to use in evaluating a theory in Table 1.4 and reproduced it hear for your convenience.
The Principle of Evidence According to the principle of evidence, what is gained in empirical research is evidence, NOT proof. This means that knowledge based on educational research is ultimately tentative. Therefore, please eliminate the word "proof" from your vocabulary when you talk about research results. Empirical research provides evidence; it does not provide proof. Also note that, evidence increases when a finding has been replicated. Hence, you should take NOT draw firm conclusions from a single research study.
Page 8 of 179
Objectives of Educational Research There are five major objectives of educational research. 1. Exploration. This is done when you are trying to generate ideas about something. 2. Description. This is done when you want to describe the characteristics of something or some phenomenon. 3. Explanation. This is done when you want to show how and why a phenomenon operates as it does. If you are interested in causality, you are usually interested in explanation. 4. Prediction. This is your objective when your primary interest is in making accurate predictions. Note that the advanced sciences make much more accurate predictions than the newer social and behavioral sciences. 5. Influence. This objective is a little different. It involves the application of research results to impact the world. A demonstration program is an example of this. One convenient and useful way to classify research is into exploratory research, descriptive research, explanatory research, predictive research, and demonstration research.
Page 9 of 179
Chapter 2 Quantitative, Qualitative, and Mixed Research This chapter is our introduction to the three research methodology paradigms. A paradigm is a perspective based on a set of assumptions, concepts, and values that are held by a community or researchers. For the most of the 20th century the quantitative paradigm was dominant. During the 1980s, the qualitative paradigm came of age as an alternative to the quantitative paradigm, and it was often conceptualized as the polar opposite of quantitative research. Finally, although the modern roots of mixed research go back to the late 1950s, I think that it truly became the legitimate third paradigm with the publication of the Handbook of Mixed Methods in Social and Behavioral Research (2003, by Tashakkori and Teddlie). At the same time, mixed research has been conducted by practicing researchers throughout the history of research. Characteristics of the Three Research Paradigms There are currently three major research paradigms in education (and in the social and behavioral sciences). They are quantitative research, qualitative research, and mixed research. Here are the definitions of each: Quantitative research research that relies primarily on the collection of quantitative data. (Note that pure quantitative research will follow all of the paradigm characteristics of quantitative research shown in the left column of Table 2.1.) Qualitative research research that relies on the collection of qualitative data. (Note that pure qualitative research will follow all of the paradigm characteristics of qualitative research shown in the right column of Table 2.1.) Mixed research research that involves the mixing of quantitative and qualitative methods or paradigm characteristics. Later in the lecture you will learn about the two major types of mixed research, mixed method and mixed model research. For now, keep in mind that the mixing of quantitative and qualitative research can take many forms. In fact, the possibilities for mixing are almost infinite.
Here is Table 2.1 for your convenience and review.
Page 10 of 179
Quantitative Research Methods: Experimental and Nonexperimental Research

Page 11 of 179
The basic building blocks of quantitative research are variables. Variables (something that takes on different values or categories) are the opposite ofconstants (something that cannot vary, such as a single value or category of a variable). Many of the important types of variables used in quantitative research are shown, with examples, in Table 2.2. Here is that table for your review:
In looking at the table note that when we speak of measurement, the most simple classification is between categorical and quantitative variables. As you can see, quantitative variables vary in degree or amount (e.g., annual income) and categorical variables vary in type or kind (e.g., gender).
Page 12 of 179
The other set of variables in the table (under the heading role taken by the variable) are the kinds of variables we talk about when explaining how the world operates and when we design a quantitative research study. As you can see, independent variables (symbolized by "IV") are the presumed cause of another variable. Dependent variables (symbolized by "DV") are the presumed effect or outcome. Dependent variables are influenced by one or more independent variables. What is the IV and DV in the relationship between smoking and lung cancer? (Smoking is the IV and lung cancer is the DV.) Sometimes we want to understand the process or variables through which one variable affects another variable. This brings us to the idea of intervening variables (also called mediator or mediating variables). Intervening variables are variables that occur between two other variables. For example, tissue damage is an intervening variable in the smoking and lung cancer relationship. We can use arrows (which mean causes or affects) and draw the relationship that includes an intervening variable like this: Smoking---->Tissue Damage---->Lung Cancer. Sometimes a relationship does not generalize to everyone; therefore, researchers often use moderator variables to show how the relationship changes across the levels of an additional variable. For example, perhaps behavioral therapy works better for males and cognitive therapy works better for females. In this case, gender is the moderator variable. The relationship be type of therapy (behavioral versus cognitive) and psychological relief is moderated by gender. Now, I will talk about the major types of quantitative research: experimental and nonexperimental research. Experimental Research The purpose of experimental research is to study cause and effect relationships. Its defining characteristic is active manipulation of an independent variable (i.e., it is only in experimental research that manipulation is present). Also, random assignment (which creates "equivalent" groups) is used in the strongest experimental research designs.
Here is an example of an experiment.
Pretest O1 O1 Where:
Treatment XE XC
Posttest O2 O2
Page 13 of 179
E stands for the experimental group (e.g., new teaching approach) C stands for the control or comparison group (e.g., the old or standard teaching approach)
Because the best way to make the two groups similar in the above research design is to randomly assign the participants to the experimental and control groups, lets assume that we have a convenience sample of 50 people and that we randomly assign them to the two groups in our experiment. Here is the logic of this experiment. First, we made our groups approximately the same at the start of the study by using random assignment (i.e., the groups are equated). You pretest the participants to see how much they know. Next, you manipulate the independent variable by using the new teaching approach with the experimental group and using the old teaching approach for the control group. Now (after the manipulation) you measure the participants knowledge to see how much they know after having participated in our experiment. Lets say that the people in the experimental group show more knowledge improvement than those in the control group. What would you conclude? In this case, we can conclude that there is a causal relationship between the IV, teaching method, and the DV, knowledge, and specifically we can conclude that the new teaching approach is better than the old teaching approach. Make sense? Now, lets say that in the above experiment we could not use random assignment to equate our groups. Lets say that, instead, we had our best teacher (Mrs. Smith) use the new teaching approach with her students in her 5th period class and we had a newer and less experienced teacher (Mr. Turner) use the old teaching approach with his 5th period class. Lets again say that the experimental group did better than the control group. Do you see any problems with claiming that the reason for the difference between the two groups is because of the teaching method? The problem is that there are alternative explanations. First, perhaps the difference is because Mrs. Smith is the better teacher. Second, perhaps Mrs. Smith had the smarter students (remember the students were not randomly assignment to the two groups; instead, we used two intact classrooms). We have a name for the problems just mentioned. It is the problem of alternative explanations. In particular, it is very possible that the difference we saw between the two groups was due to variables other than the IV. In particular, the difference might have been due to the teacher (Mrs. Smith vs Mr. Turner) or to the IQ levels of the groups (perhaps Mrs. Smiths students had higher IQs than Mr. Smiths students) We have a special name for these kinds of variables. They are called extraneous variable. It is important to remember the definition of an extraneous variable because they can destroy the integrity of a research study that claims to show a cause and effect relationship. An extraneous variable is a variable that may compete with the independent variable in explaining the outcome. Remember this, if you are ever interested in identifying cause and effect relationships you must always determine whether there are any extraneous variables you need to worry about. If an extraneous variable really is the reason for an outcome (rather than the IV) then we sometimes like to call it a confounding variable because it has confused or confounded the relationship we are interested in.
Page 14 of 179
Nonexperimental Research Remember that the defining characteristic of experimental research was manipulation of the IV. Well, in nonexperimental research there is no manipulation of the independent variable. There also is no random assignment of participants to groups. What this means is that if you ever see a relationship between two variables in nonexperimental research you cannot jump to a conclusion of cause and effect because there will be too many other alternative explanations for the relationship. In the chapter, we make a distinction between two examples of nonexperimental research. In the "basic case" of causal-comparative research, there is one categorical IV and one quantitative DV. Example: Gender (IV) and class performance (DV). You would look for the relationship by comparing the male and female average performance levels. In the simple case of correlational research, there is one quantitative IV and one quantitative DV. Example: Self-esteem (IV) and class performance (DV). You would look for the relationship by calculating the correlation coefficient. The correlation coefficient is a number that varies between 1 and +1, and 0 stands for no relationship. The farther the number is from 0, the stronger the relationship. If the sign of the correlation coefficient is positive (e.g., +.65) then you have a positive correlation, which means the two variables move in the same direction (as one variable increases, so does the other variable). Education level and annual income are positively correlated (i.e., the higher the education, the higher the annual income). If the sign of the correlation coefficient is negative (e.g., -.71) then you have a negative correlation, which means the two variables move in opposite directions (as one variable increases, the other decreases). Smoking and life expectancy are negatively correlated (i.e., the higher the smoking, the lower the life expectancy). We will show you how to improve on the two basic nonexperimental designs in later chapters, but for now, please remember these important points: 1) You can obtain much stronger evidence for causality from experimental research than from nonexperimental research (e.g., a strong experiment is better than causal-comparative and correlation research).
2) You cannot conclude that a relationship is causal when you only have one IV and one DV in nonexperimental research (without controls). Therefore, the basic cases of both causalcomparative and correlation research are severely flawed! 3) In later chapters we explain three necessary conditions for causality (relationship, temporal order, and lack of alternative explanations)
Page 15 of 179
For a preview of these three necessary conditions required to make a firm statement of cause and effect, read this next section. It is provided as supplemental or preview material for this topic which occurs in many chapters of the book. If you have had enough for now, just skip to the next section of this lecture entitled Qualitative Research. There are three necessary conditions that you must establish whenever you want to conclude that a relationship is causal. They are shown in the following Table:
Our experiment met these criteria quite nicely. That is, we had a relationship between teaching method and knowledge; the manipulation occurred before the posttest; and because we randomly assigned the people to the two groups, there should be no other variables that can explain away the relationship. On the other hand, in the basic cases of causal-comparative and correlational research, where we only observed a relationship between two variables (we had no manipulation or random assignment), we have only established condition 1. We can only conclude that the two variables are related. In chapter 11 we will show you how to design nonexperimental research that performs better than the basic cases on the three above conditions. Still, remember, even when these basic cases are improved, experimental research with random assignment is better for studying cause and effect than nonexperimental research. Another way of saying this is, if you want to show that one thing causes another thing, then, if it is feasible, you will want to CONDUCT AN EXPERIMENT.
Qualitative Research Methods We describe qualitative research earlier, in Table 2.1. There are five major types of qualitative research: phenomenology, ethnography, case study research, grounded theory, and historical research. All of the approaches are similar in that they are qualitative approaches. Each approach, however, has some distinct characteristics and tends to have its own roots and following.
Page 16 of 179
Here are the definitions and an example of the different types of qualitative research: Phenomenology a form of qualitative research in which the researcher attempts to understand how one or more individuals experience a phenomenon. For example, you might interview 20 widows and ask them to describe their experiences of the deaths of their husbands. Ethnography is the form of qualitative research that focuses on describing the culture of a group of people. Note that a culture is the shared attitudes, values, norms, practices, language, and material things of a group of people. For an example of an ethnography, you might decide to go and live in a Mohawk communities and study the culture and their educational practices. Case study research is a form of qualitative research that is focused on providing a detailed account of one or more cases. For an example, you might study a classroom that was given a new curriculum for technology use. Grounded theory is a qualitative approach to generating and developing a theory form data that the researcher collects. For an example, you might collect data from parents who have pulled their children out of public schools and develop a theory to explain how and why this phenomenon occurs, ultimately developing a theory of school pull-out. Historical research research about events that occurred in the past. An example, you might study the use of corporeal punishment in schools in the 19th century. Mixed Research Methods Mixed research is a general type of research (its one of the three paradigms) in which quantitative and qualitative methods, techniques, or other paradigm characteristics are mixed in one overall study. Earlier we showed it major characteristics of mixed research in Table 2.1. Now the two major types of mixed research are distinguished: mixed method versus mixed model research. Mixed method research is research in which the researcher uses the qualitative research paradigm for one phase of a research study and the quantitative research paradigm for another phase of the study. For example, a researcher might conduct an experiment (quantitative) and after the experiment conduct an interview study with the participants (qualitative) to see how they viewed the experiment and to see if they agreed with the results. Mixed method research is like conducting two mini-studies within one overall research study. Mixed model research is research in which the researcher mixes both qualitative and quantitative research approaches within a stage of the study or across two of the stages of the research process. For example, a researcher might conduct a survey and use a questionnaire that is composed of multiple closed-ended or quantitative type items as well as several open-ended or qualitative type items. For another example, a researcher might collect qualitative data but then try to quantify the data. The Advantages of Mixed Research First of all, we advocate the use of mixed research when it is feasible. We are excited about this new movement in educational research and believe it will help qualitative and quantitative
Page 17 of 179
researchers to get along better and, more importantly, it will promote the conduct of excellent educational research. Perhaps the major goal for researcher who design and conduct mixed research is to follow the fundamental principle of mixed research. According to this principle, the researcher should mix quantitative and qualitative research methods, procedures, and paradigm characteristics in a way that the resulting mixture or combination has complementary strengths and nonoverlapping weaknesses. The examples just listed for mixed method and mixed model research can be viewed as following this principle. Can you see how? Here is a metaphor for thinking about mixed research: Construct one fish net out of several fish nets that have holes in them by laying them on top of one another. The "new" net will not have any holes in it. The use of multiple methods or approaches to research works the same way. When different approaches are used to focus on the same phenomenon and they provide the same result, you have "corroboration" which means you have superior evidence for the result. Other important reasons for doing mixed research are to complement one set of results with another, to expand a set of results, or to discover something that would have been missed if only a quantitative or a qualitative approach had been used. Some researchers like to conduct mixed research in a single study, and this is what is truly called mixed research. However, it is interesting to note that virtually all research literatures would be mixed at the aggregate level, even if no single researcher uses mixed research. That's because there will usually be some quantitative and some qualitative research studies in a research literature.
Our Research Typology We have now covered the essentials of the three research paradigms and their subtypes. Lets put it all together in the following picture of our research typology:
Page 18 of 179
Chapter 3 Problem Identification and Hypothesis Formation The purpose of Chapter Three is to help you to learn how to come up with a research topic, refine it, and develop a research proposal. Sources of Research Ideas Research ideas and research problems originate from many sources. We discuss four of these sources in the text: everyday life, practical issues, past research, and theory. Regardless of the source of your idea, a key point is that you must develop a questioning and inquisitive approach to life when you are trying to come up with research ideas.
Everyday life is one common source of research ideas. Based on a questioning and inquisitive approach, you can draw from your experiences and come up with many research topics. For example, think about what educational techniques or practices you believe work well, or do not work well. Would you be interested in doing a research study on one or more of those techniques or practices? Practical issues can be a source of research ideas. What are some current problems facing education (e.g., facing administrators, teachers, students, parents). What research topics do you think can address some of these current problems? Past research can be an excellent source of research ideas. In my opinion (BJ), past research is probably the most important source of research ideas. Thats because a great deal of educational research has already been conducted on a multitude of topics, and, importantly, research usually generates more questions than it answers. This is also the best way to come up with a specific idea that will fit into and extend the research literature. For students planning on writing a thesis or dissertation, the use of past research is extremely helpful, and remember to not just look at the variables and the results, but also carefully examine how they conducted the study (i.e., examine the methods). When you read a research article, it will be helpful for you to think about the ideas shown in Table 3.1.
Page 19 of 179
Theory (i.e., explanations of phenomena) can be a source of research ideas. o Can you summarize and integrate a set of past studies into a theory? o Are there any theoretical predictions needing empirical testing? o Do you have any "theories" that you believe have merit? Test them! o If there is little or no theory in the area of interest to you, then think about collecting data to help you generate a theory using the grounded theory technique.
Ideas that Can't Be Researched Empirically

Page 20 of 179
The point in this section is that empirical research (i.e., research that is based on the collection of observable data) cannot provide answers to ultimate, metaphysical, or ethical questions. If a question is asking which value is true or correct, empirical research cant offer the solution. For example, is school prayer good?, Should homosexuals be allowed to legally marry?, Should the teaching of Christianity (and no other religion) be provided in public schools? These are moral and legal issues which cannot be directly addressed or resolved by empirical research in the social or behavioral sciences. John Dewey made the point that empirical research can provide answers about how to get to valued endpoints, but he took the valued endpoints for granted (e.g., democracy, equality, education for all). So do not expect to conduct an empirical research study that will "show whether school prayer should be adopted." Review of the Literature After you have identified your research idea, and identified a general problem that sounds interesting to you, the next step is to become familiar with the published information on your topic. Conducting a literature review will help you to see if your topic has already been researched, help you to see how you might need to revise your research idea, and show methodological techniques and problems specific to your research problem that will help you in designing a study. Most importantly, after conducting a thorough literature review, your specific research questions and hypotheses will become clearer to you. A literature review can take a different form in qualitative and quantitative research: In qualitative research (which often means exploratory research), little prior literature may be available. Furthermore, too much review may make a researcher "myopic." Literature is especially important during the later stages (e.g., interpreting results, discussion) of exploratory research. Still, for much qualitative research, we recommend that a literature review is conducted to see what has been done and to provide sensitizing concepts. Then when data are collected, the researcher can use strategies (discussed in chapter 8) to minimize the researchers biases. In quantitative research, the researcher directly "builds" on past research. Therefore, review of prior research must be done before conducting the study. In quantitative research, the literature review will help you to see if your research problem has already been done, show you data collection instruments that have been used, show designs that have been used, and show theoretical and methodological issues that have arisen. Sources of Information There are several major sources of information for you to use when conducting a literature review. Books is a good starting point. It gives you an overview and a summary of relevant research and theory. Journals is another excellent source. Journals provide more recent information than books and provide full length empirical research articles for you to carefully examine. Computer databases are excellent sources for locating information. The most important computer database in education is ERIC. Other important databases are PsycINFO or PsycLIT (for psychological research), SocioFILE and Sociological Abstracts (for sociological research), and Dissertation Abstracts (for summaries of
Page 21 of 179
doctoral dissertations in education and related fields). We strongly recommend that you do not limit your search to a single computer database. Also, we strongly recommend that you do not search only for full-text articles because this will eliminate most of the best published research. Conducting the Literature Search In this section, we have included some practical material on conducting the literature search. Because ERIC is the most important database in education, we have included a Table (3.4) that shows exactly how to search ERIC. You should access ERIC through your library to get the full version. In Table 3.6 we explain how to evaluate the quality of Internet resources. It is important for you to understand that the quality of material on the Internet varies widely and it must be evaluated before use. Using the Public Internet. The Internet has obviously become extremely important. Below, is a list of some useful subject directories, search engines, and meta-search engines that is from your chapter:
Page 22 of 179
Feasibility of the Study Before deciding whether to carry out your research project, you must decide whether it would be feasible to conduct. You should do this as early as possible so you don/t waste your time. This means that you must design a research study that can be carried out given your available resources (e.g., time, money, people). Interviewing all children with ADHD in your state probably would not be feasible for a single research study. Interviewing a set of children with ADHD at your school would be more feasible. Furthermore, part of determining feasibility involves making sure that the study can be carried out ethically. The Institutional Review Board will help you with this decision. So far we have discussed how to come up with your research topic and how to find the needed information.
Page 23 of 179
As seen in the following figure (Figure 3.1 from your book), after you get your topic, you need to move to determining your research problem, your statement of the purpose of your study, your statement of the research questions, and if you are conducting a quantitative study you will also state need to your hypotheses. Note that movement from the top to the bottom of Figure 3.1 involves a movement from the general to the specific (e.g., a hypothesis is much more specific than a research topic). Also note that as you move from the top to the bottom, you will need to conduct your literature review so that you can determine what specific research questions and/or hypotheses need to be addressed. In fact, it is usually helpful (when conducting basic or applied research) to start your literature review right at the beginning of the process shown in Figure 3.1.
(Note: to see the full process that is explained in this chapter, we recommend that you also view the concept map for chapter 3 click here for concept map or go to companion website)
Statement of the Research Problem

Page 24 of 179
As seen in the above figure, the research problem is the educational issue or problem within your broad topic area. In other words, you start with your topic and then try to identify one or more research problems that you believe need to be solved in that topic area. In quantitative research, research problems tend to emphasizes the need to explain, predict, or describe something. In qualitative research, research problems tend to focus on exploring a process, an event, or a phenomenon. Statement of the Purpose of the Study As seen in the figure, your research purpose follows from the problem you have selected, and it is your statement of your intent or objective for your research study. It is important to include this in your proposals and final reports because it helps orient your reader to your study. In quantitative research, the purpose identifies the specific type of relationship being investigated using a specific set of variables. In qualitative research, the purpose focuses on exploring or understanding a phenomenon. Statement of Research Questions After you have completed your literature review and have digested the literature, you will need to make an exact statement of the specific research questions you want to pursue. This will help ensure that you have a good grasp of what you want to do, it will enable you to communicate your idea to others, and it will help guide the research process (e.g., what variables will be examined, what methods will be needed). A good literature review will logically end with your specific research questions. In quantitative research, a research question typically asks about a relationship that may exist between or among two or more variables. It should identify the variables being investigated and specify the type of relationship (descriptive, predictive, or causal) to be investigated. For example: What effect does playing football have on students overall grade point average during the football season? o We have included scripts for writing quantitative research questions in Table 3.7. In qualitative research, a research question asks about the specific process, issue, or phenomenon to be explored or described. For example: What are the social and cultural characteristics of a highly successful school where students and teachers get along well and students work hard and achieve highly? Here is another research question: How does the social context of a school influence perservice teachers beliefs about teaching? Here is another: What is the experience of a teacher being a student like?
Formulating Hypotheses If you are conducting a quantitative research study, you will typically state your specific hypotheses that you have developed from your literature review. A hypothesis is the researchers prediction of the relationship that exists among the variables being investigated.
Page 25 of 179
If you wrote a research question, the hypothesis will by your tentative answer to your question. For the quantitative research question stated above (i.e., What effect does playing football have on students overall grade point average during the football season?) the related hypothesis might go like this: Students who play football during the football season will experience a decrease in their GPAs as compared to students not playing football. Unlike in quantitative research (where hypotheses are stated before collecting the data), hypotheses in qualitative research are often generated as the data are collected and as the researcher gains insight into what is being studied.
The Research Proposal After you have identified your research idea, reviewed the research literature, determined the feasibility of your study, made a formal statement of the research questions (and hypotheses for a quantitative study), you are ready to develop a research proposal to guide your research study. It is essential that you develop your research proposal before conducting a research study. This will force you to carefully spell out the rationale for your research study, and it will make you think about and specify each step of your study. Here are the major sections for a typical research proposal: Title Page Abstract Introduction Include a statement of the research topic. Include a statement of the research problem(s) Include a summary of the prior literature. Include the purpose of the study. Include the research question(s) Include the hypotheses for quantitative studies Method Research Participants Apparatus and/or Instruments Procedure Data Analysis References
The following briefly explains what goes in the major sections just shown: I. Introduction This section is "V shaped," moving from general to specific. It includes a statement of the topic, problem, and purpose. It includes a discussion of the prior relevant research. Finally, it ends with the research questions and hypotheses of the study.
Page 26 of 179
II. Method This section typically includes a discussion of the following: The research participants (e.g., Who are they?, What are their characteristics?, How many will there be?, Where are they located?, How will they be selected? What kind of response rate are you planning for?). The apparatus (e.g., is any special equipment needed for your study?). The instruments to be used in the study (i.e., What are your specific variables and how will you measure those variables?, What specific data collection instruments will you use?, What kinds of reliability and validity evidence is available for the instruments?, Why are the instruments appropriate for your study and your particular participants?). The procedure (this is a narrative outline of the specific steps you intend to follow to carry out your data collection; it should be clear enough for someone to replicate your study). A section on design is sometimes included (often in the procedure section), describing the research design used (e.g., a nonequivalent comparison group design or a longitudinal design). III. Data Analysis This section includes a discussion of how you intend on organizing and analyzing the data that you collect. Quantitative studies use statistical data analysis procedures (e.g., ANOVA and regression). Qualitative research studies are based on inductive data analysis (e.g., searching for categories, patterns, and themes present in the transcribed data). Note that some research proposals include a separate section or "Chapter" for the literature review (especially dissertations). Also, some prefer to include the data analysis section in the Method section. For example, the research proposal for a dissertation might include the following three chapters: 1. Introduction 2. Literature Review 3. Method Consumer Use of the Literature Frequently there will be no need to conduct an empirical research study because the necessary research will have already been done. In other words, many times, only a literature review will be needed to answer your questions. We have provided checklist for evaluating research studies in Tables 3.8 and 3.9. These will help you to evaluate each study you review. Dont forget this point that we want to continue to emphasize: never place too much confidence in a single research study. That is, you should place much more confidence in a research finding that has been replicated (i.e., shown in many different research studies).
Page 27 of 179
Because of the importance of viewing the full set of studies on an issue and the built in benefit of replication when this is done, you can see why we recommend that you pay special attention to meta-analyses when you find them in your literature searches. A meta-analysis is a quantitative technique for summarizing the results of multiple studies on a specific topic. It will tell you if a variable consistently has been shown to have an effect as well as the average size of effect.
Page 28 of 179
Chapter 4 Research Ethics Note: as you read this lecture, its a good idea to also look at the concept map for the chapter. Remember that you can click of different parts of the concept map to move upward or downward. Here is the link: http://www.southalabama.edu/coe/bset/johnson/dr_johnson/clickmaps/ch4/fr_ch4.htm
What Are Research Ethics? Ethics is the division in the field of philosophy that deals with values and morals. It is a topic that people may disagree on because it is based on people's personal value systems. What one person or group considers to be good or right might be considered bad or wrong by another person or group. In this chapter, we define ethics as the principles and guidelines that help us to uphold the things we value. There are three major approaches to ethics that are discussed in the chapter. 1. Deontological Approach - This approach states that we should identify and use a Universal code when making ethical decisions. An action is either ethical or not ethical, without exception. 2. Ethical skepticism - This viewpoint states that concrete and inviolate ethical or moral standards cannot be formulated. In this view, ethical standards are not universal but are relative to one's particular culture, time, and even individual. 3. Utilitarianism - This is a very practical viewpoint, stating that decisions about the ethics should be based on an examination and comparison of the costs and benefits that may arise from an action. Note that the utilitarian approach is used by most people in academia (such as Institutional Review Boards) when making decisions about research studies. Ethical Concerns The are three primary areas of ethical concern for researchers: 1. The relationship between society and science. Should researchers study what is considered important in society at a given time? Should the federal government and other funding agencies use grants to affect the areas researched in a society? Should researchers ignore societal concerns? 2. Professional issues. The primary ethical concern here is fraudulent activity (fabrication or alteration of results) by scientists. Obviously, cheating or lying are neverdefensible. Duplicate publication (publishing the same data and results in more than one journal or other publication) should be avoided. Partial publication (publishing several articles from the data collected in one study). This is allowable as long as the different publications involve different
Page 29 of 179
research questions and different data, and as long as it facilitates scientific communication. Otherwise, it should be avoided. 3. Treatment of Research Participants This is probably the most fundamental ethical issue in the field of empirical research. It is essential that one insures that research participants are not harmed physically or psychologically during the conduct of research. In the next section, we will go into the issue of treatment of research participants in depth. Ethical Guidelines for Research with Humans One set of guidelines specifically developed to guide research conducted by educational researchers is the AERA Guidelines. The AERA is the largest professional association in the field of education, and is also known as the American Educational Research Association. Here is the link to the American Educational Research Associations Code of Ethics: http://www.aera.net/about/policy/ethics.htm Here are some of the most important issues discussed in the chapter (and in the AERA Guidelines). 1. Informed Consent. Potential research participants must be provided with information that enables them to make an informed decision as to whether they want to participate in the research study. An actual consent form is shown in Exhibit 4.3. Here (shown in Table 4.1) is the information that you (the researcher) must put in a consent form so that potential participants are able to provideinformed consent.
Page 30 of 179
2. Informed Consent with Minors as Research Participants. Informed consent must be obtained from parents or guardians of minors. Also, assent must be obtained from minors who are old enough or have enough intellectual capacity to say they are willing to participate. Assentmeans the minor agrees to participate after being informed of all the features of the study that could affect the participants willingness to participate. 3. Passive versus Active Consent So far we have only talked about active consent (i.e., when consent is provided by the potential participant signing the consent form). Active consent is usually the preferred form of consent. Passive consent is the process whereby consent is given by not returning the consent form. An example is shown in Exhibit 4.5. Here is the key passage in the passive consent form: Participation in this study is completely voluntary. All students in the class will take the test. If you do not wish for your child to be in this study, please fill out the form at the bottom of this letter and return it to me. Also, please tell your child to hand in a blank test sheet when the class is given the mathematics test so that your child will not be included in the study.
4. Deception
Page 31 of 179
Deception is present when the researcher provides misleading information or when the researcher withholds information from participants about the nature and/or purpose of the study. Deception is allowable when the benefits outweigh the costs. However, the researcher is ethically obligated not to use any more deception than is needed to conduct a valid study. If deception is used, debriefing should be used. Debriefing is a poststudy interview in which all aspects of the study are revealed, any reasons for deception are explained, and any questions the participant has about the study are answered. Debriefing has two goals: 1. Dehoaxing informing study participants about deception that was used and the reasons for its use. 2. Desensitizing helping study participants deal with and eliminate any stress or other undesirable feelings that the study might have created. 5. Freedom to Withdraw Participants must be informed that they are free to withdraw from the study at any time without penalty. If you have a power relationship with the participants (e.g., if you are their teacher or employer) you must be extra careful to make sure that they really do feel free to withdraw. 6. Protection from Mental and Physical Harm This is the most fundamental ethical issue confronting the researcher. Fortunately, much educational research poses minimal risk to participants (as compared, for example, to medical research). 7. Confidentiality and Anonymity Confidentiality is a basic requirement in all studies. It means that the researcher agrees not to reveal the identity of the participant to anyone other than the researcher and his or her staff. A stronger and even better condition (if it can be met) is called anonymity. Anonymity means that the identity of the participant is not known by anyone in the study, including the researcher. An example would be where the researcher has a large group of people fill out a questionnaire but NOT write their names on it. In this way, the researcher ends up with data, but no names.
Institutional Review Board The IRB is a committee consisting of professionals and lay people who review research proposals to insure that the researcher adheres to federal and local ethical standards in the conduct of the research. Virtually every university in the U.S. has an IRB. Researchers must submit a Research Protocol to the IRB for review. A full example of a research protocol submitted to the IRB is shown in Exhibit 4.6. Three of the most important categories of review are exempt studies (i.e., studies involving no risk to participants and not requiring full IRB review),expedited review (i.e., the process by which a study is rapidly reviewed by
Page 32 of 179
fewer members than constitute the full IRB board), and full board review(i.e., review by all members of the IRB). Although many educational studies are fall into the exempt category, it is essential that you understand that it is the IRB staff and not the researcher that makes the decision as to whether a research protocol is exempt. The IRB will provide the formal documentation of this status for your study.
For more information than is provided in the text about IRB regulations, go here: http://ori.dhhs.gov/ Also, for your convenience, we have included in Table 4.3 the exempt categories used by the IRB.
Page 33 of 179
Chapter 5 Standardized Measurement and Assessment (For the concept map that goes with this chapter, click here.) Defining Measurement When we measure, we attempt to identify the dimensions, quantity, capacity, or degree of something. Measurement is formally defined as the act of measuring by assigning symbols or numbers to something according to a specific set of rules. Measurement can be categorized by the type of information that is communicated by the symbols or numbers assigned to the variables of interest. In particular, there are four levels or types of information are discussed next in the chapter. They are called the four "scales of measurement." Scales of Measurement 1. Nominal Scale. This is a nonquantitative measurement scale. It is used to categorize, label, classify, name, or identify variables. It classifies groups or types. Numbers can be used to label the categories of a nominal variable but the numbers serve only as markers, not as indicators of amount or quantity (e.g., if you wanted to, you could mark the categories of the variable called "gender" with 1=female and 2=male). Some examples of nominal level variables are the country you were born in, college major, personality type, experimental group (e.g., experimental group or control group). 2. Ordinal Scale. This level of measurement enables one to make ordinal judgments (i.e., judgments about rank order). Any variable where the levels can be ranked (but you don't know if the distance between the levels is the same) is an ordinal variable. Some examples are order of finish position in a marathon, billboard top 40, rank in class. 3. Interval Scale. This scale or level of measurement has the characteristics of rank order and equal intervals (i.e., the distance between adjacent points is the same). It does not possess an absolute zero point. Some examples are Celsius temperature, Fahrenheit temperature, IQ scores. Here is the idea of the lack of a true zero point: zero degrees Celsius does not mean no temperature at all; in a Fahrenheit scale, it is equal to the freezing point or 32 degrees. Zero degrees in these scales does not mean zero or no temperature.
Page 34 of 179
4. Ratio Scale. This is a scale with a true zero point. It also has all of the "lower level" characteristics (i.e., the key characteristic of each of the lower level scales) of equal intervals (interval scale), rank order (ordinal scale), and ability to mark a value with a name (nominal scale). Some examples of ratio level scales are number correct, weight, height, response time, Kelvin temperature, and annual income. Here is an example of the presence of a true zero point: If your annual income is exactly zero dollars then you earned no annual income at all. (You can buy absolutely nothing with zero dollars.) Zero means zero. Assumptions Underlying Testing and Measurement Before I list the assumptions, note the difference between testing and assessment. According to the definitions that we use: Testing is the process of measuring variables by means of devices or procedures designed to obtain a sample of behavior and Assessment is the gathering and integration of data for the purpose of making an educational evaluation, accomplished through the use of tools such as tests, interviews, case studies, behavioral observation, and specially designed apparatus and measurement procedures. In this section of the text, we also list the twelve assumptions that Cohen, et al. Consider basic to testing and assessment: 1. Psychological traits and states exist. A trait is a relatively enduring (i.e., long lasting) characteristic on which people differ; a state is a less enduring or more transient characteristic on which people differ. Traits and states are actually social constructions, but they are real in the sense that they are useful for classifying and organizing the world, they can be used to understand and predict behavior, and they refer to something in the world that we can measure. 2. Psychological traits and states can be quantified and measured. For nominal scales, the number is used as a marker. For the other scales, the numbers become more and more quantitative as you move from ordinal scales (shows ranking only) to interval scales (shows amount, but lacks a true zero point) to ratio scales (shows amount or quantity as we usually understand this concept in mathematics or everyday use of the term). Most traits and states measured in education are taken to be at the interval level of measurement. 3. Various approaches to measuring aspects of the same thing can be useful. For example, different tests of intelligence tap into somewhat different aspects of the construct of intelligence. 4. Assessment can provide answers to some of life's most momentous questions.
Page 35 of 179
It is important that the users of assessment tools know when these tools will provide answers to their questions.
5. Assessment can pinpoint phenomena that require further attention or study. For example, assessment may identify someone as having dyslexia or low self-esteem or at-risk for drug use. 6. Various sources of data enrich and are part of the assessment process. Information from several sources usually should be obtained in order to make an accurate and informed decision. For example, the idea of portfolio assessment is useful. 7. Various sources of error are always part of the assessment process. There is no such thing as perfect measurement. All measurement has some error. We defined error as the difference between a persons true score and that persons observed score. The two main types of error are random error (e.g., error due to transient factors such as being sick or tired) and systematic error (e.g., error present every time the measurement instrument is used such as an essay exam being graded by an overly easy grader). (Later when we discuss reliability and validity, you might note that unreliability is due to random error and lack of validity is due to systematic error.) 8. Tests and other measurement techniques have strengths and weaknesses. It is essential that users of tests understand this so that they can use them appropriately and intelligently. In this chapter, we will be talking about the two major characteristics: reliability and validity. 9. Test-related behavior predicts non-test-related behavior. The goal of testing usually is to predict behavior other than the exact behaviors required while the exam is being taken. For example, paper-and-pencil achievement tests given to children are used to say something about their level of achievement. Another paper-and-pencil test (also called a self-report test) that is popular in counseling is the MMPI (i.e., the Minnesota Multiphasic Personality Inventory). Clients' scores on this test are used as indicators of the presence or absence of various mental disorders. The point here is that the actual mechanics of measurement (e.g., self-reports, behavioral performance, projective) can vary widely and still provide good measurement of educational, psychological, and other types of variables. 10. Present-day behavior sampling predicts future behavior. Perhaps the most important reason for giving tests is to predict future behavior. Tests provide a sample of present-day behavior. However, this "sample" is used to predict future behavior. For example, an employment test given by someone in a Personnel Office may be used as a predictor of future work behavior.
Page 36 of 179
Another example: the Beck Depression Inventory is used to measure depression and, importantly, to predict test takers future behavior (e.g., are they a risk to themselves?).
11. Testing and assessment can be conducted in a fair and unbiased manner. This requires careful construction of test items and testing of the items on different types of people. Test makers always have to be on the alert to make sure tests are fair and unbiased. This assumption also requires that the test be administered to those types of people for whom it has been shown to operate properly. 12. Testing and assessment benefit society. Many critical decisions are made on the basis of tests (e.g., teacher competency, employability, presence of a psychological disorder, degree of teacher satisfactions, degree of student satisfaction, etc.). Without tests, the world would be much more unpredictable.
Identifying A Good Test or Assessment Procedure As mentioned earlier in the chapter, good measurement us fundamental for research. If we do not have good measurement then we cannot have good research. Thats why its so important to use testing and assessment procedures that are characterized by high reliability and high validity. Overview of Reliability and Validity As an introduction to reliability and validity and how they are related, note the following: Reliability refers to the consistency or stability of test scores Validity refers to the accuracy of the inferences or interpretations we make from test scores Reliability is a necessary but not sufficient condition for validity (i.e., if you are going to have validity, you must have reliability but reliability in and of itself is not enough to ensure validity. Assume you weigh 125 pounds. If you weigh yourself five times and get 135, 134, 134, 135, 136 then your scales are reliable but not valid. The scores were consistent but wrong! Again, you want your scales to be both reliable and valid. Reliability Reliability refers to consistency or stability. In psychological and educational testing, it refers to the consistency or stability of the scores that we get from a test or assessment procedure. Reliability is usually determined using a correlation coefficient (it is called a reliability coefficient in this context). Remember (from chapter two) that a correlation coefficient is a measure of relationship that varies from -1 to 0 to 1 and the farther the number is from zero, the stronger the correlation. For example, minus one (-1.00) indicates a perfect negative correlation, zero indicates no correlation at all, and positive one (+1.00) indicates a perfect positive correlation. Regarding strength, -.85 is stronger than +.55, and +.75 is stronger than
Page 37 of 179
+.35. When you have a negative correlation, the variables move in opposite directions (e.g., poor diet and life expectancy); when you have a positive correlation, the variables move in the same direction (e.g., education and income). When looking at reliability coefficients we are interested in the values ranging from 0 to 1; that is, we are only interested in positive correlations. Note that zero means no reliability, and +1.00 means perfect reliability. Reliability coefficients of .70 or higher are generally considered to be acceptable for research purposes. Reliability coefficients of .90 or higher are needed to make decisions that have impacts on people's lives (e.g., the clinical uses of tests). Reliability is empirically determined; that is, we must check the reliability of test scores with specific sets of people. That is, we must obtain the reliability coefficients of interest to us.
There are four primary ways to measure reliability. 1. 2.

The first type of reliability is called test-retest reliability. This refers to the consistency of test scores over time. It is measured by correlating the test scores obtained at one point in time with the test scores obtained at a later point in time for a group of people. A primary issue is identifying the appropriate time interval between the two testing occasions. The longer the time interval between the two testing occasions, the lower the reliability coefficient tends to be. The second type of reliability is called equivalent forms reliability. This refers to the consistency of test scores obtained on two equivalent forms of a test designed to measure the same thing. It is measured by correlating the scores obtained by giving two forms of the same test to a group of people. The success of this method hinges on the equivalence of the two forms of the test.
3.

The third type of reliability is called internal consistency reliability. It refers to the consistency with which the items on a test measure a single construct. Internal consistency reliability only requires one administration of the test, which makes it a very convenient form of reliability. One type of internal consistency reliability is split-half reliability, which involves splitting a test into two equivalent halves and checking the consistency of the scores obtained from the two halves. The measure of internal consistency that we emphasize in the chapter is coefficient alpha. (It is also sometimes called Cronbachs alpha.) The beauty of coefficient alpha is that it is readily provided by statistical analysis packages and it can be used when test items are quantitative and when they are dichotomous (as in right or wrong). Researchers use coefficient alpha when they want an estimate of the reliability of a homogeneous test (i.e., a test that measures only one construct or trait) or an estimate of
Page 38 of 179
the reliability of each dimension on a multidimensional test. You will see it commonly reported in empirical research articles. Coefficient alpha will be high (e.g., greater than .70) when the items on a test are correlated with one another. But note that the number of items also affects the strength of coefficient alpha (i.e., the more items you have on a test, the higher coefficient alpha will be). This latter point is important because it shows that it is possible to get a large alpha coefficient even when the items are not very homogeneous or internally consistent. The fourth and last major type of reliability is called inter-scorer reliability. Inter-Scorer Reliability refers to the consistency or degree of agreement between two or more scorers, judges, or raters. You could have two judges rate one set of papers. Then you would just correlate their two sets of ratings to obtain the inter-scorer reliability coefficient, showing the consistency of the two judges ratings.
4.

Validity Validity refers to the accuracy of the inferences, interpretations, or actions made on the basis of test scores. Technically speaking, it is incorrect to say that a test is valid or invalid. It is the interpretations and actions taken based on the test scores that are valid or invalid. All of the ways of collecting validity evidence are really forms of what used to be called construct validity. All that means is that in testing and assessment, we are always measuring something (e.g., IQ, gender, age, depression, self-efficacy). Validation refers to gathering evidence supporting some inference made on the basis of test scores. There are three main methods of collecting validity evidence. 1. Evidence Based on Content Content-related evidence is based on a judgment of the degree to which the items, tasks, or questions on a test adequately represent the domain of interest. Expert judgment is used to provide evidence of content validity. To make a decision about content-related evidence, you should try to answer these three questions: Do the items appear to represent the thing you are trying to measure? Does the set of items underrepresent the constructs content (i.e., have you excluded any important content areas or topics)? Do any of the items represent something other than what you are trying to measure (i.e., have you included any irrelevant items)? 2. Evidence Based on Internal Structure Some tests are designed to measure one general construct, but other tests are designed to measure several components or dimensions of a construct. For example, the Rosenberg SelfPage 39 of 179
Esteem Scale is a 10 item scale designed to measure the construct of global self-esteem. In contrast, the Harter Self-Esteem Scale is designed to measure global self-esteem as well as several separate dimensions of self-esteem. The use of the statistical technique called factor analysis tells you the number of dimensions (i.e., factors) that are present. That is, it tells you whether a test is unidimensional (just measures one factor) or multidimensional (i.e., measures two or more dimensions). When you examine the internal structure of a test, you can also obtain a measure of test homogeneity (i.e., how well the different items measure the construct or trait). The two primary indices of homogeneity are the item-to-total correlation (i.e., correlate each item with the total test score) and coefficient alpha (discussed earlier under reliability). 3. Evidence Based on Relations to Other Variables This form of evidence is obtained by relating your test scores with one or more relevant criteria. A criterion is the standard or benchmark that you want to predict accurately on the basis of the test scores. Note that when using correlation coefficients for validity evidence we call them validity coefficients. There are several different kinds of relevant validity evidence based on relations to other variables. The first is called criterion-related evidence which is validity evidence based on the extent to which scores from a test can be used to predict or infer performance on some criterion such as a test or future performance. Here are the two types of criterion-related evidence: Concurrent evidencevalidity evidence based on the relationship between test scores and criterion scores obtained at the same time. Predictive evidencevalidity evidence based on the relationship between test scores collected at one point in time and criterion scores obtained at a later time. Here are three more types of validity evidence researchers should provide: Convergent evidencevalidity evidence based on the relationship between the focal test scores and independent measures of the same construct. The idea is that you want your test (that your are trying to validate) to strongly correlate with other measures of the same thing. Divergent evidenceevidence that the scores on your focal test are not highly related to the scores from other tests that are designed to measure theoretically different constructs. This kind of evidence shows that your test is not a measure of those other things (i.e., other constructs). Putting the ideas of convergent and divergent evidence together, the point is that to show that a new test measures what it is supposed to measure, you want it to correlate with other measures of that construct (convergent evidence) but you also want it NOT to correlate strongly with measures of other things (divergent evidence). You want your test to overlap with similar tests and to diverge from tests of different things. In short, both convergent and divergent evidence are desirable.
Page 40 of 179
Known groups evidence is also useful in demonstrating validity. This is evidence that groups that are known to differ on the construct do differ on the test in the hypothesized direction. For example, if you develop a test of gender roles, you would hypothesize that females will score higher on femininity and males will score higher on masculinity. Then you would test this hypothesis to see if you have evidence of validity.
Now, to summarize these three major methods for obtaining evidence of validity, look again at Table 5.6 (also shown below). Please note that, if you think we have spent a lot of time on validity and measurement, the reason is because validity is so important in empirical research. Remember, without good measurement we end up with GIGO (garbage in, garbage out).
Using Reliability and Validity Information You must be careful when interpreting the reliability and validity evidence provided with standardized tests and in empirical research journal articles. With standardized tests, the reported validity and reliability data are typically based on a norming group (which is an actual group of people). If the people with which you intend to use a test are very different from those in the norming group, then the validity and reliability evidence provided with the test become questionable. Remember that
Page 41 of 179
what you need to know is whether a test will work with the people in your classroom or in your research study. When reading journal articles, you should view an article positively to the degree that the researchers provide reliability and validity evidence for the measures that they use. Two related questions to ask when reading and evaluating an empirical research article are It this research study based on good measurement? and Do I believe that these researchers used good measures? If the answers are yes, then give the article high marks for measurement. If the answers are no, then you should invoke the GIGO principle (garbage in, garbage out).
Educational and Psychological Tests Three primary types of educational and psychological tests are discussed in your textbook: intelligence tests, personality tests, and educational assessment tests. 1) Intelligence Tests Intelligence has many definitions because a single prototype does not exist. Although far from being a perfect definition, here is our definition:intelligence is the ability to think abstractly and to learn readily from experience. Although the construct of intelligence is hard to define, it still has utility because it can be measured and it is related to many other constructs. For some examples of intelligence tests, click here.
2) Personality Tests. Personality is a construct similar to intelligence in that a single prototype does not exist. Here is our definition: personality is the relatively permanent patterns that characterize and can be use to classify individuals. Most personality tests are self-report measures. A self-report measure is a test-taking method in which the participants check or rate the degree to which various characteristics are descriptive of themselves. Performance measures of personality are also used. A performance measure is a testtaking method in which the participants perform some real-life behavior that is observed by the researcher. Personality has also been measured with projective tests. A projective test is a testtaking method in which the participants provide responses to ambiguous stimuli. The test administrator searches for patterns on participants responses. Projective tests tend to be quite difficult to interpret and are not commonly used in quantitative research. For some examples of personality tests, click here.
3) Educational Assessment Tests.

Page 42 of 179
There are four subtypes of educational assessment tests:
Preschool Assessment Tests. --These are typically screening tests because the predictive validity of many of these tests is weak. Achievement Tests. --These are designed to measure the degree of learning that has taken place after a person has been exposed to a specific learning experience. They can be teacher constructed or standardized tests. For some examples of achievement tests, click here.
Aptitude Tests. --These focus on information acquired through the informal learning that goes on in life. --They are often used to predict future performance whereas achievement tests are used to measure current performance. Diagnostic Tests. --These tests are used to identify the locus of academic difficulties in students.
Sources of Information about Tests The two most important main sources of information about tests are the Mental Measurements Yearbook (MMY) and Tests in Print (TIP). Some additional sources are provided in Table 5.7. Also, here are some useful internet links (from Table 5.8):
Page 43 of 179
Page 44 of 179
Chapter 6 Methods of Data Collection (Note: For the concept map that goes with this lecture, click here. Remember: concept maps help provide the big picture as well as show how the parts are interrelated.) The purpose of Chapter 6 is to help you to learn how to collect data for a research project. The term method of data collection simply refers to how the researcher obtains the empirical data to be used to answer his or her research questions. Once data are collected they are analyzed and interpreted and turned into information and results or findings. All empirical research relies on one or more method of data collection. It is important to consider and utilize the fundamental principle of mixed research during the planning of a research study. The principle states that researchers should mix methods (including methods of data collection as well as methods of research) in a way that is likely to provide complementary strengths and nonoverlapping weaknesses. We will provide you with additional tables (not in the chapter because of space limitations) for each method of data collection so that you can compare the strengths and weaknesses of each method of data collection and attempt to put together the match that will best serve your purpose and will follow the fundamental principle of mixed research. The focus in this chapter is on methods of data collection, not methods of research (which are covered in later chapters). There are six major methods of data collection. We will briefly summarize each of these in this lecture:
Tests (i.e., includes standardized tests that usually include information on reliability, validity, and norms as well as tests constructed by researchers for specific purposes, skills tests, etc). Questionnaires (i.e., self-report instruments). Interviews (i.e., situations where the researcher interviews the participants). Focus groups (i.e., a small group discussion with a group moderator present to keep the discussion focused). Observation (i.e., looking at what people actually do). Existing or Secondary data (i.e., using data that are originally collected and then archived or any other kind of data that was simply left behind at an earlier time for some other purpose).
Page 45 of 179
Tests Tests are commonly used in research to measure personality, aptitude, achievement, and performance. The last chapter discussed standardized tests; therefore, we only have a brief discussion in this chapter. Note that tests can also be used to complement other measures (following the fundamental principle of mixed research). In addition to the tests discussed in the last chapter, note that sometimes, a researcher must develop a new test to measure the specific knowledge, skills, behavior, or cognitive activity that is being studied. For example, a researcher might need to measure response time to a memory task using a mechanical apparatus or develop a test to measure a specific mental or cognitive activity (which obviously cannot be directly observed). An excellent source of tests (and other measures) (that we didnt get into the chapter in time) is called The Directory of Unpublished Experimental Mental Measures (2003) edited by Goldman and Mitchell, published by the American Psychological Association. We list the major sources of tests and test reviews in Table 5.7. We listed the major internet sources for finding tests in Table 5.8 Remember that if a test has already been developed that purports to measure what you want to measure, then you should strongly consider using it rather. The following table lists the strengths and weaknesses of tests. It, in conjunction with the tables for the other five major methods of data collection, will help you in applying the fundamental principle of mixed research: Strengths and Weaknesses of Tests Strengths of tests (especially standardized tests) Can provide measures of many characteristics of people. Often standardized (i.e., the same stimulus is provided to all participants). Allows comparability of common measures across research populations. Strong psychometric properties (high measurement validity). Availability of reference group data. Many tests can be administered to groups which saves time. Can provide hard, quantitative data. Tests are usually already developed. A wide range of tests is available (most content can be tapped). Response rate is high for group administered tests. Ease of data analysis because of quantitative nature of data. Weaknesses of tests (especially standardized tests) Can be expensive if test must be purchased for each research participant. Reactive effects such as social desirability can occur. Test may not be appropriate for a local or unique population. Open-ended questions and probing not available. Tests are sometimes biased against certain groups of people. Nonresponse to selected items on the test. Some tests lack psychometric data.
Page 46 of 179
Questionnaires A questionnaire is a self-report data collection instrument that is filled out by research participants. Questionnaires are usually paper-and-pencil instruments, but they can also be placed on the web for participants to go to and fill out. Questionnaires are sometimes called survey instruments, which is fine, but the actual questionnaire should not be called the survey. The word survey refers to the process of using a questionnaire or interview protocol to collect data. For example, you might do a survey of teacher attitudes about inclusion; the instrument of data collection should be called the questionnaire or the survey instrument. A questionnaire is composed of questions and/or statements. Because one way to learn to write questionnaires is to look at other questionnaires, here is an example of a typical questionnaire that has mostly quantitative items, click here. For an example of a qualitative questionnaire, click here. When developing a questionnaire make sure that you follow the 15 Principles of Questionnaire Construction. I will briefly review the 15 principles now. Principle 1: Make sure the questionnaire items match your research objectives. Principle 2: Understand your research participants. Your participants (not you!) will be filling out the questionnaire. Consider the demographic and cultural characteristics of your potential participants so that you can make it understandable to them. Principle 3: Use natural and familiar language. Familiar language is comforting; jargon is not. Principle 4: Write items that are clear, precise, and relatively short. If your participants don't understand the items, your data will be invalid (i.e., your research study will have the garbage in, garbage out, GIGO, syndrome). Short items are more easily understood and less stressful than long items. Principle 5: Do not use "leading" or "loaded" questions. Leading questions lead the participant to where you want him or her to be. Loaded questions include loaded words (i.e., words that create an emotional reaction or response by your participants). Always remember that you do not want the participant's response to be the result of how you worded the question. Always use neutral wording. Principle 6: Avoid double-barreled questions. A double-barreled question combines two or more issues in a single question (e.g., here is a double barreled question: Do you elicit information from parents and other teachers? Its double barreled because if someone answered it, you would not know whether they were referring to parents or teachers or both).
Page 47 of 179
Does the question include the word "and"? If yes, it might be a double-barreled question. Answers to double-barreled questions are ambiguous because two or more ideas are confounded.
Principle 7: Avoid double negatives. Does the answer provided by the participant require combining two negatives? (e.g., "I disagree that teachers should not be required to supervise their students during library time"). If yes, rewrite it. Principle 8: Determine whether an open-ended or a closed ended question is needed. Open-ended questions provide qualitative data in the participants' own words. Here is an open ended question: How can your principal improve the morale at your school? _______________________________________________ Closed-ended questions provide quantitative data based on the researcher's response categories. Here is an example of a closed-ended question:
Open-ended questions are common in exploratory research and closed-ended questions are common in confirmatory research.
Principle 9: Use mutually exclusive and exhaustive response categories for closed-ended questions. Mutually exclusive categories do not overlap (e.g., ages 0-10, 10-20, 20-30 are NOT mutually exclusive and should be rewritten as less than 10, 10-19, 20-29, 30-39, ...). Exhaustive categories include all possible responses (e.g., if you are doing a national survey of adult citizens (i.e., 18 or older) then the these categories (18-19, 20-29, 30-39, 40-49, 50-59, 60-69) are NOT exhaustive because there is no where to put someone who is 70 years old or older.
Principle 10: Consider the different types of response categories available for closed-ended questionnaire items. Rating scales are the most commonly used, including:
o
Numerical rating scales (where the endpoints are anchored; sometimes the center point or area is also labeled). 3 4 5 6 7 Very High
1 2 Very Low
Page 48 of 179
Fully anchored rating scales (where all the points on the scale are anchored). 2 Agree 3 Neutral 4 Disagree 5 Strongly Disagree
1 Strongly Agree
1 Strongly Agree
2 Agree
3 Disagree
4 Strongly Disagree
Omitting the center point on a rating scale (e.g., using a 4-point rather than a 5point rating scale) does not appreciably affect the response pattern. Some researchers prefer 5- point rating scales; other researchers prefer 4-point rating scales. Both generally work well. You should use somewhere from four to eleven points on your rating scale. Personally, I like the 4 and 5-point scales because all of the points are easily anchored. I do not recommend a 1 to 10 scale because too many respondents mistakenly view the 5 as the center point. If you want to use a wide scale like this, use a 0 to 10 scale (where the 5 is the middle point) and label the 5 with the anchor medium or some other appropriate anchor.
Rankings (i.e., where participants put their responses into rank order, such as most important, second most important, and third most important). Semantic differential (i.e., where one item stem and multiple scales, that are anchored with polar opposites or antonyms, are included and are rated by the participants). Checklists (i.e., where participants "check all of the responses in a list that apply to them").
Principle 11: Use multiple items to measure abstract constructs. This is required if you want your measures to have high reliability and validity. One approach is to use a summated rating scale(such as the Rosenberg Self-Esteem Scale that is composed of 10 items, with each item measuring self-esteem). Another name for a summated rating scale is a Likert Scale because the summated rating scale was pretty much invented by the famous social psychologist named Rensis Likert. Here is the Rosenberg Self-Esteem Scale, which is a summated rating scale:
Page 49 of 179
Principle 12: Consider using multiple methods when measuring abstract constructs. The idea here is that if you only use one method of measurement, then your measurement may be an artifact of that method of measurement. On the other hand, if you use two or more methods of measurement you will be able to see whether the answers depend on the method (i.e., are the answers corroborated across the methods of measurement or do you get different answers for the different methods?). For example, you might measure students self-esteem via the Rosenberg Scale just shown (which is used in a self-report form) as well as using teachers ratings of the students self-esteem; you might even want to observe the students in situations that should provide indications of high and low self-esteem.
Page 50 of 179
Principle 13: Use caution if you reverse the wording in some of the items to prevent response sets. (A response set is the tendency of a participant to respond in a specific direction to items regardless of the item content.) Reversing the wording of some items can help ensure that participants don't just "speed through" the instrument, checking "yes" or "strongly agree" for all the items. On the other hand, you may want to avoid reverse wording if it creates a double negative. Also, recent research suggests that the use of reverse wording reduces the reliability and validity of scales. Therefore, you should generally use reverse wording sparingly, if at all. Principle 14: Develop a questionnaire that is easy for the participant to use. The participant must not get confused or lost anywhere in the questionnaire. Make sure that the directions are clear and that any filter questions used are easy to follow. Principle 15: Always pilot test your questionnaire. You will always find some problems that you have overlooked! The best pilot tests are with people similar to the ones to be included in your research study. After pilot testing your questionnaire, revise it and pilot test it again, until it works correctly. The following table lists the strengths and weaknesses of questionnaires. It, in conjunction with the tables for the other five major methods of data collection, will help you in applying the fundamental principle of mixed research:
Strengths and Weaknesses of Questionnaires Strengths of questionnaires Good for measuring attitudes and eliciting other content from research participants. Inexpensive (especially mail questionnaires and group administered questionnaires). Can provide information about participants internal meanings and ways of thinking. Can administer to probability samples. Quick turnaround. Can be administered to groups. Perceived anonymity by respondent may be high. Moderately high measurement validity (i.e., high reliability and validity) for well constructed and validated questionnaires. Closed-ended items can provide exact information needed by researcher. Open-ended items can provide detailed information in respondents own words. Ease of data analysis for closed-ended items. Useful for exploration as well as confirmation. Weaknesses of questionnaires Usually must be kept short.
Page 51 of 179
Reactive effects may occur (e.g., interviewees may try to show only what is socially desirable). Nonresponse to selective items. People filling out questionnaires may not recall important information and may lack self-awareness. Response rate may be low for mail and email questionnaires. Open-ended items may reflect differences in verbal ability, obscuring the issues of interest. Data analysis can be time consuming for open-ended items. Measures need validation.
Interviews In an interview, the interviewer asks the interviewee questions (in-person or over the telephone). Trust and rapport are important. Probing is available (unlike in paper-and-pencil questionnaires) and is used to reach clarity or gain additional information Here are some examples of standard probes: - Anything else? - Any other reason? - What do you mean? Interviews may be quantitative or qualitative. Quantitative interviews: Are standardized (i.e., the same information is provided to everyone). Use closed-ended questions. Exhibit 6.3 has an example of an interview protocol. Note that it looks very much like a questionnaire! The key difference between an interview protocol and a questionnaire is that the interview protocol is read by the interviewer who also records the answers (you have probably participated in telephone surveys before...you were interviewed). Qualitative interviews They are based on open-ended questions. There are three types of qualitative interviews. 1) Informal Conversational Interview. - It is spontaneous. - It is loosely structured (i.e., no interview protocol us used). 2) Interview Guide Approach. It is more structured than the informal conversational interview. It includes an interview protocol listing the open-ended questions. The questions can be asked in any order by the interviewer.
Page 52 of 179
Question wording can be changed by the interviewer if it is deemed appropriate.
3) Standardized Open-Ended Interview. Open-ended questions are written on an interview protocol, and they are asked in the exact order given on the protocol. The wording of the questions cannot be changed. The following table lists the strengths and weaknesses of interviews. It, in conjunction with the tables for the other five major methods of data collection, will help you in applying the fundamental principle of mixed research: Strengths and Weaknesses of Interviews Strengths of interviews Good for measuring attitudes and most other content of interest. Allows probing and posing of follow-up questions by the interviewer. Can provide in-depth information. Can provide information about participants internal meanings and ways of thinking. Closed-ended interviews provide exact information needed by researcher. Telephone and e-mail interviews provide very quick turnaround. Moderately high measurement validity (i.e., high reliability and validity) for well constructed and tested interview protocols. Can use with probability samples. Relatively high response rates are often attainable. Useful for exploration as well as confirmation. Weaknesses of interviews In-person interviews usually are expensive and time consuming. Reactive effects (e.g., interviewees may try to show only what is socially desirable). Investigator effects may occur (e.g., untrained interviewers may distort data because of personal biases and poor interviewing skills). Interviewees may not recall important information and may lack self-awareness. Perceived anonymity by respondents may be low. Data analysis can be time consuming for open-ended items. Measures need validation. Focus Groups A focus group is a situation where a focus group moderator keeps a small and homogeneous group (of 6-12 people) focused on the discussion of a research topic or issue. Focus group sessions generally last between one and three hours and they are recorded using audio and/or videotapes. Focus groups are useful for exploring ideas and obtaining in-depth information about how people think about an issue. The following table lists the strengths and weaknesses of focus groups. It, in conjunction with the tables for the other five major methods of data collection, will help you in applying the fundamental principle of mixed research:
Page 53 of 179
Strengths and Weaknesses of Focus Groups Strengths of focus groups Useful for exploring ideas and concepts. Provides window into participants internal thinking. Can obtain in-depth information. Can examine how participants react to each other. Allows probing. Most content can be tapped. Allows quick turnaround. Weaknesses of focus groups Sometimes expensive. May be difficult to find a focus group moderator with good facilitative and rapport building skills. Reactive and investigator effects may occur if participants feel they are being watched or studied. May be dominated by one or two participants. Difficult to generalize results if small, unrepresentative samples of participants are used. May include large amount of extra or unnecessary information. Measurement validity may be low. Usually should not be the only data collection methods used in a study. Data analysis can be time consuming because of the open-ended nature of the data.
Observation In the method of data collection called observation, the researcher observes participants in natural and/or structured environments. It is important to collect observational data (in addition to attitudinal data) because what people say is not always what they do! Observation can be carried out in two types of environments: Laboratory observation (which is done in a lab set up by the researcher). Naturalistic observation (which is done in real-world settings). There are two important forms of observation: quantitative observation and qualitative observation. 1) Quantitative observation involves standardization procedures, and it produces quantitative data. The following can be standardized: - Who is observed. - What is observed. - When the observations are to take place. - Where the observations are to take place. - How the observations are to take place. Standardized instruments (e.g., checklists) are often used in quantitative observation.
Page 54 of 179
Sampling procedures are also often used in quantitative observation: --Time-interval sampling (i.e., observing during time intervals, e.g., during the first minute of each 10 minute interval). --Event sampling (i.e., observing after an event has taken place, e.g., observing after teacher asks a question).
2) Qualitative observation is exploratory and open- ended, and the researcher takes extensive field notes. The qualitative observer may take on four different roles that make up a continuum: Complete participant (i.e., becoming a full member of the group and not informing the participants that you are studying them). Participant-as-Observer (i.e., spending extensive time "inside" and informing the participants that you are studying them). Observer-as-Participant (i.e., spending a limited amount of time "inside" and informing them that you are studying them). Complete Observer (i.e., observing from the "outside" and not informing that participants that you are studying them). The following table lists the strengths and weaknesses of observational data. It, in conjunction with the tables for the other five major methods of data collection, will help you in applying the fundamental principle of mixed research: Strengths and Weaknesses of Observational Data Strengths of observational data Allows one to directly see what people do without having to rely on what they say they do. Provides firsthand experience, especially if the observer participates in activities. Can provide relatively objective measurement of behavior (especially for standardized observations). Observer can determine what does not occur. Observer may see things that escape the awareness of people in the setting. Excellent way to discover what is occurring in a setting. Helps in understanding importance of contextual factors. Can be used with participants with weak verbal skills. May provide information on things people would otherwise be unwilling to talk about. Observer may move beyond selective perceptions of people in the setting. Good for description. Provides moderate degree of realism (when done outside of the laboratory). Weaknesses of observational data Reasons for observed behavior may be unclear. Reactive effects may occur when respondents know they are being observed (e.g., people being observed may behave in atypical ways). Investigator effects (e.g., personal biases and selective perception of observers) Observer may go native (i.e., over-identifying with the group being studied).
Page 55 of 179
Sampling of observed people and settings may be limited. Cannot observe large or dispersed populations. Some settings and content of interest cannot be observed. Collection of unimportant material may be moderately high. More expensive to conduct than questionnaires and tests. Data analysis can be time consuming.
Secondary/Existing Data Secondary data (i.e., data originally used for a different purpose) are contrasted with primary data (i.e., original data collected for the new research study). The most commonly used secondary data are documents, physical data, and archived research data. 1. Documents. There are two main kinds of documents. Personal documents (i.e., things written or recorded for private purposes). Letters, diaries, family pictures. Official documents (i.e., things written or recorded for public or private organizations). Newspapers, annual reports, yearbooks, minutes. 2. Physical data (are any material thing created or left by humans that might provide information about a phenomenon of interest to a researcher). 3. Archived research data (i.e., research data collected by other researchers for other purposes, and these data are save often in tape form or cd form so that others might later use the data). For the biggest repository of archived research data, click here. The following table lists the strengths and weaknesses of secondary/existing data. It, in conjunction with the tables for the other five major methods of data collection, will help you in applying the fundamental principle of mixed research: Strengths and Weaknesses of Secondary Data Strengths of documents and physical data: Can provide insight into what people think and what they do. Unobtrusive, making reactive and investigator effects very unlikely. Can be collected for time periods occurring in the past (e.g., historical data). Provides useful background and historical data on people, groups, and organizations. Useful for corroboration. Grounded in local setting. Useful for exploration. Strengths of archived research data: Archived research data are available on a wide variety of topics. Inexpensive. Often are reliable and valid (high measurement validity). Can study trends.
Page 56 of 179
Ease of data analysis. Often based on high quality or large probability samples. Weaknesses of documents and physical data: May be incomplete. May be representative only of one perspective. Access to some types of content is limited. May not provide insight into participants personal thinking for physical data. May not apply to general populations. Weaknesses of archived research data: May not be available for the population of interest to you. May not be available for the research questions of interest to you. Data may be dated. Open-ended or qualitative data usually not available. Many of the most important findings have already been mined from the data.
Page 57 of 179
Chapter 7 Sampling (Reminder: Dont forget to utilize the concept maps and study questions as you study this and the other chapters.) The purpose of Chapter 7 it to help you to learn about sampling in quantitative and qualitative research. In other words, you will learn how participants are selected to be part of empirical research studies. Sampling refers to drawing a sample (a subset) from a population (the full set). The usual goal in sampling is to produce a representative sample (i.e., a sample that is similar to the population on all characteristics, except that it includes fewer people because it is a sample rather than the complete population). Metaphorically, a perfect representative sample would be a "mirror image" of the population from which it was selected (again, except that it would include fewer people). Terminology Used in Sampling Here are some important terms used in sampling: A sample is a set of elements taken from a larger population. The sample is a subset of the population which is the full set of elements or people or whatever you are sampling. A statistic is a numerical characteristic of a sample, but a parameter is a numerical characteristic of population. Sampling error refers to the difference between the value of a sample statistic, such as the sample mean, and the true value of the population parameter, such as the population mean. Note: some error is always present in sampling. With random sampling methods, the error is random rather than systematic. The response rate is the percentage of people in the sample selected for the study who actually participate in the study. A sampling frame is just a list of all the people that are in the population. Here is an example of a sampling frame (a list of all the names in my population, and they are numbered). Note that the following sampling frame also has information on age and gender included in case you want to draw some samples and do some calculations.
Page 58 of 179
Random Sampling Techniques The two major types of sampling in quantitative research are random sampling and nonrandom sampling. The former produces representative samples. The latter does not produce representative samples.
Simple Random Sampling The first type of random sampling is called simple random sampling. It's the most basic type of random sampling. It is an equal probability sampling method (which is abbreviated by EPSEM). Remember that EPSEM means "everyone in the sampling frame has an equal chance of being in the final sample." You should understand that using an EPSEM is important because that is what produces "representative" samples (i.e., samples that represent the populations from which they were selected)! You will see below that, simple random samples are not the only equal probability sampling method (EPSEM). It is the most basic and well know, however.
Page 59 of 179
Sampling experts recommend random sampling "without replacement" rather than random sampling "with replacement" because the former is a little more efficient in producing representative samples (i.e., it requires slightly fewer people and is therefore a little cheaper).
How do you draw a simple random sample?" One way is to put all the names from your population into a hat and then select a subset (e.g., pull out 100 names from the hat). In the chapter we demonstrate the use of a table of random numbers. These days, researchers often use computer programs to randomly select their samples. Here is a program the you can easily use for simple random sampling, just click here. To use a computer program (called a random number generator) you must make sure that you give each of the people in your population a number. Then the program will give you a list of randomly selected numbers within the range you give it. After getting the random numbers, you identify the people with those randomly selected numbers and try to get them to participate in your research study! If you decide to use a table of random numbers such as the one shown on page 201 of the book, heres what you need to do. First, pick a place to start, and then move in one direction (e.g., move down the columns). Use the number of digits in the table that is appropriate for your population size (e.g., if there are 2500 people in the population then use 4 digits). Once you get the set of randomly selected numbers, find out who those people are and try to get them to participate in your research study. Also, if you get the same number twice, just ignore it and move on to the next number. Systematic Sampling Systematic sampling is the second type of random sampling. It is an equal probability sampling method (EPSEM). Remember simple random sampling was also an EPSEM. Systematic sampling involves three steps: First, determine the sampling interval, which is symbolized by "k," (it is the population size divided by the desired sample size). Second, randomly select a number between 1 and k, and include that person in your sample. Third, also include each kth element in your sample. For example if k is 10 and your randomly selected number between 1 and 10 was 5, then you will select persons 5, 15, 25, 35, 45, etc. When you get to the end of your sampling frame you will have all the people to be included in your sample. One potential (but rarely occurring) problem is called periodicity (i.e., there is a cyclical pattern in the sampling frame). It could occur when you attach several ordered lists to one another (e.g., if you had took lists from multiple teachers who had all ordered their lists on some variable such as IQ). On the other hand, stratification within one overall list is not a problem at all (e.g., if you have one list and have it ordered by gender, or by IQ). Basically, if you are attaching multiple lists to one another, there
Page 60 of 179
could be a problem. It would be better to reorganize the lists into one overall list (i.e., sampling frame). Stratified Random Sampling The third type of random sampling is called stratified random sampling. First, stratify your sampling frame (e.g., divide it into the males and the females if you are using gender as your stratification variable). Second, take a random sample from each group (i.e., take a random sample of males and a random sample of females). Put these two sets of people together and you now have your final sample. (Note that you could also take a systematic sample from the joined lists if thats easier.) There are actually two different types of stratified sampling. The first type of stratified sampling, and most common, is called proportional stratified sampling. In proportional stratified sampling you must make sure the subsamples (e.g., the samples of males and females) are proportional to their sizes in the population. Note that proportional stratified sampling is an equal probability sampling method (i.e., it is EPSEM), which is good! The second type of stratified sampling is called disproportional stratified sampling. In disproportional stratified sampling, the subsamples are not proportional to their sizes in the population. Here is an example showing the difference between proportional and disproportional stratified sampling: Assume that your population is 75% female and 25% male. Assume also that you want a sample of size 100 and you want to stratify on the variable called gender. For proportional stratified sampling, you would randomly select 75 females and 25 males from the population. For disproportional stratified sampling, you might randomly select 50 females and 50 males from the population. Cluster Random Sampling In this type of sampling you randomly select clusters rather than individual type units in the first stage of sampling. A cluster has more than one unit in it (e.g., a school, a classroom, a team). We discuss two types of cluster sampling in the chapter, one-stage and two-stage (note that more stages are possible in multistage sampling but are left for books on sampling). The first type of cluster sampling is called one-stage cluster sampling. To select a one-stage cluster sample, you first select a random sample of clusters.
Page 61 of 179
Then you include in your final sample all of the individual units that are in the selected clusters.
The second type of cluster sampling is called two-stage cluster sampling. In the first stage you take a random sample of clusters (i.e., just like you did in onestage cluster sampling). In the second stage, you take a random sample of elements from each of the clusters you selected in stage one (e.g., in stage two you might randomly select 10 students from each of the 15 classrooms you selected in stage one). Important points about cluster sampling: Cluster sampling is an equal probability sampling method (EPSEM) ONLY if the clusters are approximately the same size. (Remember that EPSEM is very important because that is what produces representative samples.) When clusters are not the same size, you must fix the problem by using the technique called "probability proportional to size" (PPS) for selecting your clusters in stage one. This will make your cluster sampling an equal probability sampling method (EPSEM), and it will, therefore, produce representative samples. Nonrandom Sampling Techniques The other major type of sampling used in quantitative research is nonrandom sampling (i.e., when you do not use one of the ransom sampling techniques). There are four main types of nonrandom sampling: The first type of nonrandom sampling is called convenience sampling (i.e., it simply involves using the people who are the most available or the most easily selected to be in your research study). The second type of nonrandom sampling is called quota sampling (i.e., it involves setting quotas and then using convenience sampling to obtain those quotas). A set of quotas might be given to you as follows: find 25 African American males, 25 European American males, 25 African American females, and 25 European American females. You use convenience sampling to actually find the people, but you must make sure you have the right number of people for each quota. The third type of nonrandom sampling is called purposive sampling (i.e., the researcher specifies the characteristics of the population of interest and then locates individuals who match those characteristics). For example, you might decide that you want to only include "boys who are in the 7th grade and have been diagnosed with ADHD" in your research study. You would then, try to find 50 students who meet your "inclusion criteria" and include them in your research study. The fourth type of nonrandom sampling is called snowball sampling (i.e., each research participant is asked to identify other potential research participants who have a certain characteristic). You start with one or a few participants, ask them for more, find those, ask them for some, and continue until you have a sufficient sample size. This technique might be used for a hard to find population (e.g., where no sampling frame exists). For example, you might want to use snowball sampling if you wanted to do a study of people in your city who have a lot of power in the area of educational policy making (in addition to the already known positions of power, such as the school board and the
Page 62 of 179
school system superintendent). Random Selection and Random Assignment In random selection (using an equal probability selection method), you select a sample from a population using one of the random sampling techniques discussed earlier. The resulting random sample will be like a "mirror image" of the population, except for chance differences. For example, if you randomly select (e.g., using simple random sampling) 1000 people from the adult population in Ann Arbor, Michigan, the sample will look like the adult population of Ann Arbor. In random assignment, you start with a set of people (you already have a sample, which very well may be a convenience sample), and then you randomly divide that set of people into two or more groups (i.e., you take the full set and randomly divide it into subsets). You are taking a set of people and assigning them to two or more groups. The groups or subsets will be "mirror images" of each other (except for chance differences). For example, if you start with a convenience sample of 100 people and randomly assign them to two groups of 50 people, the two groups will be "equivalent" on all known and unknown variables. Random assignment generates similar groups, and it is used in the strongest of the experimental research designs. To see exactly how to do random assignment, then click here. You can also use this randomizer program for random assignment, just click here. Determining the Sample Size When Random Sampling is Used Would you like to know the answer to the question "How big should my sample be?" I will start with my four "simple" answers to your question: Try to get as big of a sample as you can for your study (i.e., because the bigger the sample the better). If your population is size 100 or less, then include the whole population rather than taking a sample (i.e., don't take a sample; include the whole population). Look at other studies in the research literature and see how many they are selecting. For an exact number, just look at Figure 7.5 which shows recommended sample sizes. There are many sample size calculators on the web but they generally require you to learn a little bit of statistics first. Here is one click here. Ill list more when we get to the chapter on statistics. I want to make a few more points about sample size in this chapter. In particular, note that you will need larger samples under these circumstances: When the population is very heterogeneous. When you want to breakdown the data into multiple categories. When you want a relatively narrow confidence interval (e.g., note that the estimate that 75% of teachers support a policy plus or minus 4% is more narrow than the estimate of 75% plus or minus 5%).
Page 63 of 179
When you expect a weak relationship or a small effect. When you use a less efficient technique of random sampling (e.g., cluster sampling is less efficient than proportional stratified sampling). When you expect to have a low response rate. The response rate is the percentage of people in your sample who agree to be in your study.
Sampling in Qualitative Research Sampling in qualitative research is usually purposive (see the above discussion of purposive sampling). The primary goal in qualitative research is to select information rich cases. There are several specific purposive sampling techniques that are used in qualitative research: Maximum variation sampling (i.e., you select a wide range of cases). Homogeneous sample selection (i.e., you select a small and homogeneous case or set of cases for intensive study). Extreme case sampling (i.e., you select cases that represent the extremes on some dimension). Typical-case sampling (i.e., you select typical or average cases). Critical-case sampling (i.e., you select cases that are known to be very important). Negative-case sampling (i.e., you purposively select cases that disconfirm your generalizations, so that you can make sure that you are not just selectively finding cases to support your personal theory). Opportunistic sampling (i.e., you select useful cases as the opportunity arises). Mixed purposeful sampling (i.e., you can mix the sampling strategies we have discussed into more complex designs tailored to your specific needs). For a little more information on sampling in qualitative research, click here. (Hit the right arrow key to move from slide to slide.)
Page 64 of 179
Chapter 8 Validity of Research Results (Reminder: Dont forget to utilize the concept maps and study questions as you study this and the other chapters.) In this chapter we discuss validity issues for quantitative research and for qualitative research. Validity Issues in the Design of Quantitative Research On page 228 we make a distinction between an extraneous variable and a confounding variable. An extraneous variable is a variable that MAY compete with the independent variable in explaining the outcome of a study. A confounding variable (also called a third variable) is an extraneous variable that DOES cause a problem because we know that it DOES have a relationship with the independent and dependent variables. A confounding variable is a variable that systematically varies or influences the independent variable and also influences the dependent variable. When you design a research study in which you want to make a statement about cause and effect, you must think about what extraneous variables are probably confounding variables and do something about it. We gave an example of "The Pepsi Challenge" (on p. 228) and showed that anything that varies with the presentation of Coke or Pepsi is an extraneous variable that may confound the relationship (i.e., it may also be a confounding variable). For example, perhaps people are more likely to pick Pepsi over Coke if different letters are placed on the Pepsi and Coke cups (e.g., if Pepsi is served in cups with the letter "M" and Coke is served in cups with the letter "Q"). If this is true then the variable of cup letter (M versus Q) is a confounding variable. In short we must always worry about extraneous variables (especially confounding variables) when we are interested in conducting research that will allow us to make a conclusion about cause and effect. There are four major types of validity in quantitative research: statistical conclusion validity, internal validity, construct validity, and external validity. We will discuss each of these in this lecture. Statistical Conclusion Validity Statistical conclusion validity refers to the ability to make an accurate assessment about whether the independent and dependent variables are related and about the strength of that relationship. So the two key questions here are 1) Are the variables related? and 2) How strong is the relationship? Typically, null hypothesis significance testing (discussed in Chapter 16) is used to determine whether two variables are related in the population from which the study data were selected. This procedure will tell you whether a relationship is statistically significant or not. For now, just remember that a relationship is said to be statistically significant when we do NOT believe that it is nothing but a chance occurrence, and a relationship is not statistically significant when the null hypothesis testing procedure says that any
Page 65 of 179
observed relationship is probably nothing more than normal sampling error or fluctuation. To determine how STRONG a relationship is, researchers use what are called effect size indicators. There are many different effect size indicators, but they all tell you how strong a relationship is. For now remember that the answer to the first key question (Are the variables related?) is answered using null hypothesis significance testing, and the answer to the second key question (How strong is the relationship?) is answered using an effect size indicator. The concepts of significance testing and effect size indicators are explained in Chapter 16.
Internal Validity When I hear the term "internal validity" the word cause always comes into my mind. That's because internal validity is defined as the "approximate validity with which we infer that a relationship between two variables is causal" (Cook and Campbell, 1979. P.37). A good synonym for the term internal validity is causal validity because that is what internal validity is all about. If you can show that you have high internal validity (i.e., high causal validity) then you can conclude that you have strong evidence of causality; however, if you have low internal validity then you must conclude that you have little or no evidence of causality. Types of Causal Relationships There are two different types of causal relationships: causal description and causal explanation. Causal description involves describing the consequences of manipulating an independent variable. In general, causal description involves showing that changes in variable X (the IV) cause changes in variable Y (the DV): X---->Y Causal explanation involves more than just causal description. Causal explanation involves explaining the mechanisms through which and the conditions under which a causal relationship holds. This involves the inclusion (in your research study) of mediating or intervening variables and moderator variables. Mediating and moderator variables are defined in Chapter Two in Table 2.2 (on page 36). To see Table 2.2, click here. Criteria for Inferring Causation There are three main conditions that are always required if you want to make a claim that changes in one variable cause changes in another variable. We call these the three necessary conditions for causality. These three conditions are summarized below in Table 11.1:
Page 66 of 179
If you want to conclude that X causes Y you must make sure that the three above necessary conditions are met. It is also helpful if you have a theoretical rationale explaining the causal relationship. For example, there is a correlation between coffee drinking and likelihood of having a heart attack. One big problem with concluding that coffee drinking causes heart attacks is that cigarette smoking is related to both of these variables (i.e., we have a Condition 3 problem). In particular, people who drink little coffee are less likely to smoke cigarettes than are people who drink a lot of coffee. Therefore, perhaps the observed relationship between coffee drinking and heart attacks is the result of the extraneous variable of smoking. The researcher would have to "control for" smoking in order to determine if this rival explanation accounts for the original relationship.
Threats to Internal Validity In this section, we discuss several threats to internal validity that have been identified by research methodologists (especially by Campbell and Stanley, 1963). These threats to internal validity usually call into question the third necessary condition for causality (i.e., the "lack of alternative explanation condition"). Before discussing the specific threats, I want you to get the basic idea of two weak designs in your head. The first weak design is the one is the one-group pretest-posttest design which is depicted like this: O X O
In this design, a group is pretested, then a treatment is administered, and then the people are post tested. For example, you could measure your students' understanding of history at the beginning of the term, then you teach them history for the term, and then you measure them again on their understanding of history at the end of the term.
Page 67 of 179
The second weak design to remember for this chapter is called the posttest-only design with nonequivalent groups. In this lecture, I will also refer to this design as a twogroup design and sometimes as a multigroup design (since it has more than one group). XTreatment O2 ---------------------XControl O2
In this design, there is no pretest, one group gets the treatment and the other group gets no treatment or some different treatment, and both groups are post tested (e.g., you teach two classes history for a quarter and measure their understanding at the end for comparison). Furthermore, the groups are found wherever they already exist (i.e., participants are not randomly assigned to these groups).
In comparing the two designs just mentioned note that the comparison in the one group design is the participants' pretest scores with their posttest scores. The comparison in the two group design is between the two groups' posttest scores. Some researchers like to call the point of comparison the "counterfactual." In the onegroup pretest- posttest design shown above the counterfactual is the pretest. In the twogroup design shown above the counterfactual is the posttest of the control group. Remember this key point: In each of the multigroup research designs (designs that include more than one group of participants), you want the different groups to be the same on all extraneous variables and different ONLY on the independent variable (e.g., such that one group gets the treatment and the other group does not). In other words, you want the only systematic difference between the groups to be exposure to the independent variable.
The first threat to internal validity is called ambiguous temporal precedence. Ambiguous temporal precedence is defined as the inability of the researcher (based on the data) to specify which variable is the cause and which variable is the effect. If this threat is present then you are unable to meet the second of the three necessary conditions shown above in Table 11.1. That is, you cannot establish proper time order so you cannot make a conclusion of cause and effect. The second threat to internal validity is called the history threat. The history threat refers to any event, other than the planned treatment event, that occurs between the pretest and posttest measurement and has an influence on the dependent variable. In short, if both a treatment and a history effect occur between the pretest and the posttest, you will not know whether the observed difference between the pretest and the posttest is due to the treatment or due to the history event. In short, these two events are confounded. For example, the principal may come into the experimental classroom during the research study which alters the outcome. The history effect is a threat for the one group design but it is not a threat for the multigroup group design.
Page 68 of 179
You probably want to know why this it true. Well, in the one group design (shown above) you take as your measure of the effect of the treatment the difference in the pretest and posttest scores. In this case, this all or part of the difference could be due to a history effect; therefore, you don't know whether the change in scores is due to the treatment or to the history effect. They are confounded. The basic history effect is not a threat to the two group design (shown above) because now you are comparing the treatment group to a comparison group, and as long as the history effect occurs for both groups the difference between the two groups will not be because of a history effect.
The third second threat to internal validity is called maturation. Maturation is present when a physical or mental change occurs over time and it affects the participants' performance on the dependent variable. For example, if you measure first grade students' ability to perform arithmetic problems at the beginning of the year and again at the end of the year, some of their improvement will probably be due to their natural maturation (and not just due to what you have taught them during the year). Therefore in the one group design, you will not know if their improvement is due to the teacher or if it is due to maturation. Maturation is not a threat in the two group design because as long as the people in both groups mature at the same rate, the difference between the two groups will not be due to maturation. If you are following this logic about why these first two threats to internal validity are a problem for the one group design but not for the two group design then you have one of the major points of this chapter. This same logic is going to apply to the next three threats of testing, instrumentation, and regression artifacts. The fourth threat to internal validity is called testing. Testing refers to any change on the second administration of a test as a result of having previously taken the test. For example, let's say that you have a treatment that you believe will cause students to reduce racial stereotyping. You use the one group design and you have your participants take a pretest and posttest measuring their agreement with certain racial stereotypes. The problem is that perhaps their scores on the posttest are the result of being sensitized to the issue of racial stereotypes because they took a pretest. Therefore in the one group design, you will not know if their improvement from pretest to posttest is due to your treatment or if it is due to a testing effect. Testing is not a threat in the two group design because as long as the people in both groups are affected equally by the pretest, the difference between the two groups will not be due to testing. The two groups do differ on exposure to the treatment (i.e., one group gets the treatment and the other group does not). The fifth threat to internal validity is called instrumentation. Instrumentation refers to any change that occurs in the way the dependent variable is measured in the research study.
Page 69 of 179
For example, let's say that one person does your pretest assessment of students' racial stereotyping but you have a different person do your posttest assessment of students' stereotyping. Also assume that the second person tends to overlook much stereotyping but that the first person picks up on all stereotyping. The problem is that perhaps much of the positive gain occurring from the pretest to the posttest is due to the posttest assessment not picking up on the use of stereotyping. Therefore in the one group design, you will not know if their improvement from pretest to posttest is due to your treatment for reducing stereotyping or if it is due to an instrumentation effect. Instrumentation is not a threat in the two group design because as long as the people in both groups are affected equally by the instrumentation effect, the difference between the two groups will not be due to instrumentation.
The sixth threat to internal validity is called regression artifacts (or regression to the mean). Regression artifacts refers to the tendency of very high pretest scores to become lower and for very low pretest scores to become higher on post testing. You should always be on the lookout for regression to the mean when you select participants based on extreme (very high or very low) test scores. For example, let's say that you select people who have extremely high scores on your racial stereotyping test. Some of these scores are probably artificially high because of transient factors and a lack of perfect reliability. Therefore, if stereotyping goes down from pretest to posttest, some or all of the change may be due to a regression artifact. Therefore, in the one group design you will not know if improvement from pretest to posttest is due to your treatment or if it is due to a regression artifact. Regression artifacts is not a threat in the two group design because as long as the people in both groups are affected equally by the statistical regression effect, the difference between the two groups will not be due to regression to the mean. The seventh threat to internal validity is called differential selection. Differential selection only applies to multigroup designs. It refers to selecting participants for the various groups in your study that have differentcharacteristics. Remember, we want our groups to be the same on all variables except the treatment variable; the treatment variable is the only variable that we want to be systematically different for the groups. Table 8.1 list a few of the may characteristics on which the students in the different groups may differ (e.g., age, anxiety, gender, intelligence, reading ability, etc.). Unlike the previous five threats, selection is not an internal validity problem for the one group design but it is a problem for the two or multigroup design. Looking at the definition again, you can see that selection is defined for two or multigroup designs. It is not relevant to the internal validity of the single group design. As an example, assume that you select two classes for your study on reducing racial stereotyping. You use two fifth grade classes as your groups. One group will get your treatment and the other will act as a control. The problem is that these two groups of students may differ on variables other thanyour treatment variable and any differences found at the posttest may be due to these "differential selection" differences rather than being due to your treatment.
Page 70 of 179
The eight threat to internal validity is called differential attrition (it is also sometimes called mortality). Attrition simply refers to participants dropping out of your research study. Differential attrition is the differential loss of participants from the various comparison groups. Just like the last threat, differential attrition is a problem for two or multigroup design but not for the single group design. (Notice the word differential in the definition.) For example, assume again that you are doing a study on racial stereotyping. Do you see how your result would be compromised if the kind of children that are most likely to have racial stereotypes drop out of one of your groups but not the other group? Obviously, the difference observed at the post test may now be the result of differential attrition. The ninth threat to internal validity is actually a set of threats. This set is called additive and interactive effects. Additive and interactive effects refers to the fact that the threats to validity can combine to produce a bias in the study which threatens our ability to conclude that the independent variable is the cause of differences between groups on the dependent variable. They only apply to two or multigroup designs; they do not apply to the onegroup design. These threats occur when the different comparison groups are affected differently (or differentially) by one of the earlier threats to internal validity (i.e., history, maturation, testing, instrumentation, or statistical regression). A selection-history effect occurs when an event occurring between the pretest and posttest differentially affects the different comparison groups. You can think of this as what could be called a differential history effect. A selection-maturation effect occurs if the groups mature at different rates. For example, first grade students may tend to naturally change in reading ability during the school year more than third grade students. Hence, part of any observed differences in the reading ability of the two groups at the posttest may be due to maturation. You can think of this a what could be called a differential maturation effect. You now should be able to construct similar examples demonstrating the following: Selection-testing effect (where testing affects the groups differently) Selection-instrumentation effect (where instrumentation occurs differentially) Selection-regression artifacts effect (where regression to the mean occurs differentially). Remember that the key for the selection-effects is that the groups must be affected differently by the particular threat to internal validity. External Validity External validity has to do with the degree to which the results of a study can be generalized to and across populations of persons, settings, times, outcomes, and treatment variations. A good synonym for external validity is generalizing validity because it always has to do with how well you can generalize research results.
Page 71 of 179
The major types of external validity are population validity, ecological validity, temporal validity, temporal validity, treatment variation validity, and outcome validity. I will discuss each of these now...
Population Validity The first type of external validity is called population validity. Population validity is the ability to generalize the study results to individuals who were not included in the study. The issues are how well you can generalize your sample results to a population, and how well you can generalize your sample results across the different kinds if people in the larger population. Generalizing from a sample to a population can be provided through random selection techniques (i.e., a good sample lets you generalize to a population, as you learned in the earlier chapter on sampling). Generalizing across populations is present when the result (e.g., the effectiveness of a particular teaching technique) works across many different kinds of people (it works for many sub populations). This is the issue of "how widely does the finding apply?" If the finding applied to every single individual in the population then it would have full population validity. Research results that apply broadly are welcome to practitioners because it makes their jobs easier. Both of these two kinds of population validity are important; however, some methodologists (such as Cook and Campbell) are more concerned about generalizing across populations. That is, they want to know how widely a finding applies. Ecological Validity Ecological validity is present to the degree that a result generalizes across different settings. For example, let's say that you find that a new teaching technique works in urban schools. You might also want to know if the same technique works in rural schools and suburban schools. That is, you would want to know if the technique works across different settings. Reactivity is a threat to ecological validity. Reactivity is defined as an alteration in performance that occurs as a result of being aware of participating in a study. In other words, reactivity occurs sometimes because research study participants might change their performance because they know they are being observed. Reactivity is a problem of ecological validity because the results might only generalize to other people who are also being observed. A good metaphor for reactivity comes from television. Once you know that the camera is turned on to YOU, you might shift into your television behavior. This can also happen in research studies with human participants who know that they are being observed. Another threat to ecological validity (not mentioned in the chapter) is called experimenter effects. This threat occurs when participants alter their performance because of some unintentional behavior or characteristics of the researcher. Researchers should be aware of this problem and do their best to prevent it from happening.
Page 72 of 179
Temporal Validity Temporal validity is the extent to which the study results can be generalized across time. For example, assume you find that a certain discipline technique works well with many different kinds of children and in many different settings. After many years, you might note that it is not working any more; You will need to conduct additional research to make sure that the technique is robust over time, and if not to figure out why and to find out what works better. Likewise, findings from far in the past often need to be replicated to make sure that they still work. Treatment Variation Validity Treatment variation validity is the degree to which one can generalize the results of the study across variations of the treatment. For example, if the treatment is varied a little, will the results be similar? One reason this is important is because when an intervention is administered by practitioners in the field, it is unlikely that the intervention will be administered exactly as it was by the original researchers. This is, by the way, one reason that interventions that have been shown to work end up failing when they are broadly applied in the field. Outcome Validity Outcome validity is the degree to which one can generalize the results of a study across different but related dependent variables. For example, if a study shows a positive effect on self-esteem, will it also show a positive effect on the related construct of self-efficacy? A good way to understand the outcome validity of your research study is to include several outcome measures so that you can get a more complete picture of the overall effect of the treatment or intervention. Here is a brief summary of external validity: Population validity = generalizing to and across populations. Ecological validity = generalizing across settings. Temporal validity = generalizing across time. Treatment variation validity = generalizing across variations of the treatment. Outcome validity = generalizing across related dependent variables. As you can see, all of the forms of external validity concern the degree to which you can make generalizations.
Construct Representation Educational researchers must measure or represent many different constructs (e.g., intelligence, ADHD, types of on-line instruction, academic achievement). The problem is that, usually, there is no single behavior or operation available that can provide a complete and perfect representation of the construct.
Page 73 of 179
The researcher should always clearly specify (in the research report) the way the construct was represented so that a reader of the report canunderstand what was done and be able to evaluate the quality of the measure(s). Operationalism refers to the process of representing a construct by a specific set of operations or measures. For example, you might choose to represent (or "operationalize") the construct of selfesteem by using the ten item Rosenberg Self-Esteem Scale shown on page 165, and shown here for your convenience.
Why do you think Rosenberg used 10 items to represent self-esteem? The reason is because it would be very hard to tap into this construct with a single item.
Page 74 of 179
Rosenberg used what is called multiple operationalism (i.e., the use of several measures to represent a construct). Think about it like this: Would you want to use a single item to measure intelligence (e.g., how do you spell the word "restaurant")? No! You might even decide to use more than one test of intelligence to tap into the different dimensions of intelligence. Whenever you read a research report, be sure to check out how they represent their constructs. Then you can evaluate the quality of their representations or "operationalizations."
Research Validity in Qualitative Research Now we shift our attention to qualitative research! If you need a review of qualitative research, review pages 45-48 in Chapter 2 for a quick overview. Also look at the qualitative research article in Appendix B titled "You Dont Have to Be Sighted to Be a Scientist, Do You? Issues and Outcomes in Science Education. One potential threat to watch out for is researcher bias (i.e., searching out and finding or confirming only what you want or expect to find). Two strategies for reducing researcher bias are reflexivity (constantly thinking about your potential biases and how you can minimize their effects) andnegative-case sampling (attempting to locate and examine cases that disconfirm your expectations). Now I will briefly discuss the major types of validity in qualitative research, and I will list some very important and effective strategies that can be used to help you obtain high qualitative research validity or trustworthiness. Descriptive validity Descriptive validity is present to the degree that the account reported by the researcher is accurate and factual. One very useful strategy for obtaining descriptive validity is investigator triangulation (i.e., the use of multiple investigators to collect and interpret the data). When you have agreement among the investigators about the descriptive details of the account, readers can place more faith in that account. Interpretive validity Interpretive validity is present to the degree that the researcher accurately portrays the meanings given by the participants to what is being studied. Your goal here is to "get into the heads" of your participants and accurately document their viewpoints and meanings. One useful strategy for obtaining interpretive validity is by obtaining participant feedback or member checking (i.e., discussing your findings with your participants to see if they agree and making modifications so that you represent their meanings and ways of thinking). Another useful strategy is to use of low-inference descriptors in your report (i.e., description phrased very close to the participants' accounts and the researcher's field notes).
Page 75 of 179
Theoretical validity Theoretical validity is present to the degree that a theoretical explanation provided by the researcher fits the data. I listed four helpful strategies for this type of validity. The first strategy is extended fieldwork (collecting data in the field over an extended period of time). The second is theory triangulation (using multiple theories and perspectives to help you interpret the data). The third is pattern matching (making unique or complex predictions and seeing if they occur; this is, did the fingerprint that you predicted actually occur?). The fourth strategy is peer review (discussing your interpretations and conclusions with your peers or colleagues who are not as deep into the study as you are). Internal validity Internal validity is the same as it was for quantitative research. It is the degree to which a researcher is justified in concluding that an observed relationship is causal. It also refers to whether you can conclude that one event caused another event. The issue of causal validity is important if the qualitative researcher is interested in making any tentative statements about cause and effect. I have listed three strategies to use if you are interested in cause and effect in qualitative research. The first strategy is called researcher-as-detective (carefully thinking about cause and effect and examining each possible "clue" and then drawing a conclusion). The second is called methods triangulation (using multiple methods, such as interviews, questionnaires, and observations in investigating an issue) The third strategy is called data triangulation (using multiple data sources, such as interviews with different types of people or using observations in different settings). You do not want to limit yourself to a single data source. External validity External validity is pretty much the same as it was for quantitative research. That is, it is still the degree to which you can generalize your results to other people, settings, and times. Note that generalizing has traditionally not a priority of qualitative researchers. However, in many research areas today, it is becoming an important goal. One form of generalizing in qualitative research is called naturalistic generalization (generalizing based on similarity). When you make a naturalistic generalization, you look at your students or clients and generalize to the degree that they are similar to the students or clients in the qualitative research study you are reading. In other words, the reader of the report is making the generalizations rather than the researchers who produced the report. Qualitative researchers should provide the details necessary so that readers will be in the position to make naturalistic generalizations. Another way to generalize qualitative research findings is through replication. This is where you are able to generalize when a research result has been shown with different sets of people, at different times, and in different settings.
Page 76 of 179
Yet another style of generalizing is theoretical generalizations (generalizing the theory that is based on a qualitative study, such as a grounded theory research study. Even if the particulars do not generalize, the main ideas and the process observed might generalize.
Here is a summary of the strategies used in qualitative research. (Note: they are also used in mixed research and can be used creatively even in quantitative research.)
The bottom line of this chapter is this: You should always try to evaluate the research validity of empirical studies before trusting their conclusions. And, if you are conducting research you must use validity strategies if your research is going to be trustworthy and defensible.
Page 77 of 179
Chapter 9 Experimental Research (Reminder: Dont forget to utilize the concept maps and study questions as you study this and the other chapters.) In this chapter we talk about what experiments are, we talk about how to control for extraneous variables, and we talk about two sets of experimental designs (weak designs and strong designs). (Note: In the next chapter we will talk about middle of the road experimental designs; they are better than the weak designs discussed in this chapter, and they are not as good as the strong designs discussed in this chapter. The middle of the road, or medium quality designs are called quasi-experimental designs.) It is important for you to remember that whenever an experimental research study is conducted the researcher's interest is always in determining cause and effect. The causal variable is the independent variable (IV) and the effect or outcome variable is the dependent variable (DV). Experimental research allows us to identify causal relationships because we observe the result of systematically changing one or more variables under controlled conditions. This process is called manipulation.
The Experiment Here is our definition of an experiment: The experiment is a situation in which a researcher objectively observes phenomena which are made to occur in a strictly controlled situation where one or more variables are varied and the others are kept constant. This means that we observe a person's response to a set of conditions that the experimenter presents. The observations are made in an environment in which all conditions other than the ones the researcher presents are kept constant or controlled. The conditions which the researcher presents are systematically varied to see if a person's responses change with the variation in these conditions.
Independent Variable Manipulation The independent variable is the variable that is assumed to be the cause of the effect. It is the variable that the researcher varies or manipulates in a specific way in order to learn its impact on the outcome variable. Ways of Manipulating the Independent Variable In Figure 9.1 (on page 266) you can see three different ways to manipulate the independent variable. Here is that figure reproduced for your convenience:
Page 78 of 179
First, the independent variable can be manipulated by presenting a condition or treatment to one group of individuals and withholding the condition or treatment from another group of individuals. This is the presence or absence technique. Second, the independent variable can be manipulated by varying the amount of a condition or variable such as varying the amount of a drug which is given to children with a learning disorder. This is the amount technique. A third way of manipulating the independent variable is to vary the type of the condition or treatment administered. One type of drug may be administered to one group of learning disabled children and another type of drug may be administered to another group of learning disabled children. This is the type technique.
Control of Confounding Variables
Page 79 of 179
Potential confounding variables can be controlled for by using of one or more of a variety of techniques that eliminate the differential influence an extraneous variable may have for the comparison groups in a research study. Differential influence occurs when the influence of an extraneous variable is different for the various comparison groups. For example, if one group is mostly females and the other group is mostly males, then the gender may have a differentially effect on the outcome. As a result, you will not know whether the outcome is due to the treatment or due to the effect of gender. If the comparison groups are the same on all extraneous variables at the start of the experiment, then differential influence is unlikely to occur. In experiments, we want our groups to be the same (or equivalent on all potentially confounding extraneous variables). The control techniques are essentially attempts to make the groups similar or equivalent. Remember this important point: You want all of your comparison groups to be similar to each other (on all characteristics or variables) at the start of an experiment. Then, after manipulating the independent variable you will be better able to attribute the difference observed at the posttest to the independent variable because one group got a treatment and the other group did not. You want the only systematic difference between the groups in an experiment to be the variation of the independent variable. You want the groups to be the same on all other variables (i.e., the same on extraneous or confounding variables). Now we will discuss these six techniques that are used to control for confounding variables: random assignment, matching, holding the extraneous variable constant, building the extraneous variable into the research design, counterbalancing, and analysis of covariance. Random Assignment Random assignment is the most important technique that can be used to control confounding variables because it has the ability to control for both known and unknown confounding extraneous variables. Because of this characteristic, you should randomly assign whenever and wherever possible. Random assignment makes the groups similar on all variables at the start of the experiment. If random assignment is successful, the groups will be mirror images of each other.
You must be careful not to confuse random assignment with random selection! The two techniques differ in purpose. (Note: I strongly recommend that you re-read the section titled Random Selection and Ransom Assignment on pages 216-217; it is only three paragraphs long, but will help you with this very important distinction!) The purpose of random selection is to generate a sample that represents a larger population. This topic was covered in our earlier chapter on Sampling (Chapter 7). The purpose of random assignment is to take a sample (usually a convenience sample) and use the process of randomization to divide it into two or more groups that
Page 80 of 179
represent each other. That is, you use random assignment to create probabilistically equivalent groups. Note that random selection (randomly selecting a sample from a population) helps ensure external validity, and random assignment (randomly dividing a set of people into multiple groups) helps ensure internal validity. Because the primary goal is experimental research is to establish firm evidence of cause and effect, random assignment is more important than random selection in experimental research. It that is counterintuitive to you, then please reread it as many times as is necessary. Random assignment controls for the problem of differential influence (that was discussed earlier). It does they by insuring that each participant has an equal chance of being assigned to each comparison group. In other words, random assignment eliminates the problem of differential influence by making the groups similar on all extraneous variables. The equal probability of assignment means that not only are participants equally likely to be assigned to each comparison group but that the characteristics they bring with them are also equally likely to be assigned to each comparison group. This means that the research participants and their characteristics should be distributed approximately equally in all comparison groups! Again, random assignment is the best way to create equivalent groups for use in experimental research. Here is one way to carry out random assignment that we included in the first edition of our textbook:
Page 81 of 179
Page 82 of 179
Another way to conduct random assignment is to assign each person in your sample a number and then use a random assignment computer program. Here is one: http://www.graphpad.com/quickcalcs/randomize1.cfm
Matching Matching controls for confounding extraneous variables by equating the comparison groups on one or more variables that are correlated with the dependent variable. What you have to do is to decide what extraneous variables you want to match on (i.e.., decide what specific variables you want to make your groups similar on). These variables that you decide to use are called the matching variables. Matching controls for the matching variables. That is, it eliminates any differential influence of the matching variables. You can match your groups on one or more extraneous variables. For example, lets say that you decide to equate your two groups (treatment and control group) on IQ. That is, IQ is going to be your only matching variable. What you would do is to rank order all of the participants on IQ. Then select the first two (i.e., the two people with the two highest IQs) and put one in the experimental treatment group and the other in the control group (The best way to do this is to use random assignment to make these assignments. If you do this then you have actually merged two control techniques: matching and random assignment). Then take the next two highest IQ participants and assign one to the experimental group and one to the control group. Then just continue this process until you assign one of the lowest IQ participants to one group and the other lowest IQ participant to the other group. Once you have completed this, your two groups will be matched on IQ! If you use matching without random assignment, you run into the problem that although you know that your groups are matched on IQ you have not matched them on other potentially important variables. A weakness of matching when it is used alone (i.e., without also using random assignment) is that you will know that the groups are equated on the matching variable(s) but you will not know whether the groups are similar on other potentially confounding variables.
Page 83 of 179
Holding the Extraneous Variable Constant This technique controls for confounding extraneous variables by insuring that the participants in the different treatment groups have the same amount or type on a variable. For example, you might use only people who have an IQ of 120-125 in your research study if you are worried about IQ as being a confounding variable. If you are worried about gender, this if you used this technique you would either study females only or males only, but not both. A problem with this technique it that it can seriously limit your ability to generalize your study results (because you have limited your participants to only one type). Building the Extraneous Variable into the Research Design This technique takes a confounding extraneous variable and makes it an additional independent variable in your research study. For example, you might decide to include females and males in your research study. This technique is especially useful when you want to study any effect that the potentially confounding extraneous variable might have (i.e., you will be able to study the effect of your original independent variable as well as the additional variable(s) that you built into your design. Counterbalancing Counterbalancing is a technique used to control for sequencing effects (the two sequencing effects are order effects and carry-over effects). Note that this technique is only relevant for a design in which the participants receive more than one treatment condition (e.g., such as the repeated measures design that is discussed later in the chapter) Sequencing effects are biasing effects that can occur when each participant must participate in each experimental treatment condition.. Order effects are sequencing effects that arise from the order in which the treatments are administered. For example, as people complete their participation in their first treatment condition they will become more familiar with the setting and testing process. When these people participate, later, in their second treatment condition, they may perform better simply because are now familiar with the setting and testing that they acquired earlier. This is how the order can have an effect on the outcome. Order effects that need to be controlled. Carry-over effects are sequencing effects that occur when the effect of one treatment condition carries over to a second treatment condition. That is, participants performance in a later treatment is different because of the treatment that occurred prior to it. When this occurs the responses in subsequent treatment conditions are a function of the present treatment condition as well as any lingering effect of the prior treatment condition. Learning from the earlier treatment might carry-over to later treatments. Physical conditions caused by the earlier treatment might also carry-over if the time elapsing between the treatments is not long enough for the earlier effect to dissipate. Here is the good news! Counterbalancing is a control technique that can be used to control for order effects and carry-over effects.
Page 84 of 179
You counterbalance by administering each experimental treatment condition to all groups of participants, but you do it in different orders for different groups of people. For example if you just had two groups making up your independent variable you could counterbalance by dividing you sample into two groups and giving this order to the first group (treatment one followed by treatment two) and giving this order to the second group (treatment two followed by treatment one).
Analysis of Covariance Analysis of covariance (ANCOVA) is a statistical control technique that is used to statistically equate groups that differ on a pretest or some other variable. For example, in multigroup designs that have a pretest, ANCOVA is used to equate the groups on the pretest. As another example, in a learning research study you might want to control for intelligence because if there are more brighter students in one of two comparison groups (and these students are expected to learn faster) then the difference between the groups might be because the groups differ on IQ rather than the treatment variable; therefore, you would want to control for intelligence. Analysis of covariance statistically adjusts the dependent variable scores for the differences that exist on an extraneous variable (your control variable). When selecting variables to control for, note that the only relevant extraneous variables are those that also affect participants' responses to the dependent variable. Experimental Research Designs A research design is the outline, plan, or strategy that you are going to use to obtain an answer to your research question. Research designs can be weak or strong (or quasi which are moderately strong; that is, in between the weak and the strong designs) depending on the extent to which they control for the influence of confounding variables. Weak Experimental Research Designs Some research designs are considered weak because they do not control for the influence of many confounding variables.
The one-group posttest-only design is a very weak research design where one group of research participants receives an experimental treatment and is then post tested on the dependent variable. A serious problem with this design is that you do not know whether the treatment condition had any effect on the participants because you have no idea as to what their response would be if they were not exposed to the treatment condition. That is, you dont have a pretest or a control group to make your comparison with. Another problem with this design is that you do not know if some confounding extraneous variable affected the participants' responses to the dependent variable.
Page 85 of 179
Because of the problems with this design it generally gives little evidence as to the effect of the treatment condition. The next design is the one-group pretest-posttest design. Here is a depiction of it:
The one-group pretest-posttest design is a research design where one group of participants is pretested on the dependent variable and then posttested after the treatment condition has been administered. This is a better design than the one-group posttest-only design because it at least includes a pretest, that indicates how the participants did prior to administration of the treatment condition. In this design, the effect is taken to be the difference between the pretest and posttest scores. It does not control for potentially confounding extraneous variables such as history, maturation, testing, instrumentation, and regression artifacts, so it is still difficult to identify the effect of the treatment condition.
The next of the weak experimental research designs is the posttest-only design with nonequivalent groups. Here is its depiction:
The posttest-only design with nonequivalent group includes an experimental group that receives the treatment condition and a control group that does not receive the treatment condition or receives some standard condition and both groups are posttested on the dependent variable. While this design includes a control group (which gives something to compare the treatment group with), the participants are not randomly assigned to the groups so there is little assurance that the two groups are equated on any potentially confounding variables prior to the administration of the treatment condition. Because the participants were not randomly assigned to the comparison groups, this design does not control for differential selection, differential attrition, and the various additive and interaction effects
Page 86 of 179
For a summary of the threats to validity for the weak experimental designs, you should study Table 9.1 on page 277.
Strong Experimental Research Designs A research design is considered to be a "strong research design" if it controls for the influence of confounding extraneous variables. This is typically accomplished by including one or more control techniques into the research design. The most important of these control techniques is random assignment. In addition to including control techniques, strong research designs include a control group which is the comparison group that either does not receive the experimental treatment condition or receives some standard treatment condition. I will briefly discuss these strong designs: the pretest-posttest control-group design, the posttest-only control-group design, the factorial design, the repeated measures design, and the factorial design based on a mixed model. (For a summary of all of these, look at and study Table 9.2 on page 281.) The first strong experimental design is the pretest-posttest control-group design. Here is a picture of it in its basic form:
The pretest-posttest control-group design is a strong research design in which a group of research participants is randomly assigned to an experimental and control group. Both groups of participants are pre tested on the dependent variable and then post tested after the experimental treatment condition has been administered to the experimental group. This is an excellent research design because it includes a control or comparison group and has random assignment. This design controls for all of the standard threats to internal validity. Differential attrition may or may not be a problem depending on what happens during the conduct of the experiment. Note that while this design is often presented as a two group design, it can be expanded to include a control group and as many experimental groups as are needed to test your research question.
Page 87 of 179
The next strong experimental research design is the posttest-only control group design. Here is a picture of it:
The posttest-only control group design is a research design in which the research participants are randomly assigned to an experimental and control group and then post tested on the dependent variable after the experimental group has received the experimental treatment condition. This is an excellent research design because it includes a control or comparison group and has random assignment. Just like the previous design, it controls for all of the standard threats to internal validity. Differential attrition may or may not be a problem depending on what happens during the conduct of the experiment. This design does not include a pretest of the dependent variable, but this does not detract from its internal validity because it includes the control group and random assignment which means that the experimental and control groups are equated at the outset of the experiment. The next strong experimental research design is the factorial design. For a depiction of this design, please go to page 281 and look at it in Table 9.2. The layout for a factorial design with two independent variables (Type of instruction and level of anxiety) is shown in Figure 9.14 (p.287) and here for your convenience.
Page 88 of 179
A factorial design is a design in which two or more independent variables are simultaneously investigated to determine the independent and interactive influence which they have on the dependent variable. It also has random assignment to the groups. Each combination of independent variables is called a "cell." Research participants are randomly assigned to as many groups are there are cells of the factorial design if both of the independent variables can be manipulated. The research participants are administered the combination of independent variables that corresponds to the cell to which they have been assigned and then they respond to the dependent variable. The data collected from this research give information on the effect of each independent variable separately and the interaction between the independent variables. The effect of each independent variable on the dependent variable is called a main effect. There are as many main effects in a factorial design as there are independent variables. If a research design included the independent variables of gender and type of instruction, then there would potentially be two main effects, one for gender and one for type of instruction. An interaction effect between two or more independent variables occurs when the effect which one independent variable has on the dependent variable depends on the level of the other independent variable. For example, if gender is one independent variable and method of teaching mathematics is another independent variable, an interaction would exist if the lecture method was more effective for teaching males mathematics and individualized instruction was more effective in teaching females mathematics.
Page 89 of 179
The next strong experimental research design is the repeated-measures design. Here is a picture of it in its basic form with counterbalancing:
A repeated-measures design is a design in which all research participants receive all experimental treatment conditions. For example, if you were investigating the effect of type of instruction on learning mathematics and you used two types of instruction (lecture method and individualized instruction) the participants would experience both types of instruction, first one and then the other. This design has the advantage of requiring fewer participants than other designs because the same participants participate in all experimental conditions. This design also has the advantage of the participants in the various experimental groups being equated because they are the same participants in all of the treatment conditions. If you use counterbalancing with this design, then all of the standard threats to internal validity are controlled for. Differential attrition may or may not be a problem depending on what happens during the conduct of the experiment.
The last strong experimental research design discussed in this chapter is the factorial design based on a mixed model. Here is a picture of this design when it has two independent variables:
The factorial design based on a mixed model is a factorial design in which different participants are randomly assigned to the different levels of one independent variable but all participants take all levels of another independent variable.
Page 90 of 179
In the depiction above, participants are randomly assigned to variable B, and all participants receive all levels of variable A. All of the standard threats to internal validity are controlled for with this design if conuterbalancing is used for the repeated measures independent variable. Differential attrition may or may not be a problem depending on what happens during the conduct of the experiment.
As you study the designs in this chapter, two tables will be of maximum help. Table 9.1 on page 277 shows the depictions of all of the weak experimental research designs and the threats to internal validity for each of these designs. Table 9.2 on page 281 shows the depictions of all of the strong experimental research designs and the threats to internal validity for each of these designs. Here are copies of these two tables for your convenience.
Page 91 of 179
Page 92 of 179
Page 93 of 179
Chapter 10 Quasi-Experimental and Single-Case Designs (Reminder: Dont forget to utilize the concept maps and study questions as you study this and the other chapters.) The experimental research designs discussed in this chapter are used when it is impossible to randomly assign participants to comparison groups (quasi-experimental designs) and when a researcher is faced with a situation where only one or two participants can participate in the research study (single case designs). Like the designs in the last chapter, quasi-experimental and single-case designs do have manipulation of the independent variable (otherwise they would not be experimental research designs).
Quasi-Experimental Research Designs These are designs that are used when it is not possible to control for all potentially confounding variables; in most cases this is because the participants cannot be randomly assigned to the groups. Causal explanations can be made when using quasi-experimental designs but only when you collect data that demonstrate that plausible rival explanations are unlikely, and the evidence will still not be as strong as with one of the strong designs discussed in the last chapter. You can view quasi-experiments as falling in the center of a continuum with weak experimental designs on the far left side and strong experimental designs on the far right side. (In other words, quasi designs are not the worst and they are not the best. They are in-between or moderately strong designs.)
/------------------------------------/------------------------------------/ Weak Quasi Strong Designs Designs Designs
Three quasi-experimental research designs are presented in the text: the nonequivalent comparison-group design, the interrupted time-series design, and the regression discontinuity design.
Nonequivalent Comparison-Group Design This is a design that contains a treatment group and a nonequivalent untreated comparison group about of which are administered pretest and posttest measures. The groups are nonequivalent because you lack random assignment (although there are some control techniques that can help make the groups similar such as matching and statistical control). Because of the lack of random assignment, there is no assurance that the groups are highly are
Page 94 of 179
similar at the outset of the study. Here is a depiction of the nonequivalent comparison-group design:
Because there is no random assignment to groups, confounding variables (rather than the independent variable) may explain any difference observed between the experimental and control groups. The most common threat to the internal validity of this type of design is differential selection. The problem is that the groups may be different on many variables that are also related to the dependent variable (e.g., age, gender, IQ, reading ability, attitude, etc.). Here is a list of all of the primary threats to this design.
It is a good idea to collect data that can be used to demonstrate that key confounding variables are not the cause of the obtained results. Hence, you will need to think about potential rival explanations during the planning phase of your research study so that you can collect the necessary data to control for these factors. You can eliminate the influence of many confounding variables by using the various control techniques, especially statistical control (where you measure the confounding variables at the pretest and control for them using statistical procedures after the study
Page 95 of 179
has been completed) and matching (where you select people to be in the groups so that the members in the different groups are similar on the matching variables). Only when you can rule out the effects of confounding variables can you confidently attribute the observed group difference at the posttest to the independent variable.
Interrupted Time-Series Design This is a design in which a treatment condition is accessed by comparing the pattern of pretest responses with the pattern of posttest responses obtained from a single group of participants. In other words, the participants are pretested a number of times and then posttested a number of times after or during exposure to the treatment condition. Here is a depiction of the interrupted time-series design:
The pretesting phase is called the baseline which refers to the observation of a behavior prior to the presentation of any treatment designed to alter the behavior of interest. A treatment effect is demonstrated only if the pattern of posttreatment responses differs from the pattern of pretreatment responses. That is, the treatment effect is demonstrated by a discontinuity in the pattern of pretreatment and posttreatment responses. For example, an effect is demonstrated when there is a change in the level and/or slope of the posttreatment responses as compared to the pretreatment responses. Here is an example where both the level and slope changed during the intervention:
Page 96 of 179
Many confounding variables are ruled out in the interrupted time-series design because they are present in both the pretreatment and posttreatment responses (i.e., the pretreatment and posttreatment responses will not differ on most confounding variables). However, the main potentially confounding variable that cannot be ruled out is a history effect. The history threat is a plausible rival explanation if some event other than the treatment co-occurs with the onset of the treatment.
Bonus material (not required) Although not discussed in the text, there is an extension of the interrupted time-series design. It is called the multiple time-series designit is the basic interrupted time-series design with a comparable control group added to it. I mention this design because I do want you to remember that YOU can put together different designs simply by using different combinations of pretests, posttests, different types of groups, varying the number of pretests and posttests, using a control group or not, including more than one outcome variable, and so forth. Both the experimental and control groups are repeatedly pretested in the multiple timeseries design. Then the experimental group receives the treatment and the control group receives some standard treatment or no treatment, and, finally, both groups are repeatedly posttested. Here is a picture of the multiple time-series design:
Page 97 of 179
Including a control group provides control for the history effect, but only if the different groups are truly comparable and any history effect influences both groups to the same degree (i.e., as long as you don't have a selection-history effect). The various additive and interactive effects remain as potential threats to this design.
Regression Discontinuity Design This is a design that is used to access the effect of a treatment condition by looking for a discontinuity in regression lines between individuals who score lower and higher than some predetermined cutoff score on an assignment variable.
Here is the depiction of the design:
For example you might use a standardized test as your assignment variable, set the cutoff at 50, and administer the treatment to those falling at 50 or higher and use those with scores lower that 50 as your control group. This is actually quite a strong design, and methodologists have, for a number of years, been trying to get researcher to use this design more frequently. One uses statistical techniques to control for differences on the assignment variable and then checks to see whether the groups significantly differ.
Here is an example where a difference or discontinuity is easily seen:
Page 98 of 179
If you cannot assign the participants to the treatment condition based on their assignment variable scores, you will not be able to use this design. On the other hand, if you can do this, then this is an excellent design.
Single-Case Experimental Designs These are designs where the researcher attempts to demonstrate an experimental treatment effect using single participants, one at a time. We discuss several single-case designs: A-B-A design, A-B-A-B design, multiplebaseline design, and the changing-criterion design. A-B-A and A-B-A-B Designs The A-B-A design is a design in which the participant is repeatedly pretested (the first A phase or baseline condition), then the experimental treatment condition is administered and the participant is repeatedly posttested (the B phase or treatment phase). Following the posttesting stage, the pretreatment conditions are reinstated and the participant is again repeatedly tested on the dependent variable (the second A phase or the return to baseline condition).
Here is a depiction of the A-B-A design:
Page 99 of 179
The effect of the experimental treatment is demonstrated if the pattern of the pre- and posttreatment responses ( the first A phase and the B phase) differand the pattern of responses reverts back to the original pretreatment level when the pretreatment conditions are reinstated (the second A or return to baseline phase). Including the second A phase controls for the potential rival hypothesis of history that is a problem in a basic time series design (i.e., in an A-B design). Basically, you are looking for the "fingerprint" of a stable baseline (during the first A phase), then a clear jump or change in level or slope (during the B phase), and then a clear reversal or return to the stable baseline (during the second A phase). For example, if you hope for low values on your dependent measure (e.g., talking out behavior), you would hope to see a high-low-high pattern. Conversely, if you hope for high values on your dependent measure (e.g., attending to what the teacher says), you would hope to see a low-high- low pattern. One limitation of the A-B-A design is that it ends with baseline condition or the withdrawal of the treatment condition so the participant does not receive the benefit of the treatment condition at the end of the experiment. This limitation can be overcome by including a fourth phase which adds a second administration of the treatment condition so the design becomes anA-B-A-B design. A limitation of both the A-B-A and the A-B-A-B designs is that they are dependent on the pattern of responses reverting to baseline conditions when the experimental treatment condition is withdrawn. This may not occur if the experimental treatment is so powerful that its effect continues even when the treatment is withdrawn. If a reversal to baseline conditions does not occur another design (such as the multiplebaseline design) must be used to demonstrate the effectiveness of the treatment condition.
Multiple-Baseline Design This is a design that investigates two or more people, behaviors, or settings to identify the effect of an experimental treatment. The key is that the treatment condition is successively administered to the different people, behaviors, or settings. Here is a depiction of the design:
Page 100 of 179
The multiple-baseline design requires that baseline behavior is collected on the several people, behaviors, or settings and then the experimental treatment is successively administered to the people, behaviors, or settings. The experimental treatment effect is demonstrated if a change in response occurs when the treatment is administered to each person, behavior, or setting (i.e., when the fingerprint you are looking for is observed). Here is an example where a treatment fingerprint is easily seen:
Page 101 of 179
Rival hypotheses are unlikely to account for the changes in the behavior if the behavior change only occurs after the treatment effect is administered to each successive person, behavior, or setting. This design avoids the problem of failure to revert to baseline that can exist with the AB-A and A-B-A-B designs.
Changing-Criterion Design
Page 102 of 179
This is a single-case design that is used when a behavior needs to be shaped over time or when it is necessary to gradually change a behavior through successive treatment periods to reach a desired criterion. This design involves collecting baseline data on the target behavior and then administering the experimental treatment condition across a series of intervention phases where each intervention phase uses a different criterion of successful performance until the desired criterion is reached. The criterion used in each successive intervention phase should be large enough to detect a change in behavior but small enough so that it can be achieved. Here is an example this design.
Methodological Considerations in Using Single-Case Designs The following table presents some major methodological issues you must consider when using single-case designs.
Page 103 of 179
Page 104 of 179
Chapter 11 No experimental Quantitative Research (Reminder: Dont forget to utilize the concept maps and study questions as you study this and the other chapters.) Nonexperimental research is needed because there are many independent variables that we cannot manipulate for one reason or the other (e.g., for ethical reasons, for practical reasons, and for literal reasons such as it is impossible to manipulate some variables). Heres an example of an experiment where you could not manipulate the independent variable (smoking) for ethical and practical reasons: Randomly assign 500 newborns to experimental and control groups (250 in each group), where the experimental group newborns must smoke cigarettes and the controls do not smoke. Nonexperimental research is research that lacks manipulation of the independent variable by the researcher; the researcher studies what naturally occurs or has already occurred; and the researcher studies how variables are related. Despite its limitations for studying cause and effect (compared to strong experimental research), nonexperimental research is very important in education. Steps in Nonexperimental Research The pretty much the same as they were in experimental research; however, there are some new considerations to think about if you want to be able to make any cause and effect claims at all (i.e., that an IV--->DV). 1. Determine the research problem and hypotheses to be tested. Note: it is important to have or develop a theory to test in nonexperimental research if you are interested in making any claims of cause and effect. This can include identifying mediating and moderating variables (see Table 2.2 on page 36 for definitions of these two terms). 2. Select the variables to be used in the study. Note: in nonexperimental research you will need to include some control variables (i.e., variables in addition to your IV and DV that measure key extraneous variables). This will help you to help rule out some alternative explanations. 3. Collect the data. Note: longitudinal data (i.e., collection of data at more than one time point) is helpful in nonexperimental research to establish the time ordering of your IV and DV if you are interested in cause and effect. 4. Analyze the data. Note: statistical control techniques will be needed because of the problem of alternative explanations in nonexperimental research. 5. Interpret the results. Note: conclusions of cause and effect will be much weaker in nonexperimental research as compared to strong experimental and quasi-experimental research because the researcher cannot manipulate the independent variable in nonexperimental research.
Page 105 of 179
When examining or conducting nonexperimental research, it is important to watch out for the post hoc fallacy (i.e., arguing, after the fact, that A must have caused B simply because you have observed in the past that A preceded B). By the way, post hoc or inductive reasoning is fine (i.e., looking at your data and developing ideas to examine in future research), but you must always watch out for the fallacy just mentioned and you must remember to empirically test any hypotheses that you develop after the fact so that you can check to see whether your hypothesis holds true with new data. In other words, after generating a hypothesis, you must test it. (This last point goes back to Figure 1.1 on page 18 showing the research wheel.) Independent Variables in Nonexperimental Research This includes variables that cannot be manipulated, should not be manipulated, or were not manipulated. Here are some examples of categorical independent variables (IVs) that cannot be manipulatedgender, parenting style, learning style, ethnicity, retention in grade, personality type, drug use. Here are some examples of quantitative IVs that cannot be manipulatedintelligence, age, GPA, any personality trait that is operationalized as a quantitative variable (e.g., level of self-esteem). It is generally recommended that researchers should not turn quantitative independent variables into categorical variables.
Simple Cases of Causal-Comparative and Correlational Research Although the terms causal-comparative research and correlational research are dated, it is still useful to think about the simple cases of these (i.e., studies with only two variables). There are four major points in this section: 1. In the simple case of causal-comparative research you have one categorical IV (e.g., gender) and one quantitative DV (e.g., performance on a math test). The researcher checks to see if the observed difference between the groups is statistically significant (i.e., not just due to chance) using a "t-test" or an "ANOVA" (these are statistical tests discussed in a later chapter; they tell you if the difference between the means is statistically significant; they are discussed in chapter 16). 2. In the simple case of correlational research you have one quantitative IV (e.g., level of motivation) and one quantitative DV (performance on math test). The researcher checks to see if the observed correlation is statistically significant (i.e., not due to chance) using the "t-test for correlation coefficients" (it tells you if the relationship is statistically significant; it is discussed in chapter 16). Remember that the commonly used correlation coefficient (i.e., the Pearson correlation) only detects linear relationships.
Page 106 of 179
3. It is essential that you remember this point: Both of the simple cases of nonexperimental research are seriously flawed if you are interested in concluding that an observed relationship is a causal relationship. That's because "observing a relationship between two variables is not sufficient grounds for concluding that the relationship is a causal relationship." (Remember this important point!) 4. You can improve on the simple cases by controlling for extraneous variables and designing longitudinal studies (discussed below). And once you move on to these improved nonexperimental designs, you should drop the correlational and causal-comparative terminology and, instead, talk about the design in terms of the research objective and the time dimension (which is discussed below, and summarized in Table 11.3) The Three Necessary Conditions for Cause-and-Effect Relationships It is essential that your remember that researchers must establish three conditions if they are to make a defensible conclusion that changes in variable A causechanges in variable B. Here are the conditions (which have been stated in previous chapters) in a summary table:
Applying the Three Necessary Conditions for Causation in Nonexperimental Research Nonexperimental research is much weaker than strong and quasi experimental research for making justified judgments about cause and effect. It is, however, quite easy to establish condition 1 in nonexperimental researchjust see if the variables are related For example, Are the variables correlated? or Is there a difference between the means?. It is much more difficult to establish conditions 2 and 3 (especially 3). When attempting to establish condition 2, researchers use logic and theory (e.g., we know that biological sex occurs before achievement on a math
Page 107 of 179
test) and design approaches that are covered later in this chapter (e.g., longitudinal research is a strong design for establishing proper time order). Condition 3 is a serious problem in nonexperimental research because it is always possible that an observed relationship is "spurious" (i.e., due to some confounding extraneous variable or "third variable"). When attempting to establish condition 3, researchers use logic and theory (e.g., make a list of extraneous variables that you want to measure in your research study), control techniques (such as statistical control and matching), and design approaches (such as using a longitudinal design rather than a crosssectional design). The rest of the chapter will be explaining these points. To get things started, you need to understand the idea of controlling for a variable. Here is an example: first, Did you know that there is a correlation between the number of fire trucks responding to a fire and the amount of fire damage? Obviously this is not a causal relationship (i.e., it is a spurious relationship). In Figure 11.2 below, you can see that after we control for the size of fire, the original positive correlation between the number of fire trucks responding and the amount of fire damage becomes a zero correlation (i.e., no relationship).
Page 108 of 179
Here is one more example of controlling for a variable: There is a relationship between gender and income in the United States. In particular, men earn more money than women. Perhaps this relationship would disappear if we controlled for the amount of education people had. What do you think? To test this alternative explanation (i.e., it is due not to gender but to education) you could examine the average income levels of makes and females ate each of the levels of education (i.e., to see if males and females who have equal amounts of education differ in income levels). If gender and income are still related (i.e., if men earn more money than women at each level of education) then you would conclude make this conclusion: After controlling for education, there is still a relationship between gender and income. And, by the way, that is exactly what happens if you examine the real data (actually the relationship becomes a little smaller but there is still a relationship). Can you think of any additional variables you would like to control for? That is, are there any other variables that you think will eliminate the relationship between gender and income?
Techniques of Control in Nonexperimental Research We discuss three ways to control for extraneous variables in nonexperimental research. 1. Matching. A "matching variable" is an extraneous variable you wish to control for (e.g., gender, income, intelligence) and you are going to use it in the technique called matching. If you have two groups (i.e., your IV is categorical), you could attempt to find someone like each person in group one on the matching variable and place these individuals into group two. In other words, you could in effect construct a control group. If your IV is a quantitative such as level of motivation and you want to see if motivation is related to test performance, you might decide to us GPA as your matching variable. To do this, you would have to find individuals with low, medium, and high GPAs at the different levels of motivation as shown in the following table. You could do this by finding people for each of the cells of the following table: Low Medium High Motivation Motivation Motivation Low GPA 15 people 15 people 15 people Medium GPA 15 people 15 people 15 people High GPA 15 people 15 people 15 people
Page 109 of 179
Technically speaking, matching makes your independent variable and the matching variable uncorrelated and unconfounded. What this means is that if you still see a relationship between your IV and your DV you can conclude that it is not because of the matching variable because you have controlled for that variable.
2. Holding the extraneous variable constant. If you use this strategy, you will include in your study participants that are all at the same constant level on the variable that you want to control for. For example, if you want to control for gender using this strategy, you would only include females in your research study (or you would only include males in your study). If there is still a relationship between your IV and DV (e.g., motivation and test grades) you will be able that the relationship is not due to gender because you have made it a constant (by only including one gender in your study). 3. Statistical control (it's based on the following logic: examine the relationship between the IV and the DV at each level of the control/extraneous variable; actually, the computer will do it for you, but thats what it does). One type of statistical control is called partial correlation. This technique shows the correlation between two quantitative variables after statistically controlling for one or more quantitative control/extraneous variables. Again, the computer program (such as SPSS) does this for you. A second type of statistical control is called ANCOVA (or analysis of covariance). This technique shows the relationship between a categorical IV and a quantitative DV after statistically controlling for one or more quantitative control/extraneous variables. Again, you just have to figure out what you want to control for and collected the data; the computer will actually do the ANCOVA for you.
Now I am going to talk about the two key dimensions that should be used in constructing a nonexperimental research design: the time dimension and the research objective dimension. (Note that these dimensions eliminate the need for the terms correlational and causalcomparative in nonexperimental research.)
The Time Dimension in Research Nonexperimental research can be classified according to the time dimension. In particular, Figure 11.3 shows and summarizes the three key ways that nonexperimental research data can vary along the time dimension; in cross-sectional research the data are collected at a single point in time, in longitudinal or prospective research data are collected at two or more time points moving forward, and in retrospective research the researcher looks backward in time to obtain the desired data. .
Page 110 of 179
Classifying Non experimental Research by Research Objective The idea here is that nonexperimental can be conducted for many reasons. The three most common objectives are description, prediction, and explanation. Descriptive nonexperimental research is used to provide a picture of the status or characteristics of a situation or phenomenon (e.g., what kind of personality do teachers tend to have based on the Myers-Briggs test?). Predictive nonexperimental research is used to predict the future status of one or more dependent variables (e.g., What variables predict who will drop out of high school?). Explanatory nonexperimental research is used to explain how and why a phenomenon operates as it does. Interest is in cause-and-effect relationships. One type of explanatory research that I want to mention in this lecture is called theoretical modeling or causal modeling or structural equation modeling (those are all synonyms). Causal modeling (i.e., constructing theoretical models and then checking their fit with the data) is commonly used in nonexperimental research. Causal modeling is used to study direct effects (effect of one variable on another). Here is a way to depict a direct effect: X -----> Y Also used to study indirect effects (effect of one variable on another through an intervening or mediator variable). Here is a way to depict an indirect effect of X on Y: X -----> I ----->Y A strength of causal modeling in nonexperimental research is that they develop detailed theories to test.
Page 111 of 179
A weakness of causal modeling in nonexperimental research is that the causal models are tested with nonexperimental data, which means there is no manipulation, and you will recall that experimental research is stronger for studying cause and effect than nonexperimental research. Also, causal models with longitudinal data are generally better than causal models with cross-sectional data.
Classifying Nonexperimental Research by Time and Research Objective So we talked about two key dimensions for classifying nonexperimental research: the time dimension and the research objective dimension. Notice that these two dimensions can be crossed, which forms a 3-by-3 table, which results in 9 types of nonexperimental research. Here is the resulting Classification Table:
If the above table seems complicated, then note that all you really have to do is to remember to answer these two questions: 1. How are your data collected in relation to time (i.e., are the data retrospective, crosssectional, or longitudinal)? 2. What is the primary research objective (i.e., description, prediction, or explanation)? Your answers to these two questions will lead you to one of the nine cells shown in the above table.
Page 112 of 179
Chapter 12 Qualitative Research (Reminder: Dont forget to utilize the concept maps and study questions as you study this and the other chapters.) Qualitative research relies primarily on the collection of qualitative data (i.e., nonnumeric data such as words and pictures). I suggest that, to put things in perspective, you start by reviewing the table showing the common differences between qualitative, quantitative, and mixed research. That is, take a quick look at Table 2.1 on page 31 (or go to lecture two because it is also included in the lecture). Next, to further understand what qualitative research is all about, please carefully examine Pattons excellent summary of the twelve major characteristics of qualitative research, which is shown in Table 12.1 (page 362) and below:
Page 113 of 179
Now you should understand what qualitative research is. In the rest of the chapter, we discuss the four major types of qualitative research: Phenomenology. Ethnography. Grounded theory. Case study. To get things started, note the key characteristics (i.e., purpose, origin, data-collection methods, data analysis, and report focus) of these four approaches as shown in Table 12.2 on page 363 and below:
Page 114 of 179
Phenomenology The first major approach to qualitative research is phenomenology (i.e., the descriptive study of how individuals experience a phenomenon). Here is the foundational question in phenomenology: What is the meaning, structure, and essence of the lived experience of this phenomenon by an individual or by many individuals? The researcher tries to gain access to individuals' life-worlds, which is their world of experience; it is where consciousness exists. Conducting in-depth interviews is a common method for gaining access to individuals' life- worlds. The researcher, next, searches for the invariant structures of individuals' experiences (also called the essences of their experience). Phenomenological researchers often search for commonalities across individuals (rather than only focusing on what is unique to a single individual). For example, what are the
Page 115 of 179
essences of peoples' experience of the death of a loved one? Here is another example: What are the essences of peoples' experiences of an uncaring nurse? After analyzing your phenomenological research data, you should write a report that provides rich description and a "vicarious experience" of being there for the reader of the report. Shown next are two good examples. See if you get the feeling the patients had when they described caring and noncaring nurses. Here is a description of a caring nurse (from Exhibit 12.2) based on a phenomenological research study: In a caring interaction, the nurses existential presence is perceived by the client as more than just a physical presence. There is the aspect of the nurse giving of oneself to the client. This giving of oneself may be in response to the clients request, but it is more often a voluntary effort and is unsolicited by the client. The nurses willingness to give of oneself is primarily perceived by the client as an attitude and behavior of sitting down and really listening and responding to the unique concerns of the individual as a person of value. The relaxation, comfort, and security that the client expresses both physically and mentally are an immediate and direct result of the clients stated and unstated needs being heard and responded to by the nurse (From Creswell, 1998, p.289). From the same study of nurses, a description also was provided of a noncaring nurse. Here it is: The nurses presence with the client is perceived by the client as a minimal presence of the nurse being physically present only. The nurse is viewed as being there only because it is a job and not to assist the client or answer his or her needs. Any response by the nurse is done with a minimal amount of energy expenditure and bound by the rules. The client perceives the nurse who does not respond to this request for assistance as being noncaring. Therefore, an interaction that never happened is labeled as a noncaring interaction. The nurse is too busy and hurried to spend time with the client and therefore does not sit down and really listen to the clients individual concerns. The client is further devalued as a unique person because he or she is scolded, treated as a child, or treated as a nonhuman being or an object. Because of the devaluing and lack of concern, the clients needs are not met and the client has negative feelings, that is, frustrated, scared, depressed, angry, afraid, and upset (From Creswell, 1998, p.289).
Ethnography The second major approach to qualitative research is ethnography (i.e., the discovery and description of the culture of a group of people). Here is the foundational question in ethnography: What are the cultural characteristics of this group of people or of this cultural scene? Because ethnography originates in the discipline of Anthropology, the concept of culture is of central importance. Culture is the system of shared beliefs, values, practices, language, norms, rituals, and material things that group members use to understand their world. One can study micro cultures (e.g., such as the culture in a classroom) as well as macro cultures (e.g., such as the United States of America culture). There are two additional or specialized types of ethnography. 1. Ethnology (the comparative study of cultural groups).
Page 116 of 179
2.
Ethnohistory (the study of the cultural past of a group of people). An ethnohistory is often done in the early stages of a standard ethnography in order to get a sense of the group's cultural history.
Here are some more concepts that are commonly used by ethnographers: Ethnocentrism (i.e., judging others based on your cultural standards). You must avoid this problem if you are to be a successful ethnographer! Emic perspective (i.e., the insider's perspective) and emic terms (i.e., specialized words used by people in a group). Etic perspective (i.e., the external, social scientific view) and etic terms (i.e., outsider's words or specialized words used by social scientists). Going native (i.e., identifying so completely with the group being studied that you are unable to be objective). Holism (i.e., the idea that the whole is greater than the sum of its parts; it involves describing the group as a whole unit, in addition to its parts and their interrelationships). The final ethnography (i.e., the report) should provide a rich and holistic description of the culture of the group under study. Case Study Research The third major approach to qualitative research is case study research (i.e., the detailed account and analysis of one or more cases). Here is the foundational question in case study research: What are the characteristics of this single case or of these comparison cases? A case is a bounded system (e.g., a person, a group, an activity, a process). Because the roots of case study are interdisciplinary, many different concepts and theories can be used to describe and explain the case. Robert Stake classifies case study research into three types: 1. Intrinsic case study (where the interest is only in understanding the particulars of the case). 2. Instrumental case study (where the interest is in understanding something more general than the case). 3. Collective case study (where interest is in studying and comparing multiple cases in a single research study). Multiple methods of data collection are often used in case study research (e.g., interviews, observation, documents, questionnaires). The case study final report should provide a rich (i.e., vivid and detailed) and holistic (i.e., describes the whole and its parts) description of the case and its context.
Grounded Theory
Page 117 of 179
The fourth major approach to qualitative research is grounded theory (i.e., the development of inductive, "bottom-up," theory that is "grounded" directly in the empirical data). Here is the foundational question in grounded theory: What theory or explanation emerges from an analysis of the data collected about this phenomenon? It is usually used to generate theory (remember from earlier chapters that theories tell you "How" and "Why" something operates as it does; theories provide explanations). Grounded theory can also be used to test or elaborate upon previously grounded theories, as long as the approach continues to be one of constantly grounding any changes in the new data. Four important characteristics of a grounded theory are Fit (i.e., Does the theory correspond to real-world data?), Understanding (i.e., Is the theory clear and understandable?), Generality (i.e., Is the theory abstract enough to move beyond the specifics in the original research study?), Control (i.e., Can the theory be applied to produce real-world results?). Data collection and analysis continue throughout the study. When collecting and analyzing the researcher needs theoretical sensitivity (i.e., being sensitive about what data are important in developing the grounded theory). Data analysis often follows three steps: 1. Open coding (i.e., reading transcripts line-by- line and identifying and coding the concepts found in the data). 2. Axial coding (i.e., organizing the concepts and making them more abstract). 3. Selective coding (i.e., focusing on the main ideas, developing the story, and finalizing the grounded theory). The grounded theory process is "complete" when theoretical saturation occurs (i.e., when no new concepts are emerging from the data and the theory is well validated). The final report should include a detailed and clear description of the grounded theory. Final note: The chapter includes many examples of each of the four types of qualitative research to help in your understanding (i.e., phenomenology, ethnography, case study, and grounded theory). In addition, reading new examples in the published literature will help to further your understanding of these four important approaches to qualitative research.
Page 118 of 179
Chapter 13 Historical Research (Reminder: Dont forget to utilize the concept maps and study questions as you study this and the other chapters.) What is Historical Research? Historical research is the process of systematically examining past events to give an account of what has happened in the past. It is not a mere accumulation of facts and dates or even a description of past events. It is a flowing, dynamic account of past events which involves an interpretation of the these events in an attempt to recapture the nuances, personalities, and ideas that influenced these events. One of the goals of historical research is to communicate an understanding of past events. Significance of Historical Research The following gives five important reasons for conducting historical research (based on Berg, 1998): 1. To uncover the unknown (i.e., some historical events are not recorded). 2. To answer questions (i.e., there are many questions about our past that we not only want to know but can profit from knowing). 3. To identify the relationship that the past has to the present (i.e., knowing about the past can frequently give a better perspective of current events). 4. To record and evaluate the accomplishments of individuals, agencies, or institutions. 5. To assist in understanding the culture in which we live (e.g., education is a part of our history and our culture). Historical Research Methodology There is no one approach that is used in conducting historical research although there is a general set of steps that are typically followed. These include the following steps although there is some overlap and movement back and forth between the steps: 1. Identification of the research topic and formulation of the research problem or question.
2. Data collection or literature review. 3. Evaluation of materials.
4. Data synthesis. 5. Report preparation or preparation of the narrative exposition. Each of these steps is discussed briefly below. Identification of the Research Topic
Page 119 of 179
and Formulation of the Research Problem or Question This is the first step in any type of educational research including historical research. Ideas for historical research topics can come from many different sources such as current issues in education, the accomplishments of an individual, an educational policy, or the relationship between events. Data Collection or Literature Review This step involves identifying, locating, and collecting information pertaining to the research topic. The information sources are often contained in documents such as diaries or newspapers, records, photographs, relics, and interviews with individuals who have had experience with or have knowledge of the research topic. Interviews with individuals who have knowledge of the research topic are called oral histories. The documents, records, oral histories, and other information sources can be primary or secondary sources. A primary source is a source that has a direct involvement with the event being investigated like a diary, an original map, or an interview with a person that experienced the event. A secondary source is a source that was created from a primary source such as books written about the event. Secondary sources are considered less useful than primary sources. Evaluation of Materials Every information source must be evaluated for its authenticity and accuracy because any source can be affected by a variety of factors such as prejudice, economic conditions, and political climate. There are two types of evaluations every sources must pass. 1. External Criticismthis is the process of determining the validity, trustworthiness, or authenticity of the source. Sometimes this is difficult to do but other times it can easily be done by handwriting analysis or determining the age of the paper on which something was written. 2. Internal Criticismthis is the process of determining the reliability or accuracy of the information contained in the sources collected. This is done by positive and negative criticism. Positive criticism refers to assuring that the statements made or the meaning conveyed in the sources are understood. This is frequently difficult because of the problems of vagueness and presentism. Vagueness refers to uncertainty in the meaning of the words and phrases used in the source. Presentism refers to the assumption that the present-day connotations of terms also existed in the past.
Page 120 of 179
Negative criticism refers to establishing the reliability or authenticity and accuracy of the content of the sources used. This is the more difficult part because it requires a judgment about the accuracy and authenticity of what is contained in the source. Firsthand accounts by witnesses to an event are typically assumed to be reliable and accurate.
Historians often use three heuristics in handling evidence. These are corroboration, sourcing, and contextualization. Corroboration, or comparing documents to each other to determine if they provide the same information, is often used to obtain information about accuracy and authenticity. Sourcing, or identifying the author, date of creation of a document, and the place it was created is another technique that is used to establish the authenticity or accuracy of information. Contextualization, or identifying when and where an event took place, is another technique used to establish authenticity and accuracy of information. Data Synthesis and Report Preparation This refers to synthesizing, or putting the material collected into a narrative account of the topic selected. Synthesis refers to selecting, organizing, and analyzing the materials collected into topical themes and central ideas or concepts. These themes are then pulled together to form a contiguous and meaningful whole. Be sure to watch out for these four problems that might be encountered when you attempt to synthesize the material collected and prepare the narrative account. 1. Trying to infer causation from correlated events is the first problem. Just because two events occurred together does not necessarily mean that one event was the cause of the other. 2. A second problem is defining and interpreting key words so as to avoid ambiguity and to insure that they have the correct connotation. 3. A third problem is differentiating between evidence indicating how people should behave and how they in fact did behave. 4. A fourth problem is maintaining a distinction between intent and consequences. In other words, educational historians must make sure that the consequences that were observed from some activity or policy were the intended consequences.
Page 121 of 179
Chapter 14 Mixed Research: Mixed Method and Mixed Model Research (Reminder: Dont forget to utilize the concept maps and study questions as you study this and the other chapters.) This chapter is about mixed research. Mixed research is research in which quantitative and qualitative techniques are mixed in a single study. It is the third major research paradigm, adding an attractive alternative (when it is appropriate) to quantitative and qualitative research. Proponents of mixed research typically adhere to the compatibility thesis as well as to the philosophy of pragmatism. The compatibility thesis is the idea that quantitative and qualitative methods are compatible, that is, they can both be used in a single research study. The philosophy of pragmatism says that researchers should use the approach or mixture of approaches that works the best in a real world situation. In short, what works is what is useful and should be used, regardless of any philosophical assumptions, paradigmatic assumptions, or any other type of assumptions. (Pragmatism was started by the great American philosophers Charles Sanders Peirce, William James, and John Dewey). Today, proponents of mixed research attempt to use what is called the fundamental principle of mixed research. According to this fundamental principle, the researcher should use a mixture or combination of methods that has complementary strengths and nonoverlapping weaknesses. To aid you in applying this fundamental principle, we have provided tables that show the strengths and weaknesses of quantitative research and qualitative research. Here they are for your convenience:
Page 122 of 179
Page 123 of 179
Here is a list of the strengths and weaknesses of mixed research. Looking at the strengths, you will see where you want to go in planning a mixed research study.
Page 124 of 179
The Research Continuum Research can be viewed as falling along a research continuum with monomethod research placed on the far left side, fully mixed research placed on the far right side, and partially
Page 125 of 179
mixed located in the center. You should be able to take any given research study and place it somewhere on the continuum.
Types of Mixed Research Methods There are two major types of mixed research: they are mixed model research and mixed method research. Mixed Model Research In mixed model research quantitative and qualitative approaches are mixed within or across the stages of the research process. Here are the two mixed model research subtypes: within-stage and across-stage mixed model research. 1. In within-stage mixed model research, quantitative and qualitative approaches are mixed within one or more of the stages of research. An example of within-stage mixed model research would be where you used a questionnaire during data collection that included both open-ended (i.e., qualitative) questions and closed-ended (i.e., quantitative) questions. 2. In across-stage mixed model research, quantitative and qualitative approaches are mixed across at least two of the stages of research. Across-stage mixed model research designs are easily seen by examining designs 2 through 7 in Figure 14.2 (shown below):
Page 126 of 179
Here is an example of across-stage mixed model research: A researcher wants to explore (qualitative objective) why people take on-line college courses. The researcher conducts open-ended interviews (qualitative data collection) asking them why they take on-line courses, and then the researcher quantifies the results by counting the number of times each type of response occurs (quantitative data analysis); the researcher also reports the responses as percentages and examines the relationships between sets of categories or variables through the use of contingency tables. Note that this is design 2 shown above in Figure 14.2.
Mixed Method Research In mixed method research, a qualitative phase and a quantitative phase are included in the overall research study. Its like including a quantitative mini-study and a qualitative ministudy in one overall research study. Mixed method research designs are classified according to two major dimensions: 1. Time order (i.e., concurrent versus sequential) and 2. Paradigm emphasis (i.e., equal status versus dominant status). Below, in Figure 14.3, you can see the specific mixed method designs that result from crossing time order and paradigm emphasis. It is a 2-by-2 matrix, and it includes nine specific mixed method designs. In order to understand the designs, you need to first understand the notation that is used. QUAL and qual both stand for qualitative research. QUAN and quan both stand for quantitative research. Capital letters denote priority or increased weight. Lowercase letters denote lower priority or weight. A plus sign (+) indicated the concurrent collection of data. An arrow () represents a sequential collection of data. For example: qualQUAN is a dominant status, sequential design where, the overall study is primarily quantitative but it is preceded by a qualitative phase. Perhaps a researcher does an open-ended survey to find some important categories or variables that students say are important reasons for dropping out of on-line courses. Then in the quantitative phase the researcher does a quantitative study of predictors of dropping out, using quantitative statistical methods. In other words the quantitative phase was primary and the qualitative phase was supportive (and occurred first). In order to use Figure 14.3, you need to ask yourself two questions: 1. Do you want to operate largely within one dominant paradigm or not (i.e., do you want to use a dominant status design or an equal status design?), and 2. Do you want to conduct the phases concurrently (i.e., at roughly the same time) or sequentially (i.e., one before the other)?
Page 127 of 179
Your answers to these two questions will lead you to one of the designs in Figure 14.3. Your goal is to pragmatically design a study that fits your particular needs and circumstances.
It is important to understand that you are not limited to the mixed method or mixed model designs provided in this chapter. Our designs are provided to get you started. You should feel free to mix and match the designs into a design that best fits your needs. This includes designing studies that are a mix of mixed model and mixed method designs. You goal, always, is to answer you research question(s) and then to design a study that will help you to do that.
Stages of Mixed Research Process There are eight stages in the mixed research process, as shown in Figure 14.4 (in the text, and here for your convenience).
Page 128 of 179
It is important to note that although the steps in mixed research are numbered, researchers often follow these steps in different orders, depending on what particular needs and concerns arise or emerge during a particular research study. For example, interpretation and validation of the data should be done throughout the data collection process. I will very briefly comment on each of the eight (nonlinear) steps: (1) Determine whether a mixed design is appropriate Do you believe that you can best answer your research question(s) through the use of mixed research? Do you believe that mixed research will offer you the best design for the amount and kind of evidence that you hope to obtain as you conduct your research study?
(2) Determine the rationale for using a mixed design The five most important rationales or purposes for mixed research are shown below in Table 14.4:
Page 129 of 179
You can see in Table 14.4, that mixed research can help researchers to a lot of important things as they attempt to understand the world.
(3) Select the mixed method or mixed model research design We have already shown you, in this lecture, the basic mixed model designs and the basic mixed method designs. Remember that you can also build more unique and/or more complex designs than the ones we have shown as you plan a study that will help you to answer your research question(s).
(4) Collect the data Keep in mind the six major methods of data collection that we discussed in chapter 6: tests, questionnaires, interviews, focus groups, observation, and secondary or already existing data (such as personal and official documents, physical data, and archived research data). (5) Analyze the data You can use the quantitative data analysis techniques (Chapters 15 and 16) and qualitative data analysis techniques (Chapter 17). You might want to use the technique of quantitizing (i.e., converting qualitative data into quantitative data). You might want to use the technique of qualitizing (i.e., converting quantitative data into qualitative data).
Page 130 of 179
For more information on data analysis in mixed research, I highly recommend the following: Onwuegbuzie, A.J., & Teddlie, C. (2003). A framework for analzing data in mixed methods research. In A. Tashakkori & C. Teddlie (Eds.), Handbook of mixed methods in social and behavioral research (pp. 351-383). Thousand Oaks, CA: Sage.
(6) Validate the data Data validation is something that should be done throughout your research study because if your data are not trustworthy then you study is not trustworthy. In Chapter 8 we discussed validity strategies used in quantitative research (pp. 228-248) and validity strategies used in qualitative research (pp. 249-256). You should consider using quantitative and qualitative validity strategies in your study, and you should mix these in a way that best works for your mixed research study.
(7) Interpret the data Data interpretation begins as soon as you enter the field or collect the first datum (datum is the singular of data), and data interpretation continues throughout your research study. Remember that data interpretation and data validation go hand-in-hand; that is, you want to make sure that you continually use strategies that will provide valid data and help you to make defensible interpretations of your data. A couple of strategies to use during data interpretation are reflexivity (i.e., which involves self-awareness and critical self-reflection by the researcher on his or her potential biases and predispositions as these may affect the research process and conclusions), and negative-case sampling (i.e., attempting to locate and examine cases that disconfirm your expectations and tentative explanations).
(8) Write the research report. Writing the report also can be started during data collection rather than waiting until the end. Remember that mixing MUST take place somewhere in mixed research if it is to truly be mixed research, and your report should also reflect mixing; that is, as you discuss your results you must relate the quantitative and qualitative parts of your research study to make sense of the overall study and to capitalize on the strengths of mixed research. In conclusion, mixed research is the newest research paradigm in educational research. It offers much promise, and we expect to see much more methodological work and discussion about mixed research in the future as more researchers and book authors become aware of this important approach to empirical research.
Page 131 of 179
Chapter 15 Descriptive Statistics
An overview of the field of statistics is shown in Figure 15.1 (also shown below). As you can see, the field of statistics can be divided into descriptive statistics and inferential statistics (and there are further subdivisions under inferential statistics which is the topic of the next chapter).
This chapter is about descriptive statistics (i.e., the use of statistics to describe, summarize, and explain or make sense of a given set of data). A data set (i.e., a set of data with the "cases" going down the rows and the "variables" going across the columns) is shown in Table 15.1. Once you put your data set (such as the one in Table 15.1) into a statistical program such as SPSS, you are ready to obtain all the descriptive statistics that you want (i.e., which will help you to make some sense out of your data).
Frequency Distributions One useful way to view the data of a variable is to construct a frequency distribution (i.e., an arrangement in which the frequencies, and sometimes percentages, of the occurrence of each unique data value are shown). An example is shown in Table 15.2 in the book and here for your convenience.
Page 132 of 179
When a variable has a wide range of values, you may prefer using a grouped frequency distribution (i.e., where the data values are grouped into intervals and the frequencies of the intervals are shown). For the above frequency distribution, one possible set of grouped intervals would be 20,000-24,999; 25,000-29,999; 30,000-34,999; 35,000-39,999; 40,000-44,999. Note that the categories developed for a grouped frequency distribution must be mutually exclusive (the property that intervals do not overlap) andexhaustive (the property that a set of intervals or categories covers the complete range of data values). An example of a grouped frequency distribution is shown on pate 437.
Graphic Representations of Data Another excellent way to describe your data (especially for visually oriented learners) is to construct graphical representations of the data (i.e., pictorial representations of the data in twodimensional space). Some common graphical representations are bar graphs, histograms, line graphs, and scatterplots.
Bar Graphs
Page 133 of 179
A bar graph uses vertical bars to represent the data. The height of the bars usually represent the frequencies for the categories that sit on the X axis. Note that, by tradition, the X axis is the horizontal axis and the Y axis is the vertical axis. Bar graphs are typically used for categorical variables. Here is a bar graph of one of the categorical variables included in the data set for this chapter (i.e., the data set shown on page 435).
Histograms A histogram is a graphic that shows the frequencies and shape that characterize a quantitative variable. In statistics, we often want to see the shape of the distribution of quantitative variables; having your computer program provide you with a histogram is a simple way to do this. Here is a histogram for a quantitative variable included in the data set for this chapter:
Page 134 of 179
Line Graphs A line graph uses one or more lines to depict information about one or more variables. A simple line graph might be used to show a trend over time (e.g., with the years on the X axis and the population sizes on the Y axis). Here is an example of a line graph (Figure 15.4):
Line graphs are used for many different purposes in research. For example, if you will turn to page 290, you will see a line graph (e.g., Figure 9.15) used in factorial experimental designs to depict the relationship between two categorical independent variables and the dependent variable. Yet another line graph is shown on page 468 in the next chapter. This line graph shows that the "sampling distribution of the mean" is normally distributed. As you can see in the Figures just listed, line graphs have in common their use of one or more lines within the graph (to depict the levels or characteristics of a variable or to depict the relationships among variables).
Scatterplots A scatterplot is used to depict the relationship between two quantitative variables. Typically, the independent or predictor variable is represented by the X axis (i.e., on the horizontal axis) and the dependent variable is represented by the Y axis (i.e., on the vertical axis).
Page 135 of 179
Here is an example of a scatterplot showing the relationship between two of the quantitative variables from the data set for this chapter:
Measures of Central Tendency Measures of central tendency provide descriptive information about the single numerical value that is considered to be the most typical of the values of a quantitative variable. Three common measures of central tendency are the mode, the median, and the mean. The mode is simply the most frequently occurring number. The median is the center point in a set of numbers; it is also the fiftieth percentile. To get the median by hand, you first put your numbers in ascending or descending order. Then you check to see which of the following two rules applies: Rule One. If you have an odd number of numbers, the median is the center number (e.g., three is the median for the numbers 1, 1, 3, 4, 9).
Page 136 of 179
Rule Two. If you have an even number of numbers, the median is the average of the two innermost numbers (e.g., 2.5 is the median for the numbers 1, 2, 3, 7).
The mean is the arithmetic average (e.g., the average of the numbers 2, 3, 3, and 4, is equal to 3). A Comparison of the Mean, Median, and Mode The mean, median, and mode are affected by what is called skewness (i.e., lack of symmetry) in the data. Here is Figure 15.6, which showed a normal curve, a negatively skewed curve, and a positively skewed curve:
Look at the above figure and note that when a variable is normally distributed, the mean, median, and mode are the same number. When the variable is skewed to the left (i.e., negatively skewed), the mean shifts to the left the most, the median shifts to the left the second most, and the mode the least affected by the presence of skew in the data. Therefore, when the data are negatively skewed, this happens: mean < median < mode. When the variable is skewed to the right (i.e., positively skewed), the mean is shifted to the right the most, the median is shifted to the right the second most, and the mode the least affected. Therefore, when the data are positively skewed, this happens: mean > median > mode. If you go to the end of the curve, to where it is pulled out the most, you will see that the order goes mean, median, and mode as you walk up the curve for negatively and positively skewed curves.
You can use the following two rules to provide some information about skewness even when you cannot see a line graph of the data (i.e., all you need is the mean and the median): 1. Rule One. If the mean is less than the median, the data are skewed to the left.
Page 137 of 179
2.
Rule Two. If the mean is greater than the median, the data are skewed to the right.
Measures of Variability Measures of variability tell you how "spread out" or how much variability is present in a set of numbers. They tell you how different your numbers tend to be. Note that measures of variability should be reported along with measures of central tendency because they provide very different but complementary and important information. To fully interpret one (e.g., a mean), it is helpful to know about the other (e.g., a standard deviation). An easy way to get the idea of variability is to look at two sets of data, one that is highly variable and one that is not very variable. For example, which of these two sets of numbers appears to be the most spread out, Set A or Set B? Set A. 93, 96, 98, 99, 99, 99, 100 Set B. 10, 29, 52, 69, 87, 92, 100 If you said Set B is more spread out, then you are right! The numbers in set B are more "spread out"; that is, they are more variability. All of the measures of variability should give us an indication of the amount of variability in a set of data. We will discuss three indices of variability: the range, the variance, and the standard deviation. Range A relatively crude indicator of variability is the range (i.e., which is the difference between the highest and lowest numbers). For example the range in Set A shown above is 7, and the range in Set B shown above is 90. Variance and Standard Deviation Two commonly used indicators of variability are the variance and the standard deviation. Higher values for both of these indicators indicate a larger amount of variability than do lower numbers. Zero stands for no variability at all (e.g., for the data 3, 3, 3, 3, 3, 3, the variance and standard deviation will equal zero). When you have no variability, the numbers are a constant (i.e., the same number). Table 15.4 shows you how to easily calculate, by hand, the variance and standard deviation. (Basically, you set up the three columns shown, get the sum of the third column, and then plug the relevant numbers into the variance formula.) The variance tells you (exactly) the average deviation from the mean, in "squared units." The standard deviation is just the square root of the variance (i.e., it brings the "squared units" back to regular units).
Page 138 of 179
The standard deviation tells you (approximately) how far the numbers tend to vary from the mean. (If the standard deviation is 7, then the numbers tend to be about 7 units from the mean. If the standard deviation is 1500, then the numbers tend to be about 1500 units from the mean.)
Virtually everyone in education is already familiar with the normal curve (a picture of one is shown in Figure 15.7 on page 449). If data are normally distributed, then an easy rule to apply to the data is what we call the 68, 95, 99.7 percent rule." That is . . . Approximately 68% of the cases will fall within one standard deviation of the mean. Approximately 95% of the cases will fall within two standard deviations of the mean. Approximately 99.7% of the cases will fall within three standard deviations of the mean.
Measures of Relative Standing Measures of relative standing are used to provide information about where a particular score falls in relation to the other scores in a distribution of data. Two commonly used measures of relative standing are percentile ranks and Z-scores. Here is Figure 15.8 which shows these and some additional types of standard scores. You can determine the mean of the type of standard scores below by simply looking under Mean. You can determine the standard deviation by looking at how much the scores increase as you move from the mean to 1 SD. Z-Scores: have a mean of 0 and a standard deviation of 1. Therefore, if you converted any set of scores (e.g., the set of student grades on a test) to z-scores, then that new set WILL have a mean of zero and a standard deviation of one. IQ has a mean of 100 and a standard deviation of 15. SAT has a mean of 500 and a standard deviation of 100. Note: percentile ranks are a different type of score; because they only have ordinal measurement properties, the concept of standard deviation is not relevant.
Page 139 of 179
Percentile Ranks A percentile rank tells you the percentage of scores in a reference group (i.e., in the norming group) that fall below a particular raw score. For example, if your percentile rank is 93 then you know that 93 percent of the scores in the reference group fall below your score. Z-Scores A z-score tells you how many standard deviations (SD) a raw score falls from the mean. A SD of 2 says a score falls two standard deviations above the mean. A SD of -3.5 says the score falls three and a half standard deviations below the mean. To transform a raw score into z-score units, just use the following formula: Raw score - Mean -----------------------Standard Deviation
Z-score =
Page 140 of 179
For example, you know that the mean for IQ scores is 100 and the standard deviation for IQ scores is 15 (because we told you this in the book and because you can see it by examining Figure 15.8). Therefore, if your IQ is 115, you can get your z-score... 115 - 100 Z-score = --------------- = 15 15 -------- = 15
An IQ of 115 falls one standard deviation above the mean. Note that once you have a set of z-scores, you can convert to any other scale by using this formula: New score = Z-score(SD of new scale) + mean of the new scale. For example, lets convert a z-score of three to an IQ score New score=3(15) + 100 (remember, the mean of IQ scores is 100 and the standard deviation of IQ scores is 15). Therefore, the new score (i.e., the IQ score converted from the z-score of 3 using the formula I just provided) is equal to 145 (3 times 15 is 45, and when 100 is added you get 145). Examining Relationships Among Variables We have been talking about relationships among variables throughout your textbook. For example, we have already talked about correlation (e.g., see Figure 2.2 on page 44), partial correlation (e.g., see page 341), analysis of variance which is used for factorial designs (e.g., see pages 286-291), and analysis of covariance (e.g., see pages 274-275 and pages 341-342). At this point in this chapter on descriptive statistics, I introduce two additional techniques that you also can use for examining relationships among variables: contingency tables and regression analysis. Contingency Tables When all of your variables are categorical, you can use contingency tables to see if your variables are related. A contingency table is a table displaying information in cells formed by the intersection of two or more categorical variables. An example is shown in Table 15.6. When interpreting a contingency table, remember to use the following two rules: Rule One. If the percentages are calculated down the columns, compare across the rows. Rule Two. If the percentages are calculated across the rows, compare down the columns. When you follow these rule you will be comparing the appropriate rates (a rate is the percentage of people in a group who have a specific characteristic).
Page 141 of 179
When you listen to the local and national news, you will often hear the announcers compare rates. The failure of some researchers to follow the two rules just provided has resulted in misleading statements about how categorical variables are related; so be careful.
Regression Analysis Regression analysis is a set of statistical procedures used to explain or predict the values of a quantitative dependent variable based on the values of one or more independent variables. In simple regression, there is one quantitative dependent variable and one independent variable. In multiple regression, there is one quantitative dependent variable and two or more independent variables. On pages 455-459, I show you the components of the regression equations (e.g., the Yintercept and the regression coefficients). Here are the important definitions: Regression equation-The equation that defines the regression line (see Figure 15.9 in book and below).
Here is the simple regression equation showing the relationship between starting salary (Y or your dependent variable) and GPA (X or your independent variable) (two of the variables in the data set included with this chapter on page 435).
= 9,234.56 + 7,638.85 (X)
The 9,234.56 is the Y intercept (look at the above regression line; it crosses the Y axis a little below $10,000; specifically, it crosses the Y axis at $9,234.56).
Page 142 of 179
The 7,638.85 is the simple regression coefficient, which tells you the average amount of increase in starting salary that occurs when GPA increases by one unit. (It is also the slope or the rise over the run). Now, you can plug in a value for X (i.e., starting salary) and easily get the predicted starting salary. If you put in a 3.00 for GPA in the above equation and solve it, you will see that the predicted starting salary is $32,151.11 Now plug in another number within the range of the data (how about a 3.5) and see what the predicted starting salary is. (Check on your work: it is $35,970.54)
On pages 458-459, I show a multiple regression equation with two independent variables. The main difference is that in multiple regression, the regression coefficient is now called a partial regression coefficient, and this coefficient provides the predicted change in the dependent variable given a one unit change in the independent variable, controlling for the other independent variables in the equation. In other words, you can use multiple regression to control for other variables (i.e., for what we called in earlier chapters statistical control).
Page 143 of 179
Chapter 16 Inferential Statistics (REMINDER: as you read the lectures, its a good idea to also look at the concept map for each chapter. The concept maps help to give you the big picture and see how the concepts are related. Here is the link to all of the concept maps; just select the one for this chapter:http://www.southalabama.edu/coe/bset/johnson/dr_johnson/2conceptmaps.htm) This is probably the most challenging chapter in your book. However, you can understand it. It just takes attention and effort. After you carefully study the material, it will become clear to you. I will also be available to answer any questions you have. Please start this chapter by taking a look (again) at the divisions in the field of statistics that were shown in Figure 15.1 (p. 434) and also shown in the previous lecture. This shows the "big picture." As you can see, inferential statistics is divided into estimation and hypothesis testing, and estimation is further divided into point and interval estimation. Inferential statistics is defined as the branch of statistics that is used to make inferences about the characteristics of a populations based on sample data. The goal is to go beyond the data at hand and make inferences about population parameters. In order to use inferential statistics, it is assumed that either random selection or random assignment was carried out (i.e., some form of randomization must is assumed). Looking at Table 16.1 (p.464 and shown below) you can see that statisticians use Greek letters to symbolize population parameters (i.e., numerical characteristics of populations, such as means and correlations) and English letters to symbolize sample statistics (i.e., numerical characteristics of samples, such as means and correlations).
For example, we use the Greek letter mu (i.e., ) to symbolize the population mean and the Roman/English letter X with a bar over it, (called X bar), to symbolize the sample mean. Sampling Distributions
Page 144 of 179
One of the most important concepts in inferential statistics is that of the sampling distribution. That's because the use of a sampling distributions is what allows us to make "probability" statements in inferential statistics. A sampling distribution is defined as "The theoretical probability distribution of the values of a statistic that results when all possible random samples of a particular size are drawn from a population." (For simplicity you can view the idea of "all possible samples" as taking a million random samples. That is, just view it as taking a whole lot of samples!) A one specific type of sampling distribution is called the sampling distribution of the mean. If you wanted to generate this distribution through the laborious process of doing it by hand (which you would NOT need to do in practice), you would randomly select a sample, calculate the mean, randomly select another sample, calculate the mean, and continue this process until you have calculated the means for all possible samples. This process will give you a lot of means, and you can construct a line graph to depict your sampling distribution of the mean (e.g., see Figure 16.1 on page 468). The sampling distribution of the mean is normally distributed (as long as your sample size is about 30 or more for your sampling). Also, note that the mean of the sampling distribution of the mean is equal to the population mean! That tells you that repeated sampling will, over the long run, produce the correct mean. The spread or variance shows you that sample means will tend to be somewhat different from the true population mean in most particular samples. Although I just described the sampling distribution of the mean, it is important to remember that a sampling distribution can be obtained for any statistic. For example, you could also obtain the following sampling distributions: Sampling distribution of the percentage (or proportion). Sampling distribution of the variance. Sampling distribution of the correlation. Sampling distribution of the regression coefficient. Sampling distribution of the difference between two means. The standard deviation of a sampling distribution is called the standard error. In other words, the standard error is just a special kind of standard deviation and you learned what a standard deviation was in the last chapter. The smaller the standard error, the less the amount of variability present in a sampling distribution. It is important to understand that researchers do not actually empirically construct sampling distributions! When conducting research, researchers typically select only one sample from the population of interest; they do not collect all possible samples. The computer program that a researcher uses (e.g., SPSS and SAS) uses the appropriate sampling distribution for you. The computer program will look at the type of statistical analysis you select (and also consider certain additional information that you have provided, such as the sample size in your study), and then the statistical program selects the appropriate sampling distribution.
Page 145 of 179
(It's kind of like the Greyhound Bus analogy: Leave the driving to us...SPSS will take care of generating the appropriate sampling distribution for you if you give it the information it needs.)
So please remember that the idea of sampling distributions (i.e., the idea of probability distributions obtained from repeated sampling) underlies our ability to make probability statements in inferential statistics. Now, I'm going to cover the two branches of inferential statistics (i.e., estimation and hypothesis testing) that were shown in Figure 15.1: estimation and hypothesis testing.
Estimation The key estimation question is "Based on my random sample, what is my estimate of the population parameter?" The basic idea is that you are going to use your sample data to provide information about the population. There are actually two types of estimation. They can be first understood through the following analogy: Let's say that you take your car to your local car dealer's service department and you ask the service manager how much it will cost to repair your car. If the manager says it will cost you $500 then she is providing a point estimate. If the manager says it will cost somewhere between $400 and $600 then she is providing an interval estimate. In other words, a point estimate is a single number, and an interval estimate is a range of numbers. A point estimate is the value of your sample statistic (e.g., your sample mean or sample correlation), and it is used to estimate the population parameter (e.g., the population mean or the population correlation). For example, if you take a random sample from adults living an the United States and you find that the average income for the people in your sample is $45,000, then your best guess or your point estimate for the population of adults in the U.S. will be $45,000. In the above example, you used the value of the sample mean as the estimate of the population mean. Again, whenever you engage in point estimation, all you need to do is to use the value of your sample statistic as your "best guess" (i.e., as your estimate) of the (unknown) population parameter. Oftentimes, we like to put an interval around our point estimates so that we realize that the actual population value is somewhat different from our point estimate because sampling error is always present in sampling.
Page 146 of 179
An interval estimate (also called a confidence interval) is a range of numbers inferred from the sample that has a known probability of capturing the population parameter over the long run (i.e., over repeated sampling). See Figure 16.2, p.471, for a picture of twenty different confidence intervals randomly jumping around the population mean from sample to sample.) Here it is for your convenience:
Page 147 of 179
The "beauty" of confidence intervals is that we know their probability (over the long run) of including the true population parameter. (You can't do this with a point estimate.) Specifically, if you have the computer provide you with a 95 percent confidence interval (based on your data), then you will be able to be "95% confident" that it will include the population parameter. That is, your level of confidence is 95%. For example, you might take the point estimate of annual income of U.S. adults of $45,000 (used earlier as a point estimate) and surround it by a 95% confidence interval. You might find that the confidence interval is $43,000 to $47,000. In this case, you can be "95% confident" that the average income is somewhere between $43,000 and $47,000. If you have the computer program give you a 99% confidence interval, then you can be "99% confident" that the confidence interval provided will include the population parameter (i.e., it will capture the true parameter 99% of the time in the long run).
You might ask: So why dont we just use 99% confidence intervals rather than 95% intervals, since you will make fewer mistakes? The answer is that for a given sample size, the 99% confidence interval will be wider (i.e., less precise) than a 95% confidence interval. For example, the interval $40,000 to 50,000 is wider than the interval $43,000 to $47,000. 95% confidence intervals are popular with many researchers. However, you may, at times, want to use other confidence intervals (e.g., 90% confidence intervals or 99% confidence intervals).
Hypothesis Testing Hypothesis testing is the branch of inferential statistics that is concerned with how well the sample data support a null hypothesis and when the null hypothesis can be rejected in favor of the alternative hypothesis. First note that the null hypothesis is usually the prediction that there is no relationship in the population. The alternative hypothesis is the logical opposite of the null hypothesis and says there is a relationship in the population. We use hypothesis testing when we expect a relationship to be present; in other words, we usually hope to nullify the null hypothesis and tentatively accept the alternative hypothesis. (Note: if you expect the null to be true, you can use the estimation approach described in this chapter; several additional procedures for this special case are discussed in Shadish, Cook, and Campbells book Experimental and QuasiExperimental Designs, 2002, pp. 52-53) Here is the key question that is answered in hypothesis testing: "Is the value of my sample statistic unlikely enough (assuming that the null hypothesis is true) for me to reject the null hypothesis and tentatively accept the alternative hypothesis?" Note that it is the null hypothesis that is directly tested in hypothesis testing (not the alternative hypothesis).
Page 148 of 179
To get the idea of null hypothesis testing in your head, reread Exhibit 16.1 (p. 473 and shown below). Exhibit 16.1 An Analogy From Jurisprudence The United States criminal justice system operates on the assumption that the defendant is innocent until proven guilty beyond a reasonable doubt. In hypothesis testing, this assumption is called the null hypothesis. That is, researchers assume that the null hypothesis is true until the evidence suggests that it is not likely to be true. The researcher's null hypothesis might be that a technique of counseling does not work any better than no counseling. The researcher is kind of like a prosecuting attorney. The prosecuting attorney brings someone to trial when he or she believes there is some evidence against the accused, and the researcher brings a null hypothesis to "trial" when he or she believes there is some evidence against the null hypothesis (i.e., the researcher actually believes that the counseling technique does work better than no counseling). In the courtroom, the jury decides what constitutes reasonable doubt, and they make a decision about guilt or innocence. The researcher uses inferential statistics to determine the probability of the evidence under the assumption that the null hypothesis is true. If this probability is low, the researcher is able to reject the null hypothesis and accept the alternative hypothesis. If this probability is not low, the researcher is not able to reject the null hypothesis. No matter what decision is made, things are still not completely settled because a mistake could have been made. In the courtroom, decisions of guilt or innocence are sometimes overturned or found to be incorrect. Similarly, in research, the decision to reject or not reject the null hypothesis is based on probability, so researchers sometimes make a mistake. However, inferential statistics gives researchers the probability of their making a mistake.
Here is the main point: In the United States System of Jurisprudence, a defendant is "presumed innocent" until evidence calls this assumption into question. That is, the jury is told to assume that a person is innocent until they have heard all of the evidence and can make a decision. Likewise, in hypothesis testing, the null hypothesis is assumed to be true (i.e., it is assumed that there is no relationship) until evidence clearly calls this assumption into question. In jurisprudence, the jury rejects the claim of innocence (rejects the null) in the face of strong evidence to the contrary and makes the opposite conclusion that the defendant is guilty. Likewise, in hypothesis testing, the researcher rejects the null hypothesis in the face of strong evidence to the contrary. In hypothesis testing, "strong evidence to the contrary" is found in a small probability value, which says the research result is unlikely if the null hypothesis is true. When the researcher rejects the null hypothesis (i.e., rejects the assumption of no relationship), he or she tentatively accepts the alternative hypothesis (i.e., which says there is a relationship in the population). In short . . . in the procedure called hypothesis testing the researcher states the null and alternative hypotheses. Then if the probability value is small, the researcher rejects the null hypothesis and goes with the alternative hypothesis and makes the claim that statistical significance has been found.
Page 149 of 179
Now take a look at the research questions and the null and alternative hypotheses shown below and in Table 16.2 (p.474). When you look at the table be sure to notice that the null hypothesis has the equality sign in it and the alternative hypothesis has the "not equals" sign in it. You can also see in the table that hypotheses can be tested for many different kinds of research questions such as questions about means, correlations, and regression coefficients.
You may be wondering, when do you actually reject the null hypothesis and make the decision to tentatively accept the alternative hypothesis? Earlier I mentioned that you reject the null hypothesis when the probability of your result assuming a true null is very small. That is, you reject the null when the evidence would be unlikely under the assumption of the null. In particular, you set a significance level (also called the alpha level) to use in your research study, which is the point at which you would consider a result to be very unlikely. Then, if your probability value is less than or equal to your significance level, you reject the null hypothesis. It is essential that you understand the difference between the probability value (also called the p-value) and the significance level (also called the alpha level). The probability value is a number that is obtained from the SPSS computer printout. It is based on your empirical data, and it tells you the probability of your result or a more extreme result when it is assumed that there is no relationship in the population (i.e., when you are assuming that the null hypothesis is true which is what we do in hypothesis testing and in jurisprudence). The significance level is just that point at which you would consider a result to be "rare." You are the one who decides on the significance level to use in your research study. A significance level is not an empirical result; it is the level that you set so that
Page 150 of 179
you will know what probability value will be small enough for you to reject the null hypothesis. The significance level that is usually used in education is .05. It boils down to this: if your probability value is less than or equal to the significance level (e.g., .05) then you will reject the null hypothesis and tentatively accept the alternative hypothesis. If not (i.e., if it is > .05) then you will fail to reject the null. You just compare your probability value with your significance level. You must memorize the definitions of probability value and significance level right away because they are at the heart of hypothesis testing. At the most simple level, the process just boils down to seeing whether you probability value is less than (or equal to) your significance level. If it is, you are happy because you can reject the null hypothesis and make the claim of statistical significance. (Still dont forget the last step of determining practical significance.)
This full process of hypothesis testing is summarized in Table 16.3 (p.480) and shown below. Be sure to note the final step shown in the table, because after conducting a hypothesis test, you must interpret your results, make a substantive, real-world decision, and determine the practical significance of your result. Here is Table 16.3, in case you don't have your book handy.
Page 151 of 179
Step 5 shows that you must decide what the results of your research study actually mean. Statistical significance does not tell you whether you have practical significance. At the end of step four you will know whether your result is statistically significant.
Page 152 of 179
If a finding is statistically significant then you can claim that the evidence suggests that the observed result (e.g., your observed correlation or your observed difference between two means) was probably not just due to chance. That is, there probably is some nonzero relation present in the population. An effect size indicator can aid in your determination of practical significance and should always be examined to help interpret the strength of a statistically significant relationship. An effect size indicator is defined as a measure of the strength of a relationship. A finding is practically significant when the difference between the means or the size of the correlation is big enough, in your opinion, to be of practical use. For example, a correlation of .15 would probably not be practically significant, even if it was statistically significant. On the other hand, a correlation of .85 would probably be practically significant. Practical significance requires you to make a non-quantitative decision and to think about many different factors such as the size of the relationship, whether an intervention would transfer well to the real world, the costs of using a statistically significant intervention in the real world, etc. It is a decision that YOU make.
The next idea is for you to realize that you will either make a correct decision about statistical significance or you will make an error whenever you conduct a hypothesis test. This idea is shown below and in Table 16.5 (p. 482) and here for your convenience.
Looking at the top of the table (i.e., above the two columns) you will see that the null hypothesis is either true or not true in the empirical world. If you look at the side of the table (i.e., beside the two rows) you will see that you must make a decision to either fail to reject or to reject the null hypothesis. When the null is false you want to reject it, but when it is true you do not want to reject it. The four logical possibilities of hypothesis testing are shown in the table.
Page 153 of 179
When the null hypothesis is true you can make the correct decision (i.e., fail to reject the null) or you can make the incorrect decision (rejecting the true null). The incorrect decision is called a Type I error or a "false positive" because you have erroneously concluded that there is an effect or relationship in the population. When the null hypothesis is false you can also make the correct decision (i.e., rejecting the false null) or you can make the incorrect decision (failure to reject the false null). The incorrect decision is called a Type II error or a "false negative" because you have erroneously concluded that there is no effect or relationship in the population. You need to memorize the definitions of Type I and Type II errors, and after working with many examples of hypothesis testing they will become easier to ponder. Exercise: In law, a person is presumed to be innocent (i.e., that is the null hypothesis). Explain the idea of Type I and Type II errors here. Which error has occurred when an innocent person is found guilty? Which error has occurred when a guilty person is found innocent by the jury? (The answers are below.)
Hypothesis Testing in Practice In this last section of the chapter, I apply the process of hypothesis testing (which is also called "significance testing") to the data set given in Table 15.1 (p. 435) and shown again here (below).
Page 154 of 179
Since we are now using this data set for inferential statistics, we will assume that the 25 people were randomly selected. Note that there are three quantitative variables and two categorical variables (can you list them?). Also note that I will use the significance level of .05 for all of my statistical tests below.
(The answers to the earlier questions about the two types of errors are in the first case a Type I error was made and in the second case a Type II error was made.)
Before I test some hypotheses, I want to point out the reason WHY we use hypothesis or significance testing: We do it because researchers do not want to interpret findings that are not statistically significant because these findings are probably nothing but a reflection of chance fluctuations.
Page 155 of 179
Note that in all of the following examples I will be doing the same thing. I will get the p-value and compare it to my preset significance level of .05 to see if the relationship is statistically significant. And then I will also interpret the results by looking at the data, looking at an effect size indicator, and by thinking about the practical importance of the result. Again, after practice, significance becomes very easy because you do the same procedure every single time. Determining the practical significance is probably the hardest part. t-Test for Independent Samples One frequently used statistical test is called the t-test for independent samples. We do this when we want to determine if the difference between two groups is statistically significant. Here is an example of the t-test for independent samples using our recent college graduate data set:
Research Question: Is the difference between average starting salary for males and the average starting salary for females significantly different? Here the hypotheses (note that they are stated in terms of population parameters):
Null Hypothesis Ho: M = F (i.e., the population mean for males equals the population mean for females) Alternative Hypothesis H1: M F (i.e., the population mean for males does not equal the population mean for females)
The probability value was .048 (I got this off of my SPSS printout). Since my probability value of .049 is less than my significance level of .05, I reject the null hypothesis and accept the alternative. I conclude that the difference between the two means is statistically significant. Now I would need to look at the actual means and interpret them for substantive and practical significance. The males mean is $34,333.33 and the females mean is $31,076.92. I can simply look at these means and see how different they are. To help in judging how different the means are, I also calculated an effect size indicator called eta-squared which was equal to .16. This tells me that gender explains 16% of the variance in starting salary in my data set. I conclude that males earn more than females, and because this is an important issue in society, I also conclude that this difference is practically significant. One-Way Analysis of Variance One-way analysis of variance is used to compare two or more group means for statistical significance. Here is an example using our recent college graduate data set:
Page 156 of 179
Research Question: Is there a statistically significant difference in the starting salaries of education majors, arts and sciences majors, and engineering majors? Here the hypotheses (note that they are stated in terms of population parameters): Null Hypothesis. Ho: E = A&S = B (i.e., the population means for education students, arts and sciences students, and business students are all the same) Alternative Hypothesis. the same) H1: Not all equal (i.e., the population means are not all
The probability value was .001 (I got this off of my SPSS printout). Since .001 is less than .05, I reject the null hypothesis and accept the alternative. I conclude that at least two of the means are significantly different. The effect size indicator, eta-squared, was equal to .467 which say that almost 47 percent in the variance of starting salary was explained or accounted for by differences in college major. Now I need to find out which of the three means are different. In order to decide which of these three means are significantly different, I must follow the post hoc testing procedure explained in the next. Notice that is I had done an ANOVA with an independent variable that was composed of only two groups, I would not need follow-up tests (which are only needed when there are three or more groups). Post Hoc Tests in Analysis of Variance Here are the three average starting salaries for the three groups examined in the previous analysis of variance (i.e., these are the three sample means): Education: $29,500 Arts and Sciences: $32,300 Business: $36,714.29 The question in post hoc testing is "Which pairs of means are significantly different?" In this case that results in three post hoc tests that need to be conducted: 1. First, is the difference between education and arts and sciences significantly different" Here are the null and alternative hypotheses for this first post hoc test: Null Hypothesis Ho: E = A&S (i.e., the population mean for education majors equals the population mean for arts and sciences majors)
Alternative Hypothesis H1: E A&S (i.e., the population mean for education majors does not equal the population mean for arts and sciences majors) The Bonferroni "adjusted" p-value, which I got off the SPSS printout, was .233. Since .233 is > .05, I fail to reject the null that the population means for education and arts and sciences are equal. In short, this difference was not statistically significant.
Page 157 of 179
2. Second, is the difference between education and business significantly different? Here are the null and alternative hypotheses for this first post hoc test: Null Hypothesis Ho: E = B (i.e., the population mean for education majors equals the population mean for business majors) Alternative Hypothesis H1: E B (i.e., the population mean for education majors does not equal the population mean for business majors) The adjusted p-value was .001. Since .001 is < .05, I reject the null that the two population means are equal. I make the claim that the difference between the means is statistically significant. I also claim that the salaries are higher for business than for education students in the populations from which they were randomly selected. Because this finding could affect many students choices about majors and because it may also reflect the nature of salary setting by the private versus public sectors, I also conclude that this difference is practically significant. 3. Third, is the difference between arts and sciences and business significantly different? Here are the null and alternative hypotheses for this first post hoc test: Null Hypothesis Ho: B = A&S (i.e., the population mean for business majors equals the population mean for arts and sciences majors) Alternative Hypothesis H1: B A&S (i.e., the population mean for business majors does not equal the population mean for arts and sciences majors) The adjusted p-value was .031. Since .031 is < .05, I reject the null hypothesis that the two population means are significantly different. I make the claim that this difference between the means is statistically significant. I also claim that the salaries are higher form arts and sciences than for education students in the populations from which they were randomly selected. Because this finding could affect students choices about majoring in business versus arts and sciences, I believe that this finding is practically significant. In short, based on my post hoc tests, I have found that two of the differences in starting salary were statistically significant, and, in my view, these differences were also practically significant. The t-Test for Correlation Coefficients This test is used to determine whether an observed correlation coefficient is statistically significant. Here is an example using our recent college graduate data set: Research Question: Is there a statistically significant correlation between GPA (X) and starting salary (Y)? Here are the hypotheses:
Null Hypothesis.
H0: XY = 0 (i.e., there is no correlation in the population)

Page 158 of 179
H1: XY 0 (i.e., there is a correlation in the population) The observed correlation in the sample was .63. The probability value was .001. Since .001 is < .05, I reject the null hypothesis. The observed correlation was statistically significant. I conclude that GPA and starting salary are correlated in the population. If you square the correlation coefficient you obtain a variance accounted for effect size indicator: .63 squared is .397 which means that almost 40 percent of the variance in starting salary is explained or accounted for by GPA Because the effect size is large and because GPA is something that students can control through studying, I conclude that this statistically significant correlation is also practically significant. Alternative Hypothesis.
The t-Test for Regression Coefficients This test is used to determine whether a regression coefficient is statistically significant. The multiple regression equation analyzed in the last chapter is shown here again, but this time we will test each of the two regression coefficients for statistical significance. = 3,890.05 + 4,675.41 (X1) + 26.13(X2) where, is predicted starting salary 3,890.05 is the Y intercept (or predicted starting salary when GPA and GRE Verbal are zero) 4,675.41 is the regression coefficient for grade point average X1 is grade point average (GPA) X2 is GRE Verbal Research Question One: Is there a statistically significant relationship between starting salary (Y) and GPA (X1) controlling for GRE Verbal (X2)? That is, is the first regression coefficient statistically significant?

Here are the hypotheses: Null Hypothesis. H0: YX1.X2 = 0 (i.e., the population regression coefficient expressing the relationship between starting salary and GPA, controlling for GRE Verbal is equal to zero; that is, there is no relationship) Alternative Hypothesis. H1 : YX1.X2 0 (i.e., the population regression coefficient expressing the relationship between starting salary and GPA, controlling for GRE Verbal is NOT equal to zero; that is, there IS a relationship) The observed regression coefficient was 4,496.45. The probability value was .035
Page 159 of 179
Since .035 is < .05, I conclude that the relationship expressed by this regression coefficient is statistically significant. A good measure of effect size for regression coefficients is the semi-partial correlation squared (sr2) . In this case it is equal to .10, which means that 10% of the variance in starting salary is uniquely explained by GPA Because GPA is something we can control and because the effect is explains a good amount of variance in starting salary, I conclude that the relationship expressed by this regression coefficients is practically significant.
Research Question Two: Is there a statistically significant relationship between starting salary (Y) and GRE Verbal (X2), controlling for GPA (X1)? That is, is the second regression coefficient statistically significant?

Here are the hypotheses: Null Hypothesis. H0: YX2.X1 = 0 (i.e., the population regression coefficient expressing the relationship between starting salary and GRE Verbal, controlling for GPA is equal to zero; that is, there is no relationship) Alternative Hypothesis. H1 : YX2.X1 0 (i.e., the population regression coefficient expressing the relationship between starting salary and GRE Verbal, controlling for GPA is NOT equal to zero; that is, there IS a relationship) The observed regression coefficient was 26.13. The probability value was .014 Since .014 is < .05, I conclude that the relationship expressed by this regression coefficient is statistically significant. A good measure of effect size for regression coefficients is the semi-partial correlation squared (sr2) . In this case it is equal to .15, which means that 15% of the variance in starting salary is uniquely explained by GRE Verbal Because GRE Verbal is also something we can work at (as well as take preparation programs for) and because the effect is explains15% of the variance in starting salary, I conclude that the relationship expressed by this regression coefficient is practically significant.
The Chi-Square Test for Contingency Tables This test is used to determine whether a relationship observed in a contingency table is statistically significant. Research Question: Is the observed relationship between college major and gender statistically significant? The probability value was .046. Since .046 is < .05, I conclude that the observed relationship in the contingency table shown in Table 16.6 (p.492) is statistically significant. The effect size indicator used for this contingency table is Cramers V. It was equal to .496, which tells us that the relationship is moderately large.
Page 160 of 179
Because the effect size indicator suggested a moderately large relationship and because of the importance of these variables in real world politics, I would also conclude that this relationship is practically significant.
Believe it or not, we are done. My goal in this last section was to show that every single time we do one of these tests, you do the same thing. You get your probably value, compare it to your significance level, and, finally, you make a decision. You have now come a long way toward understanding the logic of significance testing. Remember, when reading journal articles look out for those probability values (to see if they are less than .05), and also look for effect sizes and statements about whether a finding is practically significant Congratulations!
Page 161 of 179
Chapter 17 Qualitative Data Analysis (Reminder: Dont forget to utilize the concept maps and study questions as you study this and the other chapters.) The purposes of this chapter are to help you to grasp the language and terminology of qualitative data analysis and to help you understand the process of qualitative data analysis. Interim Analysis Data analysis tends to be an ongoing and iterative (nonlinear) process in qualitative research. The term we use to describe this process is interim analysis (i.e., the cyclical process of collecting and analyzing data during a single research study). Interim analysis continues until the process or topic the researcher is interested in is understood (or until you run out of time and resources!). Memoing Throughout the entire process of qualitative data analysis it is a good idea to engage in memoing (i.e., recording reflective notes about what you are learning from your data). The idea is to write memos to yourself when you have ideas and insights and to include those memos as additional data to be analyzed. Data Entry and Storage Qualitative researchers usually transcribe their data; that is, they type the text (from interviews, observational notes, memos, etc.) into word processing documents. It is these transcriptions that are later analyzed, typically using one of the qualitative data analysis computer programs discussed later in this chapter. Coding and Developing Category Systems This is the next major stage of qualitative data analysis. It is here that you carefully read your transcribed data, line by line, and divide the data into meaningful analytical units (i.e., segmenting the data). When you locate meaningful segments, you code them. Coding is defined as marking the segments of data with symbols, descriptive words, or category names. Again, whenever you find a meaningful segment of text in a transcript, you assign a code or category name to signify that particular segment. You continue this process until you have segmented all of your data and have completed the initial coding. During coding, you must keep a master list (i.e., a list of all the codes that are developed and used in the research study). Then, the codes are reapplied to new segments of data each time an appropriate segment is encountered. To experience the process of coding, look at Table 17.2 and then try to segment and code the data. After you are finished, compare your results with the results shown in Table 17.3. These are shown here for your convenience.
Page 162 of 179
Don't be surprised if your results are different from mine. As you can see, qualitative research is very much an interpretative process!
Now look at how I coded the above data...
Page 163 of 179
Page 164 of 179
Qualitative research is more defensible when multiple coders are used and when high interand intra-coder reliability are obtained. Intercoder reliability refers to consistency among different coders. Intracoder reliability refers to consistency within a single coder. Inductive and a Priori Codes There are many different types of codes that are commonly used in qualitative data analysis. You may decide to use a set of already existing codes with your data. These are called a priori codes. A priori codes are codes that are developed before examining the current data. Many qualitative researchers like to develop the codes as they code the data. These codes are called inductive codes. Inductive codes are codes that are developed by the researcher by directly examining the data. Co-Occurring and Facesheet Codes As you code your data, you may find that the same segment of data gets coded with more than one code. That's fine, and it commonly occurs. These sets of codes are called co-occurring codes. Co-occurring codes are codes that partially or completely overlap. In other words, the same lines or segments of text may have more than one code attached to them. Oftentimes you may have an interest in the characteristics of the individuals you are studying. Therefore, you may use codes that apply to the overall protocol or transcript you are coding. For example, in looking at language development in children you might be interested in age or gender. These codes that apply to the entire document or case are called facesheet codes. After you finish the initial coding of your data, you will attempt to summarize and organize your data. You will also continue to refine and revise your codes. This next major step of summarizing your results includes such processes as enumeration and searching for relationships in the data. Enumeration Enumeration is the process of quantifying data, and yes, it is often done in "qualitative" research. For example, you might count the number of times a word appears in a document or you might count the number of times a code is applied to the data. Enumeration is very helpful in clarifying words that you will want to use in your report such as many, some, a few, almost all, and so on. The numbers will help clarify what you mean by frequency. When reading "numbers" in qualitative research, you should always check the basis of the numbers. For example, if one word occurs many times and the basis is the total number of words in all the text documents, then the reason could be that many people used the word or it could be that only one person used the word many times.
Page 165 of 179
Creating Hierarchical Category Systems Sometimes codes or categories can be organized into different levels or hierarchies. For example, the category of fruit has many types falling under it (e.g., oranges, grapefruit, kiwi, etc.). The idea is that some ideas or themes are more general than others, and thus the codes are related vertically. One interesting example (shown in Figure 17.2 on page 512) is Frontman and Kunkel's hierarchical classification showing the categorization of counselors' construal of success in the initial counseling session (i.e., what factors do counselors view as being related to success). Their classification system has four levels and many categories. Here is a part of their hierarchical category system:
Page 166 of 179
Showing Relationships Among Categories Qualitative researchers have a broad view of what constitutes a relationship. The hierarchical system just shown is one type of relationship (a hierarchy or strict inclusion type).
Page 167 of 179
Several other possible types of relationships that you should be on the lookout for are shown in Table 17.6 (p. 514) and shown below for your convenience.
For practice, see if you can think of an example of each of Spradley's types of relationships. Also, see if you can think of some types of relationships that Spradley did not mention.
In Figure 17.3 you can see a typology, developed by Patton, of teacher roles in dealing with high school dropouts.
Typologies (also called taxonomies) are an example of Spradley's "strict inclusion" type of relationship.
Page 168 of 179
Patton's example is interesting because it demonstrates a strategy that you can use to relate separate dimensions found in your data. Patton first developed two separate dimensions or continuums or typologies in his data: (1) teachers' beliefs about how much responsibility they should take and (2) teachers' views about effective intervention strategies. Then Patton used the strategy of crossing two one-dimensional typologies to form a two dimensional matrix, resulting in a new typology that relates the two dimensions. As you can see, Patton provided very descriptive labels of the nine roles shown in the matrix (e.g., "Ostrich," "Counselor/friend," "Complainer"). In Table 17.7 (p.517 and here for your convenience), you can see another set of categories developed from a developmental psychology qualitative research study. These categories are ordered by time and show the characteristics (subcategories) that are associated with five stages of development in old age that were identified in this study. This is an example of Spradley's "sequence" type of relationship. Here is Table 17.7:
Page 169 of 179
In the next section of the chapter, we discuss another tool for organizing and summarizing your qualitative research data. In particular, it was about the process of diagramming.
Drawing Diagrams Diagramming is the process of making a sketch, drawing, or outline to show how something works or clarify the relationship between the parts of a whole. The use of diagrams are especially helpful for visually oriented learners. There are many types of diagrams that can be used in qualitative research. For some examples, look again at Figure 17.2, on page 512 and Figure 17.3, on page 516. One type of diagram used in qualitative research that is similar to the diagrams used in causal modeling (e.g., Figure 11.5 on page 352) is called a network diagram. A network diagram is a diagram showing the direct links between categories, variables, or events over time. An example of a network diagram based on qualitative research is shown in Figure 17.4 and below for your convenience.
It is also helpful to develop matrices to depict your data. A matrix is a rectangular array formed into rows and columns. Pattons typology of teacher roles shown above is an example of a matrix. You can see examples of many different types of matrices (classifications usually based on two or more dimensions) and diagrams in Miles and Huberman's (1994) helpful book titled "Qualitative Data Analysis: An Expanded Sourcebook." Developing a matrix is an excellent way to both find and show a relationship in your qualitative data.
Page 170 of 179
As you can see, there are many interesting kinds of relationships to look for in qualitative research and there are many different ways to find, depict, and present the results in your qualitative research report. (More information about writing the qualitative report is given in the next chapter.) Corroborating and Validating Results As shown in the depiction of data analysis in qualitative research in Figure 17.1, corroborating and validating the results is an essential component of data analysis and the qualitative research process. Corroborating and validating should be done throughout the qualitative data collection, analysis, and write-up process. This is essential because you want to present trustworthy results to your readers. Otherwise, there is no reason to conduct a research study. Many strategies are provided in Chapter 8, especially in Table 8.2 which is reproduced here for your convenience.
Page 171 of 179
Computer Programs for Qualitative Data Analysis In this final section of the chapter, we discuss the use of computer programs in qualitative data analysis. Traditionally, qualitative data were analyzed "by hand" using some form of filing system. The availability of computer packages (that are specifically designed for qualitative data and analysis) has significantly reduced the need for the traditional filing technique.
Page 172 of 179
The most popular qualitative data analysis packages, currently, are NUDIST, ATLAS, and Ethnograph.
Here is a table not included in your book that provides the links to the major qualitative software programs. Most of these companies will provide you, free of charge, with demonstration copies of these packages. Bonus Table: Websites for Qualitative Data Analysis Programs Program name AnSWR (freeware) ATLAS Ethnograph Website address http://www.cdc.gov/hiv/software/answr.htm http://atlasti.de/ http://qualisresearch.com
HyperResearch http://researchware.com Nvivo NUD-IST http://www.qsrinternational.com http://www.qsrinternational.com
Qualitative data analysis programs can facilitate most of the techniques we have discussed in this chapter (e.g., storing and coding, creating classification systems, enumeration, attaching memos, finding relationships, and producing graphics). One highly useful tool available in computer packages is Boolean operators which can be used in performing complex searches that would be very time consuming if done manually. Boolean operators are words that are used to create logical combinations such as AND, OR, NOT, IF, THEN, and EXCEPT. For example, you can search for the co-occurrence of codes which is one way to begin identifying relationships among your codes.
I concluded the chapter by listing several advantages and disadvantages of computer packages for qualitative data analysis. You now know the basics of qualitative data analysis!
Page 173 of 179
Chapter 18 Writing the Research Report (Reminder: Dont forget to utilize the concept maps and study questions as you study this and the other chapters.) The purpose of this final chapter is to provide useful advice on how to organize and write a research paper that has the potential for publication. There are four main sections in this chapter: 1. General Principles Related to Writing the Research Report. 2. Writing Quantitative Research Reports Using the APA Style. 3. Writing Qualitative Research Reports. 4. Writing Mixed Research Reports.
General Principles Related to Writing the Research Report We begin this section with some general writing tips and by listing some sources on writing. Simple, clear, and direct communication should be your most important goal when you write a research report. Language The following three guidelines will help you select appropriate language in your report: 1. Choose accurate and clear words that are free from bias. One way to do this is to be very specific rather than less specific. 2. Avoid labeling people whenever possible. 3. Write about your research participants in a way that acknowledges their participation. For example, avoid the impersonal term "subject" or subjects; words such as research participants or children or adults are preferable. Keeping in mind the above guidelines, you should give special attention to the following issues which are explained more fully in our chapter and, especially, in the APA Publication Manual: Gender. The bottom line is to avoid sexist language. Sexual Orientation. Terms such as homosexual should be replaced with terms such as lesbians, gay men, and bisexual women or men. Specific instances of sexual behavior should be referred to with terms such as same gender, male-male, female-female, and male-female. Racial and Ethnic Identity. Ask participants about their preferred designations and use them. When writing this term, capitalize it (e.g., African American). Disabilities. Do not to equate people with their disability. For example, refer to a participant as a person who has cancer rather than as a cancer victim.
Page 174 of 179
Age. Acceptable terms are boy and girl, young man and young woman, male adolescent and female adolescent. Older person is preferred to elderly. Call people eighteen and older men and women.
Editorial Style Italics.
As a general rule, use italics infrequently. If you are submitting a paper for publication, you can now use italics directly rather than using underlines to signal what is to be italicized.
Abbreviations Use abbreviations sparingly, and try to use conventional abbreviations (such as IQ, e.g., c.f., i.e., etc.). Headings The APA Manual and our chapter specifies five different levels of headings and the combinations in which they are to be used in your report.
If you are using two levels of headings, center the first level and use upper- and lowercase letters (i.e., do not use all caps), and place the second heading on the left side in upper- and lowercase letters and in italics. Here is an example:
Method Procedure
If you are using three levels of headings, do the first two levels as just shown for two levels. The third level should be in upper- and lowercase letters, italicized, indented, and ending with a period. Here is an example of how to use three levels of headings:
Method Procedure Instruments. (Start the text on this same line) Quotations Quotations of fewer than 40 words should be inserted into the text and enclosed in double quotation marks. Quotations of 40 or more words should be displayed in a free standing block of lines without quotation marks. The author, year, and specific page from which the quote is taken should always be included. Numbers Use words for numbers that begin a sentence and for numbers that are below ten. See the APA Publication Manual for exceptions to this rule. Physical Measurements
Page 175 of 179
APA recommends using metric units for all physical measurements. You can also use other units, as long as you include the metric equivalent in parentheses.
Presentation of Statistical Results Provide enough information to allow the reader to corroborate the results. See your book and the APA manual for specifics (e.g., an analysis of variance significance test of four group means would be presented like this: F(3, 32) = 8.79, p .03). Note that the use of an equal sign is preferred when reporting probability values. If a probability value is less than .001, then use p < .001 rather than p = .000 Reference Citations in the Text APA format is an author-date citation method. The text shows the specifics. Here is one example: "Smith (1999) found that . . ." Frequently you will put references at the end of sentences. Here is an example, Mastery motivation has been found to affect achievement with very young children (Turner & Johnson, 2003). Reference List All citations in the text must appear in the reference list. See page 456 of the text or the APA Manual for the specific format to follow. Here are two examples: American Psychological Association. (1994). Publication manual of the American Psychological Association (4th ed.). Washington, DC: Author. Turner, L.A., & Johnson, R.B. (2003). A model of mastery motivation for at-risk preschoolers. Journal of Educational Psychology, 95(3), 495-505. Typing Double space all material. Use 1-inch margins. Use only one space between the end of a sentence and the beginning of the next sentence. Writing Quantitative Research Reports Using the APA Style There are seven major parts to the research report: 1. Title page 2. Abstract 3. Introduction 4. Method 5. Results 6. Discussion 7. References I will make a few brief comments on each of these below.
Page 176 of 179
Discussion of author notes, footnotes, tables, figure captions, and figures is only in the textbook (and, of course, in the APA Publication Manual).
1. Title Page Your paper title should summarize the main topic of the paper in about 10 to 12 words. 2. Abstract This should be a comprehensive summary which is about 120 words. For a manuscript submitted for review, it is typed on a separate page. 3. Introduction This section is not labeled. It should present the research problem and place it in the context of other research literature in the area. 4. Method This section does not start on a separate page in a manuscript being submitted for review. The most common subsections are Participants (e.g., list the number of participants, their characteristics, and how they were selected), Apparatus or Materials or Instruments (e.g., list materials used and how they can be obtained), and Procedure (e.g., provide a step-by-step account of what the researcher and participants did during the study so that someone could replicate it). 5. Results This does not start on a separate page in your manuscript. It is where you report on the results of your data analysis and statistical significance testing. Be sure to report the significance level that you are using (e.g., "An alpha level of .05 was used in this study") and report your observed effect sizes along with the tests of statistical significance. Tables and figures are expensive but can be used when they effectively illustrate your ideas. 6. Discussion This is where you interpret and evaluate your results presented in the previous section. Be sure to state whether your hypotheses were supported. Also, answer the following questions: 1. What does the study contribute? 2. How has it helped solve the study problem? 3. What conclusion and theoretical implications can be drawn from the study? 4. What are the limitations of the study? 5. What are some suggestions for future research in this area? 7. References Center the word References at the top of the page and double-space all entries.
Page 177 of 179
Writing Qualitative Research Reports We recommend that qualitative researchers also follow the guidelines given above when writing manuscripts for publication. We recommend that qualitative researchers use the same seven major parts that were discussed for the quantitative research report. Title Page and Abstract. The goals are exactly the same as before. You should provide a clear and descriptive title. The abstract should describe the key focus of the study, its key methodological features, and the most important findings. Introduction. Clearly explain the purpose of your study and situate it in any research literature that is relevant to your study. In qualitative research, research questions will typically be stated in more open-ended and general forms such as the researcher hopes to "discover," "explore a process," "explain or understand," or "describe the experiences." Method. It is important that qualitative researchers always include this section in their reports. This section includes information telling how the study was done, where it was done, with whom it was done, why the study was designed as it was, how the data were collected and analyzed, and what procedures were carried out to ensure the validity of the arguments and conclusions made in the report. Results. The overriding concern when writing the results section is to provide sufficient and convincing evidence. Remember that assertions must be backed up with empirical data. The bottom line is this: It's about evidence. -You will need to find an appropriate balance between description and interpretation in order to write a useful and convincing results section. -Several specific strategies are discussed in the chapter (e.g., providing quotes, following interpretative statements with examples, etc.). -We state that regardless of the specific format of your results section, you must always provide data (i.e., descriptions, quotes, data from multiple sources, and so forth) that back up your assertions. -Effective ways to organize the results section are organizing the content around the research questions, a typology created in the study, the key themes, or around a conceptual scheme used in the study. -It can also be very helpful to use diagrams, matrices, tables, figures, etc. to help communicate your ideas in a qualitative research report. Discussion. You should state your overall conclusions and offer additional interpretations in this section of the report. Even if your research is exploratory, it is important to fit your findings back into the relevant research literature. You may also make suggestions for future research here.
Writing Mixed Research Reports First, know your audience and write in a manner that clearly communicates.
Page 178 of 179
The suggestions already discussed in this chapter for quantitative and qualitative also apply for mixed research. In general, try to use the same seven headings discussed above. Here are a few organization options: 1. Organize the introduction, method, and results by research question. 2. Organize some sections (e.g., method and results) by research paradigm (quantitative and qualitative). 3. Write essentially two separate subreports (one for the qualitative part and one for the quantitative part). 4. NOTE: in all cases, if you are writing a mixed research report, mixing must take place somewhere (e.g., at a minimum the findings must be related and mixed in the discussion section).
Page 179 of 179

Research

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Research

Uploaded by

Copyright:

Available Formats

CONTENTS

Here is Table 2.1 for your convenience and review.

Quantitative Research Methods: Experimental and Nonexperimental Research

Here is an example of an experiment.

Ideas that Can't Be Researched Empirically

Statement of the Research Problem

There are four primary ways to measure reliability. 1. 2.

3) Educational Assessment Tests.

There are four subtypes of educational assessment tests:

Question wording can be changed by the interviewer if it is deemed appropriate.

Control of Confounding Variables

/------------------------------------/------------------------------------/ Weak Quasi Strong Designs Designs Designs

Here is the depiction of the design:

Here is an example where a difference or discontinuity is easily seen:

Here is a depiction of the A-B-A design:

Page 100 of 179

Page 101 of 179

Page 103 of 179

Page 104 of 179

Page 106 of 179

Page 108 of 179

Page 109 of 179

Page 110 of 179

Page 112 of 179

Page 113 of 179

Page 114 of 179

Page 118 of 179

2. Data collection or literature review. 3. Evaluation of materials.

Page 121 of 179

Page 122 of 179

Page 123 of 179

Page 126 of 179

Page 128 of 179

Chapter 15 Descriptive Statistics

Page 132 of 179

Page 134 of 179

Page 136 of 179

Page 139 of 179

Page 140 of 179

= 9,234.56 + 7,638.85 (X)

Page 142 of 179

Page 143 of 179

Page 146 of 179

Page 147 of 179

Page 148 of 179

Page 149 of 179

Page 151 of 179

Page 154 of 179

Page 155 of 179

H0: XY = 0 (i.e., there is no correlation in the population)

Page 161 of 179

Now look at how I coded the above data...

Page 163 of 179

Page 164 of 179

Page 166 of 179

Page 169 of 179

Page 171 of 179

HyperResearch http://researchware.com Nvivo NUD-IST http://www.qsrinternational.com http://www.qsrinternational.com

Page 173 of 179

Editorial Style Italics.

Page 179 of 179

You might also like