Validity determines whether a research study measures what it intends to measure. There are two types of validity: internal validity, which concerns the causal conclusions of a study, and external validity, which concerns the generalizability of the results. Threats to internal validity include history effects, maturation effects, and issues with instrumentation that could provide alternative explanations for any observed changes in the dependent variable other than the independent variable. Maintaining validity is crucial for drawing accurate conclusions from research.
Validity determines whether a research study measures what it intends to measure. There are two types of validity: internal validity, which concerns the causal conclusions of a study, and external validity, which concerns the generalizability of the results. Threats to internal validity include history effects, maturation effects, and issues with instrumentation that could provide alternative explanations for any observed changes in the dependent variable other than the independent variable. Maintaining validity is crucial for drawing accurate conclusions from research.
Validity determines whether a research study measures what it intends to measure. There are two types of validity: internal validity, which concerns the causal conclusions of a study, and external validity, which concerns the generalizability of the results. Threats to internal validity include history effects, maturation effects, and issues with instrumentation that could provide alternative explanations for any observed changes in the dependent variable other than the independent variable. Maintaining validity is crucial for drawing accurate conclusions from research.
measures that which it was intended to measure or how truthful the research results are. ◦ In other words, does the research instrument allow you to hit "the bull’s eye" of your research object? ◦ Researchers generally determine validity by asking a series of questions, and will often look for the answers in the research of others. What is External Validity?
In 1966, Campbell and Stanley proposed the
commonly accepted definition of external validity. ◦ “External validity asks the question of generalizability: To what populations, settings, treatment variables and measurement variables can this effect be generalized?”
External Validity is usually split into two distinct
types, population validity and ecological validity, and they are both essential elements in judging the strength of an experimental design. Population validity is a type of external validity which describes how well the sample used can be extrapolated to a population as a whole. It evaluates whether the sample population represents the entire population, and also whether the sampling method is acceptable. For example, an educational study that looked at a single school could not be generalised to cover children at every school. On the other hand, a federally appointed study, that tested every pupil of a certain age group, will have exceptionally strong population validity. Ecological validity is a type of external validity which looks at the testing environment and determines how much it influences behavior. In the school test example, if the pupils are used to regular testing, then the ecological validity is high because the testing process is unlikely to affect behavior. On the other hand, taking each child out of class and testing them individually, in an isolated room, will dramatically lower ecological validity. The child may be nervous, ill at ease and is unlikely to perform in the same way as they would in a classroom. What is Internal Validity? The easy way to describe internal validity is the confidence that we can place in the cause and effect relationship in a study. The key question that you should ask in any experiment is: ◦ “Could there be an alternative cause, or causes, that explain my observations and results?” Looking at some extreme examples, a physics experiment into the effect of heat on the conductivity of a metal has a high internal validity. Internal Validity is a measure which ensures that a researcher's experiment design closely follows the principle of cause and effect. “Could there be an alternative cause, or causes, that explain my observations and results?” Test Validity is an indicator of how much meaning can be placed upon a set of test results. Degree to which a test procedure accurately measures what it was designed to measure.
Criterion Validity assesses whether a test reflects a
certain set of abilities. Validity of a selection process test assessed by comparing the test scores with a non- test criterion. For example, test for leadership skills will match the test scores with the traits displayed by known leaders. Concurrent validity is a measure of how well a particular test correlates with a previously validated measure. It is commonly used in social science, psychology and education. The tests are for the same, or very closely related, constructs and allow a researcher to validate new methods against a tried and tested stalwart. For example, IQ, Emotional Quotient, and most school grading systems are good examples of established tests that are regarded as having a high validity. One common way of looking at concurrent validity is as measuring a new test or procedure against a gold- standard benchmark. Predictive validity is a measure of how well a test predicts abilities. It involves testing a group of subjects for a certain construct and then comparing them with results obtained at some point in the future. Most educational and employment tests are used to predict future performance, so predictive validity is regarded as essential in these fields. Content Validity is the estimate of how much a measure represents every single element of a construct. For example, an educational test with strong content validity will represent the subjects actually taught to students, rather than asking unrelated questions. Face Validity is a measure of how representative a research project is ‘at face value,' and whether it appears to be a good project. In other words, a test can be said to have face
validity if it "looks like" it is going to measure
what it is supposed to measure. For instance, if you prepare a test to measure
whether students can perform multiplication,
and the people you show it to all agree that it looks like a good test of multiplication ability, you have shown the face validity of your test. What is Reliability? The idea behind reliability is that any
significant results must be more than a one-
off finding and be inherently repeatable. Other researchers must be able to perform
exactly the same experiment, under the same
conditions and generate the same results. This will reinforce the findings and ensure
that the wider scientific community will
accept the hypothesis. Reliability is an essential component of validity but, on its own, is not a sufficient measure of validity. A test can be reliable but not valid, whereas a test cannot be valid yet unreliable. Reliability, in simple terms, describes the
repeatability and consistency of a test.
Validity defines the strength of the final
results and whether they can be regarded as
accurately describing the real world. Threats to Internal & External Validity
Is the investigator’s conclusion correct?
Are the changes in the independent variable
indeed responsible for the observed variation in the dependent variable?
Might the variation in the dependent variable
be attributable to other causes? Why is Internal Validity Important? We often conduct research in order to determine cause-and-effect relationships. Can we conclude that changes in the independent variable caused the observed changes in the dependent variable? Is the evidence for such a conclusion good or poor? If a study shows a high degree of internal validity then we can conclude we have strong evidence of causality. If a study has low internal validity, then we must conclude we have little or no evidence of causality. History Events outside of the study/experiment or
between repeated measures of the dependent
variable may affect participants' responses to experimental procedures. Often, these are large scale events (natural
disaster, political change, etc.) that affect
participants' attitudes and behaviors such that it becomes impossible to determine whether any change on the dependent measures is due to the independent variable, or the historical event. For example, consider a researcher testing whether a particular chemical increases anxiety. The researcher measures New York City residents' anxiety on September 5, 2001, gives them the drug, and measures them again 20 days later. Scores are likely to have increased because of the terrorist attacks on September 11th – an external event. In addition to well-publicized national events, history effects can include subtle factors such as changes in the weather (for example, improving mood because people are outside more) or changes in public policy (for example, increasing stress because of changes to bankruptcy laws). Maturation When subjects change as a function of time instead of the independent variable (such as growing older, becoming more experienced, experiencing physiological changes,etc) Example: Assessing a new teaching technique in children over a school year may be very difficult because of the changes in children that naturally occur in development during a year. If the participants became more conscious of health issues as they aged during the study, and therefore began to reduce smoking. The performance of first graders in a learning experiment begins decreasing after 45 minutes because of fatigue. Subjects change during the course of the experiment or even between measurements. For example, young children might mature and their ability to concentrate may change as they grow up. Both permanent changes, such as physical growth and temporary ones like fatigue, provide "natural" alternative explanations; thus, they may change the way a subject would react to the independent variable. So upon completion of the study, the researcher may not be able to determine if the cause of the discrepancy is due to time or the independent variable. Instrumentation The instruments used for measuring the behavior may change over time (especially if they’re human), or the way in which the participants were measured may change (instrument decay occurs when the standards of measurement change over time). Did any change occur during the study in the way the dependent variable was measured? Examples: Two examiners for an instructional experiment administered the post-test with different instructions and procedures. Volunteer problem Participants that volunteer to be a part of the study (are not paid in any way) tend to be “different” from those that don’t volunteer (Bell, 1962) Volunteers tend to be • More unconventional • More self-confident • More extraverted • Higher in need for achievement
Avoid this by “paying” subjects in some
form Confounding A major threat to the validity of causal inferences is confounding: Changes in the dependent variable may rather be attributed to the existence or variations in the degree of a third variable which is related to the manipulated variable. Repeated testing (also referred to as testing effects) Repeatedly measuring the participants may lead to bias. Participants may remember the correct answers or may be conditioned to know that they are being tested. Repeatedly taking (the same or similar) intelligence tests usually leads to score gains, but instead of concluding that the underlying skills have changed for good, this threat to Internal Validity provides good rival hypotheses. Selection bias Refers to selecting participants for the various groups in the study. Are the groups equivalent at the beginning of the study? It refers to the problem that, at pre-test, differences between groups exist that may interact with the independent variable and thus be 'responsible' for the observed outcome. Researchers and participants bring to the experiment a myriad of characteristics, some learned and others inherent. For example, sex, weight, hair, eye, and skin color, personality, mental capabilities, and physical abilities, but also attitudes like motivation or willingness to participate. Mortality/differential attrition
This error occurs if inferences are made on the basis
of only those participants that have participated from the start to the end. However, participants may have dropped out of the study before completion, and may be even due to the study or programme or experiment itself. In a health experiment designed to determine the effect of various exercises, those subjects who find the exercise most difficult stop participating. Experimenter bias
Experimenter bias occurs when the individuals
who are conducting an experiment inadvertently affect the outcome by non- consciously behaving differently to members of control and experimental groups.
It is possible to eliminate the possibility of
experimenter bias through the use of double blind study designs, in which the experimenter is not aware of the condition to which a participant belongs. Contamination: This occurs when there is communication about the experiment between groups of participants. Three possible outcomes of contamination: • resentment: some participants’ performance may worsen because they resent being in a less desirable condition; • rivalry: participants in a less desirable condition may boost their performance so they don’t look bad; and • diffusion of treatments: control participants learn about a treatment and apply it to themselves. Threats to external validity A threat to external validity is an explanation of how you might be wrong in making a generalization. Generally, generalizability is limited when the
cause (i.e. the independent variable) depends
on other factors; therefore, all threats to external validity interact with the independent variable. Aptitude–treatment Interaction: The sample may have certain features that may interact with the independent variable, limiting generalizability. For example, inferences based on
comparative psychotherapy studies often
employ specific samples (e.g. volunteers, highly depressed). If psychotherapy is found effective for these
sample patients, will it also be effective for
non-volunteers or the mildly depressed or patients with concurrent other disorders? Situation: All situational specifics (e.g. treatment conditions, time, location, lighting, noise, treatment administration, investigator, timing, scope and extent of measurement, etc.) of a study potentially limit generalizability. Pre-test effects: If cause-effect relationships can only be found when pre-tests are carried out, then this also limits the generality of the findings. Post-test effects: If cause-effect relationships can only be found when post-tests are carried out, then this also limits the generality of the findings.