You are on page 1of 6

CHAPTER 4:

CRITERIA FOR INFERRING EFFECTIVENESS:


HOW DO WE KNOW WHAT WORKS?

Important Terminology – Learning the language of research


Hypothesis: Tentative statement about the relationship between the independent variable (IV) and the dependent variable
(DV)
Variable: An element, feature or factor that can be changed, controlled or measured in an experiment.
Independent (predictor) Variable: The variable(s) that we think causes or is associated with some change in the DV
(For example amount of time studied)
Dependent (outcome) Variable: The variable that we think is being acted upon (for example the grade that you get on
the midterm)
Levels of Measurement for Variables. Four levels of measurement have been identified.
✓ Understanding the level of measurement of variables used in research is important because the level of
measurement determines the types of statistical analyses that can be conducted.
✓ The conclusions that can be drawn from research depend on the statistical analysis used.
LEVELS DESCRIPTION
1. NOMINAL Nominal level measurement uses symbols to classify observations into mutually exclusive
and exhaustive categories: no observations falls into more than one category; sufficient
categories must exist so that all the observations fall into some category.
This is the most basic level of measurement.
✓ At this level we can determine only whether two observations are alike or different.
Example: In a survey of teachers, sex (biological and physiological characteristics) was
determined by a question. Observations were sorted into two mutually exclusive and
exhaustive categories, male and female – dichotomous variable. Observations could be
labeled with the letters M and F, or the numerals 0 and 1
(*Note: it is also possible to include another category, intersex. cfr APA definition of
Terms https://www.apa.org/pi/lgbt/resources/sexuality-definitions.pdf)
✓ In the same survey the variable of marital status could be measured by two categories
(married and unmarried). But, these categories must each be defined so that all possible
observations will fit into one category but no more than one: legally married, common-
law marriage, religious marriage, civil marriage, living together, never married, divorced,
informally separated, legally separated, widowed, annulled, abandoned, etc)
2. ORDINAL Ordinal level of measurement classifies observations into categories that are mutually
exclusive and exhaustive. In addition, the categories can be ordered/ranked.
Example: job satisfaction can be ranked in very satisfied, satisfied, neutral, dissatisfied, or very
dissatisfied- First, second, third place; private, sergeant, colonel;
3. INTERVAL Interval level of measurement classifies observations into mutually exclusive and exhaustive
categories that can be ordered or rank according to equal increments. No meaningful zero
point exists
Ex. degrees Fahrenheit; standard scores on cognitive & affective scales
4. RATIO The ratio level is the same as the interval with the difference that 0 is meaningful and no-
arbitrary- it truly indicates the absence of what is measured
Example: area, speed, degrees kelvin, number of schools attended, amount of money in your
wallet/purse
FOUR KEY QUESTIONS FOR CRITICALLY APPRAISING EFFECTIVENESS STUDIES
*these are very important for our work this semester*
Four elements to address when critically appraising a study that examines effectiveness of intervention, program or policy
are
1. Internal validity: Was the intervention, program, or policy the most plausible cause of the observed
outcome?
2. Measurement validity and bias: Was the outcome measured in a valid and unbiased manner?
3. Statistical conclusion validity: What is the probability that the apparent effectiveness, or lack thereof,
can be attributed to statistical chance?
4. External validity: Do the study participants, intervention procedures, and results seem applicable to
your practice context?
INTERNAL VALIDITY An effectiveness study has internal validity to the extent that design arrangements enable us to
determine whether the observed outcome is due to the intervention, not something else.
CAUSAL CRITERIA THREE CRITERIA ARE NEEDED TO ESTABLISH THAT AN INTERVENTION IS
REALLY THE CAUSE OF OBSERVED OUTCOMES:
1. Time order: The intervention must precede or coincide with the change in the outcome being
measured.
2. Correlation: Changes in the outcome measure must be associated with changes in the intervention
condition.
3. Plausible alternative explanations for the correlation must be ruled out.

Common Threats to Internal Validity


1. History: The possibility that other events may have coincided with the provision of the intervention and may be
the real cause of the study outcome. Example: Following the Katrina disaster, victims were provided with eye
movement and desensitization and reprocessing (EMDR), as well as other services. Any reduction in trauma
symptoms may be due to the EMDR or the other services.
2. Passage of Time: The possibility that the mere passage of time may be the real cause of observed client
improvement after the onset of treatment. Example: Many survivors of natural disasters, such as Katrina, would
have experienced a decrease in trauma symptoms over time without any treatment.
3. Maturation: The possibility that developmental growth or change may be the real cause of the study outcome.
Example: If adolescents are provided with a long-term drug abuse intervention and followed into adulthood, many
of them will quit using drugs simply due to increasing maturity.
4. Statistical Regression to the Mean: The possibility that extreme scores at pretest that are not reflective of typical
functioning are the real cause of the study outcome. Example: Some individuals with unusually high scores on an
anxiety measure may simply be having an atypically bad day. If they are treated with an intervention, any
reduction in anxiety may simply be due to a return to more typical levels of anxiety.
5. Selectivity bias: The possibility that differences in outcomes between two groups being compared may be
because the two groups were not really equivalent (comparable) to begin with. Example: If a group of people
receives an intervention because they volunteered to receive it, and their outcomes are compared to those who
declined treatment, differences in outcome may be due to preexisting factors such as motivation.
6. Random Assignment: Randomly assigning study participants to treatment conditions is the best way to create
comparable study groups and avoid selectivity bias. Random assignment also controls for other threats to internal
validity including statistical regression, history, maturation, and passage of time.
If random assignment is used, with a sufficiently large number of participants:
✓ Contemporaneous events are unlikely to influence one group differently than the other.
✓ Each group experiences the same passage of time.
✓ Each group will experience similar developmental changes.
✓ Therefore, differences between the groups are more plausibly attributed to the treatment.
Random assignment works better with larger numbers of participants. If there are relatively few participants in a study,
it’s plausible that even randomly assigned groups will not be comparable. Researchers should provide data demonstrating
that the random assignment “worked” and that the groups truly are comparable. Approaches to random assignment
include: Tossing a coin, Random numbers tables.
MEASUREMENT ISSUES (Measurement validity and bias)
Internal validity is not the only problem that can threaten the quality of a study. Measurement procedures may be
unreliable, invalid, or biased. If results of a study fail to support the effectiveness of an intervention, it may be that the
outcome measure was unreliable or invalid.
1. Reliability. A measure’s consistency. Example: A clinical instrument for depression indicates the same level of
depression in the same client from one day to the next.
2. Validity: A measure’s ability to accurately capture what it intends to measure. Example: An assessment tool
designed to measure anger management accurately assesses these skills as opposed to attitudes or knowledge.
3. Bias. Occurs when something influences the research data to incorrectly depict a phenomenon as being better or
worse than it really is. Typically more of a concern when studies indicate treatment success. Example: A
researcher tells participants the purpose of a study and his or her hope to prove the treatment is a success before
measuring the outcome.
4. Blinding: An approach to reducing the threat of measurement bias whereby those who are administering study
measures do not know the treatment status of the people they are assessing. The people who are collecting the
measures are called blind raters, and their ratings are called blind ratings.
Keep in mind:
✓ Some studies do not provide sufficient detail in their descriptions for an accurate assessment of their approach to
reducing study bias.
✓ Many studies have measurement flaws, but not all measurement flaws are fatal.

STATISTICAL CHANCE (Statistical conclusion validity)


Study results may be attributed to statistical chance. Most studies rule out chance (or sampling error) as a plausible
explanation for the observed outcome if the probability is no more than .05 (5 out of 100). When the probability of chance
as an explanation for study findings is that low, the study outcome is called statistically significant. Regardless of the
statistical test used in a study, it will ultimately report a probability (p) value. This refers to the probability that the
results are due to chance. The smaller the p value, the less likely the results were due to chance. This usually appears
in the form of: p < .05 or p > .05. (in some works, you will see p values <.01 or <.001, which suggest an even smaller
probability that the event occurs by chance).
The smaller the sample size, the more difficult it is to achieve a statistically significant result. If a study fails to get
statistically significant results, it may be due to small sample size. In this case, the intervention should not be dismissed as
ineffective. Studies with very large sample sizes can achieve statistical significance even when the results are trivial. It’s
easier to find a “significant difference” if your sample is large even if there really is no effect (Type I error), and harder to
find a “significant difference” even where there really is one if your sample is small (Type II error).

RULE OF THUMB:
Studies with far fewer than 80 total participants (or 40 in each treatment group) are “small”. Very large samples can
involve hundreds or even thousands of participants. Statistical significance refers only to the plausibility of chance as an
explanation for study results, and it does not address problems like measurement bias.

EXTERNAL VALIDITY. External validity is the extent to which results apply to settings other than the research
setting and clients other than the study participants. For evidence-based practitioners, the key question is whether the
study participants and practice setting reflect the client and setting of concern to the practitioner. Example: If you are
treating elderly Asian immigrant clients residing in a nursing home, to what extent do the studies you are reviewing apply
to this group of clients and this treatment setting?
Summary:
INTERNAL VALIDITY Internal validity refers to the confidence we have that the
Was the intervention, program, or policy results of a study accurately depict whether one variable is or
the most plausible cause of the observed is not a cause of another
outcome?
Threats to Internal Validity
There are three criteria for 1. HISTORY (or contemporaneous events): possibility
establishing causation: that other events may have coincided with the
1. TIME ORDER: the cause must provision of the intervention and may be the real
precede the effect causes of the observed outcome
2. CORRELATION: the two variables 2. PASSAGE OF TIME OR MATURATION of
in a causal relationship must be participants (*maturation especially for young adults)
empirically correlated with one 3. STATISTICAL REGRESSION TO THE MEAN.
another (a change in one variable is Your average score would be close to the scores you
accompanied by a change in the other get on your typical days, and your elevated or low
variable) scores would regress to the mean – return to their
3.LACK OF PLAUSIBLE more typical level- on most days.
ALTERNATIVE EXPLANATION: 4. SELECTIVITY BIAS: if your study is comparing
the empirical correlation between two two groups is important to understand/examine
variables cannot be explained away as whether the two groups are really comparable.
being due to the influence of some Example: check for proportion of minorities, level
third variable that relates to both of of functioning, economic or social support
them resources, and severity of the disorder. Random
assignment is a process that can contribute to reduce
selectivity bias by ensuring that each participant has
the same chance of being assigned to each of the
treatment condition
MEASUREMENT VALIDITY
Was the outcome measured in a valid Reliability and Validity of the outcome measured
and unbiased manner?
STATISTICAL CONCLUSION *influence of SAMPLE SIZE on the chances of getting
VALIDITY significant results: increasing the sample size the probability
What is the probability that the apparent decreases that meaningful differences in outcomes between
effectiveness, or lack thereof, can be group could have been produced just by chance.
attributed to statistical chance? Two practical implications: if a study failed to reach
significance but the results seem potentially meaningful, the
intervention should not be dismissed as necessarily
ineffective.
Second, if the sample is very large, results that can be trivial
from a practical standpoint can achieve statistical significance
**rule of thumb for sample size: below 80 is small, less than
40 per group
EXTERNAL VALIDITY Applicability of the inferences about the effectiveness of the
Do the study participants, intervention intervention to settings other than the research setting and to
procedures, and results seem applicable clients other than the study participants
to your practice context?
WHAT IS INCLUDED IN A MANUSCRIPT?
Overview of the three main Content Areas of the Quantitative Research Article
(You can refer to the Holosko Chps.5-8 for review)

Usually in a quantitative research article you will find:


1. Abstract
2. Introduction
3. Method
4. Results
5. Discussion
6. References
*a Conclusions paragraph is not always required by the journal
We will now follow the classification offered by Holosko so that you can review the contents in the book.
Introduction - Review of Literature and
Purpose

Method - Sample, Procedures, Measures

Results

Discussion, Implications

1. The INTRODUCTION is usually divided in two subsections: the Review of the Literature and the Purpose of the
study. This order can be however inverted or the Purpose of the Study can be embedded within the review of the
literature
The introduction section answers the question: Why is this study being conducted?

Elements of the Literature Comments


Review
What is the P.O.I. of the The authors review the prevalence, occurrence, and incidence of the
phenomena being studied? phenomenon which is being studied
Is this a balanced (pro and The authors present both pro and con literature to support their study
con) literature review?
Is the rationale for the state Rationale for the study is clearly stated
clear?
Does the literature justify the It is important that the literature presented is related to the topic of the
approach study and that it clearly justify the purpose of the study.
Adequacy It is important that the review of the literature is adequate
Types of Study Purpose Comments
Statement of Purpose The statement of purpose is usually a sentence that begins with “The
purpose of this study is…”
Objectives It is also possible that authors offer the readers a list of study
objectives. They can be serialized (1, 2, 3) and/or listed as primary
(major) or secondary (or minor) objectives
Research Questions Authors may also decide to illustrate the study purposes presenting a
series of questions that the study is hoping to answer
Hypotheses Authors may also decide to present their hypotheses, what
relationships they expect to identify.
2. METHOD: The Method section describes the strategies used in recruiting study participants and conducting the
analysis. It is a way to show the author’s understanding of the phenomenon, and to illustrate the assumptions and
scientific rigor applied when designing and implementing the study.
It includes four main areas: Sample Selection/Participants, Study Design, Data Collection Procedure, and
Instruments
3. RESULTS: The Results session includes the Findings/Results,
4. DISCUSSION, AND IMPLICATIONS. This is where the authors present what they have found and what it
means in the relation to the current state of the literature, by placing it within the larger body of evidence.
Findings are usually summarized in tables or figures to facilitate the review of important results.

You might also like