Professional Documents
Culture Documents
TEST BANK
MULTIPLE CHOICE
ANS [C]
LOC: Concepts and Variables
TIP: Constructing Questions
[LO 2]
COG [Knowledge]
DIF [Easy]
9. “Compared with other campuses with which you are familiar, this
campus’s use of alcohol is … (choose one): 1) Greater than for other campuses,
2) Less than for other campuses, 3) About the same as for other campuses.”
This is which type of question? (4-13)
A) Open-ended
B) Closed-ended
C) Exhaustive
D) Other
ANS [B]
LOC: How Will We Know When We’ve Found It?
TIP: Constructing Questions
[LO 2]
COG [Knowledge]
DIF [Easy]
Instructor Resource
Bachman, Fundamentals of Research in Criminology and Criminal Justice, 4th Edition
SAGE Publishing, 2018
14. The use of multiple methods to study one research question is called (4-
7)
A) Statistical divergence
B) Measurement interrelationship
C) Triangulation
D) Combination of operations
ANS [C]
LOC: How Will We Know When We’ve Found It?
TIP: Combining Measurement Operations
[LO 2]
COG [Comprehension]
DIF [Medium]
19. This type of validity establishes that a measure covers the full range of the
concept’s meaning (4-16)
A) Construct
B) Criterion
C) Content
D) Face
ANS [C]
LOC: Did We Measure What We Wanted to Measure?
TIP: Content Validity
[LO 5]
COG [Evaluation]
DIF [Hard]
20. When people drink alcohol, the alcohol is absorbed into their bloodstream
and is then gradually metabolized in their liver, which may be measured to
determine their ‘blood alcohol’ level. This may be used as an indirect measure to
validate self-reports about alcohol consumption during a certain period of time.
This is an example of __________ validity (4-16)
A) Criterion
Instructor Resource
Bachman, Fundamentals of Research in Criminology and Criminal Justice, 4th Edition
SAGE Publishing, 2018
B) Construct
C) Content
D) Face
ANS [A]
LOC: Did We Measure What We Wanted to Measure?
TIP: Criterion Validity
[LO 5]
COG [Evaluation]
DIF [Hard]
21. If students are given arithmetic tests covering roughly the same
information, this Friday and next Friday, and scored approximately the same on
both tests, it is said to have (4-17)
A) Validity
B) Inter-item probability
C) Split-halves probability
D) Reliability
ANS [D]
LOC: Did We Measure What We Wanted to Measure?
TIP: Reliability
[LO 7]
COG [Analysis]
DIF [Hard]
[LO 6]
COG [Comprehension]
DIF [Medium]
are (4-
11)
A) Nominal level of measurement
B) Ordinal level of measurement
C) Interval level of measurement
D) Ratio level of measurement
ANS [C]
LOC: How Much Information Do We Really Have?
TIP: Interval Level of Measurement
[LO 4]
COG [Comprehension]
DIF [Easy]
28. A variable measured that has fixed measuring units and an absolute zero
point is (4-11)
A) Nominal level of measurement
B) Ordinal level of measurement
C) Interval level of measurement
D) Ratio level of measurement
ANS [D]
LOC: How Much Information Do We Really Have?
TIP: Ratio Level of Measurement
[LO 4]
COG [Comprehension]
DIF [Easy]
29. Available criminal justice data includes official data, such as (4-22)
A) Government statistics
B) U. S. Census Bureau statistics
C) Uniform Crime Reports
D) All of the above
ANS [D]
LOC: How Will We Know When We’ve Found It?
TIP: Using Available Data
[LO 7]
COG [Knowledge]
DIF [Easy]
TRUE/FALSE
ANS [B]
LOC: How Will We Know When We’ve Found It?
TIP: Reliability
[LO 7]
COG [Comprehension]
DIF [Medium]
ANS [B]
LOC: Did We Measure What We Wanted to Measure?
TIP: Split Halves Reliability
[LO 6]
COG [Comprehension]
DIF [Hard]
A) TRUE
B) FALSE
ANS [A]
Instructor Resource
Bachman, Fundamentals of Research in Criminology and Criminal Justice, 4th Edition
SAGE Publishing, 2018
ESSAY
LOC: Concepts
TIP: Concepts
[LO 1]
COG [Knowledge]
DIF [Medium]
fewer units of conflict than the high level of conflict, which is represented by the
number 3. These numbers really have no mathematical qualities; they are just
used to represent relative rank in the measurement of conflict.
Ordinal level of measurement A measurement of a variable in which the numbers
indicating the variable’s value specify only the order of the cases, permitting
greater than and less than distinctions
As with nominal variables, the different values of a variable measured at the
ordinal level must be mutually exclusive and exhaustive. They must cover the
range of observed values and allow each case to be assigned no more than one
value.
The Favorable Attitudes Toward Antisocial Behavior Scale measures attitudes
toward antisocial behavior among high school students with a series of questions
that each involves an ordinal distinction (see Exhibit 4.5). The response choices
to each question range from “very wrong” to “not wrong at all”; there’s no
particular quantity of “wrongness” that these distinctions reflect, but the idea is
that a student who responds that it is “not wrong at all” to a question about taking
a handgun to school has a more favorable attitude toward antisocial behavior
than does a student who says it is “a little bit wrong,” which is in turn more
favorable than those who respond “wrong” or “very wrong.”
Discrete measure is a measure that classifies cases in distinct categories
An index is a composite measure based on summing, averaging, or otherwise
combining the responses to multiple questions that are intended to measure the
same variable; sometimes called a scale.
Exhibit 4.5 Example of Ordinal Measures: Favorable Attitudes Toward Antisocial
Behavior Scale
1. How wrong do you think it is for someone your age to take a handgun to
school?
Very wrong Wrong A little bit wrong Not wrong at all
2. How wrong do you think it is for someone your age to steal anything
worth more than $5?
Very wrong Wrong A little bit wrong Not wrong at all
3. How wrong do you think it is for someone your age to pick a fight with
someone?
Very wrong Wrong A little bit wrong Not wrong at all
4. How wrong do you think it is for someone your age to attack someone
with the idea of seriously hurting them?
Very wrong Wrong A little bit wrong Not wrong at all
5. How wrong do you think it is for someone your age to stay away from
school all day when their parents think they are at school?
Very wrong Wrong A little bit wrong Not wrong at all
Sources: Lewis, Chandra, Gwen Hyatt, Keith Lafortune, and Jennifer Lembach.
2010. History of the Use of Risk and Protective Factors in Washington State’s
Healthy Youth Survey. Portland, OR: RMC Research Corporation.
See also Arthur, Michael W., John S. Briney, J. David Hawkins, Robert D. Abbott,
Blair L. Brooke-Weiss, and Richard F. Catalano. 2007. “Measuring Risk and
Instructor Resource
Bachman, Fundamentals of Research in Criminology and Criminal Justice, 4th Edition
SAGE Publishing, 2018
5—the numbers can be compared in a ratio. Ratio numbers can be added and
subtracted, and because the numbers begin at an absolute zero point, they can
be multiplied and divided (so ratios can be formed between the numbers). For
example, people’s ages can be represented by values ranging from 0 years (or
some fraction of a year) to 120 or more. A person who is 30 years old is 15 years
older than someone who is 15 years old (30 – 15 = 15) and is twice as old as that
person (30/15 = 2). Of course, the numbers also are mutually exclusive and
exhaustive, so that every case can be assigned one and only one value.
Exhibit 4.6 Ordinal-Level Variables Can Be Added to Create an Index With
Interval-Level Properties: Core Alcohol and Drug Survey
How do you think your close friends Do Not Strongly
feel (or would feel) about you... (mark Disapprove Disapprove
one for each line) Disapprove
b. Smoking marijuana
occasionally
c. Smoking marijuana
regularly
e. Taking cocaine
regularly
h. Trying amphetamines
once or twice
i. Taking amphetamines
regularly
less likely to respond accurately (Corse, Hirschinger, & Zanis, 1995). These
types of possibilities should always be considered when evaluating measurement
validity.
Face Validity
Researchers apply the term face validity to the confidence gained from careful
inspection of a concept to see if it is appropriate “on its face.” More precisely, we
can say that a measure has face validity if it obviously pertains to the concept
being measured more than to other concepts (Brewer & Hunter, 1989, p. 131).
For example, if college students’ alcohol consumption is what we are trying to
measure, asking for students’ favorite color seems unlikely on its face to tell us
much about their drinking patterns. A measure with greater face validity would be
a count of how many drinks they had consumed in the past week.
Face validity is the type of validity that exists when an inspection of the items
used to measure a concept suggests that they are appropriate “on their face.”
Although every measure should be inspected in this way, face validation on its
own is not the gold standard of measurement validity. The question “How much
beer or wine did you have to drink last week?” may look valid on its face as a
measure of frequency of drinking, but people who drink heavily tend to
underreport the amount they drink. So the question would be an invalid measure
in a study that includes heavy drinkers.
Content Validity
Content validity establishes that the measure covers the full range of the
concept’s meaning. To determine that range of meaning, the researcher may
solicit the opinions of experts and review literature that identifies the different
aspects of the concept. An example of a measure that covers a wide range of
meaning is the Michigan Alcoholism Screening Test (MAST). The MAST includes
24 questions representing the following subscales: recognition of alcohol
problems by self and others; legal, social, and work problems; help seeking;
marital and family difficulties; and liver pathology (Skinner & Sheu, 1982). Many
experts familiar with the direct consequences of substance abuse agree that
these dimensions capture the full range of possibilities. Thus, the MAST is
believed to be valid from the standpoint of content validity.
Content validity is the type of validity that establishes a measure covers the full
range of the concept’s meaning.
Criterion Validity
Consider the following scenario: When people drink an alcoholic beverage, the
alcohol is absorbed into their bloodstream and then gradually metabolized
(broken down into other chemicals) in their liver (NIAAA, 1997). The alcohol that
remains in their blood at any point, unmetabolized, impairs both thinking and
behavior (NIAAA, 1994). As more alcohol is ingested, cognitive and behavioral
consequences multiply. These biological processes can be identified with direct
measures of alcohol concentration in the blood, urine, or breath. Questions about
alcohol consumption, on the other hand, can be viewed as attempts to measure
indirectly what biochemical tests measure directly.
Instructor Resource
Bachman, Fundamentals of Research in Criminology and Criminal Justice, 4th Edition
SAGE Publishing, 2018
Criterion validity is the type of validity that is established by comparing the scores
obtained on the measure being validated to those obtained with a more direct or
already validated measure of the same phenomenon (the criterion).
Criterion validity is established when the scores obtained on one measure can
accurately be compared to those obtained with a more direct or already validated
measure of the same phenomenon (the criterion). A measure of blood-alcohol
concentration or a urine test could serve as the criterion for validating a self-
report measure of drinking, as long as the questions we ask about drinking refer
to the same period. Observations of substance use by friends or relatives could
also, in some circumstances, serve as a criterion for validating self-report
substance use measures.
An attempt at criterion validation is well worth the effort because it greatly
increases confidence that the measure is actually measuring the concept of
interest—criterion validity basically offers evidence. However, often no other
variable might reasonably be considered a criterion for individual feelings or
beliefs or other subjective states. Even with variables for which a reasonable
criterion exists, the researcher may not be able to gain access to the criterion, as
would be the case with a tax return or employer document as criterion for self-
reported income.
Construct Validity
Measurement validity also can be established by showing that a measure is
related to a variety of other measures as specified in a theory. This validation
approach, known as construct validity, is commonly used in social research when
no clear criterion exists for validation purposes. For example, in one study of the
validity of the Addiction Severity Index (ASI), McLellan et al. (1985) compared
subject scores on the ASI to a number of indicators that they felt from prior
research should be related to substance abuse: medical problems, employment
problems, legal problems, family problems, and psychiatric problems. They could
not use a criterion-validation approach because they did not have a more direct
measure of abuse, such as laboratory test scores or observer reports. However,
their extensive research on the subject had given them confidence that these
sorts of other problems were all related to substance abuse, and thus their
measures seemed to be valid from the standpoint of construct validity. Indeed,
the researchers found that individuals with higher ASI ratings tended to have
more problems in each of these areas, giving us more confidence in the ASI’s
validity as a measure.
Construct validity is the type of validity that is established by showing that a
measure is related to other measures as specified in a theory.
The distinction between criterion and construct validation is not always clear.
Opinions can differ about whether a particular indicator is indeed a criterion for
the concept that is to be measured. For example, if you need to validate a
question-based measure of sales ability for applicants to a sales position, few
would object to using actual sales performance as a criterion. But what if you
want to validate a question-based measure of the amount of social support that
people receive from their friends? Should you just ask people about the social
Instructor Resource
Bachman, Fundamentals of Research in Criminology and Criminal Justice, 4th Edition
SAGE Publishing, 2018
support they have received? Could friends’ reports of the amount of support they
provided serve as a criterion? Even if you could observe people in the act of
counseling or otherwise supporting their friends, can an observer be sure that the
interaction is indeed supportive? There isn’t really a criterion here, just a
combination of related concepts that could be used in a construct validation
strategy.
What construct and criterion validation have in common is the comparison of
scores on one measure to scores on other measures that are predicted to be
related. It is not so important that researchers agree that a particular comparison
measure is a criterion rather than a related construct. But it is very important to
think critically about the quality of the comparison measure and whether it
actually represents a different measure of the same phenomenon. For example,
it is only a weak indication of measurement validity to find that scores on a new
self-report measure of alcohol use are associated with scores on a previously
used self-report measure of alcohol use.
When researchers use multiple items to measure a single concept, they are
concerned with inter-item reliability (or internal consistency). For example, if we
are to have confidence that a set of questions reliably measures an attitude, say,
attitudes toward violence, then the answers to the questions should be highly
associated with one another. The stronger the association between the individual
items and the more items included, the higher the reliability of the index.
Cronbach’s alpha is a reliability measure commonly used to measure inter-item
reliability. Of course, inter-item reliability cannot be computed if only one question
is used to measure a concept. For this reason, it is much better to use a multi-
item index to measure an important concept (Viswanathan, 2005).
Test-retest reliability A measurement showing that measures of a phenomenon
at two points in time are highly correlated, if the phenomenon has not changed,
or have changed only as much as the phenomenon itself.
Interitem reliability An approach that calculates reliability based on the
correlation among multiple items used to measure a single concept.
Cronbach’s alpha A statistic that measures the reliability of items in an index
or scale.
3) Alternate-Forms Reliability
Researchers are testing alternate-forms reliability when they compare
subjects’ answers to slightly different versions of survey questions (Litwin, 1995).
A researcher may reverse the order of the response choices in an index or
modify the question wording in minor ways and then readminister that index to
subjects. If the two sets of responses are not too different, alternate-forms
reliability is established.
A related test of reliability is the split-halves reliability approach. A survey
sample is divided in two by flipping a coin or using some other random
assignment method. These two halves of the sample are then administered the
two forms of the questions. If the responses of the two halves are about the
same, the measure’s reliability is established.
Alternate-forms reliability A procedure for testing the reliability of responses to
survey questions in which subjects’ answers are compared after the subjects
have been asked slightly different versions of the questions or when randomly
selected halves of the sample have been administered slightly different versions
of the questions.
Split-halves reliability Reliability achieved when responses to the same
questions by two randomly selected halves of a sample are about the same.
4) Intra-Observer and Inter-Observer Reliability
When ratings by an observer, rather than ratings by the subjects themselves, are
being assessed at two or more points in time, test-retest reliability is termed
intra-observer or intra-rater reliability. Let’s say a researcher observes a grade
school cafeteria for signs of bullying behavior on multiple days. If his
observations captured the same degree of bullying on every Friday, it can be
said that his observations were reliable. When researchers use more than one
observer to rate the same persons, events, or places, inter-observer reliability
is their goal. If observers are using the same instrument to rate the same thing,
their ratings should be very similar. In this case, the researcher interested in
Instructor Resource
Bachman, Fundamentals of Research in Criminology and Criminal Justice, 4th Edition
SAGE Publishing, 2018
cafeteria bullying would use more than one observer. If the measurement of
bullying is similar across the observers, we can have much more confidence that
the ratings reflect the actual degree of bullying behavior.
Intraobserver reliability (intrarater reliability) Consistency of ratings by an
observer of an unchanging phenomenon at two or more points in time.
Interobserver reliability When similar measurements are obtained by different
observers rating the same persons, events, or places.
and that different self-report measures were not always in agreement and were
thus unreliable. A more reliable measure of substance abuse was initial reports
of lifetime substance abuse problems. This measure was extremely accurate in
identifying all those who subsequently abused substances during the project.See
Exhibit 4.8 -- The Difference Between Reliability and Validity: Drinking Behavior
10. Is it possible to achieve both reliability and validity? If so, how? (4-
19)
ANS: We must always assess the reliability of a measure if we hope to be able
to establish its validity. Remember that a vary measure is not necessarily a valid
measure, as Exhibit 4.8 illustrates. This discrepancy is a common flaw of self-
report measures of substance abuse. The multiple questions in self-report
indexes of substance abuse are answered by most respondents in a consistent
way, so the indexes are reliable. However, a number of respondents will not
admit to drinking, even though they drink a lot. Their answers to the questions
are consistent, but they are consistently misleading. So the indexes based on
self-report are reliable but invalid. Such indexes are not useful and should be
improved or discarded. Unfortunately, many measures are judged to be
worthwhile on the basis only of a reliability test.
The reliability and validity of measures in any study must be tested after the fact
to assess the quality of the information obtained. But then if it turns out that a
measure cannot be considered reliable and valid, little can be done to save the
study. Thus, it is supremely important to select in the first place measures that
are likely to be reliable and valid. In studies that use interviewers or observers,
careful training is often essential to achieving a consistent approach. In most
cases, however, the best strategy is to use measures that have been used
before and whose reliability and validity have been established in other contexts.
However, know that the selection of “tried and true” measures still does not
absolve researchers from the responsibility of testing the reliability and validity of
the measure in their own studies.
It may be possible to improve the reliability and validity of measures in a study
that already has been conducted if multiple measures were used. For example,
in a study of housing for homeless mentally ill persons, residents’ substance
abuse was assessed with several different sets of direct questions as well as with
reports from subjects’ case managers and others (Goldfinger et al., 1996). It was
discovered that the observational reports were often inconsistent with self-reports
and that different self-report measures were not always in agreement and were
thus unreliable. A more reliable measure of substance abuse was initial reports
of lifetime substance abuse problems. This measure was extremely accurate in
identifying all those who subsequently abused substances during the project.
Instructor Resource
Bachman, Fundamentals of Research in Criminology and Criminal Justice, 4th Edition
SAGE Publishing, 2018
11. Define the concept “youth gang.” As discussed in the text, what
difficulties might one confront in trying to appropriately define the
concept? Why is it so important to define a concept precisely? (4-2)
ANS: Do you have a clear image in mind when you hear the term youth gangs?
Although this is a very ordinary term, social scientists’ attempts to define
precisely the concept youth gang have not yet succeeded: “Neither gang
researchers nor law enforcement agencies can agree on a common definition...
and a concerted national effort... failed to reach a consensus” (Howell, 2003, p.
75). Exhibit 4.1 lists a few of the many alternative definitions of youth gangs.
As you can see, there are many different ideas about what constitutes a gang.
What is the basis of this conceptual difficulty? Howell (2003) suggests that
defining the term youth gangs has been difficult for four reasons:
• Youth gangs are not particularly cohesive.
• Individual gangs change their focus over time.
• Many have a “hodgepodge of features,” with diverse members and
unclear rules.
• There are many incorrect but popular “myths” about youth gangs. (pp. 27–
28)
In addition, youth gangs are only one type of social group, and it is important to
define youth gangs in a way that distinguishes them from these other types of
groups—for example, childhood play groups, youth subculture groups, delinquent
groups, and adult criminal organizations. Whenever you define a concept, you
need to consider whether the concept is unidimensional or multidimensional. If it
is multidimensional, your job of conceptualization is not complete until you have
specified the related subconcepts that belong under the umbrella of the larger
concept. And finally, the concept you define must capture an idea that is distinctly
separate from related ideas.
LOC: Concepts
TIP: Defining Youth Gangs
[LO 1]
COG [Comprehension]
DIF [Medium]
reliability of the measure. If you take a test of your math ability and then retake
the test two months later, the test is performing reliably if you receive a similar
score both times, presuming that nothing happened during the two months to
change your math ability. Of course, if events between the test and the retest
have changed the variable being measured, then the difference between the test
and retest scores should reflect that change.
Inter-Item Reliability (Internal Consistency)
When researchers use multiple items to measure a single concept, they are
concerned with inter-item reliability (or internal consistency). For example, if we
are to have confidence that a set of questions reliably measures an attitude, say,
attitudes toward violence, then the answers to the questions should be highly
associated with one another. The stronger the association between the individual
items and the more items included, the higher the reliability of the index.
Cronbach’s alpha is a reliability measure commonly used to measure inter-item
reliability. Of course, inter-item reliability cannot be computed if only one question
is used to measure a concept. For this reason, it is much better to use a multi-
item index to measure an important concept (Viswanathan, 2005).
Test-retest reliability is a measurement showing that measures of a phenomenon
at two points in time are highly correlated, if the phenomenon has not changed,
or have changed only as much as the phenomenon itself.
Interitem reliability is an approach that calculates reliability based on the
correlation among multiple items used to measure a single concept.
Cronbach’s alpha is a statistic that measures the reliability of items in an index or
scale.
LOC: Reliability
TIP: Test-Retest Reliability
[LO 6]
COG [Application]
DIF [Hard]
LOC: Reliability
TIP: Inter-Item Reliability (Internal Consistency)
[LO 6]
COG [Application]
DIF [Hard]