You are on page 1of 22

Lecture 4

Ch. 3. Measurement, Part 2


Thurs, Feb. 17th, 2022
Review of what we covered last class
1. Measurement assigns values to variables.
2. Variables represent the operational definitions of constructs.
3. Variables can be scaled in different ways (nominal, ordinal, interval,
or ratio) so that their values are interpretable in the context of your
study.
4. Your study tests a hypothesis about whether (and how) at least two
variables correlate.
Let’s stay focused on the “big picture” of hypothesis testing – as it
relates to measurement – for just a minute longer.
Hypotheses consist of predictor and outcome variables.
• Any study can be described as investigating “the effect of X on Y.”
• Theories are relatively useless without specifying a “direction.”
• Not all studies are designed to support a conclusion about the direction of
the effect. (Later in the semester we will learn more about this.)
• You are probably very familiar with the phrase “correlation does not equal causation.”
• But (good) hypotheses are stated this way.
• Thus, enter the Predictor Variable.
• Also called the “independent variable” (IV)
• The X variable. The cause. The predictor. The catalyst. The stimulus.
• And allow me to introduce the Outcome Variable.
• Also called the “dependent variable” (DV)
• The Y variable. The effect. The outcome. The reaction. The response.
Reliability
A fundamental law of measurement
Measured Value = True Value + Error
• In most cases, some non-zero amount of error variance will obscure the
true value from the measured value.
• Errors can be broken down into two kinds:
• Non-systematic errors (random factors that are out of the researcher’s control)
• Ex: respondents are in a bad mood during the study, or in a good mood!
• Ex: temperature and air pressure when measuring the height of the Empire State Building
• Systematic errors (non-random factors, unintentionally due to decisions of the
researcher)
• Ex: measuring well-being/happiness in Eastern cultures requires different kinds of instruments
• Ex: researchers define the “bottom” of the ESB as being the 1st floor and not the ground floor
• Good measurement will minimize the effects of non-systematic errors.
Reliability
• The extent to which a measurement instrument provides consistent
results over time.
• If I repeatedly use the same thermometer to measure my daughter’s
temperature, will I get the same reading every time?
• If yes, the thermometer is reliable.
• If no, the thermometer is not reliable.
• If I repeatedly use the same scale to measure my weight, the same ruler to
measure my height, the same oven to cook my food, the same IQ test to
measure my intelligence, etc., etc., etc.
• The corollary: If the true value changes, then a reliable measure will
indicate that change with precision, with minimal error.
Two kinds of reliability that we care about
• Test-retest reliability
• Administer the same test twice, to the same group of respondents, after a window
of time.
• If scores on the two tests correlate strongly, then you can assume high reliability.
• Internal consistency (the textbook does not do a good job of explaining this concept)
• Assume that you have a multi-item scale
• ex: IQ test (intelligence) consists of 100 items and questions.
• ex: Myers-Briggs test (personality) consists of 93 items and questions.
• On average, how strongly do the items correlate with each other?
• Individual items should correlate with each other (thus, we call this internal consistency).
Multi-Item Scales
Recall: Complex and abstract constructs require
special attention.
• Thoughtful operational definitions, often consisting of album
sales
multiple dimensions.
dancing
• Every dimension must therefore be measured separately.
As you did during the in-class experiential exercise.
• Constructs may be uni-dimensional but require KPop
quality
multiple items.
• Many consumer attitudes require multi-item measures.
popularity
• Let’s look at a few examples. rapping

• One example (brand loyalty) of multiple dimensions per vocals


construct, measuring one item per dimension.
• Two examples (health consciousness, media skepticism) of
single dimension per construct, measuring multiple items.
Multiple dimensions of brand loyalty:
• Number of items purchased in the past from a brand
• The monetary value of past purchases from a brand
• Self-reported likelihood of buying again in the future
• Self-reported likelihood of recommending the brand to friends or family
• Self-reported emotional attachment to a brand
• Self-reported tendency to forgive a brand
These variables would be unlikely to display internal consistency (they do not
necessarily all correlate with each other), but they could all be measured
separately in a single study, to reflect different dimensions of brand loyalty.
Or, Health Consciousness is uni-dimensional,
measured using 6 items.
• Health Consciousness: the readiness to undertake actions to improve one’s
well-being
• On a scale from 1 to 7, please indicate to what extent you agree/disagree
with the following statements (1 = strongly disagree, 7 = strongly agree).
• I reflect about my health a lot.
• I'm very self conscious about my health.
• I'm alert to changes in my health.
• I'm usually aware of my health.
• I take responsibility for the state of my health.
• I'm aware of the state of my health as I go through the day.
• Individual scores are obtained by summing (or averaging) item scores.
Media Skepticism: Uni-dimensional, measured
using 5 items.
• Media Skepticism: degree to which individuals discount and distrust
information presented by the mass media.
• After watching a news program, participants rate the program on a 1-4 scale
(1 = not at all true, 4 = very true):
• The program was not very accurate in its portrayal of the problem.
• Most of the story was staged for entertainment purposes.
• The presentation was slanted and unfair.
• In think the story was fair and unbiased.
• I think important facts were purposely left out of the story.
• Individual scores are obtained by summing (or averaging) item scores.
Multi-item scales produce composite scores.
The procedure:
1. Measure a construct (e.g., quality, value) with multiple items.
2. Calculate the sum or the average of those items.
3. The resulting average is more likely to reflect the true value than any of
the items can reflect on their own.
In other words, multi-item scales exhibit greater reliability than 1-item scales!

We prefer multi-item scales for the same reasons that we employ multiple judges for sporting contests. The
average score across multiple judges is more likely to reflect the true value after accounting for inter-judge error.
Validity
Validity
• The extent to which an instrument measures what it is supposed to
measure.
• Validity requires reliability (minimizing error) plus an estimate of the
construct that is true to the construct’s form.
• Example: what does the SAT actually measure?
• intelligence? college-preparedness? success in life? the ability to “pass tests?”
• Example: what does GPA actually measure?
• It is a valid measure of “success in school,” but any interpretation beyond that is invalid.
• Can a measure be reliable but not valid?
• Yes! See next slide.
Reliability vs. Validity
Kinds of validity we care about
1. Face validity
• You should be able to infer the construct being measured by reading the test
questions.
• As constructs become complex and abstract, face validity loses its appeal.
• Ex: I want to measure “how likely is it that you are lying to me?” without you knowing
that I am measuring your likelihood of lying.
2. Content validity
• Items should assess the construct as broadly, and from as many angles, as the
complexity of the construct dictates.
• Ex: Stony Brook asks all graduating seniors, before they leave: “how satisfied are you
with your experience at Stony Brook University?” on a 1-5 scale.
• Content validity is low, because there are likely many different factors that should be
assessed (e.g., housing, cost of tuition, safety, food, fees, faculty, social life, quality of
education, etc, etc, etc.)
3. Construct validity
Kinds of validity we care about
3. Construct validity
• The instrument should behave consistently with its underlying theory.
• Essentially, a valid measure should demonstrate validity by relating in predictable
patterns with other variables.
• This kind of validity is the most academic. Used for theory-building more than for
common research applications.
Continuous vs. Discrete Data
Continuous vs. discrete data
• Continuous: an infinite number of possible values between any two
points on the measurement scale.
• Many ratio scales are continuous, but not all.
• Discrete: the variable can only take on a limited number of values.
• Nominal and ordinal scales are, by definition, discrete variables.
• Interval scales are usually discrete, but we bypass this by using multi-item
scales and averaging individual items to create a composite score per person.
• The best way to think about interval vs. ratio and continuous vs.
discrete is on the next slide.
Don’t think of the distinction as a defining characteristic.
Think of it as an additional layer.

ratio

discrete continuous

interval
Don’t think of the distinction as a defining characteristic.
Think of it as an additional layer.

ratio
number of drugs milligrams of a drug
number of sales from repeat customers percentage of market share
number of correct responses on Exam 1 your Exam 1 score as a percentage of 100

discrete continuous
Likert scale ratings Nasdaq composite index
1-10 pain scale composite variables from multi-item scales
SAT scores your overall GPA

interval

You might also like