You are on page 1of 6

NCM111 - If a researcher reported that the average oral

MEASUREMENT AND DATA QUALITY temperature of a sample of patients was "somewhat


high," different readers might develop different
MR NINO ARCHIE S. LABORDO
conceptions about the sample's physiologic state. If
the researcher reported an average temperature of
MEASUREMENT 99.6°F, however, there is no ambiguity.
• Measurement involves rules for assigning numbers to
Levels of Measurement
qualities of objects to designate the quantity of the
attribute. Attributes do not inherently have numeric 1. Nominal measurement
values; humans invent rules to measure attributes.
2. Ordinal Measurement
- a statement by early American psychologist L. L. 3. Interval Measurement
Thurstone: "Whatever exists, exists in some amount
and can be measured." 4. Ratio Measurement
• Attributes are not constant; they vary from day to
NOMINAL MEASUREMENT
day, from situation to situation, or from one person
to another. • The lowest level, involves using numbers simply to
• Measurement requires numbers to be assigned to categorize attributes The numbers used in nominal
objects according to rules. measurement do not have quantitative meaning
• Examples of variables that are nominallv measured
- Rules for measuring temperature, weight, and other include gender and blood type.
physical attributes are familiar to us. Rules for
measuring many variables for nursing studies, - If we coded males as 1 and females as 2, the numbers
however, have to be created. would not have quantitative implications-the number
2 does not mean "more than" 1.
• Nominal measurement provides information only
Advantages of Measurement about categorical equivalence and nonequivalence
and so the numbers cannot be treated
• A major strength of measurement is that it removes mathematically.
guesswork and ambiguity in gathering and
communicating information ORDINAL MEASUREMENT

• Ranks objects based on their relative standing on an


- Consider how handicapped health care professionals
attribute. If a researcher orders people from heaviest
would be in the absence of measures If body
to lightest, this is ordinal measurement.
temperature, blood pressure, and so on. Without such
measures, subjective evaluations of clinical outcomes
- As another example, consider this ordinal coding
would have to be used.
scheme for measuring ability to perform activities of
• Not all measures are completely objective, but most
daily living:
incorporate mechanisms for minimizing subjectivity
✓ completely dependent;
• Measurement also makes it possible to obtain
✓ needs another person's assistance;
reasonably precise information. Precision allows
✓ needs mechanical assistance; and
researchers to make fine distinctions among people
✓ completely independent.
with different degrees of an attribute.
• The numbers signify incremental ability to perform
activities of daily living independently
- Instead of describing Nathan as "tall," we can depict
him as being 6 feet 3 inches tall. INTERVAL MEASUREMENT
• Measurement is a language of communication.
Numbers are less vague than words and can thus • Occurs when researchers can specify the ranking of
communicate information more clearly. objects on an attribute and the distance between
those objects. Most educational and psychological
tests yield interval-level measures.

[Date] 1
- For example, the Stanford-Binet Intelligence Scale-a Reliability Of Measuring Instruments
standardized intelligence (IQ) test used in many
countries-is an interval measure. A score of 140 on the
Stanford-Binet is higher than a score of 120, which, in RELIABILITY
turn, is higher than 100. Moreover, the difference
between 140 and 120 is presumed to be equivalent to • Reliability is the consistency with which an
the difference between 120 and 100. instrument measures the attribute. If a scale
weighed a person at 120 pounds one minute and 150
RATIO MEASUREMENT pounds the next, we would consider it unreliable.
• Ratio measurement is the highest level. Ratio scales, • The less variation an instrument produces in repeated
unlike interval scales, have a rational, meaningful measurements, the higher its reliability.
zero and therefore provide information about the • Reliability also concerns a measure's accuracy. An
absolute magnitude of the attribute. instrument is reliable to the extent that its measures
reflect true scores-that is, to the extent that
- The Fahrenheit scale for measuring temperature measurement errors are absent from obtained
(interval measurement) has an arbitrary zero point. scores.
Zero on the thermometer does not signify the absence • A reliable instrument maximizes the true score
of heat; it would not be appropriate to say that 60°F component and minimizes the error component of an
is twice as hot as 30°F. obtained score.
- Many physical measures, however, are ratio • Three aspects of reliability are of interest to
measures with a real zero. A person's weight, for quantitative researchers: stability, internal
example, is a ratio measure. It is acceptable to say consistency, and equivalence.
that someone who weighs 200 pounds is twice as STABILITY
heavy as someone who weighs 100 pounds.
• The stability of an instrument is the extent to which
Errors Of Measurement
similar results are obtained on two separate
• Situational contaminants - Scores can be affected by occasions.
the conditions under which they are produced. For • The reliability estimate focuses on the instrument's
example, environmental factors (e.g., temperature, susceptibility to extraneous influences over time,
lighting, time of day) can be sources of measurement such as participant fatigue.
error. • Assessments of stability are made through test-retest
• Response-set biases- Relatively enduring reliability procedures. Researchers administer the
characteristics of respondents can interfere with same measure to a sample twice and then compare
accurate measurements the scores.
• Transitory personal factors - Temporary states, such FOR EXAMPLE, THE STABILITY OF A SELF- REPORT SCALE THAT
'as fatigue, hunger, or mood, can influence people's MEASURED SELF-ESTEEM
motivation or ability to cooperate, act naturally, or do
their best.
• Administration variations- Alterations in the
methods of collecting data from one person to the
next can affect obtained scores. For example, if some
physiologic measures are taken before a feeding and
others are taken after a feeding, then measurement
errors can potentially occur.
• Item sampling- Errors can be introduced as a result of
the sampling of items used to measure an attribute.
For example, a student's score on a 100-item test of
research methods will be influenced somewhat by
which 100 questions are included.

[Date] 2
• Because self-esteem is a fairly stable attribute that • Equivalence, in the context of reliability assessment,
does not change much from one day to another, we primarily concerns the degree to which two or more
would expect a reliable measure of it to yield independent observers or coders agree about th
consistent scores on two different days. scoring on an instrument.
• As a check on the instrument's stability, we • With a high level of agreement, the assumption is that
administer the scale 2 weeks apart to a sample of 10 measurement errors have been minimized. The
people degree of error can be assessed through interrater (or
• The scores on the two tests are not identical but, on interobserver) reliability procedures, which involve
the whole, differences are not large. having two or more trained observers or coders make
• Researchers compute a reliability coefficient, a simultaneous, independent observations.
numeric index that quantifies an instrument's • An index of equivalence or agreement is then
reliability, to objectively determine how small the calculated with these data to evaluate the strength of
differences are. Reliability coefficients (designated as the relationship between the ratings. When two
r) range from .00 to 1.00.* The higher the value, the independent observers score some phenomenon
more reliable (stable) is the measuring instrument. In congruently, the scores are likely to be accurate and
the example shown in Table 14.1, the reliability reliable
• Test-retest reliability is relatively easy to compute,
Interpretation Of Reliability Coefficients
but a major problem with this approach is that many
traits do change over time, independently of the • Reliability coefficients are important indicators of an
instrument's stability. instruments quality. Unreliable measures reduce
• Attitudes, mood, knowledge, and so forth can be statistical power and hence affect statistical
modified by experiences between two conclusion validity.
measurements. Thus, stability indexes are most • If data fail to support a hypothesis, one possibility is
appropriate for relatively enduring characteristics, that the instruments were unreliable-not necessarily
such as temperament. that the expected relationships do not exist.
• Even with such traits, test-retest reliability tends to Knowledge about an instrument's reliability thus is
decline as the interval between the two critical in interpreting research results, especially if
administrations increases research hypotheses are not supported.
• Various things affect an instrument's reliability. For
example, reliability is related to sample
INTERNAL CONSISTENCY heterogeneity. The more homogeneous the sample
(i.e., the more similar the scores), the lower the
• lnternal consistency reliability is the most widely used reliability coefficient will be.
reliability approach among nurse researchers. This • Reliability estimates vary according to the procedure
approach is the best means of assessing an especially used to obtain them. Estimates of reliability
important source of measurement error in computed by different procedures are not identical,
psychosocial instruments, the sampling of items. and so it is important to consider which aspect of
• Scales and tests that involve summing item scores are reliability is most important for the attribute being
almost always evaluated for their internal measured.
consistency.
• An instrument may be said to be internally consistent VALIDITY
to the extent that its items measure the same trait.
• The second important criterion for evaluating a
• Internal consistency is usually evaluated by
quantitative instrument is its validity. Validity is the
calculating coefficient alpha (or Cronbach's alpha).
degree to which an instrument measures what it is
The normal range of values for coefficient alpha is
supposed to measure.
between .00 and +1.00. The higher the reliability
coefficient, the more accurate (internally consistent)
- Suppose we wanted to assess patients' anxiety by
the measure.
measuring the circumference of their wrists. We could
EQUIVALENCE obtain highly accurate and precise measurements of

[Date] 3
wrist circumferences, but such measures would not be the standard for establishing excellence in a scale's
valid indicators of anxiety content validity
• Reliability and validity are not totally independent
qualities of an instrument. - Bu and Wu (2008) developed a scale to measure
- A measuring device that is unreliable cannot possibly nurses' attitudes toward patient advocacy. The
be valid. content validity of their 84-item scale was rated by
- An instrument cannot validly measure an attribute if seven experts (a bioethicist, patient advocacy
it is erratic and inaccurate. researchers, measurement experts). The scale's CV1
- An instrument can, however, be reliable without was calculated to be .85.
being valid. • An instrument's content validity is necessarily based
• Thus, the high reliability of an instrument provides no on judgment. No totally objective methods exist for
evidence of its validity; low reliability of a measure is ensuring the adequate content coverage of an
evidence of low validity. instrument, but it is increasingly common to use a
panel of substantive experts to evaluate the content
FACE VALIDITY validity of new instruments.
• Face validity refers to whether the instrument looks CRITERION-RELATED VALIDITY
as though it is measuring the appropriate construct,
especially to people who will be completing the • Criterion-related validity assessments, researchers
instrument. seek to establish a relationship between scores on
- Johnson and colleagues (2008) developed an an instrument and some external criterion. The
instrument to measure cognitive appraisal of health instrument, whatever abstract attribute it is
among survivors of stroke. One part of the measuring, is said to be valid if its scores correspond
development process involved assessing the face strongly with scores on the criterion.
validity of the items on the scale. Stroke survivors • After a criterion is established, validity can be
were asked a series of open-ended questions estimated easily.
regarding their health appraisal after completing the • Criterion-related validity is helpful in assisting
scale, and then the themes that emerged were decision makers by giving them some assurance that
compared with the content of scale items to assess the their decisions will be effective, fair, and, in short,
congruence of key constructs. valid.

CONTENT VALIDITY A. Predictive validity - refers to an instrument's ability


to differentiate between people's performances or
• Content validity concerns the degree to which an
behaviors on a future criterion.
instrument has an appropriate sample of items for
the construct being measured and adequately For example: When a school of nursing correlates
covers the construct domain. students' incoming high school grades with their
• Content validity is crucial for tests of knowledge, subsequent grade-point averages, the predictive validity of
where the content validity question is: "How high school grades for nursing school performance is being
representative are the questions on this test of the evaluated.
universe of questions on this topic?"
B. Concurrent validity - refers to an instrument's ability
• Content validity is also relevant in measures of
to distinguish among people who differ in their
complex psychosocial traits. Researchers designing a
present status on some criterion.
new instrument should begin with a thorough
conceptualization of the construct so the instrument
For example, a psychological test to differentiate
can capture the full content domain.
between patients in a mental institution who could
• Researchers typically calculate a content validity
and could not be released could be correlated with
index (CVI) that indicates the extent of expert
current behavioral ratings of health care personnel.
agreement. We have suggested a CVI value of .90 as

[Date] 4
The difference between predictive and concurrent validity, condition correctly. A measure's sensitivity is its rate
then, is the difference in the timing of obtaining measurements of yielding "true positives."
on a criterion. • Specificity. is the measure's ability to identify
noncases correctly, that is, to screen out those
CONSTRUCT VALIDITY
without the condition.
• Construct Validity a key criterion for assessing the • Specificity is an instrument's rate of yielding "true
quality of a study, and construct validity has most negatives."
often been linked to measurement issues. The key
To determine an instrument's sensitivity, and specificity,
construct validity questions with regard to
researchers need a reliable and valid criterion of "caseness”
measurement are:
against which scores on the instrument can be assessed
- "What is this instrument really measuring?" and
"Does it validly measure the abstract concept of
interest?

• The more abstract the concept, the more difficult it is


to establish construct validity however; at the same
time, the more abstract the concept, the less suitable
it is to rely on criterion-related validity. What
objective criterion is there for such concepts as
empathy, role conflict, or separation anxiety?

KNOWN GROUPS

• One approach to construct validation is the known- - EXAMPLE: suppose we wanted to evaluate whether
groups technique. In this procedure, groups that are adolescents' self-reports about their smoking were
expected to differ on the target attribute are accurate, and we asked 100 teenagers aged 13 to 15
administered the instrument, and group scores are about whether they had smoked a cigarette in the
compared. previous 24 hours.
- For instance, in validating a measure of fear of the - The "gold standard" for nicotine consumption is
labor experience, the scores of primiparas and cotinine levels in a body fluid, and so let us assume
multiparas could be contrasted. Women who had that we did a urinary cotinine assay
never given birth would likely experience more anxiety
than women who had already had children; one might
question the validity of the instrument if such
differences did not emerge.

FACTOR ANALYSIS

• Factor analysis, which is a method for identifying


clusters of related items on a scale.
• The procedure is used to identify and group together
different measures of some underlying attribute and
to distinguish them from measures of different
attributes - Sensitivity, in this example, is calculated as the
proportion of teenagers who said they smoked and
SENSITIVITY, SPECIFICITY AND LIKELIHOOD RATIOS who had high concentrations of cotinine, divided by
Calculating Sensitivity, Specificity and Likelihood Ratios all real smokers as indicated by the urine test. Put
another way, it is the true-positive findings divided by
TERMS: all real-positive findings.

• Sensitivity. is the ability of a measure to identify a


- In this case, there was considerable underreporting of
"case" correctly, that is, to screen in or diagnosis a
smoking and so the sensitivity of the self-report was

[Date] 5
only .50. Specificity is the proportion of teenagers 8. If a diagnostic or screening tool was used, is
who accurately reported they did not smoke, or the information provided about its sensitivity and
true- negative findings divided by all real-negative specificity, and were these qualities adequate?
findings. In our example, specificity is .83. There was 9. Were the research hypotheses supported? If not,
considerably less over-reporting of smoking ("faking might data quality play a role in the failure to confirm
bad") than under-reporting ("faking good"). the hypotheses?
(Sensitivity and specificity are sometimes reported as
percentages rather than proportions, simply by
multiplying the proportions by 100.)

CRITIQUING DATA QUALITY IN QUANTITATIVE STUDIES

Guidelines For Critiquing Data Quality In Quantitative Studies

1. Is there congruence between the research variables


as conceptualized (i.e., as discussed in the
introduction of the report) and as operationalized
(i.e., as described in the method section)?
2. If operational definitions (or scoring procedures) are
specified, do they clearly indicate the rules of
measurement? Do the rules seem sensible? Were
data collected in such a way that measurement errors
were minimized?
3. Does the report offer evidence of the reliability of
measures? Does the evidence come from the
research sample itself, or is it based on other studies?
If the latter, is it reasonable to conclude that data
quality would be similar for the research sample as for
the reliability sample (e.g., are sample characteristics
similar)?
4. If reliability is reported, which estimation method was
used? Was this method appropriate? Should an
alternative or additional method of reliability
appraisal have been used? Is the reliability sufficiently
high?
5. Does the report offer evidence of the validity of the
measures? Does the evidence come from the
research sample itself, or is it based on other studies?
If the latter, is it reasonable to believe that data
quality would be similar for the research sample as for
the validity sample (e.g., are the sample
characteristics similar)?
6. If validity information is reported, which yalidity
approach was used? Was this method appropriate?
Does the validity of the instrument appear to be
adequate?
7. If there is no reliability or validity information, what
conclusion can you reach about the quality of the data
in the study?

[Date] 6

You might also like