You are on page 1of 21

TAHAPAN

KONSEPTUAL
PENYUSUNAN ALAT UKUR
(KUESIONER)
Desy Indra Yani
DEVELOPING MEASUREMENT
INSTRUMENTS (QUESTIONNAIRES)
Preparation:
•  Clearly define the construct of the
measurement instrument;
•  Conduct a pilot study to test the
measurement instrument;
•  Determine the validity of the measurement
instrument
DEVELOPING MEASUREMENT
INSTRUMENTS (QUESTIONNAIRES)
Documentation
•  Clear description of the construct
•  Old and new versions of items
•  Formulation why certain scorings were chosen
•  Results of pilot testing
•  Final version of the measurement instrument
DEVELOPING MEASUREMENT
INSTRUMENTS (QUESTIONNAIRES)
•  When there is no instrument available that measures the
construct of your interest, you may decide to develop a
measurement instrument yourself.
The Steps:
•  Definition and elaboration of the construct intended to be
measured
•  Choice of measurement method (e.g. questionnaire/
physical test)
•  Selecting and formulating items
•  Scoring issues
•  Pilot study
•  Field-testing
Other concepts
•  Tips for Developing and Testing Questionnaires/
Instruments

•  Review of the Steps for Development of Quantitative


Research Tools

•  Development and Validation of Instruments to


Measure Learning of Expert-Like Thinking

•  Stages of Psychometric Measure Development: The


example of the Generalized Expertise Measure (GEM)
Step 1: Definition and elaboration of
the construct intended to be
measured
•  The first step in instrument development is conceptualization, which
involves defining the construct and the variables to be measured.
•  Use
the International Classification of Functioning, Disability and Health (ICF)
(WHO, 2011) or the model by Wilson and Clearly (1995) as a framework
for your conceptual model.
•  When the construct is not directly observable (latent variable), the best
choice is to develop a multi-item instrument (De Vet et al. 2011).
•  When the observable items are consequences of (reflecting) the
construct, this is called a reflective model.
•  When the observable items are determinants of the construct, this is
called a formative model.
•  When you are interested in a multidimensional construct, each dimension
and its relation to the other dimensions should be described.
Step 2: Choice of measurement method
(e.g. questionnaire/physical test)
•  Some constructs form an indissoluble alliance with a
measurement instrument, e.g. body temperature is
measured with a thermometer; and a sphygmomanometer
is usually used to assess blood pressure in clinical practice.
•  The options are therefore limited in these cases, but in
other situations more options exist.
•  For example, physical functioning can be measured with a
performance test, observations, or with an interview or
self-report questionnaire.
•  With a performance test for physical functioning,
information is obtained about what a person can do, while
by interview or self-report questionnaire information is
obtained about what a person perceives he/she can do.
Step 3: Selecting and formulating
items
•  To get input for formulating items for a multi-
item questionnaire you could examine similar
existing instruments from the literature that
measure a similar construct, e.g. for different
target population, and talk to experts (both
clinicians and patients) using in-depth interview
techniques.
•  You should pay careful attention to the
formulation of response options, instructions,
and choosing an appropriate recall period (Van
den Brink & Mellenbergh, 1998).
Step 4: Scoring issues
•  Many multi-item questionnaires contain 5-point
item scales, and therefore are ordinal scales.
•  Often a total score of the instrument is
considered to be an interval scale, which makes
the instrument suitable for more statistical
analyses. Several questions are important to
answer: How can you calculate (sub)scores? Add
the items, use the mean score of each item, or
calculate Z-scores.
Step 4: Scoring issues
•  Are all items equally important or will you use
(implicit) weights? Note that when an instrument
has 3 subscales, with 5, 7, and 10 items
respectively, the total score calculated as the
mean of the mean score of each subscale differs
from the total score calculated as the mean of all
items.
•  How will you deal with missing values? In case of
many missings (>5-10%) consider multiple
imputation (Eekhout et al., 2014).
Step 5: Pilot study
•  Be aware that the first version of the
instrument you develop will (probably) not be
the final version.
•  It is sensible to (regularly) test your
instrument in small groups of people.
•  A pilot test is intended to test the
comprehensibility, relevance, and
acceptability and feasibility of your
measurement instrument.
Step 6: Field-testing
•  See guideline Evaluation of measurement
properties.
•  The consensus-based standards for the
selection of health status measurement
instruments (COSMIN) taxonomy
•  According to the taxonomy, the measurement
properties cover three quality domains:
reliability, validity, and responsiveness
Reliability
•  Reliability is the extent to which scores for
individuals who have not changed are the same
for repeated measurement under several
conditions
–  [eg, using different sets of items from the same
questionnaire (internal consistency);
–  over time (test-retest);
–  by different persons on the same occasion (inter-
rater); or
–  by the same persons on different occasions (intra-
rater.
Reliability
The reliability domain contains the following measurement properties:
•  (i) internal consistency: the degree of interrelatedness among the
items (expressed by Cronbach’s α or Kuder-Richardson Formula
(KR-20) (21, 23);
–  when internal consistency is relevant, factor analysis or principal component analysis
should be applied to determine whether the items form one or more than one scale
•  (ii) reliability: the proportion of the total variance in the
measurements that reflects the “true” differences among individuals,
including test-retest, inter- and intra-observer reliability [this aspect
is reflected by the intraclass correlation coefficient (ICC) or Cohen’s κ]
(23, 25);
•  (iii) measurement error: the systematic and random error of an
individual’s score that is not attributed to true changes in the
construct to be measured, expressed by the standard error of
measurement (SEM). The SEM can be converted into the smallest
detectable change (SDC)
Validity
•  Validity is described as the degree to which
an instrument measures the construct(s) it
purports to measure
•  The validity domain contains three
measurement properties:
–  Content validity
–  Construct validity
–  Criterion validity
Content Validity
•  The degree to which the content of the
instrument is an adequate representative of
the construct to be measured (including face
validity).
•  Content validity is an assessment of whether
all items are relevant for the construct, aim
and target population,and if no important
items are missing (preferably by the target
group)
Construct Validity
•  Construct validity is divided into three aspects:
•  (a) structural validity: the degree the instrument scores are an adequate
reflection of the construct’s dimensionality.
–  Factor analysis should be performed to confirm the number of subscales present;
•  (b) hypotheses testing: the degree to which the instrument scores are
consistent with hypotheses based on the assumption that the instrument
validly measures the construct.
–  Many different hypotheses can be formulated and tested (eg, the extent scores on a
particular instrument relate to scores on other instruments or expected differences in
scores between “known” groups.
–  It is important in hypotheses testing to state hypotheses a priori, clearly indicating both
direction and magnitude of the correlation or difference.
–  For example, higher correlations are expected with similar constructs and variables, and
lower correlations with dissimilar constructs and variables;
•  (c) cross-cultural validity: the degree to which the performance of the
items on a translated or culturally adapted instrument are an adequate
reflection of the performance of the items of the original version of the
instrument
Criterion Validity
•  Criterion validity: the degree to which the scores of
an instrument are an adequate reflection of a “gold
standard”.
•  if no real gold standard is available for measuring
health-related work functioning, we will not evaluate
criterion validity.
Responsiveness
•  Responsiveness is described as the ability of an
instrument to detect change over time in the construct
to be measured.
•  The responsiveness domain is considered an aspect of
validity in a longitudinal context
•  Therefore, appropriate measures to evaluate
responsiveness are the same as those for hypotheses
testing and criterion validity.
•  The only difference here is that hypotheses should
focus on the change score of an instrument.
•  Another approach is to determine the area under the
receiver operator characteristic curve (AUC).
References
•  De Vet et al. 2011 Measurement in Medicine.
•  Streiner D.L., Noorman G. 2008 Health measurement
scales: a practical guide to their development and use.
4th Oxford University Press.
•  Van den Brink W.P., Mellenbergh G.J. 1998 Testleer en
testconstructie. Boom, Amsterdam.
•  Eekhout I., de Vet H.C., Twisk J.W.R., Brand J.P., de
Boer M.R., Heymans M.W. Missing data in a multi-
item instrument were best handled by multiple
imputation at the item score level. J Clin Epidem.
2014;67(3):335-342.

You might also like