You are on page 1of 6

My Review and Views on the Scale Construction Article

Topic of the Article: Scale Construction Notes

Jamie DeCoster
Department of Psychology
University of Alabama
348 Gordon Palmer Hall
Box 870348
Tuscaloosa, AL 35487-0348
Phone: (205) 348-4431
Fax: (205) 348-8648

Date: June 5, 2000

Presented to: Ms. Nadia Ayub

Course Name: Psychometric Testing

Presented by: Shazia Hashmi

Student ID: 11771

The Article talks about the following topics in detail and my review will follow the same pattern.

Sr. # Content Page #

1 Introduction 1

2 Creating Items 2

3 Data Collection and Entry 5

4 Validity and Reliability 7

5 Measuring More than One Variable with a Scale 13

6 The Role of Revision 15

7 Reporting a Scale 17
Chapter 1: Introduction:

Introduction is very apt – just how a scale construction introduction should be. It gives details of what the article is
going to discuss in each section.

The article talks about internal consistency of the scale as well as about the validity and reliability – the 2 most
important and irreplaceable aspects of a scale construction.

The items within a scale should be typically interchangeable especially when they are trying to gauge one single
construct and theoretical variable.

Good scales possess both validity and reliability.

Sometimes a single questionnaire contains items from several different scales mixed together. In this case items may
not all be interchangeable - items should, however, be interchangeable within each subscale.

Chapter 2: Creating Items

This chapter provides a set of guidelines for writing good scale items which are very simply and easily defined. Even
a lay man can understand the language used to describe points to follow while writing items. For example:

o Items should be simple and straightforward

o Design items so that they can be answered with very little instruction
o Always avoid double-barreled questions
o Avoid non-monotonic questions
o Make the questions clear for same interpretations from different respondents
o Avoid using any vague words or phrases in the items
o Use of language and technical terms and jargons should be according to the level of the target audience the
scale is constructed for.
o Avoid biased language and leading questions
o Questions should be unbiased and neutral
o Avoid Emphasized text (underlined, italicized, or boldfaced) – this could influence respondents’ answers
o Try reverse coding a number of your items. To reverse code an item you just reword it however must not be
o Use a common structure because the purpose of a scale is to use a number of items to measure a single

The article enlightens us that almost all Psychological scales use items in a Likert scale format. The important
characteristic of a Likert scale is that it has respondents answer questions by picking a response on a numerical
continuum. The following is an example of a Likert scale question.

How much do you like playing Monopoly?

1 2 3 4 5 6 7

not at all somewhat very much

Research has shown that Likert scales with seven response options are more reliable than equivalent items with
greater or fewer options.

This has given me a good understanding and I will also use 7-ratings responses in my scale.

The last important point mentioned in the article is that the items should each be completely self-contained so that the
actual order of presentation of the items even if randomized, produces the same responses.
Chapter 3: Data Collection and Entry

This chapter talks about administering the scale in similar conditions with all the participants.
It also emphasized that the order of the items should be independent of any influences and we should also check the
validity by comparing results with the outside criterion by collecting responses in a single experimental session.

The chapter also talks about making efficient use of research participants by conducting other experiments in the
same session that they complete the scale however we should always consider any possible effects these other
tasks might have on the way respondents fill out the scale.

The chapter says that for entering the data perhaps the best strategy to remove data entry errors is to have the
questionnaires entered into the computer twice, in two different ‾les. This is referred to as double entry (since it is
highly unlikely that the two ‾les would contain the same typographical error on the same item)

The chapter also talks about using different computer software. The three most common options are to use either a
word processor (like WordPerfect or Microsoft Word), a spreadsheet (like Lotus 123 or Microsoft Excel), or a
database (like Paradox or Microsoft Access).

The best option is probably to use a spreadsheet program.

Chapter 4: Validity & Reliability

The article very correctly describes both the important aspects of scale construction i.e. Validity and Reliability:
Validity and reliability are independent of each other. Validity is often thought of as the “accuracy" of the scale while
reliability is its “precision." Scales that lack validity have systematic biases to them, while those that lack reliability
have large random errors associated with their measurement.

Validity: The article talks about different types of validity for example it says that Cook and Campbell (1979) provide
four major divisions of experimental validity: Statistical conclusion validity, internal validity, construct validity, and
external validity. However at the same time the article points out Construct Validity as the most important one - The
validity that we are most concerned about when creating a scale is construct validity which is the extent to which the
measurements taken in a study properly represent the underlying theoretical constructs.

When we attempt to validate a scale we try to demonstrate that our theoretical interpretation of the responses to the
scale is correct. Validity therefore measures the match between a variable representing
a “true" measure of the construct and the scale responses.
A scale by itself is therefore neither valid nor invalid. The question of validity comes in only when we attempt to relate
the scale to a particular theoretical construct. A scale could be valid for one purpose but invalid for another.

The article also talks about the criterion validity – again a way to authenticate validity of a scale; no other
demonstration can validate your scale if it is not related to the criterion. In this case you can demonstrate that your
scale has criterion validity by showing that the scale is related to the correct measure (the criterion). When such a
criterion exists the only thing that matters regarding validity is the relationship between your scale and the criterion.
Demonstrating that the two are related is sufficient to conclude that the scale is valid.

One method is to argue that the scale possesses face validity. A scale has face validity if the items composing the
scale are logically related to the underlying construct. In essence, face validity asks whether the scale “looks"
appropriate. Face validity is usually a strictly qualitative judgment and is most convincing if supported with other more
objective data.

For scales that lack objective criteria it is probably most important to demonstrate that the scale has convergent
validity. To do this you show that the responses to your scale are related to other measurements that are supposed to
be affected by the same variable.

When assessing convergent validity it is often useful to simultaneously assess divergent validity. To demonstrate
divergent validity you show that your scale is not related to measurements that are supposed to represent different

Although validity and reliability are defined to be independent concepts, they are related in an important way. It is very
difficult to determine the validity of a highly unreliable scale. Establishing criterion, convergent, and divergent validity
all involve showing statistically significant relationships between the scale and other measures. If your scale is
unreliable such relationships will be hard to demonstrate. You may have a valid scale in those cases, but you will not
be able to show it.

Keep in mind that validity measures how successfully your scale matches onto the theory that you propose. If you fail
to validate the scale, it does not necessarily mean that there is something wrong with the scale. It can also indicate
that there is a problem with the theory upon which you are basing your validation.

Reliability: For Reliability the article refers to some equations – something which we haven’t read in our course so
far. The article says that determining the reliability of a scale is somewhat different from determining the validity of a
scale. Unlike validity, reliability is a precisely defined mathematical concept. Reliability is measured on a scale of 0 to
1, where higher values represent greater reliability.

A particular measurement taken with the scale is therefore composed of two factors: the theoretical “true score" of the
scale and the variation caused by random factors. This relationship is summarized in the equation: M = T + e; (4.1)

Where M is the actual scale measurement, T is the theoretical true score, and e is random error. The random error
could be either a positive or negative value.

The reliability coefficient (RHO) is also defined through formula but it also says that this formula can not be used to
calculate reliability.

Then the article gives different methods to calculate reliability: One way to calculate reliability is to correlate the
scores on parallel measurements of the scale. There are a number of different ways to measure reliability using
parallel measurements. Some examples are:

o Test-Retest method.
o Alternate Forms method.
o Split-Halves method.
o Another way to calculate reliability is to use a measure of internal consistency. The most popular of these
reliability estimates is Cronbach's alpha.

The reliability of a scale is heavily dependent on the number of items composing the scale. Even using items with
poor internal consistency you can get a reliable scale if your scale is long enough. One consequence of this is that
adding extra items to a scale will generally increase the scale's reliability, even if the new items are not particularly

The article also talks about different ways how reliability of a scale can be enhanced. For example it says that
reliability estimate increases as you test your scale using more respondents. You should therefore use at least 20
participants when calculating reliability. Obtaining more data won't hurt, but will not strongly impact the stability of
your findings.

Reliability has specific implications for the utility of your scale. The variability in your measure will prevent anything
higher. Therefore, the higher the reliability of your scale, the easier it is to obtain significant findings.

Finally the article tells us very aptly that It should be noted that low reliability does not call results obtained using a
scale into question. Low reliability only hurts your chances of finding significant results. It cannot cause you to obtain
false significance so using a scale with low reliability is analogous to conducting an experiment with a low number of

As we have not discussed this topic “Using Statistical Software” in the course so I will not be able to write my reviews
on this.

Chapter 5: Measuring More than One Variable with a Scale

I completely agree to the article here. It says that there are two basic situations where we might calculate more than
one variable from the items in the scale.
Subscales: The first situation is when all of our items are related to the same abstract variable, but we expect that
the answers to our items will be organized in clusters. In this case we are actually testing a single scale possessing
different subscales.
Statistically, all the items in our scale should be correlated, and all the items in each subscales are interchangeable.
To calculating an overall score for our scale we must also calculate scores for each of our subscales. To do this we
simply take the average of the items belonging to each subscale.

Concurrent Scales: In the second situation the items in are actually not all related to the same underlying
abstract variable. Rather, different groups of items are related to different variables. In this case we test concurrent
scales. Statistically, items in the same group must be highly correlated but will not necessarily correlate with items in
other groups.
When testing concurrent scales we do not calculate a score averaging over all the items neither do we expect that the
average of all of scales has any meaning. We simply calculate separate scores for each scale in the survey, as if it
were administered by itself.

Chapter 6: The Role of Revision

The chapter tells us that items should be heavily edited to provide the greatest validity and reliability. A scale will
typically be tested and revised three or four times prior to professional presentation.

The topic also discriminates between “pretests" and “tests" of a scale.

o Pretests are when we collect data purely so that we can know how to revise our scale.
o Tests are when we collect data that we expect will be used to establish the validity and the reliability of the
scale in some professional forum.

Finally the topic describes the typical procedure for developing a scale which I agree to. It tells us that we do not have
to be as careful with the experimental setup when administering pretests as when administering tests. We should
bear in mind that the purpose of experimental methodology is to improve the quality of inferences.

Towards the end of this topic the paragraph is accurately describing that if the scale we are constructing contains
subscales then we should use more flexible criteria when deciding whether an item should be included or dropped
from the scale.
It is possible that an item might contribute strongly to its subscale but only weakly to the overall scale. We should
always make an explicit decision as to which is more important, or if they are equally important. We must always
include the same items in our tests of the scale and subscale reliabilities.

Chapter 7: Reporting a Scale

The chapter very briefly describes the procedure to report a Scale:

o Always start report with a discussion of the theoretical variables that motivate responses to the scale.
o If the scale is the sole object of presentation, explain why the scale is needed and recommend ways that it
can be used.
o Describe how the initial items were constructed, and why together they should represent the underlying
theoretical construct.
o Report the number of items in the final version of the scale, as well as two or three example items so that
the reader has an idea of what the exact content is like.
o Present the statistical analyses to demonstrate the scale's validity and reliability.
o It is best to present the reliability scores first, since a low reliability can impact the interpretation of validity
o Only present the validity and reliability analyses from final test of the scale which means to only report a
small portion of the actual analyses performed during the construction of the scale.
o The validity and reliability analyses should always be based on the full set of items administered at the time
of the final test.
o At the end, include an appendix listing all of the items in the scale and any special procedures that might be
required for scale administration and scoring.
o Indicate what items (if any) are reverse coded. The item-total score correlations may also be reported in the
o Do not allow statistical calculations to dominate your discussion of the scale.

The article tells us to remember that the usefulness of a scale is as much dependent on the theoretical basis behind
its development as it is on the specific procedures and analyses performed in its construction. This should reflect in
the report.

In the end, the article provides references used to write this very valuable and informative article on test construction.
This article is helping me in designing my scale which is my term project as well.