You are on page 1of 14

Developing inventories for satisfaction and Likert scales in a service

environment

by

Contact Author: Karin Braunsberger, Ph.D.


Associate Professor of Marketing
University of South Florida St. Petersburg
College of Business
140 Seventh Avenue South
Bayboro Station 306
St. Petersburg, FL 33701-5016
Telephone: (727) 873-4082
Fax: (727) 873-4192
E-mail: braunsbe@stpt.usf.edu

Roger Gates
DSS Research
6750 Locke Avenue
Ft. Worth, TX 76116
Telephone: (817) 665-7000
Fax: (817) 665-7001
E-mail: rgates@dssresearch.com

1
Abstract

Purpose: To produce up-to-date inventories for satisfaction and Likert scales that contain

commonly used scale point descriptors and their respective mean scale values and standard

deviations.

Methodology/Approach: All data were collected online using the SSI Survey Spot Panel. The

panel is national (U.S.) in scope and was screened to include individuals 21-65 years of age. A

random sample was drawn. Thirty-nine satisfaction items and 19 agreement items were tested,

and the mean value and the standard deviation were calculated for each of these descriptors.

Findings: Even though only six of the items that had been tested by Jones and Thurstone (1955)

were included in the list of satisfaction scale descriptors, the semantic meanings of those six have

changed very little over the years.

Research limitations/implications: One limitation of the current study might be the chosen

service context, since scale point descriptor inventories developed within the context of health

insurance might not be valid in other service contexts.

Practical Implications: Since the present study focuses on two types of scales that are frequently

used in service environments, namely Likert and satisfaction scales, the major contribution of

this study is to provide researchers and managers in services marketing with quantitative

measurement of the meanings of commonly used scale point descriptors, which as pointed out by

Myers and Warner (1968) will make possible the development of equal interval scales and thus

aid analyses of data sets. It will thus help service marketers to develop questionnaires that more

accurately reflect actual consumer satisfaction and opinions.

Keywords: Satisfaction scale inventory, Likert scale inventory.

Paper Type: Research paper.

2
Developing inventories for satisfaction and Likert scales in a service
environment

1. Introduction

Researchers and managers in services marketing are often concerned with assessing

customer satisfaction and opinions (Bearden, Malhotra and Uscátequi, 1998). When developing

questions to assess satisfaction it has been strongly suggested that the end points of preference

response scales should be words or phrases that denote bi-polar extremes, and that all anchoring

points should be suitably spaced along the semantic continuum connecting the end points (Jones

and Thurstone, 1955). Jones and Thurstone (1955) further express the need to investigate the

semantic properties of commonly used scale point descriptors to make sure that they possess the

above properties and also carry meaning that is as clear as possible to subjects that represent the

researcher’s population of interest. Further, knowing the exact scale value of each scale point

descriptor is of importance when constructing successive-interval type of scales. Consequently,

Jones and Thurstone (1955) examine the semantic meanings, to respondents, of 51 scale point

descriptors using 9-point scales and subsequently present the research community with a listing

of words and phrases that range from those expressing “greatest like” to those conveying the

“greatest dislike.” That is, the authors succeed in constructing a “continuum of meaning” that

ranges from the end points “best of all” to its bi-polar extreme “despise” (p.33), and further

provide future researchers with both the scale value and standard deviation of each of the tested

words and phrases.

Similarly, Myers and Warner (1968) argue that the construction of accurate and

meaningful scales requires that researchers comprehend the psychological meaning, to the

respondent, of scale point descriptors. These authors further assert that quantitative measurement

3
of the meanings of commonly used scale point descriptors would allow researchers to develop

equal interval scales that are desirable for subsequent statistical analyses of data sets.

Accordingly, Myers and Warner (1968) modify the technique introduced by Jones and Thurstone

(1955), investigate the psychological meaning of 50 commonly used scale point descriptors to

four different groups of respondents, and present the respective mean scale values and standard

deviations for all four groups of respondents. Even though the four subject groups are very

different from each other (i.e., housewives, business executives, undergraduate and graduate

business students), their mean scale values and standard deviations are very similar.

Similar studies have been conducted by Bartram and Yelding (1973), Vidali (1975),

Wildt and Mazis (1978), and the findings indicate that inventory scale values such as provided

by Jones and Thurstone (1955) and Myers and Warner (1968) “are surprisingly consistent among

very diverse groups of people,” “can be used with a high degree of confidence,” and are “likely

to provide psychological scales that are virtually equi-distant” (Vidali, 1975, p.25).

Considering, however, that languages change over time (Graddol, 2004; Yang, 2000),

and no recent inventories are available, the purpose of the present study is to produce a current

inventory containing commonly used scale point descriptors and their respective mean scale

values and standard deviations. Since the present study focuses on two types of scales that are

frequently used in service environments, namely Likert and satisfaction scales, the major

contribution of this study is to provide researchers with quantitative measurement of the

meanings of commonly used scale point descriptors, which as pointed out by Myers and Warner

(1968) will make possible the development of equal interval scales and thus aid statistical

analyses of data sets.

2. Methods

4
The goal of the present research was to develop inventories for two types of frequently

used response scales, namely satisfaction and Likert scales. A review of the literature focused on

locating commonly used scale point descriptors for both types of scales (see Tables 1 and 2).

Given that that there is considerable overlap of scale point descriptors, a final number of 39

satisfaction items and 19 agreement items was chosen and tested.

[INSERT TABLES 1 AND 2 HERE]

The data collection followed the method first outlined by Jones and Thurstone (1955).

Accordingly, all satisfaction scale point descriptors were treated as items on nine-point scales

(from -4 to +4). Each scale was anchored to the left by “greatest dislike,” its midpoint by

“neither like nor dislike,” and to the right by “greatest like” (see Table 3 for the instructions

given to respondents). The procedure for the Likert scale point descriptors was similar, except

that the left-hand anchor read “greatest disagreement,” the scale midpoint “neither agree nor

disagree,” and the right-hand anchor “greatest agreement.” For each of the scale point

descriptors, respondents were asked to place a check mark in the space on the nine-point scale

that best described the meaning of the respective scale point descriptor.

[INSERT TABLE 3 HERE]

All data were collected online, in the United States. For that purpose, the SSI Survey Spot

Panel was used. The panel is national in scope and was screened to include individuals 21-65

years of age. A random sample was drawn, and of those invited to participate by panel, 65%

qualified to participate in the survey. That is, because the present study focuses on creating an

inventory of satisfaction and agreement measures in the health insurance industry, we recruited

only subjects who actually had experience with such insurance, i.e., had group health insurance

through an employer [self or spouse]. Considering that 65% of the U.S. population has health

5
insurance, our samples are therefore representative of the population of interest. Further, only the

household decision-maker or co-decision maker was qualified to participate. The response rate of

those who qualified was 62%. All subjects were asked to rate each of the 39 satisfaction and 19

agreement items. The satisfaction scale point descriptors were rated first, followed by the

agreement scale point descriptors. The order of the items within each of the categories (i.e.,

satisfaction and agreement descriptors) was random. Following the procedure outlined by Jones

and Thurstone (1955) and defended by Myers and Warner (1968), all subjects (N = 272) were

shown all scale-point descriptors within each category at once.

3. Data analysis and results

The mean value and the standard deviation were calculated for each of the scale point

descriptors (Tables 4 and 5). Interestingly, even though only six of the items that had been tested

by Jones and Thurstone (1955) were included in the list of satisfaction scale descriptors, the

semantic meanings of those six have changed very little over the years (see Table 4).

[INSERT TABLES 4 AND 5 HERE]

4. Discussion and conclusion

The current study examines the semantic properties of commonly used scale point

descriptors for both satisfaction and agreement scales, and subsequently provides inventories of

mean values and standard deviations for these scale point descriptors to be used by researchers.

Knowing a scale point descriptor’s mean value makes it possible to construct successive interval

and/or equal interval scales that support meaningful statistical analyses and interpretation.

Although the current study manages to overcome some of the limitations pointed out by

Myers and Warner (1968) – namely the use of relatively small samples that are not national in

scope and are not random in kind – one limitation of the current study that future research should

6
investigate is the limitation that might arise due to the chosen product context. It is conceivable

that scale point descriptor inventories developed within the context of health insurance might not

be valid in other product contexts. However, even as we point to this limitation, Mittelstaedt

(1971, p. 236), who compares three different studies that focused on building scale point

descriptor inventories, helps us argue that the product context used to develop an inventory is not

very likely to impact the usefulness of that inventory in other product contexts: “In spite of

differences in time, place, subjects, instruments, instructions, referents and the contextual

differences which may arise from using widely different arrays of stimuli, the correspondence

among the scale values of the three studies seems remarkable.”

7
TABLE 1
Satisfaction Scales
Crosby and Stephens, 1987, Journal of Displeased
Marketing Research (cited by Wirtz and Pleased
Lee, 2003, Journal of Service Research)
Kolodinsky, 1999, Journal of Consumer Very dissatisfied
Affairs Dissatisfied
Neutral
Satisfied
Very satisfied
Peterson and Wilson, 1992, Journal of the Very satisfied
Academy of Marketing Science Somewhat satisfied
Somewhat dissatisfied
Very dissatisfied
Uncertain
Peterson and Wilson, 1992, Journal of Very satisfied
Marketing Research Somewhat satisfied
Unsatisfied
Very unsatisfied
Peterson and Wilson, 1992, Journal of the Completely satisfied
Academy of Marketing Science Very satisfied
Fairly satisfied
Somewhat dissatisfied
Very dissatisfied
Preisser, 2002, Health Services and Excellent
Outcomes Research Methodology Very good
Good
Fair
Poor
SIP Servizio Opinioni, 1989, as cited in Very satisfied
Peterson and Wilson, 1992, Journal of the Quite satisfied
Academy of Marketing Science Not very satisfied
Not at all satisfied
Weinstein, 1989, American Banker Very satisfied
Consumer Survey Somewhat satisfied
Completely unsatisfied
Westbrook, 1980, Journal of Marketing (T- Delighted
D Scale) Pleased
Mostly satisfied
Mixed (about equally satisfied and dissatisfied)
Mostly dissatisfied
Unhappy
Terrible
For reasons of completion and exploratory Extremely satisfied, acceptable, slightly
purposes, the following scale point satisfied, OK, neither satisfied nor dissatisfied,
descriptors were added slightly dissatisfied, fairly dissatisfied,
completely dissatisfied, extremely dissatisfied

8
TABLE 2
Likert Scales
Albaum, 1997, Market Research Society Strongly agree
Agree
Neither agree nor disagree
Disagree
Strongly disagree
Hair, Bush and Ortinau, 2003, Marketing Definitely agree
Research Generally agree
Slightly agree
Slightly disagree
Generally disagree
Definitely disagree
Jacoby and Matell, 1971, Journal of Agree
Marketing Research Uncertain
Disagree
McDaniel and Gates, 2002, Marketing Strongly agree
Research Somewhat agree
Neutral
Somewhat disagree
Strongly disagree
Menezes and Elbert, 1979, Journal of Strongly agree
Marketing Research Generally agree
Moderately agree
Moderately disagree
Generally disagree
Strongly disagree
For reasons of completion and exploratory Completely agree
purposes, the following two scale point Completely disagree
descriptors were added

9
TABLE 3
Instructions to Respondents

WORD MEANING TEST

In this test are words and phrases that people might use to show like or dislike for
health insurance plans. For each word or phrase make a check mark to show what the word or
phrase means to you. Look at the examples.

Example I

Suppose you heard a person say that he/she “barely liked” his/her health insurance
plan. You would probably decide that he/she likes it only a little. To show the meaning of the
phrase “barely like,” you would probably check under +1 on the scale below.

Greatest Neither Greatest


Dislike Like Nor Like
Dislike

-4 -3 -2 -1 0 +1 +2 +3 +4
Barely like

Example II

If you heard someone say he had the “greatest possible dislike” for a certain health
insurance plan, you would probably check under -4, as shown on the scale below.

Greatest Neither Greatest


Dislike Like Nor Like
Dislike

-4 -3 -2 -1 0 +1 +2 +3 +4
Greatest possible
dislike √

For each phrase on the following pages, check along the scale to show how much like
or dislike the phrase means.

10
TABLE 4
Satisfaction Items
Item Valid N Means Std Dev
Excellent 272 3.74 0.94
Completely satisfied 272 3.58 1.24
Extremely satisfied 272 3.33 1.91
Very satisfied 272 3.29 1.21
Delighted 272 3.11 1.22
Very good 272 2.61 0.91
Quite satisfied 272 2.67 1.17
Mostly satisfied 272 2.39 1.14
Pleased 272 2.04 1.17
Satisfied 272 1.88 1.14
Good 272 1.81 0.96
Fairly satisfied 272 1.45 1.01
Somewhat satisfied 272 1.32 0.86
Acceptable 272 1.22 0.82
Slightly satisfied 272 0.94 0.91
OK 272 0.69 0.85
Fair 272 0.47 0.92
Neutral 272 0.03 0.36
Mixed (about equally 272 0.00 0.43
satisfied and dissatisfied)
Neither satisfied nor 272 -0.02 0.36
dissatisfied
Uncertain 272 -0.07 0.50
Slightly dissatisfied 272 -1.13 0.76
Somewhat dissatisfied 272 -1.42 0.81
Fairly dissatisfied 272 -1.66 0.98
Not very satisfied 272 -1.51 1.32
Displeased 272 -1.85 1.10
Unhappy 272 -1.87 1.10
Dissatisfied 272 -1.85 1.25
Poor 272 -1.92 1.29
Unsatisfied 272 -2.14 1.21
Mostly dissatisfied 272 -2.78 0.92
Quite dissatisfied 272 -2.65 1.73
Very unsatisfied 272 -3.15 1.45
Not at all satisfied 272 -3.25 1.43
Very dissatisfied 272 -3.08 1.57
Terrible 272 -3.36 1.14
Completely dissatisfied 272 -3.22 2.14
Completely unsatisfied 272 -3.60 1.45
Extremely dissatisfied 272 -3.71 1.02
* Statistically significant at the .05 level
Jones and Thurstone (1955) inventoried the scale point descriptors excellent (mean = 3.71, std
dev = 1.01); very good (mean = 2.56, std dev = .87); good (mean = 1.91, std dev = .76); fair
(mean = .78, std dev = .47); neutral (mean = .02, std dev = .18); poor (mean = -1.55, std dev =
.87)

11
TABLE 5
Likert Items
Item Valid N Means Std Dev
Completely agree 272 3.63 0.91
Definitely agree 272 3.32 1.13
Strongly agree 272 3.05 1.60
Agree 272 1.92 0.88
Generally agree 272 1.67 0.91
Moderately agree 272 1.62 1.03
Somewhat agree 272 1.23 0.65
Slightly agree 272 0.96 0.56
Neutral 272 0.01 0.25
Neither agree nor 272 0.00 0.32
disagree
Uncertain 272 -0.08 0.44
Slightly disagree 272 -0.95 0.84
Somewhat disagree 272 -1.37 0.80
Moderately disagree 272 -1.65 1.12
Disagree 272 -1.82 1.01
Generally disagree 272 -1.74 1.09
Strongly disagree 272 -3.21 1.43
Definitely disagree 272 -3.45 1.18
Completely disagree 272 -3.69 1.18

* Statistically significant at the .05 level

12
REFERENCES

Albaum, G. (1997) “The Likert scale revisited: an alternate version,” Market Research Society,
Vol 39 No 2, pp. 331-48.

Bearden, W. O., Malhotra, M. K., Uscátequi, K. H. (1998) “Customer contact and the evaluation
of service experiences: propositions and implications for the design of services,”
Psychology and Marketing, Vol 15 No 8, pp. 793-809.

Bertram, P., Yelding, D. (1973) “The development of an empirical method of selecting phrases
used in verbal rating scales,” Journal of the Market Research Society, Vol 15, pp. 151-56.

Crosby, L. A., Stephens, N. (1987) “Effects of relationship marketing on satisfaction, retention,


and prices in the life insurance industry,” Journal of Marketing Research, Vol 24
(November), pp. 404-11.

Graddol, D. (2004) “The future of language,” Science, Vol. 303 (February 27), pp. 1329-31.

Hair, J. F., Bush, R. P., Ortinau, D. J. (2003) Marketing Research: Within a Changing
Information Environment, 2nd edition. McGraw-Hill, New York.

Jacoby, J., Matell, M. S. (1971) “Three-point Likert scales are good enough,” Journal of
Marketing Research, Vol 8 (November), pp. 495-500.

Jones, L. V., Thurstone, L. L. (1955) “The psychophysics of semantics: an experimental


investigation,” The Journal of Applied Psychology, Vol 39 No 1, pp. 31-6.

Kolodinsky, J. (1999), “Consumer satisfaction with a managed health care plan,” The Journal of
Consumer Affairs, Vol 33 No 2, pp. 223-36.

McDaniel, C., Gates, R. (2005) Marketing Research, 6th edition. Wiley, Hoboken, NJ.

Menezes, D., Elbert, N. F. (1979) “Alternative semantic scaling formats for measuring store
image: an evaluation,” Journal of Marketing Research, Vol 16 (February), pp. 80-7.

Mittelstaedt, R. A. (1971) “Semantic properties of selected evaluative adjectives: other


evidence,” Journal of Marketing Research, Vol 8 (May), pp. 236-37.

Myers, J. H., Gregory Warner, W.G. (1968) “Semantic properties of selected evaluation
adjectives,” Journal of Marketing Research, Vol 5 (November), pp. 409-12.

Peterson, R. A., Wilson, W. R. (1992) “Measuring customer satisfaction: fact and artifact,”
Journal of the Academy of Marketing Science, Vol 20 No 1, pp. 61-71.

Preisser, J. S. (2002) “Quasi-likelihood analysis of patient satisfaction with medical care,” Health
Services & Outcomes Research Methodology, Vol 3 No 4, pp. 233-45.

13
Vidali, J. J.(1975) “Context effects on scaled evaluatory adjective meaning,” Journal of the
Market Research Society, Vol 17 No 1, pp. 21-5.

Weinstein, M. (1989) “Consumers still like service, but their enthusiasm erodes,” American
Banker Consumer Survey, Vol 6, pp. 18, 20.

Westbrook, R. A. (1980) “A rating scale for measuring product/service satisfaction,” Journal of


Marketing, Vol 44 (Fall), pp. 68-72.

Wildt, A. R., Mazis, M. B. (1978) “Determinants of scale response: label versus position,”
Journal of Marketing Research, Vol 15 (May), pp. 261-7.

Wirtz, J., Lee, M. C. (2003) “An examination of the quality and context-specific applicability of
commonly used customer satisfaction measures,” Journal of Service Research, Vol 5 No
4, pp. 345-55.

Yang, C. D. (2000) “Internal and external forces in language change,” Language Variation and
Change, Vol 12 No 3, pp. 231-50.

14

You might also like