Professional Documents
Culture Documents
net/publication/220260414
CITATIONS READS
484 1,053
3 authors, including:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Leon A. Kappelman on 17 June 2014.
Measuring Abstract
tionalization of a service quality construct that hopes to receive (e.g., Parasuraman et al.
is theoretically grounded in a discrepancy or 1985,1988, 1991; ZeithamI et al. 1993). These
gap model. In conceptualizing service quality, multipie definitions and corresponding opera-
Parasuraman et al. 1985, 1988, 1991, 1994b) tionalizations of "expectations" in the
use the "service quality model," which posits SERVQUAL literature result in a concept that
that one's perception of service quality is the is loosely defined and open to multiple inter-
result of an evaluation process whereby "the pretations (Teas). Yet even when concise defi-
customer compares . . . the perceived service nitions are provided, various interpretations of
against the expected semce" (Gronroos 1984, the expectations construct can result in poten-
p. 37). tially serious measurement validity problems.
please the customer (e.g., friendiiness of a cate that perceptions are influenced by both
salesperson in a retail store)" (p. 116). This will and should expectations, but in opposite
interpretation of expectations results in an directions. Increasing w///expectations leads to
inverse of the relationship between the a higher perception of service quality whereas
SERVQUAL score, calculated as perception an increasing expectation of what should be
minus expectation (P - E), and actual service delivered during a service encounter will actu-
quaiity for all values when perception scores ally decrease the ultimate perception of the
are greater than expectation scores (i.e., P > quality of the service provided (Boulding et al.
E). This interpretation is consistent with the 1993). Not only do these findings faii to sup-
finding that user satisfaction scores were high- port the gap model of service quality, but these
est when actual user participation was in con- results also demonstrate the wildly varying
gruence with the user's need for participation, impact of different interpretations of the expec-
rather than merely maximized (Doll and tations construct.
Torkzadeh 1989).
Different methods to operationalize "expecta-
These various interpretations of the "expecta- tions" in developing their IS versions of
tion" construct lead to a number of measure- SERVQUAL have been used (Pitt el al. 1995;
ment problems. The findings suggest that a Kettinger and Lee 1994). Qne study used the
considerable portion of the variance in the instructions to the survey to urge the respon-
SERVQUAL instrument is the result of mea- dents to "think about the kind of IS unit that
surement error induced by respondent's vary- would deliver excellent quality of service" (Pitt
ing interpretations of the "expectations" con- et al. 1995). The items then take a form such
struct (Teas 1993). as: El They wiii have up-to-date hardware and
software. Whereas the second study (Kettinger
Three separate types of expectations have and Lee 1994) used the form: E1 Excellent
been described (Boulding et al. 1993): (1) the college computing services will have up-to-
will expectation, what the customer believes date equipment
will happen in their next service encounter; (2)
the should expectation, what the customer Recall that some respondents to SERVQUAL
believes should happen in the next service were found to interpret expectations as fore-
encounter; and (3) an ideal expectation, what casts or predictions (Teas 1993). This interpre-
a customer wants in an ideal sense. The ideai tation corresponds closely with the will expec-
interpretation of expectation is often used in tation (Boulding et al. 1993). It is easy to see
the SERVQUAL literature (Boulding et al. how this interpretation might be formed espe-
1993). Boulding et al. (1993) differentiate cially with the "They will" phrasing (Pitt et al.
between shouid and ideai expectations by stat- 1995). Unfortunately, the impact of the will
ing that what customers think shouid happen expectation on perceptions of service quality is
may change as a result of what they have opposite from that intended by the
been told to expect by the service provider, as SERVQUAL authors and the (P-E) or gap
well as what the consumer views as reason- model of service quality (Boulding et al. 1993).
able and feasible based on what they have
been told and their experience with the firm or In summary, a review of the literature indicates
a competitor's service. In contrast, an ideal that respondents to SERVQUAL may have
expectation may "be unrelated to what is rea- numerous interpretations of the expectations
sonable/feasible and/or what the service construct and that these various interpretations
provider tells the customer to expect" have different and even opposite impacts on
(Boulding et al. 1993, p. 9). A series of experi- perceptions of service quality. Moreover, some
ments demonstrated results that were incom- of the findings demonstrate that expectations
patibie with the gap model of service quality influence only perceptions and that percep-
(Boulding et al. 1993). Instead, the results tions alone directly influence overall service
demonstrate that service quality is influenced quality (Boulding et al. 1993). These findings
only by perceptions. Moreover, the results indi- fail to support the (P-E) gap model of service
quality and indicate that the use of the expec- findings suggest that neither the UIS nor
tations construct as operationalized by SERVQUAL alone can capture all of the fac-
SERVQUAL-based instruments is problematic. tors which contribute to perceived service
quality in the IS domain. For example, items
contained in the UIS include the degree of
training provided to users by the IS staff, the
Applicability of SERVQUAL Across level of communication between the users and
Industries the IS staff, and the time required for new sys-
tems development and implementation, all of
Another often mentioned conceptual problem which possess strong face validity as determi-
with SERVQUAL concerns the applicability of nants of IS service quality. In addition,
a single instrument for measuring service qual- Kettinger and Lee dropped the entire tangibies
ity across different industries. Several dimension from their IS version of SERVQUAL
researchers have articulated their concerns on based on the results of confirmatory factor
this issue. A study of SERVQUAL across four analysis. These finding contradict the belief
different industries found it necessary to add that all dimensions of SERVQUAL are relevant
as many as 13 additional items to the instru- and that there are of no unique features of the
ment in order to adequately capture the ser- IS domain not included in the standard
vice quality construct in various settings, while SERVQUAL instrument (Pitt et al. 1995). It is
at the same time dropping as many as 14 difficult to argue that items concerning the
items from the original instrument based on manner of dress of IS employees and the visu-
the results of factor analysis (Carman 1990). al attractiveness of IS facilities (i.e., tangibles)
The conclusion arrived at was that consider- should be retained as important factors in the
able customization was required to accommo- IS domain while issues such as training, com-
date differences in service settings. Another munication, and time to complete new systems
study attempted to utilize SERVQUAL in the development are excluded. We agree that
banking industry (Brown et al. 1993). The using a single measure of service quality
authors were struck by the omission of items across industries is not feasible (Dabholkar et
which they thought a priori would be critical to - al. 1996) and therefore future research should
subject's evaluation of service quality. They involve the development of industry-specific
concluded that it takes more than simple adap- measures of service quality.
tation of the SERVQUAL items to effectively
address service quality across diverse set-
tings. A study of sen/ice quality for the retail
sector also concluded that utilizing a single
Empirical difficulties with the
measure of service quality across industries is SERVQUAL instrument
not feasible (Dabholkar et al. 1996).
A difference score Is created by subtracting
Researchers of service quality in the informa- the measure of one construct from the mea-
tion systems context appear to lack consensus sure of another in an attempt to create a mea-
on this issue. Pitt et al. (1995) state that they sure of a third distinct construct. For example,
could not discern any unique features of IS in scoring the SERVQUAL instrument, an
that make the standard SERVQUAL dimen- expectation score is subtracted from a percep-
sions inappropriate nor could they discern any tion score to create such a "gap" measure of
dimensions with some meaning of service service quality. Even if one assumes that the
quality in the IS domain that had been exclud- discrepancy theory is correct and that these
ed from SERVQUAL. Kettinger and Lee are the only (or at least, the last) two inputs
(1994), however, found that SERVQUAL into this cognitive process, it still raises the
should be used as a supplement to the UIS question: Can calculated difference scores
(Baroudi and Qrlikowski 1988) because that operationalize the outcome of a cognitive dis-
instrument also contains items that are infipor- crepancy? It appears that several problems
tant determinants of IS service quality. Their with the use of difference scores make them a
poor measure of psychological constructs This formula shows that as the correlation of
(e.g., Edwards 1995; Johns 1981; Lord 1958; the component scores increases, the reliability
Peter et al. 1993; Wall and Payne 1973). of the difference scores is decreased. An
Among the difficulties related to the use of dif- example was provided where the reliability of
ference measures discussed in the literature the difference score formed by subtracting one
are low reliability, unstable dimensionality, and component from another with an average relia-
poor predictive and convergent validities. bility of .70, and a correlation of .40, is only .50
(Johns 1981). Thus, while the average reliabili-
ty of the two components is .70, which is con-
sidered acceptable (Pitt et al. 1995; cf.,
Nunnally 1978), the correlation between the
Reiiabiiity Probiems With Difference Scores
components reduces the reliability of the differ-
Many studies demonstrate that Cronbach's ence score to a level that most researchers
would consider unacceptable (Peter et al.
(1951) alpha, a widely-used method of esti-
1993).
mating instrument reliability, is inappropriate
for difference scores (e.g., Cronbach and
An example of the overestimatlon for the relia-
Furby 1970; Edwards 1995; Johns 1981; Lord
bility caused by the misuse of Cronbach's
1958; Peter et al. 1993; Prakash and alpha can be found in the analysis of service
Lounsbury 1983; Wall and Payne 1973). This quality for a computer manufacturer
is because the reliability of a difference score (Parasuraman et al. 1994a; see Table 1). Note
is dependent on the reliability of the compo- that Cronbach's alpha consistently overesti-
nent scores and the correlation between them. mates the actual reliability for the difference
The correct formula for calculating the reliabili- scores of each dimension (column 2). Also
ty of a difference score (rD) is: note that the use of the correct formula for cal-
culating the reliability of a difference score has
demonstrated that the actual reliabilities for the
SERVQUAL dimensions may be as much as
.10 lower than reported by researchers incor-
rectly using Cronbach's alpha. In addition,
these findings show that the non-difference,
direct response method results in consistently
where r,, and r^ are the reliabilities of the two
higher reliability scores than the (P-E) differ-
component scores,CT,^and a^^ are the vari-
ence method of scoring.
ances of the component scores, and r,2 is the
correlation between the component scores These results have important implications for
(Johns 1981). the IS-SERVQUAL (Pitt et al. 1995).
Johns' a for
A Priori Cronbachs' a Cronbachs' a Differences
Dimensions (Non-Difference) (Difference) (Difference)
Tangibles .83 .75 .65
Reliability .91 .87 .83
Responsiveness .87 .84 .81
Assurance .86 .81 .71
Empathy .90 .85 .81
Note: Difference scores calculated as perception minus expectation (P - E).
Cronbach's alpha, which consistently overesti- values (ranging from .72 to .81) compared to
mates the reliability of difference scores, was the SERVOUAL difference scores (ranging
used incorrectly. Even when using the inflated from .51 to .71).
alpha scores, Pitt et al. note that two of three
reliability measures for the tangibles dimension The predictive validity of difference scores, a
fall below the 0.70 level required for commer- non-difference direct response score, and the
cial applications. Had they utilized the appro- perceptions only scores for SERVOUAL in the
priate modified alpha, they may have conclud- context of a financial institution hae been com-
ed that the tangibles dimension is not reliable pared (Brown et al. 1993). Correlation analysis
in the IS context, a finding which would have was performed between the various scores
been consistent with the results of Kettinger and a three-item behavioral intentions scale.
and Lee (1994). Behavioral intentions include such concepts as
whether the customer would recommend the
A review of the literature clearly indicates that financial institution to a friend or whether they
by utilizing Cronbach's alpha, researchers tend would consider the financial institution first
to overestimate the reliabilities of difference when seeking new services. The results of the
scores especially when the component scores study show that both the perceptions only (.31)
are highly correlated: Such is the case with the and direct response (.32) formats demonstrat-
SERVOUAL instrument (Peter et al. 1993). ed higher correlations with the behavioral
intentions scale than did the traditional differ-
ence score (.26).
Predictive and Convergent Vaiidity issues The superior predictive and convergent validity
of perception-only scores was confirmed
With Difference Scores
(Cronin and Taylor 1992). Those results indi-
Another problem with the SERVOUAL instru- cated higher adjusted r-squared values for per-
ment concerns the poor predictive and conver- ception-only scores across four different indus-
gent validities of the measure. Convergent tries. The perception component of the per-
validity Is concerned with the extent to which ception-minus-expectation score consistently
multiple measures of the same construct agree performs better as a predictor of overall ser-
with each other (Cambell and Fiske 1959). vice quality than the difference score itself
Predictive validity refers to the extent to which (Babakus and Boiler 1992; Boulding et al.
scores of one construct are empirically related 1993; Cronin and Taylor 1992; Parasuraman
to scores of other conceptually-related con- etal. 1991).
structs (Bagozzi et al. 1992; Kappelman 1995;
Parasuraman et al. 1991).
Factor
Study Instrument Analysis Structure
Carman (1990) Four modified Principal axis factor Five to nine factors
SERVOUALs using analysis with oblique
12-21 of the original rotation
items
Bresinger and Lambert Original 22 items Principal axis factor Four factors with
(1990) anaiysis with obiique eigenvalues > 1
rotation
Parasuraman, ZeithamI, Original 22 items Principal axis factor Five factors, but different
and Berry (1991) analysis with oblique from a priori model.
rotation Tangibles dimension spiit
into two factors, while respon-
siveness and assurance
dimensions loaded on a single
factor.
Finn and Lamb (1991) Originai 22 items LISREL confirmatory Five-factor model
factor analysis had poor fit.
Babakus and Boiler Original 22 items (1) Principal axis factor (1) Five-factor modei
(1991) analysis with oblique not supported
rotation.
(2) Confirmatory factor (2) Two factors
analysis
Cronin and Taylor (1992) Originai 22 items Principal axis factor Unidimensional
analysis with oblique structure
rotation
"Van Dyke and Popelka 19 of original 22 items Principal axis factor Unidimensional
(1993) analysis with oblique structure
rotation
*Pitt, Watson, and Kavan Original 22 items Principal components (1) Financial institution
(1995) and maximum iikelihood seven-factor model with
with verimax rotation tangibies and empathy spWt
into two.
(2) Consulting firm five-
factors, none matching the
original.
(3) Information systems
service firm—three-factor
model.
*Kettinger and Lee Original 22 items LISREL confirmatory Four-factor model, tangibles
(1994) factor analysis dimension dropped.
•Kettinger, Lee, and Lee Original 22 items Principal axis factor (1) Korea, three-factor
(1995) analysis with oblique model, tangibies retained.
rotation (2) Hong Kong, four-factor
model, tangibies retained.
"Measured information systems service quality.
began with 97 paired questions (i.e., one for SERVQUAL not match the proposed model, its
expectation and one for perception), items factor structure varied across settings.
(i.e., question pairs) were first dropped on the Analysis of the data from the consulting firm
basis of within-dimension Cronbach coeffi- resuited in a five-factor model although none
cient aiphas, reducing the pool to 54 question of these matched the originai a priori factors.
pairs. More items were then dropped or re-
The factor analysis of the information systems
assigned based on oblique-rotation factor
business data resulted in the extraction of only
loadings and within-dimension Cronbach
coefficient alphas resulting in a 34 paired-item three factors.
instrument with a proposed seven-dimension-
LISREL confirmatory factor analysis was used
al structure. A second data collection and
analysis with this "revised" definition and on SERVQUAL data collected from users (i.e.,
operationalization of service quaiity resulted students) of a college computing services
in the 22 paired-item SERVQUAL instrument department (Kettinger and Lee 1994).
with a proposed five-dimensional structure. Analysis of this data resuited in a four factor
Two of these five dimensions contained items solution. The entire tangibies dimension was
representing seven of the original 10 dimen- dropped. An IS version of SERVQUAL was
sions. We are cautioned, however, that those used in a cross-national study (Kettinger et ai.
who wish to interpret factors as real dimen-
1995). Results of exploratory common factor
sions shoulder a substantial burden of proof
analysis with oblique rotation indicated a
(Gronbach and Meehl 1955). Moreover, such
proof must rely on more than just empirical three-factor model from a Korean sample and
evidence (e.g., Bynner 1988; Galletta and a four-factor model was extracted from a Hong
Lederer 1989). Kong data set. The tangibles dimension was
retained in the analysis of both of the Asian
The results of several studies have demon- samples.
strated that the five dimensions claimed for the
SERVQUAL instrument are unstabie (see The unstable dimensionality of SERVQUAL
Tabie 2). SERVQUAL studies in the informa-
demonstrated in many domains, including
tion systems domain have also demonstrated
the unstable dimensionality of the SERVQUAL information services, is not just a statistical
instrument. The service quality of IS services curiosity. The scoring procedure for
was measured in three different industries, a SERVQUAL calls for averaging the P-E gap
financial institution, a consulting firm, and an scores within each dimension (Parasuraman et
information systems service business (Pitt et al. 1988). Thus a high expectation coupled
al. 1995). Factor analysis was conducted using with a iow perception for one item would be
principal components and maximum likelihood
canceled by a low expectation and high per-
methods with varimax rotation for a range of
ception for another item within the same
models. Analysis indicated differing factor
structures for each type of firm. Analysis of the dimension. This scoring method is oniy appro-
results for a financiai institution indicated a priate if all of the items in that dimension are
seven-factor modei with both the tangibles and interchangeable. This type of analysis would
empathy dimensions split into two. These be justified if SERVQUAL demonstrated a
breakdowns should not be surprising. Pitt et al. clear and consistent dimensional structure.
note that "up-to-date hardware and software" However, given the unstable number and pat-
are quite distinct from physical appearances in
tern of the factor structures, averaging groups
the IS domain. The empathy dimension was
created by the original SERVQUAL authors of items to calculate separate scores for each
from two distinctly different constructs, namely dimension cannot be justified. Therefore, for
understanding and access, which were com- scoring purposes, each item should be treated
bined due to the factor ioadings alone, without individually and not as part of some a priori
regard to underlying theory. Not only did IS- dimension.
ceived service quality can be increased by expectations." Qne study indicates that such
either improving actual performance or by man- direct measures possess higher reliability and
aging expectations, specifically by reducing Improved convergent and predictive validity
should expectations and/or increasing will when compared to difference scores
expectations. These two different types of (Parasuraman et al. 1994a).
expectations are not differentiated by the tradi-
tional SERVQUAL gap scoring method. A better
approach to understanding the impact of expec-
tations on perceived service quality may be to
Conclusion
measure wiii and shou/af expectations separate-
Recognizing that we cannot manage what we
ly and then compare them to a service quality
cannot measure, the increasingly competitive
measure that utilizes either a direct response or
market for IS services has emphasized the
perceptions-only method of scoring.
need to develop valid and reliable measures of
the service quality of information systems ser-
vices providers, both internal and external to
Prescriptions for the use of the organization. An important contribution to
SERVQUAL this effort was made with the suggestion of a
IS-modified version of the SERVQUAL instru-
The numerous problems associated with the ment (Pitt et al. 1995). However, earlier stud-
use of difference scores suggest the need for ies raised several important questions con-
an alternative response format. Qne alterna- cerning the SERVQUAL instrument (e.g.,
tive is to use the perceptions-only method of Babakus and Boiler 1992; Carman 1990;
scoring. A review of the literature (Babakus Cronin and Taylor 1992, 1994; Peter et al.
and Boiler 1992; Boulding et al. 1993; Cronin 1993; Teas 1993, 1994). A review of the litera-
and Taylor 1992; Parasuraman et al. 1991, ture suggests that the use of difference scores
1994), indicates that perceptions-only scores with the IS-SERVQUAL instrument results in
are superior to the perception-minus-expecta- neither a valid nor reliable measure of per-
tion difference scores in terms of reliability, ceived IS service quality. Those choosing to
convergent validity, and predictive validity. In use any version of the IS-SERVQUAL instru-
addition, the use of perceptions-only scores ment are cautioned. Scoring problems aside,
reduces by 50% the number of items that must the consistently unstable dimensionality of the
be answered and measured (44 items to 22). SERVQUAL and IS-SERVQUAL instruments
Moreover, the findings of Boulding et al. (1993) intimates that further research is needed to
suggest that expectations are a precursor to determine the dimensions underlying the con-
perceptions and that perceptions alone directly struct of service quality. Given the importance
influence service quality. of the service quality concept in IS theory and
practice, the development of improved mea-
A second alternative, suggested by Carman sures of service quality for an information sys-
(1990) and Babakus and Boiler (1992), is to tems services provider deserves further theo-
revise the wording of the SERVQUAL items retical and empirical research.
into a format combining both expectations and
perceptions into a single question. Such an
approach would maintain the theoretical value
of expectations and perceptions in assessing References
service quality, as well as reduce the number
of questions to be answered by 50%. This Babakus, E., and Boiler, G. W. "An Empirical
direct response format holds promise for over- Assessment of the SERVQUAL Scale,"
coming the inherent problems with calculated Joumai of Business Research (24:3), 1992,
difference scores. Items with this format could pp. 253-268.
be presented with anchors such as "falls far Bagozzi, R., Davis, F., and Warshaw, P.
short of expectations" and "greatly exceeds "Development and Test of a Theory of