Professional Documents
Culture Documents
Escala de Likert
Escala de Likert
213
CHI ‘88
different systems evaluated. In one study, 4,597 moderately controlled situation. The present
respondents evaluated 179 different systems sampling domain included computer professionals
ranging from micros to large mainframes [9]. and hobbyists who had extensive and uncontrolled
Many have demonstrated more concern for use of the evaluated systems.
reliability and validity issues. However, few
studies have had a sustained development of a Since a questionnaire’s reliability is related to the
questionnaire, and although many of the surveys number of items and scaling steps, the larger the
consider several issues associated with general number of items and scaling steps, the higher the
subjective satisfaction of the system, few if any reliability of the questionnaire [7]. Thus, QUIS
directly focus on the interface. This research effort began with a ‘large number of questions. In
attempts to address these issues. addition, 10 point scales were used since more
than 10 steps would contribute little to reliability
Review of the Development Process 171. However, QUIS had to be shortened to
The original questionnaire consisted total of 90 improve the percentage of completed
questions [lo]. Five questions were overall questionnaires. Thus, successive versions of the
reaction ratings of the system. The remaining 85 questionnaire had less items while maintaining a
items were organized into 20 different groups, high reliability.
which had a main component question followed by
related subcomponent questions. The The QUIS (3.0) and a subsequent revised version
questionnaire’s short version had only the 20 main (4.0) was administered to an introductory computer
questions listed along with the five overall science class learning to program in CF PASCAL
questions. Each of the questions had rating scales [3]. Participants, 155 males and 58 females, were
ascending from 1 on the left to 10 on the right and assigned to either the interactive batch run IBM
anchored at both endpoints with adjectives (e.g., mainframe or an interactive syntax-directed editor
inconsistent/consistent). These adjectives were programming environment on an IBM PC.
always positioned so that the scale went from During class time, they evaluated the environment
negative on the left to positive on the right. In they had used during the first 6 weeks of the course
addition, each item had “not applicable” as a with QUIS (3.0). A multiple regression of the
choice. Instructions also encouraged raters to sub-component questions of QUIS (3.0) with each
include any written comments. No empirical work main component question was use,d to eliminate
was done to assessits reliability or validity. questions with low beta weights, thereby reducing
the number of ratings from 103 to 70 in QUIS
The original questionnaire was modified and (4.0) while retaining the same basic organization.
expanded to three sections in the Questionnaire for Next, they evaluated the other environment with
User Interface Satisfaction (QUIS 3.0). In Section QUIS (4.0) after switching ~to the other
I, three questions concerned the type of system environment for six weeks.
being rated and the amount of time spent on the
system. In Section II, four questions dealt with The participants’ exam and project grades were used
the user’s past computer experience. Section III as a reference point for establishing validity since
was a modified version of the original an effective interface might be associated with
questionnaire with the rating scaleschanged from 1 better performance. Higher satisfaction ratings and
through 10 to 1 through 9. performance were expected for the interactive
syntax-directed editor programming environment.
QUIS’s generalizability could be established by However, subjective ratings did not correspond
having different user populations evaluate different with the students’ performance in the class.
systems. The development of QUIS included Problems in the syntax-directed editor’s interfa.ce
respondents who were: 1) students, 2) computer had led to higher satisfaction ratings for the
professionals, 3) computer hobbyists, and 4) mainframe. Although the performance measures
novice users. Moreover, it was important to (exam and project grades) failed to help establi.sh
administer the QUIS under different experimental validity, QUIS diagnosed interface problems in the
conditions: 1) strictly controlled experiments with syntax-editing programming environment.
a small number of subjects exposed to a system for
a very short period of time, 2) less rigidly QUIS’s reliability was high. Cronbach’s alpha, an
controlled manipulations with a medium number estimation of reliability based on the average
of participants who used a system for a limited intercorrelation among items, indicated that QUIS
time, and 3) a field study having no control with (3.0) had an overall reliability of .94, with
volunteers who have used a system extensively. interitem alpha values varying b!{ .002. QUIS
The characteristics of versions 3.0 and 4.0 were (4.0) had an overall reliability 01. .89, with the
based on student evaluations of a system in a
214
CHI ‘88
215
CHI ‘88
216
CHI ‘88
217
CHI ‘88
line system (CLS) vs. menu driven application encoding errors. Presently a computerized IBMTM
(MDA). Both like and MDA groups had PC version of QUIS has been implemented and
consistently higher ratings compared to dislike and distributed.
CLS groups, respectively.
ACKNOWLEDGEMENTS
Although there were strong differences found We thank Yuri Gawdiak and Steven Versteeg for
between like vs. dislike and CLS vs. MDA in the their help collecting data. Funding was provided
overall reactions, more significant differences were by NSF, AT&T and the University of Maryland
found between CLS vs. MDA in the specific main Computer Science Center.
component questions in comparison to the like vs.
dislike group. Evaluating a large number of REFERENCES
different software products may account for the lack 1. Bailey, J, E., & Pearson, S. W. :Development
of significant differences in the like vs. dislike of a tool for measuring and analyzing c0mpute.r
groups, since each product differs in its strengths user satisfaction. Management Science, 29,5,
and weaknesses. Aggregation of the evaluations (May 1983), !j30-545.
across different products may have cancelled the
rating differences between like and dislike groups. 2. Coleman, W. D., Williges, R. C., & Wixon,
D. R. Collecting detailed user evaluations of
MDA was rated higher than CLS for many software interfaces. Proceedings of the Huma.n
reasons. Shneiderman (1987) lists five advantages Factors Society - 29th Annual Meeting - 1985,
of MDA: 1) shortening of learning time, 2) 240-244.
reduction of keystrokes, 3) structuring of decision-
making, 4) permitting the use of dialog 3. Chin, J. P., Norman, K. L., & Shneiderman, B.
management, and 5) support for error handling. Subjective user evaluation of CF PASCAL,
MDA’s higher ratings in error messages,help, and programming tools. Technical Report (CAR-
feedback is evidence of MDA’s superiority in TR-304), Human-Computer Interaction
dialog management and error handling. Moreover, Laboratory, IJniversity of Maryl.and, College
MDA’s higher ratings in organization of screen Park, MD 20742, 1987.
information and the sequence of screens indicate
that good design of menus can avoid the pitfalls of 4. Gallagher, C. A. Perceptions of the value of a
MDA: 1) slowing down frequent users and 2) management information system. Academy cf
getting lost while navigating through the menus. Mangement Journal, 17,1, (1974), 46-55.
Surprisingly, although CLSs are known for their
flexibility and appeal to “power” users 1101, the 5. Ives, B. Olson, M. H., Baroudi, J. J. (1983).
overall ratings of MDA suggests that these The measurement of user information
frequent and sophisticated users rated a MDA more satisfaction. Communications of the ACM,
satisfying, powerful and flexible than a CLS. 26, 785-793.
218