Development and Validation of Behaviorally-Anchored Rating Scales for Student Evaluation of Pharmacy Instruction1
Paul G. Grussing
College of Pharmacy, M/C 871, The University of Illinois at Chicago, 833 South Wood Street, Chicago IL 60612
Robert J. Valuck
Department of Pharmacy Administration, The University of Illinois at Chicago, Chicago IL
Reed G. Williams
Department of Medical Education, The University of Illinois at Chicago, Chicago IL
The study purpose was to improve pharmacy instruction by identifying dimensions of teaching unique to pharmacy education and developing reliable and valid rating scales for student evaluation of instruction. Error-producing problems in the use of student ratings of instruction, existing rating methods and dimensions of effective teaching are reported. Rationale is provided for development of Behaviorally-Anchored Rating Scales, BARS, and the methods used are described. In a national study, 4,300 descriptions of pharmacy teaching were collected in nine critical incident writing workshops at four types of schools. Ten dimensions of pharmacy teaching were identified and validated for classroom, laboratory and experiential teaching. Scales were developed for each dimension. Measures of scale quality are described including retranslation data, standard deviations of effectiveness ratings, reliability and validity data and data supporting reduction of leniency and central tendency effects. Four outcomes of the project are discussed, emphasizing two: use of the newly-validated dimensions in modification of traditional numerically-anchored scales in local use, and of BARS in providing clear and convincing performance feedback to pharmacy instructors.
INTRODUCTION AND PURPOSE
From among the traditional faculty roles of teaching, research and service, this study investigated only the evaluation of teaching. Teaching performance may be evaluated using multiple data sources: (i) documented self-evaluation and course improvement; (ii) peer review of instructional methods, instructor-written texts or manuals, and other developed media, syllabi and tests; (iii) gains in student learning; (iv) student ratings of instructor performance; (v) observation or videotaping; and (vi) teaching awards (1,2). This study focused on only one data source: student evaluation of faculty performance. Its purpose was to improve the quality of instruction in U.S. colleges of pharmacy by identifying dimensions of pharmacy instruction and developing new, reliable and valid student measures of effective pharmacy teaching2. Such measures of instructional performance, whether utilized in instructor self-assessment, for periodic performance reviews or in the critical promotion and tenure process, are essential for the continued development of effective teachers. If pharmacy students and instructors are to have confidence in instructional rating systems and to eventually benefit from the rating process, clear dimensions of effective teaching should be identified and rating errors minimized. Problems with the content validity of student ratings of instructor performance introduce rating error when instruments are not sensitive to the unique differences in lecture, laboratory and experiential instruction. Moreover, when instructor rating instruments are developed for use across university colleges and departments or disci-
plines, without having been validated for use in rating pharmacy instruction in particular, additional questions of validity and rating error arise.
Error in Instructor Ratings
Reduction of measurement error is imperative in evaluation of faculty teaching performance. Eight kinds of error in the administration and use of instructional performance rating scales prompted this study. The research and development methods chosen were intended to minimize most of these common sources of rating error, especially the first five listed: (i) error in instrument content; (ii) error in the interpretation of the meaning of ratings (3-5); (iii) showmanship(6-8); (iv) common rating error effects such as “halo effect”(9), “reverse halo effect”(10), “leniency effect” and “harshness (or strictness) effect”(11), and “central ten
The research was supported, in part, by a GAPS grant from the SmithKline Beecham Foundation through the American Association of Colleges of Pharmacy. 2 The term “dimension”, as used in this article, refers to an axis, or continuum, along which performance descriptors, varying in quality or intensity, may be ordered. The dimension is identified arid shown to be independent and non-overlapping in meaning with other clusters of similar behaviors. 3 Formative evaluation refers to evaluation of a process or product to provide feedback for the purpose of making possible mid-process refinements or improvements. 4 Summative evaluation is conducted to examine the quality or impact of a final, completed process or product.
American Journal of Pharmaceutical Education Vol. 58, Winter Supplement 1994
(vi) mixed purposes of evaluation3. feedback learning testing exams examinations experience G.
See reference 23. Class climate.13). A concurrent validation study would constitute the eighth step. and that it varies from one pharmacy teaching environment to another. StudentStudentFlexible. h 1. The seventh step was to develop scales by selection of meaningful behavioral anchors based on the retranslation process and high rater agreement on the scale anchors. then retranslated to the original by an independent expert. structuring Teaching style and methods Objective evaluation.
See reference 25. and (viii) errors in data implementation(20. BARS.
Nine study steps were elaborated to achieve the project goals.
First author only. Finally. the project would identify dimensions of instructional behavior unique to pharmacy education and to three teaching environments: classroom.S. structure. would be developed. by showing correlations with a known reliable and valid. traditional numerically-anchored scale of parallel content. motivation stimulates encourages stimulates thought thinking thinking J. the project was designed to demonstrate generalizability of the scales for use in all U. Enthusiasm. The tentative dimensions so identified were later used for preliminary classification of student-and facultygenerated critical incidents of pharmacy teaching. See reference 7. and feedback Sensitivity. the study began with identification of tentative dimensions of pharmacy teaching. responsive
6 Hildebrandh Provides objectives Teaching clarification application
Individual interaction. for each dimension and teaching environment. colleges of pharmacy. Dynamic. The researchers sought to apply a method. instructor faculty attitudes re: warmth. Grading and Testing.21). accessibility
dency effect”(12. The third step was to conduct critical incident workshops for the collection of descriptors of effective and ineffective teaching in U.Table I. content-parallel numerically-anchored scales. as Fairness in Grading and Grading. It is analogous to the procedures used by language translators to ensure that all of the meanings of an original text are preserved.
See reference 8.
Identification of Tentative Dimensions
Tentative dimensions of pharmacy instruction were identified and validated based on a review of the pertinent literature. organization structure clarity A. Style. course difficulty difficulty difficulty I. g See reference 28. Enthusiasm/ Enthusiasm. other than factor analysis. First. The final step was accomplished through the concurrent validation study.S. Knowledge of Competence Knowledge Knowledge subject area of subject
5 Dasg Course outlining. Text material is translated into a foreign language. Workload. laboratory and experiential. the researchers intended to demonstrate concurrent validation of the scales developed. First. (v) error in instrument reliability. to identify and describe dimensions of pharmacy teaching. student interaction interaction students Concern rapport H. Simultaneously. the sixth step of obtaining effectiveness ratings for incidents from study panelists would provide data for establishing scale anchors. The second goal was to develop Behaviorally-Anchored Rating Scales. (vii) inconsistent methods of instrument administration(16-19).15). yielding a useful parallel set of traditional.4(14. Work Workload. parallel in content. course requirements. for which traditional. Lecturing ability methods interpreting skill and ability clarity F. 58. availability. Four goals were set for the study. This initial validation step would be based on the literature.
American Journal of Pharmaceutical Education Vol. The second step was
The Smith and Kendall retranslation process uses an independent group of expert raters who reallocate descriptors of performance to dimensions describing performance qualities. Third. Procedures for minimizing the first five types of rating errors were sought. Editing and selection of collected incidents was the fourth step.
to select the most appropriate scaling method. Winter Supplement 1994
. The fifth was to establish and validate dimensions of pharmacy teaching using the retranslation process to demonstrate independence of the dimensions5 (22). colleges of pharmacy. Dimensions of teaching selected from the education literature Tentative Literature sources and dimensionsb study 1 2 3 4 dimensionsa Dickinsonc Wotrubad ICESe Centraf D Course Subject and Course Organization organization course management. Emphasis was placed on selecting or developing procedures and instruments to rate the most appropriate pharmacy teaching behaviors and to rate them accurately and consistently. numerically-anchored scales. Teaching Teaching Speaking. Tables I and II display dimensions mentioned in studies and review articles outside and within pharmacy education. Interaction. The literature supporting this selection decision is described. enthusiasm
Dimensions listed by final dimension letters and order. Faculty colleagues have reported the belief that effective pharmacy teaching is different from good teaching in other departments and disciplines.
and developed BARS for student evaluation of instruction (28). Pharmacy. 40. and by Brown. 3-7. Based on deficiencies in the use of rating instruments designed for use in faculty performance evaluation generally. First. Some of the authors cited in Tables I and II described additional kinds of dimensions and behaviors not shown in the tables. Education. are summarized in Table II. 446-448 450-552 8-13 165-166 317-325 114-118 428-430 Dimensionsa Carlson Zanowiak Jacoby Sauter Purohit Kotzand Peterson A. bFirst author only. In a Canadian study of teaching in the behavioral sciences. 50. mentioned key features of pharmacy teaching performance(41). Items designed for instructor’s formative self-evaluation are not so normed. Citing Hildebrand. 1975-90 Citations. in an invitational article. numerically-anchored items from a “catalog” of over 400 items classified by teaching dimensions. General. in an ad hoc committee report. the authors discuss “components” of effective teaching as perceived by their colleagues and by students(34). they reported the BARS to be at least as psychometrically sound in terms of reliability. Their work yielded a comprehensive evaluation system available at the researchers’ school(26). and discussed major functions in the supervision of students by clinical pharmacists(39. four of which are reported in Table I. Knowledge of subject area x x x x D. Brandenberg et al. the nine most frequently mentioned were also cited in a text chapter on uses and limitations of student ratings(24). Instructors may select traditional. Seven are shown in Table I. Dickinson and Zellinger identified six teaching dimensions for veterinary medicine instructors(29). of the “best” and “worst” teaching they had experienced(27). Responses were factor-analyzed into five clusters (dimensions) of teaching performance. mentioned characteristics of an effective pharmacy teacher(42. After review of the education and psychology literature. 42. not student. Teaching dimensions in the pharmacy education literature.40). Workload. 41. Seven dimensions of effective instruction were reported often in the education and psychology literature. 58. Wotruba and Wright reviewed 21 published studies of student evaluation of teaching(23). in the retranslation process chosen for this study. in reporting a theoretical model for pharmacy faculty peer evaluation. Of the 40 criteria they listed. Course Organization x x x x F. pp. Jacoby described how modification of an existing instrument for use in student evaluation of pharmacy teaching contributed to improved classroom instruction(32). evidence of criteria for effective pharmacy teaching was sought. Student-instructor interaction x x x x x x x I. Grading and feedback x x x H. Teaching ability x x x x x x G. described development and validation of scales for student evaluation of teaching(25). Three articles. citing Kiker. 44. A second type of dimension not listed in the tables was based on the notion of self-rated student accomplishment. 31). Three of the articles reported research on pharmacy instruction. in observable and behavioral terms. Vol. Martin et al. mentioned qualities of good pharmacy teaching. Finally. Downs and Troutman identified criteria for the evaluation of clinical pharmacy teaching(37). asked faculty and students to provide descriptions. Behaviors associated with this named factor seemed unlikely to be collected and used as scale anchors in this research which would focus on teacher.43). explored issues of student evaluation of instruction(33). Das et al. Specific behaviors might not. Items designed for use in summative evaluation are normed by instructor rank and by required/ elective status. 40. be assigned to the same dimension as suggested by the factor name coined by authors of previous scales and instruments. behaviors. identified seven dimensions of teaching. Peterson. pp. In their article describing the development of a teacher rating instrument. final dimension letters and order. some instruments contained
American Journal of Pharmaceutical Education Vol. inter-rater variability and content validity. Table I summarizes the most frequently mentioned dimensions of teaching in original studies or reviews. Kotzan and Mikael and Kotzan and Entreken developed and imple
mented a factor-analyzed instrument for student evaluation of undergraduate pharmacy instruction(30. mentioned basic components of teaching and learning as requisite elements in such evaluation^). Carlson suggested a comprehensive evaluation program(38). As part of a panel devoted to the evaluation of pharmacy teaching. Two authors reported special needs for evaluation of clinical teaching performance. problems in attaching meaning to labels assigned to factors in earlier studies could introduce bias in the generation of unobservable behaviors in this study. described clinical faculty evaluation and development programs at one college of pharmacy(36). 39. dAlso Vol. Based on this research. Ten articles from the pharmacy education literature. course difficulty x x x
8 9 47. In equivalent forms comparisons using traditional rating instruments.
Education. which described or mentioned dimensions of teaching. The text author also summarized dimensions of teaching behavior as identified in factor-analytic studies. written as reports or invitational articles. an instructional consulting service was initiated to provide feedback to faculty. Enthusiasm/motivation x x x x x J. 102-107 193-195 Martin Downs x x x x x x x x x
Listed by frequency of mentions. 40. Sauter and Walker. Citing articles by Kulick. the author emphasized three major dimensions students use to judge their teachers. Hildebrand et al. American Journal of Pharmaceutical Educationb 4 5 6c 7 1 2 3c 39. Purohit et al. Dimensions based on authors’ original research.Table II. Winter Supplement 1994
the development of BARS scales for rating performance of pharmacy practitioners and student externs. it consists. laboratory and experiential teaching. vs. In stage two. numerically-anchored scales. arranged for representative types of volunteer students and faculty to attend local critical incident writing workshops. combining systematic and random sampling. Two examples occur in the pharmacy literature describing methods of rating pharmacy residents.
American Journal of Pharmaceutical Education Vol. Raters are instructed to read the entire continuum of behaviors and then select the one which most closely describes the actual. and as a criterion measure in prediction studies. public ownership of U. identified performance dimensions associated with teaching behavioral science courses.0 or near -1. Several studies examined the psychometric properties of BARS vs. Examples may be found in adaptations of the goal-attainment scaling process(65). possibly impacting upon the methods. names of all U.1990. High. Eastern schools were defined as those located in AACP-NABP Districts IIV. 58. Sampling.
Critical Incident Collection
Private vs. low graduate-education emphasis (68). Business undergraduate students rated their instructors twice and the study compared the effect of mid-course feedback for both types of scales. The Behaviorally-Anchored Rating Scale. graduate education emphasis was defined as schools above or below the median number(22) of Ph. (ii) Public or private ownership. Seeking improvements over traditional graphic. socalled. Second. of an array of behavioral statements which range from most effective to least effective. A three-stage sampling procedure was used. behavior of the ratee. one early article compared BARS with traditional scales for leniency error and inter-rater agreement and found more favorable scale properties for traditional scales(57). Then one set of the two remaining complete sets was randomly chosen. In order to ensure generalizability of scales for use across all school types. Two sets of four cells were systematically eliminated because they did not contain schools in all cells. styles and quality of teaching. Dickson and Zellinger compared veterinary medicine students’ ratings of their instructors using a BARS scale and a “mixed standard scale” in which items were scrambled so that both the dimensions
Choice of Scaling Method
and the ordinal relationships among scale anchors were disguised(8). following researcher guidelines. and reported comparisons of the scale properties(63. Distinctions are made between BARS and other behaviorally-based rating scales.0 relationships” were found. The local collaborators were requested to secure broad representation of differing educational levels in a group of 30 undergraduate professional students and from all disciplines in a group of 15 faculty representing classroom. for each performance dimension identified. Six cited studies have demonstrated the presence of independent dimensions of teaching performance and the feasibility of generating “scaleable” behaviors in construction of reliable and valid rating instruments.S. Their disadvantages include showing less evidence of reliability and validity.items describing environmental conditions and curricular features which were beyond the control of a single instructor-ratee.67).S. Economy in development has also been described in connection with. unambiguous statements of ratee performance (48-50). compared properties of a BARS scale with a well-established. However. This study was based upon critical incidents of teaching behavior in a variety of pharmacy teaching environments. Each statement is accompanied by a number on the scale. Such behaviors were not expected to result from this study which would use critical incidents describing observable instructor behaviors only. A third reason for selecting the BARS scale type is that the vivid behavioral descriptions used are easy for raters to associate with ratee performance. Each of the four schools then represented each stratum and each stratum was represented by two schools. BARS scales for student evaluation of college instruction have been reported.47). panels of experts may write broad behavioral descriptions for use as scale anchors. and pharmacy employees generally(66. contentparallel numerically-anchored scale(58).45).64). Each of the these variables was believed to contribute to the instructional culture of the colleges. Such instruments enjoy advantages in ease and cost of development. BARS have been used for evaluation of performance in a wide variety of other professions and occupations(46. Second. Winter Supplement 1994
. schools were randomly selected from within each of four cells in the selected stage one set. numericallyanchored scales. was chosen for development in this study because of its unique measurement properties. Finally. or of using a traditional numerically-anchored scale only (51). In stage one. developed BARS scales for use by psychology students in rating of their instructors. (iii) Geographic (East 6-8 West). and are very compelling in pointing out where the ratee may benefit from introspection and performance improvement(58-60). one of which is recorded to indicate the ratee’s performance on that particular dimension. instead of having experts write general descriptors of performance along a continuum. study schools selected as sources for generating critical incidents were classified and selected using four strata: (i) BS-or PharmD-conferring.S. Green and Sauser. Comparable reliability and validity was observed and reported(52-56). developed BARS scales based on the dimensions. This clarity is supported by following the Flanagan critical incident technique in scale item generation. The third stage of sampling occurred when research collaborators at the four selected schools. it relies on critical incidents which may be classified into dimensions of behavior shown to be unique and independent of each other in their meaning. followed by Champion and Green. schools were randomly assigned to one of the appropriate 16 cells.D. Harari and Zedick identified nine dimensions of teaching behavior and developed corresponding BARS scales for evaluating teaching ability of college psychology professors(61). First. May 30. Horn et al. college of pharmacy graduate programs in 1989. BARS. and (iv) or High vs. that “1. pharmacy students. A review of the literature supports the choice. colleges of pharmacy was confirmed in personal correspondence with Mary Bassler. was previously reported(44. students enrolled in U. American Association of Colleges of Pharmacy. low. They are claimed by some researchers to demonstrate more reliability and validity than numerically-anchored scales because the behaviors serving as scale anchors are clear. First. or expected. Western in Districts V-VIII. based on the four strata. They found that when faculty and student ratings for quality of instructor behavior were correlated. Das et al. “Short-cut” BARS(64). and then compared scale properties with parallel versions in a numerically-anchored scale format(62).
yet frequently-occurring incidents. The incidents were retranslated and rated following the process developed by Smith and Kendall(22) and previously reported in the pharmacy literature (72). Only descriptors of instruction in the professional curriculum were included. 4.” (ii) “Teaching ability—laboratory. 40 students from the final professional year at the researchers’ school were added to increase
Editing and Selection of Incidents
New Dimensions. SD = 239). 58. Productivity in generation of incidents by 138 critical incident writers was broadlybased among schools. 5. laboratory or experiential teaching site. Faculty members from all disciplines wrote 1. As an incentive. a uniform format and unidimensional behavioral style. See Tables II and V.
Importance.g. nor were student attitudes or moralistic statements based on students’ belief systems.098 incidents (72 percent. students received a letter including a report providing feedback on their learning style and suggestions for adaptation to differing teaching styles and formats. Frequency of mention was a primary selection factor. distinct clusters of sufficient numbers of scaleable incidents were observed. SD = 78). based on selected good examples of clear. Clusters of incidents describing both laboratory and experiential instruction were observed. Care was taken to ensure that student raters were “upperclassmen” with exposure to instruction at all curricular levels. Behaviors must have been observable in the classroom. The researchers’ school was used as a pilot site for training in the conducting of item writing workshops. Panelists were asked to think about effective and ineffective teaching incidents they had actually experienced or observed in the classroom. The numbers and kinds of incidents were sufficiently rich and varied to enable a useful number of potentially-scaleable items to be used in the retranslation process/
Incident Selection Criteria
Ten criteria were applied in the selection and editing of incidents for the retranslation process.Critical Incident Workshops. The mean numbers of incidents written per student and faculty were 22 (SD = 4. not exclusively pre-pharmacy teaching behaviors. 8. Incidents were selected by the researchers to represent high. student educational levels and all faculty disciplines. Use of forms and presentation of the writing tasks in a clear. They were encouraged to visit with the local research collaborators for additional information about how to apply their own learning styles. After the workshops. student participants were invited to a post-workshop luncheon. School mean = 242. Only behaviors which related to one of the teaching dimensions were included. Space for low scale anchors was reserved for ineffective. incidents were reviewed to ensure brevity. No modifications in forms or procedures were made after the pilot administration. demonstrating importance of behaviors cited by multiple panelists and occurring across several types of colleges of pharmacy. Incidents collected were reviewed for their quality and content relative to the tentative dimensions of instruction.
American Journal of Pharmaceutical Education Vol. During review of the critical incidents. 9.
1. student-participants were promised written feedback on their learning styles and suggestions for adaptation to differing teaching styles and formats. school mean = 620. To enhance this. standardized and reliable manner among researchers. describing the situation and specifically what the instructor said or did. Behaviors describing unusual instructor leniency or lack of rigor were avoided because some students might perceive them as evidence of poor teaching while others might rate them highly because of perceptions that “easy” behaviors are associated with effective teaching. 3. vivid and unidimensional incidents written. they were asked to write brief “stories” about each incident they could recall. a wide variety of incidents reflecting effective. Each incident must have been clear.8) respectively. Winter Supplement 1994
. Opinions or vague general descriptors of teaching “attributes” were not used. Finally. 6. 10. “Teaching Ability — Experiential. was checked during the pilot. laboratory or experiential teaching site. It also became apparent that incidents relating to choice of media might confound ratings based on incidents describing instructor behavior in the classroom. Positive feedback was provided to the groups. mediocre and ineffective media selection and use were observed. 7. not environmental ones beyond the control of the instructor (e. The incidents must have described instructional behaviors. Separate workshops for students and for faculty panelists were then conducted at each study school. it was necessary to divide them proportionally into two booklets to be sent to two separate retranslation groups. The new tentative dimensions of pharmacy teaching were: (i) “Selection and use of media. Retranslation and Effectiveness Ratings
The final newly-identified tentative dimension. Students at the pilot plus study schools wrote 3.8) and 23 (SD = 4. Panelists were reminded that the incidents were to have been personally experienced and described as observable behaviors.” includes behaviors common to several kinds of community and institutional experiential instruction. obviously unethical or criminal activity were also eliminated. Incidents describing unprofessional conduct.
Similar incident generation outcomes occurred for the second and third new tentative dimensions. not clinical instruction alone. not for obviously uncommon and aberrant behaviors. panelists were given a list of seven tentative dimensions of pharmacy teaching to prompt them to recall and write additional incidents. Using forms provided. unambiguous and unidimensional in its meaning.”) 2. and their classification into seven tentative dimensions. Educational jargon or school-specific terminology was avoided.” While the selection and use of effective media was frequently subsumed under the effective teaching dimension in previous studies. medium and low quality instructional behaviors. students were invited to complete a learning styles inventory(69-71). Because of the large (N = 402) number of incidents retained after the editing process. Moreover.. Near the end of the workshops. the media selection and development incidents collected in this study suggested independence from behaviors describing instructor lecture performance. “This instructor’s lecture room is not air-conditioned. As a second incentive. In addition to critical incident writing. three additional.202 incidents (28 percent. Each workbook contained approximately an equal number of incidents representing each dimension.” and (iii) “Teaching ability—experiential.
50 = 0. and then to rate their importance on a seven-point scale.34. The notion of stability of BARS scales is a useful. the importance of each dimension was determined by asking students to study the dimension descriptions in Table V. marking of an effectiveness rating on a 15-point scale with 15 points being the highest (most effective) teaching performance. before describing the products developed: validated dimensions and scales.
The two dimensions rated most important were “Teaching Ability—Lecture” and “Knowledge of Subject Area.80. The mean P value for these tests was 0.22 to P = 0. they might not be selected for inclusion in the final rating scales.10 After incidents were sorted based on these criteria. ten one way analyses of variance were conducted and none yielded significant differences between learning style groups at P = 0. a grand mean of all scale anchor points for all 10 scales was computed. and a correlated means “t” test showed significant changes (lower ratings over time) for two of the three scales. Kolb has labeled these styles as “Converger”. and Riley(70. Students responded by indicating that all scales were “important” or “very important. a possible source of variance in ratings was examined. less productive for generation of “average” incidents of professional behavior. After this review. Incidents were retained as scale anchors in the respective scales if at least 80 percent of the participants agreed on assignment to the dimension and if the standard deviation about each mean scale point was 2. scale point.
The project results are reported first in terms of measures of scale quality. practical review of both of these ratings suggested that all 10 dimensions are generally considered to be important by students and that none should be eliminated from the final set of rating scales. these incidents were added to the scales only if the behavior was different than a behavior at the same. with a fiveweek time interval. This procedure was intended to show the independence of the dimensions. Application of the styles to pharmacy students’ and pharmacists’ learning has been described by Garvey et al. no dimension received lower than a 5 point rating or more than approximately 15 points. Test-retest reliability was based on two administrations of three selected BARS scales in one class.99. This also enabled the critical incidents to be retranslated/rated by students with and without incident-writing experience. Results show significant testretest correlations for two of the three scales. To determine if learning style differences had an impact on overall respondents’ ratings.” “Selection and Use of Media” and “Workload/Course Difficulty” were considered by students to be the least important. Importance ratings were considered in scale construction.77.05. or near to the same. “Assimilator. Ten working days later another letter and retranslation booklet was sent to the remaining non-respondents. 58.0. Critical incident generation is “easier” for descriptions of extremely ideal or unsatisfactory behaviors. Student raters were given four tasks—two to validate the importance of dimensions to be identified. but with respondent assignment of less than 80 percent agreement and assignment divided equally between two dimensions. one to retranslate the incidents into dimensions.” x = 5. no significant rating differences among the four learning style groups were found. four basic learning styles were identified(69). This process yielded an additional eight incidents as useable scale anchors. F3. The second task asked students to divide and assign a total of 100 points to the 10 dimensions. The third task.71).0 or less. First. Although statistically significant differences were shown between dimension importance ratings.
Sixty percent agreement is frequently cited as a selection criterion. See Table IV.the total student retranslator/rater pool to 106. The purpose was to obtain mean ratings with sufficiently low standard deviations to enable their use as scale anchors.0 in order to provide more mid-scale range anchors. Importance ratings could also be used to assign weights to each dimension in the calculation of an overall teaching performance score. After 10 working days a postcard reminder was sent to non-respondents. The researchers elected not to select items with standard deviations greater than 2. To determine if learning style differences related to mean scale ratings for individual dimensions. These first two tasks were designed to validate the dimensions by showing their relative importance to students. A standard of 80 percent agreement on assignment of incidents to dimensions was used to retain the item for scaling. with values ranging from P = 0. condition for demonstration of BARS scale reliability.” and “Accomodator”. a group of 11 critical incidents with standard deviations of less than 2. Guidelines furnished to raters for using the 15-item effectiveness scale have been previously reported(73). P = 0. A panel of five faculty members was asked to review these incidents to determine their suitability as behavioral descriptors for both dimensions. In addition. The group assigned the incidents to the dimension for which they felt the best description was provided.
Measures of BARS Quality Reliability. Using one way analysis of variance.80. Historical effects in the students’ experiencing of instruction are expected during the length of a course. After the scales were constructed using incident ratings with the greatest rater agreement.
American Journal of Pharmaceutical Education Vol. Differences in learning style among respondentraters did not relate to their ratings which established scale anchor points. Using data from the learning style inventories administered to all student incident writers who also participated in the retranslation process. some studies report that incidents with greater variances are selected for mid-scale anchors in scale construction. for each incident. Re translation booklets were mailed with a letter of explanation. Rho = 0. but not necessary. Fifty-seven students responded for a 54 percent response rate. the retranslation step. If dimensions would not be valued as being important. and one to assign an effectiveness rating to each incident.93. The final task was. If respondents would not agree that incidents were descriptive of their respective dimensions. was reviewed. Winter Supplement 1994
. “Diverger”. involved assigning each incident listed to one of the 10 tentative dimensions. based on one lecturer’s performance using a sample of pairs of ratings. the incidents would not be useable as scale anchors. Responses to the two ratings correlated highly. The final scales are typical of BARS scales generally in terms of distribution of anchors at the high and low ranges of the scales. Inter-rater reliability is reported in Table IV. Based on the 100point forced distribution. Measures of inter-rater agreement and of stability were conducted using a limited number of volunteer faculty.
one which rated two lecturers and one which rated clerkship instructors. Almost always 19. Did the instructor make good use of examples and Illustrations? Yes. The instructor promoted an atmosphere conducive to work and learning. They noted that items and scale anchors involving quality of laboratory instruments could cause low ratings which. if not kept confidential. Workload and Teaching. Ratings for the experiential rotations were obtained by asking volunteer students from the final professional year and recent alumni who were new. Was the grading system for the course explained? Yes. Raters included all students in attendance at one third professional year lecture. The instructor summarized material presented in each class. How accessible was the instructor for student conferences about the Available regularly course? 15. very unclear Poorly related Almost never Seldom Never available Strongly disagree Strongly disagree No. very dynamic 18.
Concurrent Validity. Two senior faculty members.13 scale points lower. It was easy to hear and understand the instructor. A numerically-anchored media scale was not constructed because the two lecturers did not use media other than assigned readings and the chalkboard. Because rotations were systematically scheduled. The researchers first identified all catalog items which related to the content described in the ten dimensions. The instructor’s clinical demonstrations were clear and concise. How would you characterize the instructor’s command of the subject Broad and accurate 22. Agree _ _ _√ _ 12345 _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ Disagree
1.” This procedure provided sufficient num-
bers of raters and eliminated ratings of “first-preceptor” student-faculty relationships. Yes. Faculty members responsible for laboratory instruction declined participation. might adversely affect their departmental and college-wide performance reviews. All four BARS means were lower. Strongly agree 7. a statistically significant difference. Although these data may suggest that the BARS produce less leniency in ratings. The instructor was a dynamic teacher. Almost always 6. possibly attributable to their unambiguous scale anchors. EXAMPLE…The Minnesota Twins will win the World Series in 1993. Numerically-anchored scalea
Name of the instructor being rated: PLEASE COMPLETE THE FOLLOWING RATINGS SHOWING YOUR EVALUATION ON THE FIVE POINT SCALES BELOW. very dull Almost never Almost never poor Plainly deficient No. ICES. Yes. How well did examination questions reflect content and emphasis of the course? Well related 12. with courses in lecture format. 58. The instructor attempted to cover too much material. Champaign-Urbana. central tendency and halo effects.
Scale Properties and Error Reduction
American Journal of Pharmaceutical Education Vol. seldom Almost never Almost never Strongly disagree very unfair. Strongly agree 16. After team-taught courses and unwilling volunteer instructors were eliminated. 31 items were selected which most closely matched the behaviors described in the dimensions tested. Were exam questions worded clearly? Yes. BARS and numerically-anchored scales were compared for scale properties contributing to leniency. No. Almost always 13. Selected scales were administered to two groups of students at the researchers’ school. very well 9. The instructor’s knowledge of the subject was: excellent 21. BARS ratings were correlated with corresponding numerically-anchored scales constructed with selected items from the catalog of items available from the university’s Office of Instructional Resources. Winter Supplement 1994
. The instructor stated clearly what was expected of students: almost always 3. seldom
Item source: Instructor Course Evaluation System. The instructor was sensitive to student needs. very clear 11. The instructor stimulated my intellectual curiosity. University of Illinois. Comparison of two scale properties suggest that BARS have produced less central tendency rating effect. Moreover. Always 14. very often 4. the two available participants received very high ratings with low variance and ranges of ratings. Then. The grading procedures for the course were: very fair 8. The instructor listened attentively to what class members had to say. always
very unclear almost never No. The variance in ratings was greater for all BARS scales. The course objectives were: very clear 2. The instructor seemed well prepared for classes. volunteered for the study and signed releases which were included with the written rating instructions provided to students. The instructor motivated me to do my best work. The mean BARS rating for four scales was 1. All but two of the correlations are positive and significant.Table III. Interaction. it also ensured representativeness from the wide variety of required clerkships offered. The amount of graded feedback given to me during the course was: quite adequate 10. it is not clear which scale best represents a “true” rating of instructor performance. Almost always 5. Evidence for less leniency effect in the use of BARS was provided by comparing the means for both sets of four selected scales: Evaluation. The final 22-item numerically-anchored scale appears in Table III and its construction in relationship to the ten dimensions is described in Table IV. Almost always 20. first-year members of the clinical faculty to rate their “second preceptor. Strongly agree 17. not at all not enough No.
Workload/course difficultyj I.71.” Three’ scales are environment-specific: “Teaching Ability—Lecture.31 Testretestg 0.35 0. The mean percentage agreement on assignment of incidents to dimensions. 79. 3 are experiential preceptors (N = 30 raters). Laboratory instruction might also include evaluation of selection and use of media. Ten independent dimensions of pharmacy teaching were identified.33 0. 1 for the Knowledge of Subject Area scale. and 3 BARS scales are non-significant jLow scale reliability. ICES N of Scale Reliaitems items bilityd Rateee 3 3-5 0.05. Teaching Ability-Laboratory. d Cronbach’s “alpha”.82 0. c Ratees 1 and 2 are lecturers (N > 80 raters) and ratees no.” and “Teaching Ability—Experiential.51 0.85 0. Knowledge of Subject Area
1 3 3 22k
16 17-19 20-22
0. By combining the scales as the table suggests. f-h All correlations positive and significant at P<0.24i 0. Instructor and Course Evaluation System.” “Teaching Ability—Laboratory. Halo effect was compared by examining correlations of measures with each other within BARS and within numerical scale types. BARS and ICES.72 0.75 0.” and “Teaching Ability— Experiential.78 2 1 2 5 6 1-2 0.” “Teaching Ability—Labora32
American Journal of Pharmaceutical Education Vol. A total of 134 critical incidents “survived” the retranslation and effectiveness rating process.18ns
H.76 scale points illustrates strong student rater agreement on the level of effectiveness of each scale point. They are described in Table V and include three previously-unreported new dimensions: “Selection and Use of Media. either seven or eight dimensions of teaching may be measured in three pharmacy teaching environments.63 1 0. 0. 8 Correlations between 2 measures at 5-week intervals. h Showing concurrent validity.26ns 0. N = 50 pairs. The mean intercorrelation for all four BARS was 0.
Dimensions.” The other seven scales apply to all three teaching environments. k Nine items deleted from original scale to develop reliable subscales
comparison of the modal ratings for both sets of all four scales shows that the BARS yielded modal ratings which were farther from their respective scale mid-points than their adjusted numerical scale counterparts—total differences of 14.6 percent.28 0.
tory. SD = 0. single item used.63 Correlation statistics rh N 0. The process and results are summarized in Table VI.43 0.50 0. f Based on 60 randomly-selected pairs of ratings.10. University of Illinois. c Two scales not tested: B. Course organization F.45 0. except “i” where P=0. Reliabilities and concurrent validation of BARSa using parallel ICESb scales and items
Dimensionsc A.3 scale points. 8.01.80 1 2 1 2 3
BARS reliabilities Interraterf 0. Evidence for lower halo effect for BARS was not found.39
Behaviorally-anchored rating scales. and lecturer ratee no. nearly met the 80 percent retranslation goal. suggesting that raters are less apt to allow performance in one area to affect their ratings in another. Selection and use of media.83 0.96 2 1 2 3 1 2 3 2 3 0.9 vs.32 0.72 0. Teaching ability-lecture C.14ns 0. The mean standard deviation of 1. 58. Preceptor ratees for the Workload/Course Difficulty item. Student-instructor interaction
0. BARS Scales.Table IV. and for the numerical scales. Enthusiasm/motivation J. Teaching ability-experiential D. with a range of from 10 to 19 incidents used as anchors per scale.43 89 0.90 0.83 7-11 0. SD = 0.64 0. Student performance evaluation
Numerically-anchored scale statistics.47 0.78 0. Winter Supplement 1994
. Three sample scales appear in the Appendix.45 87 0. and E. If scales show a low inter-correlation their independence is demonstrated.19ns 0.49 30 90 86 89 30 82 87 30 80 30 86 89 86 90
0. respectively.66 0.19.77 0.
For colleges of pharmacy which participate in universitywide rating systems. Atmosphere conducive to learning. Winter Supplement 1994
. D-J. Workload/Course Difficultyb Scope of content.. Concise. Knowledge of Subject Areab Well-prepared. The kind and quality of instruments in current use for student rating of pharmacy faculty teaching varies considerably. introduce.
The project yielded four major outcomes: (i) validated dimensions of teaching performance for use in development
American Journal of Pharmaceutical Education Vol. Reliability studies on such expanded scales are recommended. Selection and Use of Media Effective use of slides. Initiatives to help students. Teaching Ability—Laboratory Availability of equipment. For schools not required to participate in university-wide teaching evaluation systems. Use of BARS in Faculty Development. nor is administration by persons untrained in performance assessment. Interpretation and explanation of concepts .C. Concern about student learning. E. to systematically select. Fair. augmented by additional I. Sufficient time and access. Laboratory and Experiential. These BARS and parallel traditional scales are the first to be based on the ten new independent dimensions of teaching performance unique to pharmacy education. Experiential teaching evaluated on dimensions C. use of the traditional scales developed for this study. Use and continued research and development of these scales in multiple pharmacy schools would provide additional positive returns on the research and development investment. Length and difficulty of assignments. Concurrent validation of the BARS using specially-constructed numerical scales of parallel content has an additional useful outcome. Coverage of content. I.
G. Each administration should be man aged by a human resources expert familiar with development and administration of this type of performance rating scale. assignments and student expectations.E. because of the vivid behaviors they portray. and iv) BARS for use in faculty development. Responses to student difficulties.
J. Effective use of chalkboard. D. C & E added on basis of critical incidents surviving the retranslation/rating process. Application. Feedback to students.
F. Teaching Ability—Experiential Demonstration and supervision of learning experiences. One of the many characteristics of BARS is that.S. and monitor the scales. Existing traditional numerically-anchored items of high quality may be combined into scales for the 10 unique pharmacy teaching dimensions. Student-Instructor Interactionb Availability for consultation. Care should be taken.
C. With such revised scales in place. content. Dimensions B. Enthusiasm/Motivationb Dynamic in presentation of subject.S. however.
or revision of traditional scales. Stimulation of student thought and interest. Potential user schools should utilize a designated testing specialist for BARS scale administration.E. For schools using the I. Clear. Course Organization11 Clarity of scheduling. overheads. Explanation of method. texts. (ii) reliable and valid numerically-anchored I. could provide reliable scale scores based on the dimensions. models) Student Performance Evaluationb Lecture. Relationship to course content/objectives. Demonstration before performance. Competent in field. Detail of content outline.E. Utility of the Dimensions in Local Scale Development or Revision. administration. objective grading. b Tentative dimensions identified at onset of study. Emphasis and summary of main points. Reasonable due dates and project deadlines. reagents and ingredients. Teaching Ability—Lectureb Audible and clear speaking.
Classroom teaching evaluated on dimensions A. BARS scales are expensive to develop and maintain. similar possibilities exist for within-school scale modification and improvement. items or other items descriptive of the ten dimensions. system. Safety. Clarity of learning objectives. Such scales may be used to report performance ratings with higher reliability than is possible with a series of individual items. the project offers guidance for the college to work with the central agency responsible for managing the faculty evaluation program. Listening to student questions and concerns. (iii) reliable and valid BARS for administration. Unsupervised scale use by students is not recommended.C. This tends to cause a favorable shift in teaching behaviors and an inflation of ratings based on
B. useful reporting. Knows limits of expertise. Pharmacy instruction dimensions Dimensionsa
A. Use of BARS has been most successful in organizations where persons being rated have had input into the scale development process and where the scales are professionally-administered(74). Use of Equivalent Form ICES Scales. Professional and patient communications. If the central service agency does not offer items to rate performance in all of the new pharmacy teaching dimensions identified. within-pharmacy norms is possible. D-J. videos. H. item-writing and validating activities are called for to complete the locally-developed scales.C. Following the course outline and objectives. Conveying a helpful and supportive attitude. handouts. F-J. development of local.
D. The expected project outcome of reliable and valid scale development was accomplished and the product is available for use in schools of pharmacy. administer. Use of examples and illustrations. Supervision. unambiguous questions and assignments.Table V.S. Administration of BARS Scales. Practice. scales. Laboratory teaching evaluated on dimensions B. not rote memory. Interest in student outcomes. faculty ratees are prone to adopt effective teaching behaviors and to abandon those associated with low scale ratings. Motivation of students to do their best work. Availability for help after class. 58. Sensitivity to students’ needs.
particularly laboratory teaching. Similarly. two methodological and one philosophical. including those with little teaching experience.4 Range/ scale 10-19 Mean 79.76 Range 1. “Independence/Assertiveness” and “Handling/Coping with Detail”. The foundation for scale construction was the commonality of seven factors established and named in previous studies. Numerous items describing substandard professional behavior were eliminated from the scales. for student ratings of “Knowledge.g. accommodations to instructional formats and styles may be made. effectiveness ratings Mean 1. their own learning styles and willingness to expend effort. would broaden the range of
talent being rated. Instead. should be asked to fashion a tentative set of dimensions based on the critical incidents. e.C. especially those which were rejected for scale use because of wide rating variance.1-2. to a much wider variance in ratings than for the two volunteer lecturers. 77). allowing less reliance on senior students from the pilot school.S. served to measure this reliably. Cooperation of additional volunteer instructors. if discovered. It may also be possible to classify additional teaching behaviors based on personal attributes of the instructor. Second. Although sufficiently represented to permit rich incident writing contributions from upperclassmen.” students deal with perceptions rather than facts about the instructor’s knowledge. Perspectives of additional mature students’ writings would have enhanced the pool of incidents. Summary statistics. 58. Significance of item variance differences between learning style groups. Taxonomical Classification of Incidents with Ethical Implications. Such personal dimensions. Because this study stressed observed behaviors. in part.E. Review of scale development ratings of critical incidents depicting “Workload and course difficulty” showed that some students approach ratings for this dimension in terms of relative “ease” of workload. This study demonstrated that the mean BARS ratings for items selected in scale development were not affected by students’ learning styles. First. The emphasis in scale construction and use has been on the advantages of unidimensional observed behaviors as scale anchors. it did not create global descriptors of instructors’ “personality. retranslation and effectiveness ratings Percent agreement on N of useable relevant incidents dimension Dimensions 10 Total items 134 Mean n/scale 13. both types of scales are subject to students’ perceptions of appropriate input and effort vs. This emphasis enabled identification of ten discrete dimensions of teaching performance. Study Constraints Three constraints. Topics for Future Research Reliability.6-100
Standard deviation. Both lecturers received very high ratings on both types of scales. faculty member commitment from study schools for the purpose of concurrent validation of the scales was not sought at the onset. may offer insights to instructors for possible instructional style and performance accommodations based on specific observed teaching behaviors. volunteered and with limitations on their available class time. a larger number of volunteers from the final professional year could have enhanced the study.” Moreover. Such studies should be expanded to include all of the dimensions of teaching in all environments. Heartfelt introspection about these unforgiving “snapshots” of what students think of their instructors’ teaching could result in re-dedicated commitment to improved teaching. Thus. Ongoing reliability studies are planned. Personal Dimensions. Reliability studies will also be conducted on expanded versions of the numericallyanchored I. This desirable side effect of BARS use suggests that their greatest contribution may be in the provision of highly-effective faculty performance feedback. Only vivid examples of lack of preparedness in the classroom. Perhaps students. the dimensions were not created or edited by students. The low concurrent validity correlations for two scales require additional study. both highly-experienced.6 Range 60. to student differences in perceptions. scales. Winter Supplement 1994
. Research is continuing on the effect of learning styles on all 402 critical incidents which were subjected to retranslation and effectiveness ratings. not educational researchers. Third. and not in their reliable and valid performance assessment capabilities alone. the factor-analytic basis for classifying teaching behaviors was not challenged in this study. The higher correlations for experiential courses were due. have been previously reported
American Journal of Pharmaceutical Education Vol. the known disadvantages of using study volunteers is evident. others in a more normative sense in terms of perceived “appropriateness” of the amount of work assigned. as measured by the BARS scale. volunteers were obtained only from the researchers’ school. such clusters may “cut across” many of the ten validated performance dimensions. thus narrowing the range of responses. This required administration of only part of the scales.Table VI. More importantly. Low correlations for the Workload item and the Knowledge scale are attributable. The utility of BARS in providing performance feedback is well-established(75). may have limited the study outcomes. When students are made aware of their personal learning styles.0
improved faculty performance. It is possible that students have a discerning and reliable way of “knowing” qualities of instruction and may be able to organize and describe an of instructional qualities more efficiently than researchers who begin with factor-analyzed groupings of teaching behaviors and who insist on working only with descriptions of observed behavior. More “trait-like” than the observable performance-based dimensions identified. participation by a larger proportion of “seniors” from study schools would have enabled their utilization in larger numbers for the retranslation/rating steps. without prompting of previously-named factors or the dimensions identified in this study. Only two lecturers. Research on Learning Styles. in part. A review of the bank of critical incidents for purposes of classification into available taxonomies of ethical behaviors is planned (76.
Princeton NJ (1976). 28. are gratefully acknowledged. J.) Wiley and Sons. 102-107(1983). Pharm.O.A. West Nyack NJ (1968)..... separate dimensions have been identified and scales have been developed for three pharmacy teaching environments.. 50.. Sage Publications. (8) Dickinson. (39) Kulik.. (12) Smith. M. 165-166(1976). Psychol. (47) Kingstrom.P. p.
Am. Newsletter No. “Behaviorally anchored rating scales: A review of the literature.” A factor-analyzed pharmacy-student evaluation of pharmacy faculty.E. 193-195(1986). (1). 40. Personnel Decisions. and Course Characteristics and Student Ratings of Teacher Effectiveness..C. Jr. and Wright.A. (2).A. 44. providing essential support in selection of representative faculty and student groups and making logistical arrangements for conducting the critical incident writing workshops. it is hoped. Vol. K.C.” ibid.. (7) p. 2(1974). p. “Characteristics of the effective teacher.. 65. P. 25-26. 149-155 (1973).M. The Teacher Evaluation Handbook. (3) Centra. Fox effect: A study of lecture effectiveness and ratings of instruction.. 183.. “Critical issues in teacher and student evaluation. (29) Op. Pharm... 46.. 317-325(1977). and DeCotis. Englewood Cliffs NJ (1988) pp.G. (18) Op.. R. Berkeley CA (1971) pp. New York NY (1983) p. (1992) pp.” ibid.F. Teachers. “A critical analysis of studies comparing behaviorally anchored rating scales (BARS) and other rating formats.F. A. (2). Behav. Appl.” Am.G..R. E. J. J.C. 205.. Homewood IL. Finally..C. 428-430(1980). 50... 115-120(1979) (45) Lipman.. (31) Kotzan.. L. R.. R. the greatest impact may be the “mirror” which these BARS have provided into pharmacy teaching styles and behaviors. blur and fade. Enhanced by broadly-based input.D. Cit. “Behavioral prescriptions for faculty based on student evaluations of teaching. and Barnowe. (edit. 47.. and Troutman. (11) Encyclopedia of Psychology.C. Educ. teaching behaviors: “Student Interaction” and “Enthusiasm/ Motivation. Sci. 51. cit. 3. (33) Purohit.G. D. Education by Appointment. R. (14) Op. 40. (35) Sauter.) John Wiley and Sons. (48) Campbell. J. (38) Carlson.T. (43) Kiker.. J. “Evaluation of teaching. (27) Op. 8-13(1976). 183. 18-19. J.. 3-7(1976).G.. 149-156 (1975). “Faculty evaluation and development issues: Clinical faculty evaluation. Inc. Dunnette.N. (13) Op. Braskamp. (19) Op. L. 53.J. and Kendall... (34) Op. 217222(1979) (46) Schwab. 446-448(1975). 721-723(1973). Jr. (44) Grussing. Sutherland. R. Cit. and Zellinger. (25) Brandenburg. P.. and Hellervik. References (1) Centra. Corsini. J. and Bass. Scale anchors refer to instructional behaviors only. 12. 149. Wilson.” Memo to the Faculty. p. D. and Trinca.E. 39. Psychol. 147-154(1980). Identification of such teaching dimensions could supplement these BARS. approaches to more reliable use of traditional.E. J. R... T. problems with rater errors have been reduced.. Office of Instructional Resources. and Nelson.” ibid. R..E. R. P. 450-452(1975). “Retranslation of expectations: An approach to the construction of unambiguous anchors for rating scales. Newbury Park CA (1984) pp. cit. the scales are generalizable and available for use in all types of colleges of pharmacy. (42) Zanowiak.. 58. (37) Downs.” Am. L. The Relationship between Student.. A.C. T. Brandenburg.” CEDR Quart. (10) Ibid.” ibid. have painted their multi-colored picture of the teaching landscape. 549-562(1975). D.” Third.757. Acknowledgments. “Chair report of the AACP Council of Faculties Ad Hoc Committee on Promotion and Tenure.R. 49-50.. (7) p. Prompted by faculty review of BARS. “Development of program guidelines for community and institutional externships. (2) pp. J. (20) Op. Manasse.G. H. R. and Cyrs. accepted 1/23/94.. “Development of behaviorally-anchored rating scales for pharmacy practice. 31..R. “ICES norms”. A. 4-13.” J. 114-118(1978). Urbana IL (1977-83).” ibid.A. Center for Research and Development in Higher Education.. The assistance of Mikyoung Choi. J. 5th ed. (36) Martin. 263-289(1981).. Parker Publishing. “A theoretical model for faculty ‘peer’ evaluation. (4) Measurement and Research Division.” Can. Evaluating University Teaching..” Am.” ibid. 58. (2). Univ. “Evaluation of teaching: One faculty member’s viewpoint. 2976. “Behaviorally anchored scales for assessing behavioral science teaching.D. H. J.
(9) MacMillan Dictionary of Psychology. Jossey-Bass. Pharm.. G. Pharm. 1. Prentice-Hall. “Development and implementation of a factor-analyzed faculty evaluation instrument for undergraduate pharmacy instruction. Winter Supplement 1994
.. This vivid painting as a rating instrument may. received 9/29/93. B.” ibid. 51-52.. A. “A planned program for evaluation and development of clinical pharmacy faculty. (11). not to extraneous conditions beyond the instructor’s control..49. however. For better or worse. Debra Agard and Trena Magers in data entry.. global student descriptors of instructors’ “personality” have been replaced by measures of two kinds of important. Fourth. (21) Ivancevich. 43. and Creech. 25-37(1994). T. “Behaviors. T. C. First. 43. Project Report 76-1. p. (24) Op. P.. “A comparison of the behaviorally anchored rating and mixed standard scale formats. Appl.” ibid. Champaign-Urbana IL (1977)..A.A. numerically-anchored scales have been suggested. Second.. 4-5. Irwin. and Ory.L.. with input from faculty incident writers.. (16) Ibid.” Personnel Psychol.. “A panel: The evaluation of teaching in schools and colleges of pharmacy. P. and Dienst.M. (8) p. 80. (41) Peterson. 7-11. and Walker. P. in literature review.P.L. (7) Hildebrand. (28) Das. (edit Dunnette.A.W. (32) Jacoby. and Mikeal. 39.
This study has addressed the problem of rater error in several ways. A. 327-327. (5) Op. cit. Cit. p. cit. (17) Op.L. 205.. 41. and McMahon.E. P. J. in terms of scale content.” J. Determining Faculty Effectiveness.D.. Human Resource Management. 47.A. M. 653-63 (1975). J. (2). Educ.V. (22) Cain-Smith. Sevy. 7988(1979). S. (30) Kotzan. J... D. San Francisco CA (1979) pp. F. Cit.R. 42. Educ. 48. 16-22. Pharm.” Am. (edit. (23) Wotruba. Perrier... L. the desired outcome of improved teaching could then demand even more sensitive measures and compelling reminders of how the teaching/learning enterprise might continually be enhanced. H. Silzer. 6-9.G.C “Considerations for an evaluation program of instructional quality..A.. New York NY (1984) p. (3) p..E. M. Arvey.R.
American Journal of Pharmaceutical Education Vol. Educational Testing Service.. and helpful consultation with and comments by Bruce A. (15) Manning. cit.for BARS describing pharmacy practice behaviors(78). W.” J. (2) Braskamp. Cit. and Ory. J.) MacMillan. Educ. Educ. results and organizational effectiveness: The problem of criteria. J. 18-20.... Unpublished Report. P.” ibid. Med. enhancing the ability to more completely and accurately describe the characteristics of effective teaching. Frost. (6) Ware. p. Evaluating Teaching Effectiveness.C. Higher Educ. Heneman III. “How to develop a teacher rating instrument: A research approach” J. “The Dr. and Williams.M.. Office of Instructional Resources. of Illinois. 812(1979)... and observable. P.D. London (1989).. p. students. 21. University of Illinois. M. 40. D. 43.” Nursing Outlook. and Entrekin. p.” in Handbook of Industrial Psychology. cit. 11. R. (40) Brown. J. J. J. p. Several colleagues collaborated in research at four anonymous colleges of pharmacy. (26) ICES Item Catalog. Educ.
cit.. (50). Krausz. 513-515(1974). 116-117.The development and evaluation of behaviorally based rating scales.” This instructor used new scientific and professional terms freely. (68) Penna.” Educ. and Sherman. this instructor looked to the class. Higher Educ. and McGhan. Consider the Typical performance level on this dimension for your ratee. 41. McBer and Co.5
5. T. C... 2. S. 53.. 15-22 (1973). (58).” ibid.” J.. T. 51.” Psychol..” and continue lecturing. 568-576(1982). Winter Supplement 1994
.J. paused. (56) Jacobs. cit.. “Enrollments in schools and colleges of pharmacy. Psychol. H.. Goal Attainment Scaling: Applications. Psychol.M. Seattle WA (1992). (46).1 1211. 3. Carefully read the dimension and supporting examples (in parentheses) Read each performance level on this dimension for your ratee. p.I. p.W.. Bootman. M. 410-417(1976). C... (50) Harari. 261-265(1973).. (54) Op. (44). S.” ibid. or This instructor taught several approaches to solving problems. D. (59) Blood. (51) Flanagan.9
At the beginning of each class period. Doing Right by Students: Professional Ethics for Professors. 249-252(1974). Appl.M. and Measwrcmenf. (60) Zedeck. (8). B. D. this instructor briefly summarized the previous lecture and outlined the present lecture. “Learning styles: A comparison of pharmacy students.I. p. “Effectiveness of performance feedback from behaviorally anchored rating scales.. Imparato... 58. (52) Op.8 2. W. A. Bull.” Org. saying “sorption” and not conveying whether adsorption or absorption was meant. cit.” ibid. M. and Zedeck. R. cit.8
10. (78) Op. A.” ibid.J. (64) Champion. This instructor began each class period by asking students if they had any questions from the last class period. M.B. 59.. P. 1988-1989. cit. cit.. (44)..” Personnel Psych.. 5... 561-565(1975). 134-140(1984). 117. but also rationale supporting them.D. mumbling through lectures. 67.M. Human Perform. Boston MA (1985). and Cardillo. This instructor did not speak clearly... assuming that students already knew them. (49) Borman.3 12. Erlbaum. “An assessment of learning styles among pharmacy students.7
APPENDIX: THREE SAMPLE BARS SCALES FOR STUDENTEVALUATIONOFPHARMACYINSTRUCTION
INSTRUCTIONS TO RATER: 1. and Maddox. Hillsdale N J. (76) Counelis. cit. faculty and practitioners. J.” ibid. M. Behav.S.C. “A recomparison of behavioral expectation scales to summated scales. this instructor would say “You only need to listen to what I am saying. W..J.. 327357(1954). (66) Elenbaas. S. “Development of behaviorally anchored rating scales as a function of organizational level. 61. p. W.. 33-36 (1987). DeNisi. 554-555. This instructor not only described concepts and process. Compare his/her typical performance with each of the performance examples. Kafry.. Learning Style Inventory Interpretation Booklet. cit. and Cranny.” Am. 266-273.F. 84-86(1993). then erased the notes before students completed taking them down. (1993). J. (75) Op. pointing out rationale for each method. (44). and Dunnette.B. “Behavior-based versus trait-oriented performance ratings: An empirical study. 48.. 59..
American Journal of Pharmaceutical Education Vol. cit. 51. “Expectations of behaviorally anchored rating scales. (61) Op. 93-103(1970). D.E. J. (67) Nelson. 58. (55) Landy. “Shortcut methods for deriving behaviorally anchored rating scales. Effective use of chalkboard. and Guion.. or When students would ask this instructor to please repeat a point made in lecture.5 4.. p. Circle the scale number (1-15) nearest to the performance example 2-
2. Psychol. 48.J.J.” J. When new drug products entered the market. graduate students. cit. Educ. p. Fagg.. J.N. A. N.” ibid.” J.” Am. Sauser. (62) Op. J..3 4-
3. 33. and Champion.” ibid. “Spin-offs from behavioral expectation scale procedures.L. 60. PhD Dissertation.D. J.. and Sauser.C. (47).
4.H.R.. R. Appl.6
This instructor frequently said “Aahhh” or “Ummm” between phrases and sentences. University of Washington. R. A.S. W. (77) Fassett. (72) Op. and Oleno. M.P. J. 29-41(1988). “Evaluation of students in the clinical setting. Theory. TEACHING ABILITY -LECTURE (Audible and clear speaking.E.. (28). “Development and evaluation of shortcut-derived behaviorally anchored rating scales..H.A. Alvares. This instructor did not enunciate clearly. “Development of behaviorally anchored scales for the evaluation of faculty teaching.. 354-363(1992). S. (65) Keresuk. p. 150-153. 263. (73) Op. O. “The critical incident Technique. “Toward empirical studies on university ethics.
which best shows his/her typical performance in this dimension. This instructor wrote notes on the chalkboard faster than students could comprehend and record them. p.. 56. Use of examples and illustrations.. (58) Hom. R. this instructor frequently used them in examples illustrating therapeutic aspects of the active ingredient(s).G.. 270-302(1989).
11. (53) Op. (63) Green. (70) Garvey. “An assessment of the mastery of entrylevel practice competencies using a primary care clerkship training model. Interpretation and explanation of concepts. A. F. C. (74) Op. R. K.A.R. (57) Bernadin. 40. Meas. When lecturing from overhead projections. 761-775(1981).S. This instructor lectured “over the heads” of the level of intellect of the students. and Zedeck. 64.. 574. “Development of scales for the measurement of work motivation. 595-610(1980). Follow the same rating procedure for all 10 dimensions.. Emphasis and summary of main points..) Performance Example
Rating EXCELLENT 15141312. 570. and Bannister. When overhead transparencies were removed before students could complete their notes. (71) Riley. S.” ibid.” ibid. Pharm.. 57. Green. W. 116.. (69) Kolb. Kinicki. Pharm. (58). the instructor would say “Get it from your neighbor.. 564-570 (1976). asking if there were any questions.. p.6
9 This clinical preceptor’s exams were patient-oriented in case format. 3.
Rating Performance Example EXCELLENT 1514131211.7 After arriving late for conferences.6 This instructor coordinated a team-taught course in which lecturers had no idea of what other lecturers were teaching.0 This instructor distributed his/her syllabus two weeks before the end of instruction. Clear. this instructor would ask “What are we supposed to lecture about today?” 2. 109. Following the course outline and objectives)
F. content.9 This’ preceptor’s constructive feedback included reasons for needed improvement as well as positive outcomes of things the students did well. 2. saying “No one is perfect.6 This instructor administered multiple-choice exams containing not less than twelve responses per question. 2. plus helpful comments on each question. answers missed. 32. 4.
Rating Performance Example EXCELLENT 15141313.0 This instructor had a policy of not assigning “A” grades. Application.2 This clinical preceptor refused to give students their final rotation evaluation until they turned in their evaluation of the preceptor first.1 This instructor never had sufficient copies of handouts on the first day of class.3 This instructor frequently delayed lecture ten minutes while returning to his/her office for forgotten lecture notes.11. Detail of content outline. Explanation of method. 987655. 3. even if asked.9 This instructor.0 This lab instructor based grades on results and not on explanations of process used to obtain results. this clinical preceptor would spend additional time to collect materials and get organized.7 This instructor’s course included content which was duplicative of previously taught prerequisite “material”. 109. 98765.2 Unknown to college administration and students. assignments and student expectations. 10. term project and grading policies. 58.5 This preceptor did not give student performance feedback. 3.D. 1211. 2. this instructor made examinations available via computer where students could see the correct answers. this instructor arranged for a T.5 During the next lecture after an exam. to teach the entire course.7 After arriving late to class. Winter Supplement 1994
.5 This instructor provided students with written exam.” 1POOR
American Journal of Pharmaceutical Education Vol. not rote memory). administration. but did not make it available until the third week of the term. the content was not standardized between sections.0 This instructor reviewed learning objectives before each examination.0 After exams.A.2 This instructor provided practice quizzes on computer terminals. 2. saying that he would work it up as the term progressed.3 This instructor’s course syllabus contained helpful suggestions on how to take notes.” 3.1 This instructor did not return midterm exams until one day before the final. 2. reasons for the grade assigned. 3.4 This preceptor conducted weekly performance feedback sessions with all externs.6 This instructor encouraged students to submit term papers early so that feedback could be provided enabling revision before the due date. 2. 2. named “trivial pursuit” by the class.5 This instructor did not proofread exams and made corrections on the chalkboard only after students detected errors during the exam. 10. 2. with specific student performance behaviors. 2. 21POOR
STUDENT PERFORMANCE EVALUATION (Lecture. and general expectations for student performance.9 This instructor frequently arrived late to lecture and then would run overtime with lecture. 54. This instructor began the course without a sylla bus.9 This instructor provided only one description of how grades would be computed—”totally bell curve. 3. 1110. objective grading. 43.
COURSE ORGANIZATION (Clarity of scheduling. study for exams.8 This instructor’s exams were so long that it was impossible to complete them in the time allowed. 22.8 When this instructor divided a class into recitation sections. 11.5 This clinical preceptor was unable to document. Feedback to students.4 This clinical preceptor told students “up front” what was expected and followed through with learning situations. Unambiguous questions and assignments. Laboratory. clarity of learning objectives.8. Fair. and Experiential: Relationship to course content/objectives. 12. tested on facts which were least emphasized in class. 11.1 This instructor wrote a special text for the course. this instructor reviewed the questions most frequently missed by students.