You are on page 1of 22
CHAPTER I2 Evaluation in Health Science Education Eryn A. SANA Overview ‘This chapter is an introduction to educational evaluation as it applies to various areas in health science education. Distinctions among commonly used terms to refer to evaluation are initially clarified and placed in their proper perspectives. ‘The different purposes, types, and models of evaluation are discussed with their corresponding examples. The chapter also includes a discussion of the basic processes involved in doing educational evaluation and what a final report should contain. Objectives Ac the end of this chapter, you should be able to: 1, Discuss the nature of educational evaluation according to its different purposes and types, 2. Determine the various areas in health science education where educational evaluation can be applied, 3. Distinguish the different evaluation models and their respective uses, 4. Identify the differen requisites for good evaluation instruments, and 5. Explain the different steps in conducting educational evaluations. A. The nature of educational evaluation Educational evaluation is most frequently used for the purpose of grading and classifying students. Itis designed to determine who among the students fail, succeed, or simply get by in a given course (Bloom, Hastings, and Madaus 1971). Toward this purpose, teachers engage in a series of important activities of testing, measurement, grading, and assessment. The usual practice of teachers is to administer a test at the end of instruction. When teachers construct a set of questions for students to answer, in written or any other form, they Enaluation in Health Science Education + 183 are engaging in a testing activity. After the test, the students’ answers are reviewed and scores are assigned according to the number of correct responses. “This is measurement, the act of determining the degree to which a learner possesses a certain attribute (Popham 1993). Based on this practice, teachers can more precisely measure the learner’s status with respect to the number of correct answers that have been set ahead of time, After measurement, each student is assigned a corresponding grade. This process refers to grading, which is usually followed by a decision to pass, promote, remediate, or fail 2 student. On the other hand, the collection of data and organizing them to measure how the learners have achieved the expected levels of competencies asa result of instruction refers to atsessment (Best and Khan 1989). This concept is the one closest to educational evaluation in terms of scope and purpose and, in fact, until the present, is being used interchangeably with it. In this chapter, the term “assessment” is applied to evaluation of student achievement, while the broader term “educational evaluation” refers to.a much wider scope in health professions education. Educators support the modern perspective of educational evaluation to go beyond grading students. After all, education involves several activities in teaching and learning which could and need to be evaluated regularly. The curriculum, its other educational components, in the form of short programs and projects, and even the performance of teachers and not just the students must be evaluated. Along this context, Worthen and Sanders (1987) enumerated the following roles of educational evaluation: To provide a basis for decision making and policy formulation, To assess student achievement, To evaluate curricula, To accredit schools, ‘To monitor expenditures of public funds, and To improve educational materials and programs. AY PYNo “These roles imply the comprehensiveness of educational evaluation. It entails appraisal of not just student achievement but also of all other identifiable educational phenomena, e.g., recruitment, retention, and promotion of students and teachers, innovating the curricula or any program, or integration of certain advocacies as part of the school’s thrusts. This chapter presents a synthesis of several definitions and roles to define educational evaluation as the determination of the worth of any educational phenomenon or product through a systematic, formal, and scientific collection, organization, and interpretation of data. Its final output is a recommendation of decisions the educational managers have to make regarding the educctional phenomenon or product evaluated. “The basic specific purposes of educational evaluation (Worthen and Sanders 1987) arc: 1. To determine ways by which to systematically improve an educational product or phenomenon. This can be in terms of identifying needs, selecting the best strategies from among the known ones, monitoring changes as they occur, and measuring the impact of these changes. 184 + Teaching and Learning in te Health Sciences 2. To establish the cost-benefit analysis of the program being evaluated. Especially in educational programs that require scarce state appropriations, evaluations should be built- in in these programs to justify for continuous appropriations. 3. To test the applicability of known theories on student development. ‘The need for systematic and often subtle information to supplant or confirm casual observations is what generates, the need for evaluation. 4, To appraise the quality of the schoo! programs and to constantly seek ways of improving that quality. This is a professional responsibility of educators. 5. To satisfy the need of funding agencies for reports and updates to legitimize their decisions and improve their public relations through credible, empirical decision making. Exercise 12-1 Analyze the following scenarios and identify the appropriate process to be used. Your choices include testing, measurement, grading, assessment, and evaluation. Justify your answers and compare them with the exercise feedback. 1. Toward the end of the semester, a biochemistry teacher grades her students to determine their standing in class. 2. The dean of Health Science College X gathers pre-and post-test scores of students from two classes to measure how the difference between two teaching methods affected instruction and if the school would advocate any of the instructional methods. 3. Faculty member ¥ constructed a series of questions for the practical examination that would be administered in his class. The items would be later pooled to develop a rating scale on taking a patient’s vital signs. B. Central issues in educational evaluation Educational evaluation isa disciplined inquiry; it is systematic and formal and, therefore, follows set of standardized methodologies and processes. Because of these features, educational evaluation is often used interchangeably with educational research. This is because both research and evaluation use measurement devices, collect, organize, and interpret data systematically, and literally utilize the outputs later. However, educational research differs from evaluation in terms of focus of inquiry, generalizability, and value emphasis. Educational researchers want to explain educational phenomena, discuss them in relation to other similar cases and variables, and draw conclusions about them. Findings of educational research are matched or compared to comparable situations or populations and are attempted to be generalized. This is a marked contrast to educational evaluation that is only interested on a particular educational program and the decisions to be made regarding it based on the evaluation findings, e.g., termination, modification, or extension of the program. Table 12-1 summarizes the distinctions between educational research and evaluation. Evaluation in Health Science Education * 185 Table 12-1. Comparison between educational cvaluation and research (Popham 1993) Inguiry characteristics | Educational evaluation Educational research Focus Decisions ____ | Condusions Generalizability Low __| High ‘Valueemphasis | Worth Truth Sana et al. (2001) studied the “attributes of student attrition in two colleges in the University of the Philippines Manila.” Respondents included students who failed to finish their academic programs within the prescribed period of time. During the interviews, they were asked for various reasons for their attrition. They named some built-in features of the curriculum, traits of the learning environment, and personality differences with their teachers as primary reasons. The study mentioned the respondents’ perceived. strengths and weaknesses of the curricula of the ewo colleges and their physical and psychological learning environments, and their perceived problems with the ways their teachers taught. Ic also attempted to describe the effective environment as determined by respondents. Such descriptions were typical of educational research—explaining and describing the educational phenomenon of student attrition—using the context of two local colleges. The study could not be taken as an educational evaluation as it was not interested in gathering data to see which features of the curriculum need to be changed or revised, or who among the faculty members should be retained, promoted, or terminated. ‘The paper, however, proposed recommendations that could pave the way for an educational evaluation. Aside from distinguishing educational evaluation from research, another issue central to evaluation is the determination of when the evaluation should be done. This means deciding at what stage of the educational product or phenomenon should evaluation be made. The vital concepts on this issue refer 10 diagnostic, formative, and summative evaluations. Diagnostic evaluation is usually conducted before the implementation of any educational program, Its primary purpose is for placement, to classify students according to various characteristics and assign them to the appropriate groups. It also aims to determine the presence or absence of prerequisite competencies so that the appropriate help is given to the learners. Formative evaluation is conducted during the development and operation of a program to provide educational managers with information regarding the progress of the program. Icis also called monitoring evaluation. Chipanah and Miron (1990) wrote that monitoring, evaluation is a process that should be undertaken, especially by new or pioneering programs, to guide decision makers on how else they could be improved. This type of evaluation can contribute to the gradual and effective refinement of curricular innovations in terms of establishing interim progress and immediate evaluation of products or curricular milestones (Wholey, Hatry, and Newcomer 1994). For instance, during the development of the core curriculum integrating the concepts of tuberculosis (TB) and its control through the Directly Observed Treatment Short Course 186 + Teaching and Learning in the Heath Sciences (DOTS) strategy, the Philippine Tuberculosis Initiatives for the Private Sector (Phil TIPS) presented the draft curriculum to two panels for validation purposes (Atienza, Roa, and Sans 2007). The first panel was composed of medical teachers, private and public physicians, and education specialists. Revisions of the draft core curriculum were made and presented to the second panel composed of all the deans of medical schools in the Philippines. During both meetings, the draft curriculum was read, reviewed, and analyzed to validate its contents, including the curricular framework, goals and objectives, content, etc. After the first ten months of operation, the core curriculum was then subjected to a formative evaluation. The final ouput was a core turriculum on TB control for Philippine medical schools that was revised and validated thrice. Summative evaluation is conducted at the end of a program to provide potential consumers with judgment about that program’s worth or merit (Worthen and. Sanders 1987). Consumers of summative evaluation also refer to sponsors, clients, participants, and stakeholders. An evaluation’s sponsor refers to the person or institution that authorizes the evaluation and provides the fiscal resources for the activity. Client refers to the specific agency or individual who requests for the evaluation, In many instances, sponsors and clients refer to a single person or institution. However, there are also cases where clients would request a sponsor to finance the evaluation for them. Participants refer to those from whom the evaluator asks data during the conduct of the study. Participants usually include the clients, other stakeholders, and other key informants. Stakeholders refer to those who may be direcily affected by the evaluation results. School officials, program staff, parents, students, and companies interested to recruit the school’s graduates are examples of these stakeholders (Worthen and Sanders 1987). As an example, when the Master in Epidemiology (clinical epidemiology) program turned ten years old in 2002, the Department of Clinical Epidemiology of the College of Medicine, University of the Philippines Manila commissioned a study to do a summative evaluation of the said curriculum (Sana et al. 2002). ‘The department (as a client) thought che program was ripe for an evaluation and requested one of its financial benefactors, the Rockefeller Foundation (the sponsor), to finance the activity. The scudy evaluated the context, input, process, and product used in the curriculum. Key informants, namely, the clients (the department chair and other faculty administrators) were interviewed. Likewise involved were other stakeholders, such as the program’s current students, graduates, former faculty members, potential applicants to the degree program, faculty members who may wish to consider teaching the program, school administrators from both the department and college, and institutions identified to be interested in supporting the program in different ways, such as the offering of scholarships and grants. Table 12-2 presents the general differences between formative and summative evaluations. ‘The distinctions are clear with regard to their purpose, audience, evaluator, and basic methodologies that must be applied. Evaluation in Health Science Education * 187 Table 12-2. Differences between formative and summative evaluations (Worthen and Sanders 1987) Basie oF comparioon Formative evaluation ‘Summative evaluation Purpose ____| Toimprove the program ‘To certify program utility ‘Audience Program administrators and staff_| Potential consumer and funding agency ‘Who should do ic Internal evaluator External evaluator ‘Major characteristic Timely Convincing Measures (Ofeen informal Valid/reliable Frequency of data collection | Frequent Limited Sample size —___| Often small Usually large Questions asked ‘Whar is working? What results occur? ‘What needs to be improved? | With whom? How can it be improved? Under what condition? With what training? ‘Ac what cost? Design constraints What information is needed? | What claims do you wish to make? When? ‘The last issue in educational evaluation that confronts health science education refers to who actually conducts the evaluation. This concern is also a major ethical issue in educational evaluation. When sponsors, clients, and participants refer to one individual or institution, their interests may come into conflict with the evaluation activity. Since they participate in the evaluation, they could influence the data gathering. In cases when the evaluator is also a stakeholder, the validity of the evaluation findings becomes questionable. Table 12-2 shows that an internal evaluator is preferred for formative evaluations. This type of evaluator, being a member of the program being studied, definitely knows more about the program than any outsider. However, this person may also be so close to it to the point of having, difficulty in making an objective judgment. This difficulty is the exact strength of an external evaluator, who, being an outsider, is not emotionally attached to the program and can make use of objective evaluation parameters and decide accordingly. On the other hand, it is also difficult for the external evaluator to fully grasp the nature of the program as deeply as an internal one, The major deciding factor then with respect to deciding on the best person to conduct the exercise is a clean and honest definition of terms of reference between the evaluator and the funding agency or the program administrators. Before the actual study is conducted, the latter should examine the interests, competencies, and prejudices, if any, of the evaluator who they want to commission for their program. C. Requisites of ensuring quality evaluation instruments As previously discussed, educational evaluation is a systematic, formal, and scientific collection, organization, and interpretation of data to determine the worth of any educational phenomenon. The quality of data in educational evaluation is of crucial importance since decisions would be made based on them. Hence, the collection and organization of data must be able to meet the basic requisites of quality evaluation instruments. 188 + Teaching and Learning in the Health Sciences The three basic requirements of ensuring quality evaluation instruments are validity, reliability, and practicality. Validity refers to the degree to which correct inferences can be made based on the information obtained from the given data. ‘This reflects the trueness and accuracy of the evaluation data gathered. Validity of any data can be established through four basic means, namely, construct, content, concurrent, and predictive validity. Construct validity is the broadest type because it deals with the degree to which a certain instrument describes the theoretically accepted traits or characteristics of a certain concept under study. For instance, the concept of clinical competence theoretically includes (a) acceptable abilities to elicit a comprehensive history of the client, (b) performance of a thorough physical ‘examination, (c) logical and systematic analysis of data, (d) coming up with a sound diagnosis, and (¢) formulation of a relevant and appropriate plan. Actoss health disciplines, year levels, and health science institutions, clinical competence, essentially, is ascertained through the assessment of these constructs. In developing an instrument that assesses students’ clinical competence, such as a rating scale, such a tool should contain items that qualify performance in these five constructs. Content validity, on the other hand, refers to the degree by which the instrument contains an adequate number of items to represent cach of the constructs being studied. Content validity should also reflect the relative weights of each construct being measured, For example, if clinical competence can be broken down into five constructs and if these are considered to be equally important, then to establish the content validity of an evaluation tool, the items in this tool should equally represent the said constructs. If a thirty-item practical examination is to be used, then there should be six items for each construct. If the practical examination contains ten items on diagnosis and five items each on the other four constructs, then the said examination is construct valid (since all five major constructs are included in the examination) but not content valid, In essence, this exercise of establishing the construct and content validity of a data collection instrumentalso means constructing the blueprint of the said data collection wool. Chapter 13 presents an actual example of a test blueprint that is a most appropriate example of an attempt to build the content validity of an instrument. ‘The concurrent validity of an instrument is established by collecting data to see if the results obtained with the instrument agree with the results from other instruments administered at approximately the same time to measure the same thing (Henerson, Mortis, and Fitz-Gibbon 1978). This is determined by computing the correlation coefficient, , between data obtained from at least two different instruments. A high correlation coefficient for the concurrent validity of two or more instruments also reflects that the instruments have construct validity since they mean to measure the same traits. Clinical competence of surgical clerks, for example, can be determined using two or more research instruments based on the same blueprint, such as a rating scale accomplished by supervisors during their practical and written objective examinations. Scores of clerks measured by the two instruments could be compared using the test of correlation, and, if found highly correlated, such instruments could be said to have concurrent validity. This type of validity is synonymous with equivalent forms of reliability in the sense that there are two Evaluation in Health Science Education * 189 instruments used to measure the consistency of the performance of the program or the learner being evaluated. ‘The working formula for ris: B22, ‘Where z, refers to the z score for the variable X (the first instrument) z, = sawscore — mean standard deviation 2,xefers to the 2 score for the variable ¥ (the second instrument; please use the same formula for z, using the data from the second tool) nrefers to the total number of respondents or examinees The r-value is always equal to or less than + 1.0. The closer the value obtained to + 1.0, the more consistent the scores are. Positive values reflect a direct relationship between the two vatiables, e.g,, scores in the first instrument vary the same way as those in the second instrument. Negative results mean the opposite. In developing a data collection instrument, evaluators should also be concerned with the tool’s ability to predict future behavior, The predictive validity of an instrument is established by demonstrating that the results from the instrument can be used to predict some future behavior (Henerson, Morris, and Fitz-Gibbon 1978). For example, the Philippine Psychological Association, Inc. developed a questionnaire that determined the personality profile of a typical Filipino. Brion (2002) used this same instrument in obtaining the personality profile of Filipino medical students. The test developed by the professional society was able to predict the personality of another group of respondents. Aside from validity, a data collection instrument should also be reliable. This means that the data collection tool is able to generate consistent performance among the respondents being evaluated. Specifically, a reliable evaluation tool yields consistent scores for each respondent from one administration of an instrument to another and from one set of items to another (Fraenkel and Wallen 1993). Reliability could be established using several ways. The test-retest method involves administering the same instrument twice to the same group afier a certain interval of time has passed. The two scores can be compared by obtaining the reliability coefficient (t), just like in the concurrent validity exercise, In the test-retest exercise, the time interval is an important factor and should be recorded accordingly. A very close interval might just be a function of recall, while too long an interval might just be a function of the respondent? maturation. In the equivalent forms method, two different but equivalent (also called alternate or parallel) forms ofan instrument are administered to the same group of respondents during the same time period. The two parallel instruments should be ideally based on the same blueprint. Again, the rtest can be run to establish the reliability of the two instruments. The inter rater reliability test, on the other hand, establishes consistency based on different raters who examined a particular examinee or respondent. For example, four consultants in a summative oral examination could 190 + Teaching and Learning in the Healsh Sciences examine resident physician. The scores given by the four raters can be tested for reliability and the higher the r obtained, the more consistent the raters’ ratings are. The other requisite of a good data collection instrument is practicality. This requisite means the test should be efficient, economical, and easy to score and interpret. Itis a common mistake among evaluators to construct an unusually long questionnaire such that respondents lose interest in completing the said tool. Such a long tool will not be practical in terms of resources that will be required; it will also be quite tiresome for respondents and raters. Asa general rule, researchers should pilot test any instrument before actually using it for data collection. This will establish the validity, reliability, and practicality of the instrument. Exercise 12-2 Answer the following questions briefly, Please refer to the feedback at the end of the chapter. 1. The diplomates’ examination in obstetrics-gynecology consists of twelve areas in the said field. The written examination is composed of 200 multiple-choice questions. Ifall the twelve areas are equally important, how many items should each of them have in the said examination? What type of validity is ensured in this case? 2. Granting that one of the examiners, a leading personality in gynecologic oncology insists that her part be more than the items in question #1, what requisite is violated? 3. Ifa given academic program's success is asserted by heavily biased clients during interviews and other participants in the survey questionnaire, what requisite/s is/are ensured/violated? D. Parts of an evaluation report ‘After addressing the basic issues in educational evaluation and considering the basic requisites of data collection instruments, the evaluation could now be started. The basic parts of a written evaluation report capture the essential steps in conducting an educational evaluation. Worthen and Sanders (1987) presented these steps as: 1. Executive summary Introduction to the report a. Purpose of the evaluation b. Audiences for the evaluation cc. Limitations of the evaluation report and disclaimers (if any) d. Overview of report contents 3. Focus of the evaluation a, Description of the evaluation object b. Evaluative questions or objectives used to focus the study cc. Information needed to complete the evaluation Fualuation in Health Science Education © 191 4. Evaluation plan and procedures a. Data collection plan; design of the study b. Overview of evaluation instruments c. Overview of data analysis and interpretation Presentation of evaluation results a. Summary of evaluation findings b. Interpretation of findings 6. Conclusions and recommendations 2. Criteria and standards used to judge the evaluation object b. Judgment zbout the evaluation object (strengths and weaknesses) c. Recommendations 7. Minority reports of rejoinders (if any) 8. Appendices a. Detailed tabulations or analysis of data b. Instruments and/or detailed procedures used Other information Only details of the frst four parts are discussed in the succeeding sections. The remaining parts are self-explanatory and depend on the first four parts. 1. ‘The executive summary ‘The executive summary refers to the capsule presentation of the entire evaluation report. ‘This is usually from dhuee to fifteen pages long and should give readers an overview of the salient points of the evaluation study, from the background up to the recommendations. To facilitate the understanding of the final report, and in cognizance of the extremely full schedules of most sponsors and stakeholders that may not havealll the time to read a lengthy report, another one- page executive abstract can be presented prior to this executive summary. The abstract is usually not more than 300 words, including the names of evaluators, title of the study, and auspices where the evaluation project was conducted. 2. Introduction “The introductory part should contain a rationale on the evaluability of the program and a discussion on the purposes, audiences, scope, and limitations of the evaluation study. In this section, the evaluators have to discuss if the project is for a formative or summative purpose, and whether an internal or external evaluator shall conduct it. The three basic questions that institutions have to ask in determining the evaluability of a program are as follows (Wholey, Hatry, and Newcomer 1994): 2. Can the results of the evaluation influence decisions about the program? i. Are there decisions pending about the continuation, modification, or termination of the program? ii. Is there considerable support for the program by influential interests groups that would make its termination highly unlikely? 192 + Teaching and Learning in th Health Sciences b. Can the evaluation be done in time to be useful? i, Are the data available now? ii, How long will it take to collect the data needed to answer kcy evaluation questions? c, Is the program significant enough to meric evaluation? i. Does the program consume large amounts of resources? Is program performance marginal? ‘Are there problems with program delivery? .. Is program delivery highly inefficient? y. Is this a pilot program with presumed potential for expansion? “The first question refers to the utilization criterion of evaluation. ‘This implies that the evaluation is designed to answer specific questions raised by those in charge of the program, and answers to these would be used to decide on the future of the program. Timeliness, as the second criterion, deals with when and how long the evaluation would be done to affect the next decisions about the program. The last criterion deals wich significance, referring to the relevance and necessity of the program, especialy if it requires a considerable amount of resources. In varying stages of an educational program, evaluation can be done, but satisfying the three criteria of evaluability can serve as the first road check. There are programs, for instance, that provide scholarship grants to selected four-year degree programs in one college. “These educational programs cannot be clearly evaluated for impact within their first four to cight years since the recipients of the scholarship have yet to finish their degree programs. If the funding agencies are interested, what can be in otder is a formative evaluation to determine the intetim progress of the scholars. Table 12-2, presented earlier, has already discussed the rightful audiences for a specific type of evaluation. A formative one refers to program administrators and staff and a summative type includes potential consumers and funding agencies of the project. Evaluators have to present a thorough discussion of these stakeholders to assert their credibility and objectivity as far as the evaluation project is concerned. 3. Focus and design of the evaluation Parts two and three of the basic steps in conducting an evaluation study can be simplified by making an evaluation matrix. A typical matrix enumerates the different evaluative questions in the scudy, a corresponding discussion of data required, who can be the appropriate sources of such information, what instruments match the objectives and can generate the data needed, and how such data would be analyzed and interpreted. The basic principle in preparing this matrix, therefore, is establishing congruence among the different components, beginning with the evaluative questions. Formulation of these questions literally means posing the evaluative problems, then focusing on the generation of data that would provide the bases for later decisions on what to do with the program being evaluated. This is why educational evaluation also means the process of delineating, obtaining, and providing useful information for judging decision alternatives (Stufflebeam and Webster 1991). In this chapter, only the formulation of Evaluation tn Healh Science Education * 193 evaluation questions and framework are discussed. ‘The other components in the matrix, namely, sources of data, type of information needed, instruments to be used, and the methodology to analyze them are discussed in chapter 19, specifically on data gathering procedures. In the previously cited monitoring evaluation of the core curriculum on TB and TB control through DOTS (Atienza, Roa, and Sana 2007), the actual evaluation matrix used is presented in table 12-3. Table 12-3. Sample monitoring evaluation matrix of the integration of TB-DOTS in Philippine medical curricula (Atienza, Roa, and Sana 2007) Evaluative questions Sourcels of data Tustrumentle ‘Analyis of data “To what extent did the Deans, curriculum Questionnaire, Summary statistics, medical schools integrate the | committee chairs, selected | key informant interviews, | frequencies, and core competencies expressed | faculty members, master | sample course outlines, | qualitative analysis as learning objectives in the | "TB educators syllabi ‘TB-DOTS core curriculum? Did the medical schools ‘Deans, curriculum Questionnaire, ‘Summary statistics, integrate the core curriculum | committee chaits selected | key informant interviews, | frequencies, according to the suggested | faculty members, master | sample course outlines, | qualitative analysis, ‘year level? TB educators syllabi, lists of references, | and content samples of T-L resources _|analysis The actual monitoring evaluation report in table 12-3 also induded discussions on the specific information required for each of the evaluation questions. For instance, in the two evaluation questions given in the table above, the following information were identified as necessary: 1. General and specific objectives of the TB-DOTS core and the respondents’ curricula, 2, Self-rating of medical schools on the degree of their integration, 3, Explanation of the experiences of respondents in teaching according to the competencies set, 4, Description of competencies and topics set according to the different year levels, Self-rating of medical schools on the degree of their integration, and Explanation of the experiences of respondents in teaching according to year levels and competencies set. aw Again, consistency among the matrix components should be established to be able to assert a sound evaluation focus and design. It could be noted that all other components in the matrix depended on the evaluation questions. They serve as the major bases in deciding the remaining evaluation components. On the other hand, in terms of deciding what evaluation questions to ask, a major factor that must be considered refers to the evaluation framework. The various theories in educational evaluation provide the venue to choose the appropriate model. Operationally, a critical review of these models can help evaluators to consider and assess optional frameworks that they can use 194 + Teaching and Learning in the Health Sciences to plan and conduct their studies (Stufflebeam and Webster 1991). The more common models are discussed briefly in the succeeding section to help health professions educators decide on the best evaluation model to use for a particular evaluation scenario. 4. Objectives-oriented evaluation model ‘Ar this point, you may have observed that this textbook is following an objective-based flow of discussion. All chapters contain sets of objectives that you, as readers, should be able to reach after you have finished reading them. The underlying principle in this format is to guide you on what you can accomplish and later on, appraise yourself accordingly on how far you have reached the targets. This is the same principle behind the objectives- oriented evaluation model. Worthen and Sanders (1987), Stufflebeam and Webster (1991), and House (1991) explained that this model took off from the works of Tyler and Taba, then was revised later by other educational evaluation experts, such as Hammond and Provus. The model makes use of learning objectives as the standards in determining the success or failure of any educational phenomenon, program, or experience. The pioneer evaluators also asserted that such objectives have to be formulated in strictly behavioral terms for easier evaluation. Furthermore, Provus particularly stressed the inclusion of the discrepancy evaluation model in this framework precisely to identify how distant the actual program has progressed in relation to the objectives. Educators and funding institutions alike both appreciate the objectives-oriented evaluation model because the standards in determining the worth of a program are already dearly builtin in the program. However, critics also wrote that this model lacks the essential clements of a real evaluation since it is generally focused only on the attainment of its objectives on student achievement, There may be other unintended objectives that were also achieved. & Management-oriented evaluation model If the objective-oriented model is especially useful to teachers as assessors of student learning, the management-oriented evaluation model is especially designed for administrators, hence, the title. This model rests on the rationale that evaluative information is an essential part of good decision making, and that the evaluator can best serve education by serving administrators, policy makers, school boards, teachers, and others in the school system (Worthen and Sanders 1987). In this particular framework, only the most popular and comprehensive model is discussed in detail: the context, input, process, and product (CIPP, pronounced as “sip”) design developed by Daniel Scufflebeam in 1969 and 1983. Table 12-4 presents the focus of CIPP as 2 management-oriented evaluation model. Based on the focus, the appropriate administrator's decisions are identified and clarified. Evaluation in Health Science Education +195 Table 12-4, Focus, management decisions, and objectives of the CIPP evaluation model (Worthen and Sanders 1987) Focus of | Educational Objectives evaluation | _ decisions Context [Planning | To identify the institutional context, the target population, and che decisions _| opportunities for addressing needs, diagnose problems underlying the needs, and to judge whether the proposed objectives are sufficient for those needs Input | Structuring | To identify and assess system capabilities, alternative program decisions | strategies, procedural designs for implementing strategies, budgets, and schedules Process | Implementing | To idencify or predict, in process, defects in the procedural design decision: | or its implementation, provide information for the preprogrammed decisions, and record and judge procedural events and activities Product | Recycling | To collect descriptions and judgments of outcomes, relate them to decisions __[ objectives, and determine the program's worth and merit Table 12-4 shows that CIPP evaluators can provide managers and administrators all the information they would require throughout the different stages of any program. Such framework, therefore, suggests that it can be both applicable for cither a formative or summative evaluation. It also puts the evaluator in a relatively strong position since they are the ones who provide judgments on the merits or worth of a program. In the earlier cited evaluation of the Master of Science in Epidemiology (clinical epidemiology) (MSCE) of the University of the Philippines College of Medicine's Department of Clinical Epidemiology, the following evaluative questions were formulated according to a summative CIPP, Table 12-5 presents some excerpts from these questions. Critics acclaim CIPP as a comprehensive evaluation model. Guba and Lincoln (1981) wrote that it is consistent with the very components of any educational program, and that itis well operationalized. To be discussed in chapter 14 are the basic curricular components which include learning objectives, subject matter, instructional resources, and assessment plans to determine student achievement. CIPP is consistent with these components. Program administrators using CIPP in their curriculum can regularly monitor the progress of their program. On the other hand, CIPP is also described as too data conscious and it empowers the evaluator with so much information about the program being evaluated. Since the model also makes use of all curricular components, it is also clearly expensive to conduct. 196 + Teaching and Learning in the Health Sciences ring. ng Table 12-5, Excerpts from the matrix of the CIPP evaluation model of the MSCE (Sana et al. 2002) ‘What are the resources available for the MSCE program? Facilities, instructional resources, human resources, such as faculty members, personnel, exc. Key questions Data needed Instruments oo ‘CONTEXT. - ‘What is the overall societal description | Health statistics Records cof health research in the Philippines? | Organizational capabilities of | Health statistics How is this reflected in the general institutions engaged in health | Institutional reports health status of the Filipinos? research How can the graduates of the MSCE_ | Professional roles and program respond according to the responsibilities of an MSCE health need research of the country? graduate INPUT. ‘What pertinent policies govern the | Policies on selection of students, | Review of records ‘MSCE program? recruitment of teachers, Survey questionnaire graduation, etc. PROCESS How do che program administrators “Teachers and students styles Curriculum and instructional outputs? carry out the MSCE program? designs ‘What problems did teachers and Problems encountered by both | Records and handbooks students encounter while going teachers and students Survey questionnaire through the program? Evaluation tools used for student — and teacher performance | PRODUCT What is the impact of the MSCE [Research outputs of alumni | Records program in relation to health research | Health care researches Self appraisals Expertise-oriented evaluation approach This approach depends primarily upon one’s professional expertise to judge an educational institution, program, product, or activity (Worthen and Sanders 1987). Its variation is Eisnet's connoisseurship model (Popham 1993) that relies on expert human judgment to determine the worth of an educational program or experience. This model is a highly political approach so evaluators usually conduct this using the formal professional review system, informal professional review system, ad hoc panel reviews, and ad hoc individual reviews. ‘The five categories present in the formal professional review system distinguish it from the other four categories. These are the structures or organizations established to conduct periodic reviews of educational endeavors, published standards (and possibly instruments) for use in such reviews: a prespecified schedule, e.g. every five years, on which reviews will be conducted, combined opinions of several experts to reach the overall judgments of value, and an impact on the status of whet is being reviewed, depending on the outcome (Worthen and Sanders 1987). Evaluasion in Health Science Education “197 For instance, in the accreditation of training hospitals that can quelify as teaching hospitals to conduct residency training for, say, internal medicine, it is usually the professional society (e.g., the Philippine College of Physicians, the professional society of internists in the Philippines) that conducts the expertise-oriented evaluation. The criteria that these societies set in the accreditation of hospitals include the hospital’s compliance with the standards of practice of the specialty society, meeting such standards in terms of available resources and facilities, and proofs that the applicant-hospital has the capability of running the minimum standard of the clinical internship curriculum. Expertise-oriemted evaluators usually conduct accteditations of clinical internships, residencies, fellowships, on-the-job trainings, ot any continuing professional education program offered in the health sciences. Members and especially designated officers of specialty societies conduct these evaluations and once they find that an institution's program has met their criteria, they issue a certification of accreditation. While this model is appreciated for its attempt to professionalize a program, it is also criticized for being occasionally subjective, expensive, and sometimes unnecessary, as the programs will continue anyway. Exercise 12-3 Given below are different sample evaluation scenarios. Answer the questions following each scenario and compare your answers with the feedback at the end of the chapter. 1. A tertiary hospital would like to acquire level 3 accreditation for its residency training in pediatrics. Who would be in the best and most politically accepted position to conduct the evaluation? What evaluation framework would be appropriate here? Why? 2. Your professional association of medical technology schools is interested in standardizing the curriculum. Who should conduct the evaluation and what framework would be suitable? Explain. 3. Your institution runs an integrated, community-based, iritensive rotation in pharmacy. You would like to know how this program could be made more relevant for both the institution and your students. E, Assessment of student achievement As defined earlier, educational evaluation refers to the process of determining the worth of any educational phenomenon or product through a systematic, formal, and scientific collection, organization, and interpretation of data. The preceding discussions show the wide variety of educational phenomena where evaluation can be applicable; they also imply that evaluation should not at all be limited to grading students, This also means that educational evaluation, in the context of student achievement, goes beyond grading and is also concerned with the process of ascertaining objective changes in students’ behaviors in terms of knowledge, 198 + Teaching end Learning in the Health Sciences skills, and attitudes. This last section focuses more on this issue, as the assessment of student achievement remains the most immediate and frequent exercise in educational evaluation. Table 12-6 presents the generally accepted ways to assess the three outcomes of learning. Knowledge is a function of the cognitive domain of learning and can be assessed through writen, oral, and practical examinations. Learning objectives requiring learners to receive, remember, process, retrieve, apply, and evaluate knowledge can be tested in several types of written examinations. The more common of these are the supply tests, such as fill-in-the- blanks, identification, and essay examinations. These tests ask students to provide or explain their answers. There are also alternative types of tests that require learners to choose between or among possible answers. These include true-false, multiple-choice questions, and matching types of tests. Chapter 13 is about the basic principles and applications of test construction and analysis, grading, and interpretation of scores. Table 12-6, Learning outcomes and how they can be assessed Learning outcomes | Bases of assessment [Knowledge Tests Skills Demonstrations [Attitudes Observations Psychomotor, language or communication, interpersonal, and other related skills, as discussed in chapter 4, are learning outcomes that must be demonstrated and actually performed. Achievement of these competencies can be best assessed using several types of instruments, notably the practical, process, and product examinations. Furthermore, for more accurate assessment of clinical competence, the recommended way is through the objective structured clinical examination (OSCE) or its variants. OSCE is preferred because it can assess all the clinical competencies, such as history taking and performing a thorough physical examination with actual patients in addition to observing the communication and social skills of the examinee. Details of how OSCE can be organized are likewise discussed in chapter 13. Chapter 3 has carlier presented attitudes and values as the learning outcomes of the affective domain. In the discussion, it was asserted that attitudes and values could be taughtand, therefore, assessed like the other two domains of learning. Henerson et al. (1987) presented a matrix om how attitudes and values can be best assessed. The matrix is summarized in table 12-7. Table 12-7. Common approaches in assessing attitudes as learning outcomes ‘Approach ‘When appropriate Toole Sel-report ‘When raters understand the questions asked; | Interviews, polls, surveys, when they are aware of the information asked; _| questionnaires, rating scales, logs, and when they can answer honestly journals, diaries, student portfolios Others: ‘When learners being assessed would have Questionnaires, rating scales, logs, difficulty rating themselves honestly journals, diaries, observations Records When records are complete and can be accessed | Attendance logs, inventories, charts, logbooks, counselors files, et. Evaluation in Health Science Etucation *199 “The validity and accuracy of attitude assessment depend highly on the capability of raters. They should be able to observe the learner, and note consistencies or inconsistencies in their statements of feelings, beliefs, and actions. ‘They should also have sufficient opportunities 10 observe the learners in a representative number of class contacts. Gloria-Cruz (2002), for instance, identified the members of the ancillary staff, such as che nutses, as reliable raters in assessing the attitudes of resident physicians in the Department of Otorhinolaryngology of the Philippine General Hospital. ‘The study asserced that the nurses are permanent employees of the department and work closely with the residents on training. The author stressed that these nurses have to be adequately trained to accurately identify the presence or absence of the attitudes and values the department hopes to inculcate in its residents. Table 12-7 should also be related with the discussions in chapter 3 emphasizing that attitudes and values could best be gauged based on the degree of consistency of how students demonstrate given behavior. If various evaluators using different types of instrumentsascertain the consistency of students’ given attitudes, not only is the reliability of the data ensured; they also strongly suggest the validity of the observed behaviors. Yu-Maglonzo (1999) studied the students’ feedback on their urban community module as part of their innovative medical curriculum using case narratives. Students’ writen anecdotes and reflections included in the narrative reports revealed significant changes in their attitudes toward community medicine and health care. They also gave valuable suggestions on how the reaching-learning processes could be improved in the said module. The narrative reports of students were regularly checked and served as formative assessment instruments to aid in the development of desired attitudes toward community health. The study proved that the use of student narratives was a significant variation from the traditional, structured written examinations to evaluate attitudes. Given below are samples of learning scenarios for evaluation. Identify the types of evaluation, appropriate tools, and raters that must be employed to ensure a valid and | Exercise 12-4 | | reliable evaluation: 1. Level 2 Related Learning Experience (RLE) dass in year level II nursing of a sectarian school | 2. Final year (internship) in occupational therapy after a two-week rotation in an | orthopedic center | 3. Reaching the twenty-fifth year of offering a fellowship training program in internal | medicine J} 200+ Teaching and Learning i the Heh Science Summary Educational evaluation is conducted to determine ways to improve an educational product or phenomenon, analyze costs and benefits gained from continuing such programs, and make decisions regarding educational programs. The first purpose is addressed by formative evaluation while the latter is through summative evaluation, ‘The basic requisites of quality evaluation are validity, reliability, and practicality. Validity refers to the degree to which correct inferences can be made; it is the accuracy of data obtained. This can be established through construct, content, concurrent, and predictive validity Reliability is the consistency of the data gathered by the evaluation tool. This is established using the test-retest and the equivalent forms method. Inter-rater reliability establishes the consistency of rating of different raters examining a particular individual. Practicality means that the tool is efficient and economical, and its results are easy to score and interpret. Before one embarks on an evaluation study, the evaluability of the program should be analyzed. This is determined by looking into the utilization, timeliness, and significance of the evaluation results. Once a program or evaluation object is deemed evaluable, several steps are followed. This includes establishing the purpose, audience, and limitations of the evaluation study. These are included in the introduction of the report. The next step is to describe the evaluation object and formulate and evaluate questions to be addressed. ‘The evaluator selects the appropriate evaluation framework or model to use. Models include the objectives-oriented evaluation model, the management-oriented model, and the expertise-otiented approach. The first one makes usc of learning objectives as standards in determining the success or failure of any educational phenomenon, program, or experience ‘The CIPP, a management-oriented model, helps administrators make decisions by laying ou: a plan for comprehensive evaluation that includes assessment of the context, input, process. and product. ‘The expertise-oriented approach is used by professional societies in accrediting educational programs. Information necessary to answer the evaluative questions are identified and the appropriate data collection plan is designed, including the evaluation tools to be utilized and the method of data analysis and interpretation to be followed. The study is then conducted, and the findings, conclusions, and recommendations are presented to the stakeholders. Appropriate assessment of learning outcomes in the three domains of learning should be performed. References cited ‘Atienza, M. A., C. C. Roa Jr, and E, A. Sana, 2007. “Development of core curriculum integrating tuberculosis control and directly observed treatment short course for Philippine medical schools” Annals Academy of Medicine Singapore, November, 36 (11): 930-36. Best, J. W. and J. V. Khan. 1989. Reiearch in Education. 6th ed. New Jersey: Prentice-Hall. Evaluation in Health Science Flucaion #201 Bloom, B.S., }. T. Hastings, and G. F Madaus. 1971. Handbook on Formative and Summative Evaluation of Student Learning. New York: McGraw-Hill, Brion, T. C. 2002. “Ang pagkatao ng mga mag,aaral sa Kolehiyo ng Mcdisina ng Pamantasan ng Lungsod ng Maynila sa pamamagitan ng Panukat ng Ugali at Pagkataong Pilipino.” Major project for the degree of Master in Health Professions Education, National Teacher Training Center for the Health Professions, University of the Philippines Manila Chipanah, V. and G. Miron. 1990. Evaluating Educational Programmes and Projects: Holistic and Practical Considerations. Belgium: UNESCO. Dressel, P1976. Handbook of Academic Evaluation, California: Jossey-Bass. Fraenkel, J. R. and N. E, Wallen. 1993. How to Design and Evaluate Research in Education. New York: McGraw-Hill. Gloria-Cruz, T.L. I. 2002. “Development of evaluation scheme for senior residents in Otorhinolaryngology.” Major project for the degree of Master in Health Professions Education, National Teacher Training Center for the Health Professions, University of the Philippines Manila. Guba, E. G. and ¥. S, Lincoln. 1981. Effective Fualuation. San Francisco: Jossey-Bass Publishers. Henetson, M. E., L. L. Mortis, and C. . Fitz-Gibbon. 1987. How To Measure Attitudes. California: SAGE Publications. House, E. R. 1991. “Assumptions underlying evaluation models” In Evaluation Models, Viewpoints on ‘Educational and Human Service Evaluation, ed. G. F. Mandaus, M. Scriven, and D. L. Stufflebeam. 1993. Educational Evaluation. Boston: Allyn and Bacon. Sana E. A, M.A. Atienza, L. F. Abarquer, J. A. ?. Mojica, and N. S. Fajutagana. 2009. “Evaluation of the Master of Science in epidemiology curriculum.” Acta Medica Philippina 44 (1): 35-41 Sana, E. A., J. A. P, Mojica, N. S. Fajutagana, and M. L. Viray. 2001. “Auributes of student attrition in the Colleges of Allied Medical Professions and Denstistry, University of the Philippines Manila.” ‘Commissioned research. National Institutes of Health, University of the Philippines Manila. Shadish, W. R.. T. D. Cook, and L. C. Leviton. 1991. Foundations of Program Evaluation. California: SAGE Publications. Scufflebeam, D. L. and W. J. Webster. 1991. “An analysis of alternative approaches to evaluation.” In Evaluation Models, Viewpoints on Educational and Human Service Evaluation, eds. G. F. Mandaus, M, Scriven, and D. L. Stufflebeam. Boston: Kluwer. ‘Tyler, L. E. 1929. Test Measurement, New Jersey: Prentice-Hall Wholey J. S., H. P. Hatry, and K. E. Newcomer, eds. 1994. Handbook of Practical Program Evaluation. San Francisco: Jossey-Bass Publishers. Worthen, B. R. and J. R. Sanders. 1987. Educational Evaluation. New York: Longman Group. Yu-Maglonzo, E. I. 1999. “Content analysis of the students’ narrative feedback in the urban community module.” Major project for the degree of Master in Health Professions Education, National Teacher “Training Center for the Health Professions, University of the Philippines Manila, 202 + Teaching and Learning in the Health Sciences Feedback to exercises Feedback to exercite 12-1 1. The first scenario refers to grailing. The faculty member here is preparing the students’ standing to determine their final grades. 2. This is educational evaluation. The dean here is collecting scientific evidence to determine the efficacy of two instructional methods. Such evidence also justifies the school’s use of a specific instructional method. Decisions would also be made based on the evidence gathered from the evaluation. 3. Faculty member Y is constructing a test, so the process involved in this case is testing. Feedback to exercite 12-2 1, There should be at least sixteen to seventeen items each for the twelve areas in the diplomates’ examination. This ensures construct validity of the test, as diplomates should be able 10 show competency in the twelve areas in obstetrics-gynecology. 2. If the gynecologic oncologist on the board of examiners insists that her part be more than sixteen to seventeen items, the test remains construct valid but no longer content valid. The twelve areas are represented but not according to the weight of each construct. 3. The heavily biased clients and other participants would ensure the reliability of their views, as they are consistent with each other. However since they are heavily biased, the information they give is questionable and, therefore, might not be valid. Feedback to exercie 12-3 1. The first one calls for an expertise-oriented evaluation model. This is the most consistent with the case presented. It is also presumed that the appropriate specialty society is in place for the job. 2. Either the expertise-oriented model or the CIPP would be appropriate. The professional society of pediatricians should be able to give a pool of content experts in pediatrics. 3. You may review the objectives of the program and apply the objectives-oriented evaluation. If you want a summative one, the CIPP would likewise be appropriate so that all curricular components could be reviewed and judged. Feedback to exercise 12-4 1, Level 2 Related Learning Experience (RLE) class in year level II nursing of a sectarian school, and 2. Final year (internship) in occupational therapy after a two-week rotation in an orthopedic center Both cases are scmestral courses that regularly meet at least four hours a week. Teachers can very well make a formative assessment of students’ knowledge, skills, and attitudes using written examinations (to test knowledge), practical examinations (to test skills), and asking them to write regular diaries or journals (to test attitudes). Aside from their clinical instructors, members of the community and the hospital staff who regularly join these students can be involved in their assessment, especially in those pertaining to attitudes. 3. Reaching the twenty-fifth year of offering a fellowship training program in internal medicine ‘A summative evaluation is in order here, and an external evaluation is suggested to ensure che objectivity of the study.

You might also like