You are on page 1of 95

Basic Pedagogical Training for New

Melese Birhanu/Lecturer,
Specialty; Educational Research and Development/

Education Quality Assurance and Audit Office

Head , Staff Development Unit
Department of Educational Planning and Management
School of Education
University of Gondar
 Thegeneral objective of this training
session is entirely to:

 Commence the process of evaluation of

teaching and learning in Higher Education
Brainwash professional questions
 1. As university instructor what terms do we
usually recognize when we think of especially
evaluation of learning in general as
instructional process?

 2. In Pedagogical understandings, is teaching

instruction or part of instruction?
 3.How about learning ?
Is it instruction, or part of instruction?
 4. Can we call an academician teaching in
Higher education Institution as teacher,
lecturer or instructor ?
 which name is pedagogically accepted?

 5. So how can we define instruction

considering these all the above concepts?
 6. How can we evaluate instruction/teaching
learning process/?

 7. What are the tools we instructors employ

to measure student’s learning performance?
 Test/Testing
 Measurement
 Assessment
 Evaluation
Test: is an instrument (a tool) to measure
and finally assess whether students acquire
the intended objective of a lesson or not.
Testing: is the collection of quantitative
(numerical) information about the degree to
which a competence or ability is present in
the test-taker.
 Norm-Referenced Test
 Criterion-Referenced Test
 To determine whether each student has
achieved specific/standard skills, or
 Measures specific skills which make up a
designated curriculum. These skills are
identified by teachers and curriculum
 Each skill is expressed as an instructional
 To rank each student with respect to the
achievement of others in broad areas of
 To discriminate between high and low
 Measures broad skill areas sampled from a
variety of textbooks, syllabi, and the
judgments of curriculum experts.
 is the process by which we attach numbers
to psychological attributes.

 is the process of quantifying Psychological


 is the assignment of numbers to

psychological attributes
 Assessment in higher education is usually described as “The ongoing
process aimed at understanding and improving student learning.

 It involves making our expectations explicit and public; setting appropriate

criteria and high standards for learning quality; systematically gathering,
analyzing, and interpreting evidence to determine how well performance
matches those expectations and standards; and using the resulting
information to document, explain, and improve performance.
 is systematic information gathering without
necessarily making judgments of worth.

 It may involve the collection of quantitative or

qualitative (narrative) information.
 is the process of making judgments regarding the
appropriateness of some person, program, process, or
product for a specific purpose.

 Evaluation may or may not involve testing,

measurement, or assessment.
 Validity
 Reliability
 Reliability : is the consistency of your
measurement, or the degree to which an
instrument measures the same way each time
it is used under the same condition with the
same subjects.

 A measure is considered reliable if a person's

score on the same test given twice is similar.
 It is important to remember that reliability
is not measured, it is estimated.
 Reliability may be estimated through a
variety of methods that fall into two types:
- single-administration and
- multiple-administration.
 Single-administration methods include split-
half and internal consistency.
 The split-half method treats the two halves of a
measure as alternate forms. This "halves
reliability" estimate is then stepped up to the
full test length using the Spearman-Brown
prediction formula.
 Split-Half Reliability: A measure of
consistency where a test is split in two and
the scores for each half of the test is
compared with one another.

 If the test is consistent it leads the

experimenter to believe that it is most likely
measuring the same thing.
 This is not to be confused with validity where
the experimenter is interested if the test
measures what it is suppose to measure.

 A test that is consistent most likely is

measuring something; the experimenter just
does not know what that "something" is.

 This is why it is said that reliability sets the

ceiling of validity.
 rs = 1- 6×∑d2

Where: two halves of a measure?

 Example
 Calculating the reliability from a single
administration of a test
 Commonly reported
◦ Split-half
◦ Cronbach alpha
◦ K-R20
◦ K-R21
 Calculated automatically by many statistical
software packages
1. The test is split in half (e.g., odd / even)
creating “equivalent forms”
2. The two “forms” are correlated with each
3. The correlation coefficient is adjusted to
reflect the entire test length
◦ Spearman-Brown Prophecy formula
 Multiple-administration methods require that
two assessments are administered.
 In the test-retest method, reliability is
estimated as the Pearson product-moment
correlation coefficient between two
administrations of the same measure.
 It was developed by Karl Pearson from a
related idea introduced by Francis Galton in
the 1880s
 In 1911 he founded the world's first
university statistics department at
University College London.
 rxy = (n∑xy) –(∑x)(∑y)
√(n∑x2)-(∑x)2 √(n∑y2)-(∑y)2

If the correlation coefficient is 1-positive

correlation-consistency of scores-reliable.
If 0,weak or nil correlation-weak/no consistency of
If -1 negative correlation- no consistency-not
 is the degree to which a test measures what it is
supposed to measure.
 how valid a test is depends on its purpose—
E.g, a ruler may be a valid measuring device for
length, but isn’t very valid for measuring
Q1. What are the points in a lesson we must consider
say our test is valid? (2 minutes)
 There are three basic approaches to the validity of
tests and measures.

1. content validity,
2. construct validity, and
3. criterion-related validity.
 Is the degree to which the test items represent
the domain, or universe of the course being

 In order to establish the content validity of a

measuring instrument, the teacher must
identify the overall content to be represented.

 Items must then be randomly chosen from this

content that will accurately represent the
information in all areas.
 Shows that the measure relates to a variety of
other measures as specified in a theory.
 It asks if there is a relationship between how I
operationalized my concepts in this study to
the actual causal relationship I'm trying to
 We are interested in construct validity when we
want to use an individual's, for example,
Physics test performance as a basis for
inferring his performance of Mathematics.
 Refers to the extent to which one can infer from an
individual's score on a test how well s/he will
perform some other external task or activity that is
supposedly measured by the test in question.
 That is, is the test score useful in predicting some
future performance (predictive validity) or can the
test score be substituted for some less efficient
way of gathering data (concurrent validity)?
 of criteria related validity are:
 success in school, success in class, or success
as an employee.
 The planning of an achievement test should fulfill the
following series of steps:-
1. Determine the purpose of the test.
2. Identify the learning outcomes to be measured by the
3. Define the learning outcomes in terms of specific,
observable behavior.
4. Outline the subject matter to be measured by the test.
5. Prepare a table of specifications.
6. Use the table of specifications as a basis for preparing
 Tests can be used in an instructional program

i. to assess entry behavior (placement test),

ii. monitor learning progress (formative test),
iii. diagnose learning difficulties (diagnostic
test), and
iv. measure performance at the end of
instruction (summative test).
Question: When do we use each of them? (2
 Measures prerequisite entry skills.

A. Include each prerequisite entry behavior.

B. Typically items are easy and criterion-
 Determines entry performance on course

A. Select representative sample of course


A. Typically, items have a wide range of

difficulty and are norm-referenced
 Provides feedback to students and teacher on
learning progress

i. Includes all unit objectives, if possible (or

those most essential)
ii. Items match difficulty of unit objectives and
are criterion referenced.
 Determines causes of
recurring/habitual/frequent learning

 Includes sample of tasks based on common

sources of learning error.

 Typically, items are easy and are used to

pinpoint specific causes of error.
 Assigns grades, or certifies mastery at he
end of instruction.

a. Select representative sample of course

b. Typically, items have a wide range of
difficulty and are norm-referenced.
 The learning outcomes measured by a test should
faithfully reflect the objectives of instruction.

 Identify those instructional objectives that are to be

measured by the test and then make certain that they
are stated in a manner that is useful for testing.

 This is easier said than done.

 One useful guide for approaching this task is the
Taxonomy/Bloom’sTaxonomy/ of Educational
 This is a comprehensive system that classifies
objectives within each of three domains:

(1) Cognitive,
(2) Affective, and
(3) Psychomotor
 The cognitive domain of the taxonomy is
concerned with intellectual outcomes,
 The affective domain with interests and
attitudes, and
 The psychomotor domain with motor skills.
 Since our concern here is with achievement
testing, we shall focus primarily on the
cognitive domain.
 Intellectual outcomes in the cognitive
domain are divided into two major classes:
(1) knowledge and
(2) intellectual abilities and skills.
 These are further subdivided into six main
areas as follows:
A. Knowledge
1. KNOWLEDGE (Remembering previously
learned material)
1.1. Knowledge of dates, places
1.2. Knowledge of items
1.3. Knowledge of specific facts
 2. Knowledge of ways and means of dealing
with specifics
2. 1. Knowledge of conventions
2. 2. Knowledge of trends and sequences
2. 3. Knowledge of classifications and
2. 4. Knowledge of criteria
2. 5. Knowledge of methodology
3. Knowledge of the universal and abstractions in a field

3.1. Knowledge of principles and generalizations.

3. 2. Knowledge of theories and structures.

B. Intellectual Abilities and Skills

2. COMPREHENSION (Grasping the meaning of

2.1. Translation (Converting from one form to another)
2.2. Interpretation (Explaining or summarizing material)
2.3. Extrapolation (Extending the meaning beyond the data)
3. APPLICATION (Using information in concrete

4. ANALYSIS (Breaking down material into its parts)

4.1. Analysis of elements (Identifying the parts)
4.2. Analysis of relationships (Identifying the
4.3. Analysis of organizational principles
(Identifying the way the parts are organized)
5. SYNTHESIS (Putting parts together into a
5.1. Production of a unique communication
5.2. Production o£ a plan or proposed set
of operations
5.3. Derivation of a set of abstract relations
6. EVALUATION (Judging the value of a thing
for a given purpose using definite criteria)

6.1. Judgments in terms of interna1 evidence

6.2. Judgments in terms of externa1 criteria'
 The learning outcomes to be measured by a
test are most useful in test planning when
they are stated as terminal behavior that is
 That is, they should indicate clearly the
student performance to be demonstrated at
the end of the learning experience.
 When a satisfactory list of general learning
outcomes has been identified and clearly
stated, the next step is to list the specific
student behaviors that are to be accepted
as evidence that the outcomes have been
 The content of a course may be outlined in
detail for teaching purposes, but only the
major categories need to be listed in a test
 For Example:-
 Chapter 1. Definition of terms
 1.1. Test
 1.2. Measurement
 1.3. Evaluation
 Chapter 2-Qualities of Good Tests
 2.1. Reliability
 2.1.1.Test-Retest
 2.1.2. Internal Consistency
 2.2. Validity
 2.1. Content Validity
 2.2. Construct Validity
 2.3. Criterion Validity
 This is a table that relates outcomes to
content and indicates the relative weight
to be given to each of the various areas.
 The purpose of the table is to provide
assurance that the test will measure a
representative sample of the learning
outcomes and the subject-matter topics
to be measured.
Sr Content Specific Objectives
Knowled Comprehensio Applicati Analysis Synthesis Evaluatio T
o. ge n on n o

1 Test
2 Measure
 The table of specifications is like a blueprint to the test
 It specifies the number and the nature of the items in
the test, thereby providing a guide for item writing.
 If the table has been carefully prepared and the learning
outcomes clearly specified, the quality of the test will
depend largely on how closely the test maker can match
the specifications.
 Basic Principles of Achievement Testing
 I . Achievement tests should measure clearly
defined learning outcomes that are in harmony
with the instructional objectives.
 2. Achievement tests should measure a
representative sample of the learning outcomes
and subject matter included in the instruction.
 3. Achievement tests should include the types of
test items that are most appropriate for measuring
the desired learning outcomes.
 4. Achievement tests should be designed to
fit the particular uses to be made of the
 5. Achievement tests should be made as
reliable as possible and should then be
interpreted with caution.
 6. Achievement tests should be used to
improve student learning.
 Q1. Can we put pedagogically accepted logical
order of types of test items commonly used in
cognitive tests?

 There are six types of test items most commonly used in cognitive
tests. These item types are:
 True/False
 Fill-In the blank spaces
 Short Answer
 Matching
 Multiple-Choice
 Essay
 The true/false item presents the test-taker with a
statement that he or she must indicate is either
true or false.
 This type of item is a sensible choice for “naturally
dichotomous” content, that is, content that
presents the learner with only two plausible
 True/false items can assess the knowledge,
comprehension, and application levels.

 However, unfortunately they are most often

used to assess only the knowledge level.
 they are typically easier to write than other

 they are easily and reliably scored, and

test-taker responses can be submitted to
statistical item analysis that can be used to
improve the quality of the test.
 test-takers have a 50–50 chance of getting
the items correct simply by guessing.
 Matching items present test-takers with two
lists of words or phrases and ask the test-
taker to match each word or phrase on one
list (the “A” list) to a word or phrase on the
other (the “B” list).

 These items should be used only when

assessing understanding of homogeneous
content (types of wire, types of clouds,
types of switches, etc.).
 The matching item can assess the
knowledge and comprehension levels.
 can be scored quickly and objectively
 are relatively easy to write
 Responses to matching questions can be
submitted to statistical item analysis
 are limited to the two lowest levels of Bloom’s Taxonomy
 are constructed using heterogeneous content, that is, if the
words or phrases appearing on the “A” list are essentially
unrelated to one another, matching items become extremely
 Another difficulty with matching items results from test
writers including equal numbers of entries in both lists and
allowing items from the “B” list to be used only once.
 The multiple-choice item presents test-
takers with a question (technically called a
“stem”) and then asks them to choose from
among a series of alternative answers (a
single correct or best answer and several
 Sometimes the question takes the form of
an incomplete sentence followed by a series
of alternative completions among which the
test-taker is to choose.
 Multiple-choice questions can assess all
Bloom levels except the two highest ones,
synthesis and evaluation.
 Are most flexible
 are ideal for diagnostic testing.
 are quickly and reliably scored
 Low chance to get the correct answer by
 are difficult and time-consuming to write.
 are open-ended, that is, the answer does not appear
before the test-taker.
 is a question or an incomplete statement followed by
a blank line upon which the test-taker writes the
answer to the question or completes the sentence.
 Therefore, fill-in questions should be used when the
instructional objective requires that the test-taker
recall or create the correct answer rather than simply
recognize it.
 Objectives that require the correct spelling
of terms, for example, require fill-in items.
 Fill-in items are limited to those questions
that can be answered in a word or short
phrase; short answer and essay questions
require much longer responses
 Fill-in items can assess the knowledge,
comprehension, or application levels. They
most often are written, however, at the
knowledge level.
 Fill-in items are typically easy to write.
 are suitable only for questions that can be
answered with a word or short phrase.

 fill-in items present scoring problems.

 are open-ended questions requiring responses from
test-takers of one page or less in length.
 Short answer questions require responses longer than
those for fill-in items and shorter than those for essay
 Short answer questions are recommended when the
objective to be assessed requires that the test-taker
recall information unassisted.
 They can be used to assess all Bloom levels
except possibly the highest one, evaluation;
most responses to evaluation questions
would necessarily be somewhat longer.
 they are able to elicit original responses from test-
 are, unfortunately, extremely serious ones.
Most notably, short answer questions are
very difficult to score reliably
 are open-ended test questions requiring a
response longer than a page in length.
 are recommended for objectives that
require original, lengthy responses from
 are also recommended for the assessment
of writing skills.
 can be used to assess all levels of Bloom’s
 assess the highest cognitive levels.
 difficult to score reliably.
 I. Design each item to measure an
important learning outcome.
 2. Present a single clearly formulated
problems in the stem of the item.
 3. State the stem of the item in simple, clear
 4. Put as much of the wording as possible
in the stem of the item.
 5. State the stem of the item in positive
 6. Emphasize negative wording whenever it is
used in the stem of an item.
 7. Make certain that the intended answer is
or clearly best.
 8. Make all alternatives grammatically
with the stem of the item and parallel in
 9. Avoid verbal clues that might enable
students to select the correct answer or to
eliminate an incorrect alternative.
 Let's review some of the verbal clues commonly
found in multiple-choice items.
 (a) Similarity of wording in both the stem and the
correct answer
 (b) Stating the correct answer in textbook language
or stereotyped phraseology
 (c) Stating the correct answer in greater detail
 (d) Including absolute terms in the distracters
 (e) Including two responses that are all-inclusive.
 (f) include two responses that have the same
 10. make the distracters plausible and attractive to
the uninformed.
 11. Vary the relative length of the correct answer to
eliminate length cue.
 12. Avoid using “none of the above and all of the
 13. Vary the position of the correct answer in a
random manner
 1. Include only one central, significant idea in
each statement.
 2. Word the statement so precisely that it can
unambiguously be judged true or false.
 3. Keep the statements short, and use simple
language structure.
 4. Use negative statements carefully, and avoid
double negatives.
 5. Statements of opinion should be attributed to
some source.
 6. Avoid extraneous clues to the answer.
 Statements that include such absolutes as
"always," "never,“ "all," "none," and "only" tend
to be false;
 statements with qualifiers such as "usually,"
"may," and "sometimes" tend to be true.
 1. Include only homogeneous material in each
matching item.
 2. keep the list of items short and place the
brief response on the right.
 3. Use a large ,or small, number of response s
than premises, and permit the responses to be
used more than ones
 4. Specify in the direction the bases for
matching and indicate that each response can
be used once, or more than once
 1. Include only homogeneous material in
each matching item.
 2. keep the list of items short and place the
brief response on the right.
 3. Use a large ,or small, number of response
s than premises, and permit the responses to
be used more than ones
 4. Specify in the direction the bases for
matching and indicate that each response
can be used once, or more than once
 1. state the item so that only a brief single
answer is possible
 2. Start with a direct question , and switch to
an incomplete statement only when greater
conciseness is possible by doing so.
 3. The words to be supplied should relate to
the main point of the statement.
 4. Place the blanks at the end of the statement.
 5.Avoid extraneous clues to the answer
 6.For numerical answers, indicate the degree of
precision expected and the units in which they
are to be expressed.