Final Reviewer Educ7a

Assessment of Learning focuses on the development and utilization of assessment
tools to improve the teaching-learning process. It emphasizes on the use of testing for
measuring knowledge, comprehension and other thinking skills.
Lesson 2.1. Measurement

Measurement – is a process of quantifying or assigning number to the individual’s
intelligence, personality, attitudes and values and achievement of the students. To measure
is to apply a standard measuring device to an object, group of objects, events or situations
according to procedure determined by one who is skilled in using such device.
For instance, knowledge of the subject matter is often measured through
standardized test results. In this case measurement procedure is testing. A scale of 1 to 5
can also be used by a group of experts to rate a student’s (or a teacher’s) knowledge of the
subject matter. In this case, knowledge of the subject matter is measured through
perceptions.
Types of Measurement
Objective (as in testing) - measurements that do not depend on the person or individual
taking the measurements. Regardless of who is taking the measurement, the same
measurement values should be obtained when using an objective assessment procedure.
Subjective (as in perceptions) – often differ from one assessor to the next even if the
same quantity or quality is being measured. (aesthetic appeal of a product or project,
drama performance, etc.)
Measurement of quantity or quality of interest = true value + random error
Lesson 2.2. ASSESSMENT
Assessment - the process of gahering students’ performance over a period of time to

determine learning and mastery of skills. Such evidence of learning can take the forms of:
dialogue record, journals, written work, portfolios, tests, and other learning tasks.
-The overall goal of assessment is to improve student learning

and provide students, parents and teachers with reliable information regarding student
progress and extent attainment of the expected learning outcomes.
-Assessment results show the more permanent learning and
clearer picture of the student’ ability. It is a very powerful tool for educational
improvement.
Role of Assessment in Classroom Instruction

1. Beginning of Instruction
Placement Assessment according to Gronlund, Linn, and Miller (2009) is
concerned with the entry performance (to determine the prerequisite skills, degree of
mastery of the course objectives and the best mode of learning). Determines knowledge
and skills that an individual posseses which are necessary at the beginning of instruction
2. During Instruction
Formative Assessment (assessment FOR learning) is used to monitor learning
progress. It identifies learning errors that need to be corrected and it provides information
to make instruction more effective.
Diagnostic Assessment can be given at the beginning or during instruction to
identify the strengths and weaknesses of the students regarding the topic to be discussed
and to determine the causes of learning problems that cannot be revealed by formative
assessment and to formulate a plan for remedial action
3. End of Instruction
Summative Assessment (assessment OF learning) is given
at the end of the course or unit to determine the extent to which the instructional objectives
have been met. The effectiveness of the summative assessment depends on the validity and
reliability of the activity and tools.
Assessment AS learning is associated with self-assessment. Students sets their target,
actively monitor and evaluate their own learning in relation to their set target. They become
self-directed and independent learners.
Lesson 2.3. EVALUATION
Evaluation - is a process designed to provide information that will help us make a

judgement about a particular situation. The end result of evaluation is to adopt, reject, or
revise what has been evaluated. Objects of evalaution include instructional programs, school
projects, teachers, students, and educational goals. We expect our process to give
information regarding the worth, appropriateness, goodness, validity or legality of
something for which a reliable measurement has been made.
Evaluation is the process of assessing the impact and value of a
series of actions in achieving desired outcomes from start to finish. You can’t evaluate unless you have
a stated business problem, SMART goal and objectives
The subject of evaluation is wider than assessment which
focuses specifically on student learning outcomes
To summarize,
we measure height, distance, weight, knowledge of subject matter through
testing;
we assess learning outcome;
we evaluate results in terms of some criteria or objectives
Measurement refers to the process by which the attributes or dimension of some objects
or subjects of study are determined.
Assessment is a process of selecting, receiving and using data for the purpose of
improvement in the current Performance. One of the primary measurement tools in
education
Evaluation is an act of passing judgement on the basis of a set of standards. In education,
evaluation is the process of using the measurements gathered in the assessment to judge
the relationship between what was intended by the instruction and what was learned
Measurements are more objective as they have numerical standards to compare and
record. Evaluation could be seen to be more subjective as the evaluator and measures used
are part of human sciences and performance related.
Lesson 2.4. Assessment FOR, OF and AS Learning: Approaches to Assessment
Assessment of Learning
The predominant kind of assessment in schools is Assessment of Learning. Its purpose is summative,
intended to certify learning and report to parents and students about students’ progress in school,
usually by signalling students’ relative position compared to other students. Assessment of Learning in
classrooms is typically done at the end of something (eg, a unit, course, a grade, a Key Stage, a
program) and takes the form of tests or exams that include questions drawn from the material studied
during that time. In Assessment of Learning, the results are expressed symbolically, generally as
marks across several content areas to report to parents.
Assessment for Learning

Assessment for Learning offers an alternative perspective to traditional assessment in
schools. Simply put, Assessment for Learning shifts the emphasis from summative to
formative assessment, from making judgments to creating descriptions that can be used in
the service of the next stage of learning.
Assumption: Classroom assessment can enhance learning
Marking is not designed to make comparative judgments among the students but to
highlight each students’ strengths and weaknesses and provide them with feedback that will
further their learning.
When the cook tastes the soup, that’s formative; when the guests taste the soup, the summative. - Robert Stake
In reality, it is through classroom assessment that attitudes, skills, knowledge and thinking are fostered, nurtured and
accelerated – or stifled. -Hynes (1991)
Recordkeeping in this approach may include a grade book, but the records on which
teachers rely are things like checklists of student’s progress against expectations, artefacts,
portfolios of students work over time, and worksheets to trace the progression of students
along the learning continuum.
Assessment as Learning
Assessment for Learning can go a long way in enhancing student learning. By introducing the notion of
Assessment as Learning intends to reinforce and extends the role of formative assessment for
learning by emphasizing the role of the student, not only as a contributor to the assessment and
learning process, but also as the critical connector between them. This is the regulatory process in
metacognition. It occurs when students personally monitor what they are learning and use
the feedback from this monitoring to make adjustments, adaptations, and even major
changes in what they understand. Assessment as Learning is the ultimate goal, where students
are their own best assessors.
Assumption: Self-assessment is at the heart of the matter.
Recordkeeping in assessment as Learning is a personal affair. Students and teachers decide

(often together) about the important evidence of learning and how it should be organized and kept.
Students routinely reflect on their work and make judgements about how they can capitalise on what
they have done already. Comparison with others is almost irrelevant. Instead, the critical reference
points are the student’s own prior work and the aspirations and targets for continued learning.
Lesson 4.1. Principles of Good Practice in Assessing Learning Outcomes

9 Principles of Good Practice for Assessing Student Learning
1. The assessment of student learning begins with educational

values. Assessment is not an end in itself but a vehicle for educational improvement.
Educational values should drive not only what we choose to assess but also how we do
so.
2. Assessment is most effective when it reflects an understanding of learning as
multidimensional, integrated, and revealed in performance over time.
Learning is a complex process. It entails not only what students know but what they
can do with what they know; it involves not only knowledge and abilities but values,
attitudes, and habits of mind that affect both academic success and performance
beyond the classroom. Assessment should reflect these understandings by employing
a diverse array of methods, including those that call for actual performance, using
them over time so as to reveal change, growth, and increasing degrees of integration.
3. Assessment works best when the programs it seeks to improve have clear,
explicitly stated purposes.
Assessment is a goal-oriented process. It entails comparing educational performance
with educational purposes and expectations -- those derived from the institution's
mission, from faculty intentions in program and course design, and from knowledge of
students' own goals.
4. Assessment requires attention to outcomes but also and equally to the
experiences that lead to those outcomes.
Information about outcomes is of high importance; where students "end up" matters
greatly. But to improve outcomes, we need to know about student experience along
the way -- about the curricula, teaching, and kind of student effort that lead to
particular outcomes. Assessment can help us understand which students learn best
under what conditions; with such knowledge comes the capacity to improve the whole
of their learning.
5. Assessment works best when it is ongoing not episodic.
Assessment is a process whose power is cumulative. Though isolated, "one-shot"
assessment can be better than none, improvement is best fostered when assessment
entails a linked series of activities undertaken over time.
6. Assessment fosters wider improvement when representatives from across
the educational community are involved.
Student learning is a campus-wide responsibility, and assessment is a way of enacting
that responsibility. Thus, while assessment efforts may start small, the aim over time
is to involve people from across the educational community. Assessment may also
involve individuals from beyond the campus (alumni/ae, trustees, employers) whose
experience can enrich the sense of appropriate aims and standards for learning.
7. Assessment makes a difference when it begins with issues of use and
illuminates questions that people really care about.
Assessment recognizes the value of information in the process of improvement. But to
be useful, information must be connected to issues or questions that people really
care about. This implies assessment approaches that produce evidence that relevant
parties will find credible, suggestive, and applicable to decisions that need to be
made.
8. Assessment is most likely to lead to improvement when it is part of a larger
set of conditions that promote change.
Assessment alone changes little. Its greatest contribution comes on campuses where
the quality of teaching and learning is visibly valued and worked at. On such
campuses, the push to improve educational performance is a visible and primary goal
of leadership; improving the quality of undergraduate education is central to the
institution's planning, budgeting, and personnel decisions.
9. Through assessment, educators meet responsibilities to students and to the
public.
There is a compelling public stake in education. As educators, we have a responsibility
to the publics that support or depend on us to provide information about the ways in
which our students meet goals and expectations. But that responsibility goes beyond
the reporting of such information; our deeper obligation -- to ourselves, our students,
and society -- is to improve.
Lesson 4.3. Constructive Alignment
What is Constructive Alignment?

Constructive Alignment is a teaching principle that combines constructivism, the idea that
learners construct or create meaning out of learning activities and what they learn,
and alignment, a curriculum design concept that emphasizes the importance of defining and
achieving intended learning outcomes.
The goal of Constructive Alignment then, is to support students in developing as much
meaning and learning as possible from a well designed, coherent, and aligned
course. Courses are congruent and cohere in an explicit way when there is good fit and flow
between a course’s intended learning outcomes, teaching and learning activities, and
assessments of student learning.
Constructive Alignment involves:
 Thoughtfully determining intentions for what students should learn and how they
will demonstrate their achievement of these intended learning outcomes, and
clearly communicating these to students;
 Designing teaching and learning activities so that students are optimally engaged
in achieving these learning outcomes; and
 Creating assessments that will allow students to demonstrate their attainment of
the learning outcomes and allow instructors to discern how well these outcomes
have been achieved.
https://flexforward.pressbooks.com/chapter/constructive-alignment/
Lesson 4.4. VARIETY OF ASSESSMENT METHODS, TOOLS AND TASKS
Authentic assessments refer to assessments (non-paper-and-pencil test) wherein students are asked to
perform real-world tasks that demonstrate meaningful application of what they have learned. It is also
called alternative assessment
Examples:
Product Output (visual-graph, collage, reflective-journal, reports, papers, research projects)
Performance tasks (experiments, oral presentation, dramatization)
- Examples of performance tests are executing steps of tango, delivering a keynote speech,
opening a computer, deomonstration teaching, etc.)
Traditional assessments refer to conventional methods of testing (pen-and-paper test), usually

standardized and use pen and paper with multiple-choice, true or false or matching type test items.
Basic examples of paper-and-pencil tests are:

Selected Response Constructed Response
Alternate response Completion (Fill-in-the-blanks)
Matching type Short answer
Multiple Choice Essay
Problem solving
Lesson 4.5. Portfolio

Portfolio falls under non-paper-and pencil test. A portfolio is purposeful collection of
student work or documented performance (e.g. video of dance) that tells the story of
student achievement or growth. The word purposeful implies that a portfolio is not a
collection of all student's work. It is not just a receptacle for all student's work. The
student's work that is collected depends on the type and purpose of a portfolio you want to
have. It can be a collection of products or recorded performances or photos of
performances.
Types of Portfolio
Portfolios can be classified according to purpose. According to purpose, portfolios can be
classified either as l) working portfolios. 2) display portfolios or 3) assessment portfolios,
(Introduction to Using Portfolios in the Classroom by Charlotte Danielson and Leslye
Abrutyn)
Working or Development Portfolio
A working portfolio is so named because it is a project "in the works." containing work in
progress as well as finished samples of work. A growth portfolio demonstrates an
individual's development and growth over time. Development can be focused on academic
or thinking skills, content knowledge, self-knowledge, or any area that is important for your
purposes. For this reason, it is also called development portfolio. Growth or development
portfolio can serve as a holding tank for work that may be selected later for a more
permanent assessment or display portfolio.
Display, Showcase or Best Works Portfolios
It is the display of the students' best work. Students exhibit their best work and interpret its
meaning. Showcase portfolio demonstrates the highest level of achievement attained by the
student.
Assessment or Evaluation Portfolio
As the name implies, the main function of an assessment portfolio is

to document what a student has learned based on standards and competencies expected of
students at each grade level. The standards and competencies of the curriculum, then, will
determine what students select for their portfolios. Their reflective comments will focus on
the extent to which they believe the portfolio entries demonstrate their mastery of the
standards and competencies.
Lesson 4.6. Scoring Rubrics
A rubric is a coherent set of criteria for students’ work that includes descriptions of levels of
performance quality on the criteria. The purpose of rubrics is to assess performance made
evident in processes and products. It can serve as a scoring guide that seeks to evaluate a
student’s performance tasks. An objective type of test can be scored by simply counting the
correct answers but the essay tests, student’s products and performances cannot be scored
the way objective tests are being scored. Products and performances need scoring rubrics
for the score to be reliable.
Two Types of Rubrics

Holistic rubrics
 single criteria rubrics (one-dimensional) used to assess participants' overall
achievement on an activity or item based on predefined achievement levels;
 performance descriptions are written in paragraphs and usually in full sentences.
Analytic rubrics
 two-dimensional rubrics with levels of achievement as columns and assessment

criteria as rows. Allows you to assess participants' achievements based on multiple
criteria using a single rubric. You can assign different weights (value) to different
criteria and include an overall achievement by totaling the criteria;
 written in a table form.
Lesson 5.1. Planning a Test and Construction of Table of Specifications
1. Identifying Test Objectives

An objective test, if it is to be comprehensive, must cover the various levels of Bloom’s
taxonomy. Each objective consists of a statement of what is to be achieved and, preferably
by the students.
Example. We want to construct a test on the topic: “Subject-Verb Agreement in English” for
a Grade V class. The following are typical objectives:
Knowledge/Remembering. The students must be able to identify the subject and the
verb in a given sentence.
Comprehension/Understanding. The students must be able to determine the appropriate
form of a verb to be used given the subject of a sentence.
Application/Applying. The students must be able to write sentences observing rules on
subject-verb agreement.
Analysis/Analyzing. The students must be able to break down a given sentence into its
subject and predicate.
Evaluation/Evaluating. The students must be able to evaluate whether or not a sentence
observes rules on subject-verb agreement
Synthesis/Creating. The students must be able to formulate rules to be followed
regarding the subject-verb agreement.
2. Deciding on the type of objective test

The test objectives guide the kind of objective tests that will be designed and constructed
by the teacher. For instance, for the first four (4) levels, we may want to construct a
multiple-choice type of test while for application and judgment, we may opt to give an
essay test or a modified essay test. At all times, the test to be formulated must be aligned
with the learning outcome. This is the principle of constructive alignment.
3. Preparing a Table of Specifications (TOS)
A table of specifications or TOS is a test map that guides the teacher in constructing a
test. The TOS ensures that there is balance between items that test lower level thinking
skills and those which test higher order thinking skills ( or alternatively, a balance
between easy and difficult items) in the test. The simplest TOS consists of four (4)
columns: (a) level of objective to be tested, (b) statement of objective, (c) item numbers
where such an objective is being tested, and (d) Number of items and percentage out of
the total for that particular objective.
4. Constructing the test items
The actual construction of the test items follows the TOS. As a general rule, it is advised
that the actual number of items to be constructed in the draft should be double the desired
number of items, For instance, if there are five (5) knowledge level items to be included
in the final test form, then at least ten (10) knowledge level items should be included in
the draft. The subsequent test try-out and item analysis will most likely eliminate many of
the constructed items in the draft (either they are too difficult, too easy or non-
discriminatory), hence, it will be necessary to construct more items than will actually be
included in the final test form.
5. Item analysis and try-out
The test draft is tried out to a group of pupils or students. The purpose of this try out is to
determine the : (a.) item characteristics through item analysis, and (b) characteristics of
the test itself-validity, reliability, and practicality.
Lesson 5.2. Constructing Types of Paper-and-Pencil Tests
A. TRUE-FALSE TEST
Binomial-choice tests are tests that have only two (2) options such as true or false, right
or wrong, good or better and so on.
Rule 1: Do not give a hint (inadvertently) in the body of the question.
Example: The Philippines gained its independence in 1898 and therefore celebrated its
centennial year in 2000. ______
Obviously, the answer is FALSE because 100 years from 1898 is not 2000 but 1998.
Rule 2: Avoid using the words “always”, “never” “often” and other adverbs that
tend to be either always true or always false.
Example: Christmas always falls on a Sunday because it is a Sabbath day.
Statements that use the word “always” are almost always false. A test-wise student can
easily guess his way through a test like these and get high scores even if he does not
know anything about the test.
Rule 3: Avoid long sentences as these tend to be “true”. Keep sentences short.
Example: Tests need to be valid, reliable and useful, although, it would require a great
amount of time and effort to ensure that tests possess these test characteristics. _______
Notice that the statement is true. However, we are also not sure which part of the
sentence is deemed true by the student. It is just fortunate that in this case, all parts of the
sentence are true and hence, the entire sentence is true. The following example illustrates
what can go wrong in long sentences:
Example: Tests need to be valid, reliable and useful since it takes very little amount of
time, money and effort to construct tests with these characteristics.
The first part of the sentence is true but the second part is debatable and may, in fact, be
false. Thus, a “true” response is correct and also, a “false” response is correct.
Rule 4. Avoid trick statements with some minor misleading word or spelling
anomaly, misplaced phrases, etc.
A wise student who does not know the subject matter may detect this strategy and
thus get the answer correctly.
Example: True or False. The Principle of our school is Mr. Albert P. Panadero.
The Principal’s name may actually be correct but since the word is misspelled and the
entire sentence takes a different meaning, the answer would be false! This is an example
of a tricky but utterly useless item.
Rule 5: Avoid quoting verbatim from reference materials or textbooks. This
practice sends the wrong signal to the students that it is necessary to memorize the
textbook word for word and thus, acquisition of higher level thinking skills is not
given due importance.
Rule 6. Avoid specific determiners or give-away Qualifiers. Students quickly learn
that strongly worded statements are more likely to be false than true, for example,
statements with “never” “no” “all” or “always.” Moderately worded statements are
more likely to be true than false. Statements with “many” “often” “sometimes”
“generally” ‘frequently” or “some” should be avoided.
Rule 7. With true or false questions, avoid a grossly disproportionate number of

either true or false statements or even patterns in the occurrence of true and false
statements.
B. CONSTRUCTING MULTIPLE CHOICE TESTS
A generalization of the true-false test, the multiple-choice type of test offers the student
with more than two (2) options per item to choose from. Each item in a multiple-choice
test consists of two parts: (a) the stem, and (b) the options. In the set of options, there is
a “correct” or “best” option while all the others are considered “distracters”. The
distracters are chosen in such a way that they are attractive to those who do not know the
answer or are guessing but at the same time, have no appeal to those who actually know
the answer. It is this feature of multiple-choice type tests that allow the teacher to test
higher-order thinking skills even if the options are clearly stated. As in true-false items,
there are certain rules of thumb to be followed in constructing multiple-choice tests.
Guidelines in constructing Multiple Choice Items
Rule 1: Do not use unfamiliar words, terms and phrases. The ability of the item to
discriminate or its level of difficulty should stem from the subject matter rather than from
the wording of the question.
Example: What would be the system reliability of a computer system whose slave and
peripherals are connected in parallel circuits and each one has a known time to failure
probability of 0.05?
A student completely unfamiliar with the terms “slave” and “peripherals”may not be able
to answer correctly even if he knew the subject matter of reliability.
Rule 2: Do not use modifiers that are vague and whose meanings can differ from
one person to the next such as: much, often, usually, etc.
Example: Much of the process of photosynthesis takes place in the:
a. bark b. leaf c. stem
The qualifier “much” is vague and could have been replaced by more specific qualifiers
like:” 90% of the photosynthetic process” or some similar phrase that would be more
precise. Be quantitative
Rule 3: Avoid complex or awkward word arrangements. Also, avoid use of negatives
in the stem as this may add unnecessary comprehension difficulties.
Example:
(Poor) As President of the Republic of the Philippines, Corazon Cojuangco Aquino would
stand next to which President of the Philippine Republic subsequent to the 1986 EDSA
Revolution?
(Better) Who was the President of the Philippines after Corazon C. Aquino?
Rule 4: Do not use negatives or double negatives as such statements tend to be

confusing. It is best to use simpler sentences rather than sentences that would
require expertise in grammatical construction.
Example:
(Poor) Which of the following will not cause inflation in the Philippine economy?
(Better) Which of the following will cause inflation in the Philippine economy?
(Poor) What does the statement “Development patterns acquired during the formative
years are NOT Unchangeable” imply?
A. B. C. D.
(Better) What does the statement “Development patterns acquired during the formative
years are changeable” imply?
A. B. C. D.
Rule 5: Each item stem should be as short as possible; otherwise you risk testing
more for reading and comprehension skills.
Rule 6: Distracters should be equally plausible and attractive.

Example: The short story: May Day’s Eve, was written by which Filipino author?
a. Jose Garcia Villa b. Nick Joaquin c. Genoveva Edrosa Matute
d. Robert Frost e. Edgar Allan Poe
If distracters had all been Filipino authors, the value of the item would be greatly
increased. In this particular instance, only the first three carry the burden of the entire
item since the last two can be essentially disregarded by the students.
Rule 7: All multiple choice options should be grammatically consistent with the
stem.
Example:
As compared to the autos of the 1960s autos in the 1980s ______
a. Traveling slower b. bigger interiors c. to use less fuel
d. contain more safety measures
Option A, B and C are obviously wrong for the language smart because when added to
the stem the sentence is grammatically wrong. D is the only option which when
connected to the stem retains the grammatical accuracy of the sentence, thus obviously is
the correct answer
Rule 8: The length, explicitness, or degree of technicality of alternatives should not

be the determinants of the correctness of the answer. The following is an example
of this rule:
Example: If the three angles of two triangles are congruent, then the triangles are:
a. congruent whenever one of the sides of the triangles are congruent
b. similar
c. equiangular and therefore. must also be congruent
d. equilateral if they are equiangular
The correct choice, “b,” may be obvious from its length and explicitness alone. The other
choices are long and tend to explain why they must be the correct choices forcing the
students to think that they are, in fact, not the correct answers!
Rule 9: Avoid stems that reveal the answer to another item.
Rule 10: Avoid alternatives that are synonymous with others or those that, include
or overlap others.
Example: What causes ice to transform from solid state to liquid state’?
a. Change in temperature
b. Changes in pressure
c. Change in the chemical composition
d. Change in heat levels
The options a and d are essentially the same. Thus, a student who spots these identical
choices would right away narrow down the field of choices to a, b, and c. The last
distracter would play no significant role in increasing the value of the item.
Rule 11: Avoid presenting sequenced items in the same order as in the text.
Rule 12: Avoid use of assumed qualifiers that many examinees may not be aware of.
Rule 13: Avoid use of unnecessary words or phrases, which are not relevant to the
problem at hand (unless such discriminating ability is the primary intent of the
evaluation). The items value is particularly damaged if the unnecessary material is
designed to distract or mislead. Such items test the student’s reading comprehension
rather than knowledge of the subject matter.
Example: The side opposite the thirty degree angle in a right triangle is equal to half the
length of the hypotenuse. If the sine of a 30-degree is 0.5 and its hypotenuse is 5, what is
the length of the side opposite the 30-degree angle?
a. 2.5 b. 3.5 c. 5.5 d. 1.5
The sine of a 30-degree angle is really quite unnecessary since the first sentence already
gives the method for finding the length of the side opposite the thirty-degree angle. This
is a case of a teacher who wants to make sure that no student in his class gets the wrong
answer!
Rule 14: Avoid use of non-relevant sources of difficulty such as requiring a complex
calculation when only knowledge of a principle is being tested.
Note in the previous example, knowledge of the sine of the 30-degree angle would have
led some students to use the sine formula for calculation even if a simpler approach
would have sufficed.
Rule 15: Avoid extreme specificity requirements in responses.
Rule 16: Include as much of the item as possible in the stem. This allows for less
repetition and shorter choice options.
Rule 17: Use the “None of the above” option only when the keyed answer is totally
correct.
When choice of the “best” response is intended, “none of the above” is not
appropriate, since the implication has already been made that the correct response
may be partially inaccurate.
Rule 18: Note that the use of “all of the above” may allow credit for partial
knowledge.
In a multiple option item, (allowing only one option choice) if a student only knew
that two (2) options were correct, he could then deduce the correctness of “all of the
above”. This assumes you are allowed only one correct choice.
Rule 19: Having compound response choices may purposefully increase difficulty of
an item.
Rule 20: The difficulty of a multiple choice item may be controlled by varying the
homogeneity or degree of similarity of responses.
The more homogeneous, the more difficult the item.
Example:
(Less Homogeneous) Thailand is located in:
a. Southeast Asia b. Eastern Europe c. South America
d. East Africa e. Central America
(More Homogeneous) Thailand is located next to:
a. Laos and Kampuchea b. India and China
c. China and Malaya d. Laos and China
e. India and Malaya
C. CONSTRUCTING MATCHING TYPE AND SUPPLY TYPE ITEMS
The matching type items may be considered as modified multiple-choice type items
where the choices progressively reduce as one successfully matches the items on the left
with the items on the right.
Example: Match the items in column A with the items in column B.

A B
_________1. Magellan a. First President of the Republic
_________2. Mabini b. National Hero
_________3. Rizal c. Discovered the Philippines
_________4. Lapu-Lapu d. Brain of Katiputian
_________5. Aguinaldo e. The great painter
f. Defended Limasawa island
Normally, column B will contain more items than column A to prevent guessing on the
part of the students.
Matching type items, unfortunately, often test lower order thinking skills (knowledge
level) and are unable to test higher order thinking skills such as application and
judgement skills.
A variant of the matching type items is the data sufficiency and comparison type of test
illustrated below: This is also another example of completion type of test that measures
higher order-thinking skill.
Example: Write G if the item on the left is greater than the item on the right; L if the item
on the left is less than the item on the right; E if the item on the left equals the item on the
right and D if the relationship cannot be determined.
A B
1. Square root of 9 ______ a. -3

2. Square root of 25 ______ b. 615
3. 36 inches ______ c. 3 meters
4. 4 feet ______ d. 48 inches
5. 1 kilogram ____ e. 1 pound
The data sufficiency test above can, if properly constructed, test higher-order thinking
skills. Each item goes beyond simple recall of facts and, in fact, requires the students to
make decisions.
Another useful device for testing lower-order thinking skills is the supply type of tests.
Like the multiple-choice test, the items in this kind of test consist of a stem and a blank
where the students would write the correct answer.
Example: The study of life and living organisms is called __________.
Supply type tests depend heavily on the way that the stems are constructed. These tests
allow for one and only one answer and, hence, often test only the students’ knowledge. It
is , however, possible to construct supply type of tests that will test higher order
thinking as the following example shows:
Example: Write an appropriate synonym for each of the following. Each blank
corresponds to a letter:
Metamorphose: _ _ _ _ _ _
Flourish: _ _ _ _
The appropriate synonym for the first is CHANGE with six(6) letters while the
appropriate synonym for the second is GROW with four (4) letters. Notice that these
questions require not only mere recall of words but also understanding of these words.
D. CONSTRUCTING ESSAY TESTS
Essays, classified as non-objective tests, allow for the assessment of higher-order

thinking skills. Such tests require students to organize their thoughts on a subject matter
in coherent sentences in order to inform an audience. In essay tests, students are required
to write one or more paragraphs on a specific topic.
Essay questions can be used to measure the attainment of a variety of objectives.

Note that all these involve the higher-level skills mentioned in Bloom’s Taxonomy.
The following are rules of thumb which facilitate the scoring of essays:
Rule 1: Phrase the direction in such a way that students are guided on the key
concepts to be included. Specify how the students should respond
Example: Write an essay on the topic: “Plant Photosynthesis” using the following
keywords and phrases: chlorophyll, sunlight, water, carbon dioxide, oxygen, by-product,
stomata.
Note that the students are properly guided in terms of the keywords that the teacher is
looking for in this essay examination. An essay such as the one given below will get a
score of zero (0). Why?
Plant Photosynthesis
Nature has its own way of ensuring the balance between food producers and consumers.
Plants are considered producers of food for animals. Plants produce food _for animals
through a process called photosynthesis. It is a complex process that combines various
natural elements on earth into the final product which animals can consume in order to
survive. Naturally, we all need to protect plants so that we will continue to have food on
our table. We should discourage the burning of grasses, cutting trees, and illegal
logging. If the leaves of plants are destroyed, they cannot perform photosynthesis and
animals will also perish.
Rule 2: Inform the students on the criteria to be used for grading their essays.
This rule allows the students to focus on relevant and substantive materials rather
than on peripheral and unnecessary facts and bits of information.
Example: Write an essay on the topic: “Plant Photosynthesis” using the keywords
indicated. You will be graded according to the following criteria: (a) coherence, (b)
accuracy of statements, (c) use of keywords, (d) clarity and (e) extra points for innovative
presentation of ideas.
Rule 3: Put a time limit on the essay test.

Rule 4: Decide on your essay grading system prior to getting the essays of your
students.
Rule 5: Evaluate all of the students’ answers to one question before proceeding to
the next question.
Scoring or grading essay tests question by question, rather than student by student, makes
it possible to maintain a more uniform standard for judging the answers to each question.
This procedure also helps offset the halo effect in grading. When all of the answers on
one paper are read together, the grader’s impression of the paper as a whole is apt to
influence the grades he assigns to the individual answers. Grading question by question,
of course. prevents the formation of this overall impression of a student’s paper. Each
answer is more apt to be judged on its own merits when it is read and compared with
other answers to the same question. than when it is read and compared with other answers
by the same student.
Rule 6: Evaluate answers to essay questions without knowing the identity of the
writer.
This is another attempt is control personal bias during scoring. Answers to essay
questions should be evaluated in terms of what is written, not it terms of what is known
about the writers from other contacts with them. The best way to prevent our prior
knowledge from influencing our judgment is to evaluate each answer without knowing
the identity of the writer. This can be done by having the students write their names on
the back of the paper or by using code numbers in place of names.
Rule 7: Whenever possible, have two or more persons grade each answer. The best
way to check on the reliability of the scoring of essay answers is to obtain two or
more independent judgments.
Although this may not be a feasible practice for routine classroom testing, it might be
done periodically with a fellow teacher (one who is equally competent in the area).
Obtaining two or more independent ratings becomes especially vital where the results are
to be used for important and irreversible decisions, such as in the selection of students for
further training or for special awards. Here the pooled ratings of several competent
persons may be needed to attain level of reliability that is commensurate with the
significance of the decision being made.
Some teachers use the cumulative criteria i.e. adding the weights given to each criterion,
as basis for grading while others use the reverse. In the latter method, each student begins
with a score of 100. Points are then deducted every time a teacher encounters a mistake
or when a criterion is missed by the student in his essay.
Note: In every test write instructions that are clear, explicit, and
unambiguous
Lesson 6.1. Item Analysis

Item analysis provides statistics on overall performance, test quality, and individual
questions. This data helps you recognize questions that might be poor discriminators of
student performance.
Item Analysis is an important (probably the most important) tool to increase test
effectiveness. Each items contribution is analyzed and assessed.
To write effective items, it is necessary to examine whether they are measuring the fact,
idea, or concept for which they were intended. This is done by studying the student’s
responses to each item. When formalized, the procedure is called “item analysis”. It is a
scientific way of improving the quality of tests and test items in an item bank.
An item analysis provides three kinds of important information about the quality of test
items.
 Item difficulty: A measure of whether an item was too easy or too hard.
 Item discrimination: A measure of whether an item discriminated between students who

knew the answer and students who did not.
 Effectiveness of alternatives: Determination of whether distractors (incorrect but

plausible answers) tend to be marked by the less able students and not by the more
able students.
(https://wtesting/faculty-information/test-scoring/score-report-interpretation/item-analysis-1, n.d.)
Uses for item analysis:
 Improve questions for future test administrations or to adjust credit on current attempts
 Discuss test results with your class
 Provide a basis for remedial work
 Improve classroom instruction
View test statistics:

 Possible Points: The total number of points for the test.
 Possible Questions: The total number of questions in the test.
 In Progress Attempts: The number of students currently taking the test who haven't
submitted it yet.
 Completed Attempts: The number of submitted tests.
 Average Score: Scores denoted with an asterisk indicate that some attempts aren't graded
and that the average score might change after all attempts are graded. The score shown is
the average score reported for the test in the Grade Center.
 Average Time: The average completion time for all submitted attempts.
Item discrimination refers to the ability of an item to differentiate among students on the
basis of how well they know the material being tested.
 Discrimination: Shows the number of questions that fall into the:
Good (greater than 0.3),
Fair (between 0.1 and 0.3),
Poor (less than 0.1) categories.
A discrimination value is listed as Cannot Calculate when the question's difficulty is
100% or when all students receive the same score on a question. Questions with
discrimination values in the Good and Fair categories are better at differentiating between
students with higher and lower levels of knowledge. Questions in the Poor category are
recommended for review.
Discrimination index reflects the degree to which an item and the test as a whole are
measuring a unitary ability or attribute, values of the coefficient will tend to be lower for tests
measuring a wide range of content areas than for more homogeneous tests.
Item Discrimination I
The single best measure of the effectiveness of an item is its ability to separate students
who vary in their degree of knowledge of the material tested and their ability to use it. If
one group of students has mastered the material and the other group had not, a larger
portion of the former group should be expected to correctly answer a test item. Item
discrimination is the difference between the percentage correct for these two groups.
Item discrimination can be calculated by ranking the students according to total score and
then selecting the top 27 percent and the lowest 27 percent in terms of total score. For each
item, the percentage of students in the upper and lower groups answering correctly is
calculated. The difference is one measure of
item discrimination (IDis). The formula is:

IDis = (Upper Group Percent Correct) – (Lower Group Percent Correct)
Item #1 in the attached report would have an IDis of
100% - 62.5% = 37.5% (or .375 as a decimal).
The following levels may be used as a guideline for acceptable items.

Negative IDis Unacceptable – check item for error
0% - 24% Usually unacceptable – might be approved
25% - 39% Good item
40% - 100% Excellent item
Item Discrimination II
The point biserial correlation (PBC) measures the correlation between the correct answer
(viewed as 1 = right and 0 = wrong) on an item and the total test score of all students. The
PBC is sometimes preferred because it identifies items that correctly discriminate between
high and low groups, as defined by the test as a whole instead of the upper and lower 27
percent of a group.
Inspection of the attached report shows that the PBC can generate a substantially different
measurement of item discrimination than the simple item discrimination difference
described above. Often, however, the measures are in close agreement.
Generally, the higher the PBC the better the item discrimination, and thus, the effectiveness
of the item. The following criteria may be used to evaluate test items.
PBC Interpretation
.30 and above Very good items
.20 to .29 Reasonably good items, but subject to improvement
.10 to .19 Marginal items, usually needing improvement
.00 to .09 Poor items, to be rejected or revised
“Item difficulty” is the percentage of the total group that got the item correct.
The item difficulty index ranges from 0 to 100; the higher the value, the easier the question.
When an alternative is worth other than a single point, or when there is more than one correct
alternative per question, the item difficulty is the average score on that item divided by the
highest number of points for any one alternative.
Item difficulty is relevant for determining whether students have learned the concept being
tested. It also plays an important role in the ability of an item to discriminate between students
who know the tested material and those who do not. The item will have low discrimination if it is
so difficult that almost everyone gets it wrong or guesses, or so easy that almost everyone gets
it right.
Item difficulty is important because it reveals whether an item is too easy or too hard. In
either case, the item may add to the unreliability of the test because it does not aid in
differentiating between those students who know the material and those who do not. For
example, an item answered correctly by everyone does nothing to aid in the assignment of
grades. The same is true for items that no one answers correctly.
The optimal item difficulty depends on the question-type and on the number of possible
distractors. Many test experts believe that for a maximum discrimination between high and
low achievers, the optimal levels (adjusting for guessing) are:
Items with difficulties less than 30 percent or more than 90 perecent definitely need
attention. Such items should either be revised or replaced. An exception might be at the
beginning of a test where easier items (90 percent or higher) may be desirable.
Distractors & Effectiveness

Although Item Discrimination statistics measure important characteristics about test item
effectiveness, they don’t reveal much about the appropriateness of item distractors. By
looking at the pattern of responses to distractors, teachers can often determine how to
improve the test.
The effectiveness of a multiple-choice question is heavily dependent on its distractors. If
two distractors in a four-choice item are implausible, the question becomes, in effect, a true
false item. It is, therefore, important for teachers to observe how many students select
each distractor and to revise those that draw little or no attention. Use of “all of the above”
and “none of the above” is generally discouraged.
Reliability & Validity
The importance of a test achieving a reasonable level of reliability and validity cannot be
overemphasized. To the extent a test lacks reliability, the meaning of individual scores is
ambiguous. A score of 80, say, may be no different than a score of 70 or 90 in terms of
what a student knows, as measured by the test. If a test is not reliable, it is not valid.
If an instrument is unreliable, it cannot get valid outcomes
Lesson 6.2. Reliability

Reliability refers to the consistency of test scores; how consistent a particular student’s
test scores are from one testing to another.
It is the degree to which an assessment tool produces stable and consistent results.
One of the best estimates of reliability of test scores from a single administration of a test is
provided by the Kuder-Richardson Formula 20 (KR20) or KR-21. On the “Standard Item
Analysis Report” attached, it is found in the top center area. For example, in this report the
reliability coefficient is .87. For good classroom tests, the reliability coefficients should
be .70 or higher.
To increase the likelihood of obtaining higher reliability, a teacher can:
1. increase the length of the test;
2. include questions that measure higher, more complex levels of learning, and include
questions with a range of difficulty with most questions in the middle range; and
3. if one or more essay questions are included on the test, grade them as objectively as
possible.
Types of Reliability
Test-retest reliability is a measure of reliability obtained by administering the

same test twice over a period of time to a group of individuals. The scores from Time 1 and
Time 2 can then be correlated in order to evaluate the test for stability over time.
Example: A test designed to assess student learning in psychology could be given to a

group of students twice, with the second administration perhaps coming a week after the
first. The obtained correlation coefficient would indicate the stability of the scores.
Parallel forms reliability is a measure of reliability obtained by administering

different versions of an assessment tool (both versions must contain items that probe the
same construct, skill, knowledge base, etc.) to the same group of individuals. The scores
from the two versions can then be correlated in order to evaluate the consistency of results
across alternate versions.
Example: If you wanted to evaluate the reliability of a critical thinking assessment, you
might create a large set of items that all pertain to critical thinking and then randomly split
the questions up into two sets, which would represent the parallel forms.
Inter-rater reliability is a measure of reliability used to assess the degree to which

different judges or raters agree in their assessment decisions. Inter-rater reliability is
useful because human observers will not necessarily interpret answers the same way; raters
may disagree as to how well certain responses or material demonstrate knowledge of the
construct or skill being assessed.
Example: Inter-rater reliability might be employed when different judges are evaluating the
degree to which art portfolios meet certain standards. Inter-rater reliability is especially
useful when judgments can be considered relatively subjective. Thus, the use of this type
of reliability would probably be more likely when evaluating artwork as opposed to math
problems.
Internal consistency reliability is a measure of reliability used to evaluate the

degree to which different test items that probe the same construct produce similar results.
Average inter-item correlation is a subtype of internal consistency reliability. It is

obtained by taking all of the items on a test that probe the same construct (e.g., reading
comprehension), determining the correlation coefficient for each pair of items, and finally
taking the average of all of these correlation coefficients. This final step yields the average
inter-item correlation.
Split-half reliability is another subtype of internal consistency reliability. The

process of obtaining split-half reliability is begun by “splitting in half” all items of a test that
are intended to probe the same area of knowledge (e.g., World War II) in order to form two
“sets” of items. The entire test is administered to a group of individuals, the total score for
each “set” is computed, and finally the split-half reliability is obtained by determining the
correlation between the two total “set” scores.
Lesson 6.3. Validity of a Test
Validity refers to how well a test measures what it is purported to measure.
Content or curricular validity is generally used to assess whether a classroom test is

measuring what it is supposed to measure.
A quantitative method of assessing test validity is to examine each test item. This is
accomplished by reviewing the discrimination (IDis) of each item. If an item has a
discrimination measure of 25 percent or higher, it is said to have validity, that is, it is doing
what it is suppose to be doing – discriminating between those that are knowledgeable and
those that are not knowledgeable.
(https://www.uwosh.edu/testing/faculty-information/test-scoring/score-report-interpretation/item-
analysis-1, n.d.)
Types of Validity
1. Face Validity ascertains that the measure appears to be assessing the intended
construct under study. The stakeholders can easily assess face validity. Although this is not
a very “scientific” type of validity, it may be an essential component in enlisting motivation
of stakeholders. If the stakeholders do not believe the measure is an accurate assessment
of the ability, they may become disengaged with the task.
Example: If a measure of art appreciation is created all of the items should be related to the
different components and types of art. If the questions are regarding historical time
periods, with no reference to any artistic movement, stakeholders may not be motivated to
give their best effort or invest in this measure because they do not believe it is a true
assessment of art appreciation.
2. Construct Validity is used to ensure that the measure is actually measure what it is
intended to measure (i.e. the construct), and not other variables. Using a panel of “experts”
familiar with the construct is a way in which this type of validity can be assessed. The
experts can examine the items and decide what that specific item is intended to measure.
Students can be involved in this process to obtain their feedback.
Example: A women’s studies program may design a cumulative assessment of learning

throughout the major. The questions are written with complicated wording and phrasing.
This can cause the test inadvertently becoming a test of reading comprehension, rather
than a test of women’s studies. It is important that the measure is actually assessing the
intended construct, rather than an extraneous factor.
3. Criterion-Related Validity is used to predict future or current performance - it

correlates test results with another criterion of interest.
Example: If a physics program designed a measure to assess cumulative student learning

throughout the major. The new measure could be correlated with a standardized measure
of ability in this discipline, such as an ETS field test or the GRE subject test. The higher the
correlation between the established measure and new measure, the more faith stakeholders
can have in the new assessment tool.
Lesson 7.1: Measures of Central Tendency
To further describe a set of data, a single summarizing value maybe computed, known as
Measure of Central Tendency. Though tabularand graphical summaries convey general impressions
on the data, measures of central tendency or location give information about a single value in which
the data set of observations tends to cluster to. Some of the popular and commonly used measures of
central tendency are the mean, median, and the mode.
1.1 MEAN
Arithmetic mean
- Is the average value of the data set; it is denoted by μ, defined as the sum of all the
obsercations divided by the total number of observations. In symbols, if we let X=the
value of the i thobservation and N= number of observation, then the mean is given by:
N
∑ Xi
μ= i=1 i=1 , 2 , … . N
N
1. Find the average of the scores: 80, 82, 76, 78, 82, and
91.
Solution:
N
∑ Xi 80+82+76+78+ 82+91
μ= i=1 = =81.5
N 6
Weighted mean
- The value is obtained by summing up the product of each score by its corresponding
weight, divided by the sum of the weights. In formula,
2. Angel has the following grades and the equivalent credit units foe each grade. Determine her GWA
(General Weighted Average)
Subject Units (wi) Grade (xi) wi(xi)
Filipino 2 3 87 261
English 3 84 252
Math 1 3 85 255
P.E 1 95 95
Chem 1 (Lec) 3 82 246
Chem 1 (Lab) 1 82 82
Philo 1 3 85 255
17 1446
Solution:
1446
X w= =85.06
17
Thus, the weighted mean is 85.06
Grand Mean
- This value is obtained by summing up the mean of each group multiplied by the number
of scores in the group, divided by the sum of the number of scores in each group.
n
∑nx where, x= mean of each group

x= k=1
∑n
n= # observation in the group.
3: In a P. E. class, there are 25 freshmen, 30 sophomores, and 10 juniors. If the freshmen averaged
89 in a practical test, the sophomores averaged 85, and the juniors averaged 82, find the mean grade
of the entire class.
25 ( 89 ) +30 ( 85 ) +10(82) 5.595

Solution x̅ G= = = 86.08
65 65
Answer: the entire class has a mean grade of 86
4. A researcher conducted a survey involving three groups of respondents. The mean of the first group
with 24 respondents was 4.32.
The Median
The median, denoted by Md, is a single value which divides an array of observations into two equal
parts such that half of the observations fall below
it and the other half fall above it. It is the middlemost value in the array.
1. Find the median of the scores 7, 2, 3, 7, 6, 9, 10, 8, 9,

9, 10.
Solution: arrange the scores in increasing magnitude or ascending order
2, 3, 6, 7, 7, 8, 9, 9, 9, 10, 10
With these eleven scores, the number 8 is located in the exact middle, so 8 is the median.
8. Find the median of the scores 7, 2, 3, 7, 6, 9, 10, 8, 9, 9
Solution: Again, arrange the scores
2, 3, 6, 7, 7, 8, 9, 9, 9, 10
The two centermost scores are 7 and 8. So, we find the mean of these two scores.
7+8
=7.5 Thus, 7.5 is the median of the given scores.
2
The Mode
The mode, denoted by Mo, is the value which occurs most frequently in the given data set.
1. Find the mode

a. The scores 1, 2, 3, 2, 4, 7, 9, 2 have a mode of 2.
b. The scores 2, 3, 6, 7, 8, 9 have no mode since no score is repeated
c. The scores 1, 2, 2, 3, 4, 5, 2, 5, 6, 6, 7, 9, 6 have the modes 2 and 6 since they both occur
with the same highest frequency (we refer to such data as bimodal)
d. The scores 3, 4, 5, 1, 3, 2, 4, 5, 7, 10 have the modes 3, and 4, and 5. (multimodal)
Lesson 7.2. Standard Normal Distribution
A score distribution has a normal distribution when most of the values are aggregated
around the mean and the number of values decrease as you move below or above the
mean: the bar graph of frequencies of a normally distributed sample will look like a bell
curve. The mean, median and mode are equal.
A normal curve.
Skewed Distribution:
If one tail is longer than another, the distribution is skewed. These distributions are
sometimes called asymmetric or asymmetrical distributions as they don’t show any kind of
symmetry. Symmetry means that one half of the distribution is a mirror image of the other
half. For example, the normal distribution is a symmetric distribution with no skew. The tails
are exactly the same.
A left-skewed distribution has a long left tail. Left-skewed distributions are also
called negatively-skewed distributions. That’s because there is a long tail in the negative
direction on the number line. The mean is also to the left of the peak.
A right-skewed distribution has a long right tail. Right-skewed distributions are also
called positive-skew distributions. That’s because there is a long tail in the positive direction
on the number line. The mean is also to the right of the peak. The scores tend to
congregate at the lower end of the score distribution.
https://www.statisticshowto.com/probability-and-statistics/skewed-distribution
NORMAL DISTRIBUTIONS AND THE EMPIRICAL RULE
One of the most important statistical distributions of data is known as normal distribution. This distribution
A Normal distribution forms a bell-shaped curve that is symmetric about a vertical line through the mean
Properties of a Normal Distribution

*Every normal distribution has the following properties.
*The graph is symmetric about a vertical line through the
mean of the distribution.
*The mean, median, and mode are equal.
*The y-value of each point on the curve is the percent
(expressed as a decimal) of the data at the corresponding
x-value.
*Areas under the curve that are symmetric about the
mean are equal.
*The total area under the curve is 1.
In the normal distribution shown at the left ,the area of the shaded region is 0.159
units .This region represents the fact that 15.9% of the
data values are greater than or equal to 10 Because the
area under the curve is 1 ,the unshaded region under the curve has area 1-0.159, or 0.841,
representing the fact that 84.1% of the data are less than 10.
The following rule, called the empirical Rule, describe the percents of data that lie within
1,2 and3 standard deviation of the mean in a normal distribution.
Empirical Rule for a Normal Distribution

In normal distribution ,apprOximately
68% of the data lie within 1 standard deviation of the mean.
95% of the data lie within 2 standard deviation of the mean.
99.7% of the data lie within 3 standard deviation of the mean.
Question: what is the area under the curve to the right of the mean for a normal distribution?
ANSWER: Because normal distribution is symmetric about the mean, the area under the curve
to the right to the mean is one-half the total area. The total area under a nomal distribution is
1, so the area under the curve to the right of the mean is 0.5
The Standard Normal Distribution
It is often helpful to convert data values x to z-Scores, as we did in the previous section by
using the z-score formulas:
x−μ
z= for population
σ
x−x
¿z = for sample
s
If the original distribution of z values is a normal distribution, then the corresponding
distribution of z-scores will also be a normal distribution. This normal distribution of z-scores is
called the standard normal distribution. See Figure below. It has a mean of 0 and a standard
deviation of 1.
Solve an Application
6. A soda machine dispenses soda into 12-ounce cups. Tests show that the actual amount of soda
dispensed is normally distributed, with a mean of 11.5 oz and a standard deviation of 0.2 oz
a. What percent of cups will receive less than 11.25 oz of soda?
b. what percent of cups will receive between 11.2 oz and 11.55 oz

of soda?
c. if a cup is filled at random, what is the probability that the
machine will overflow the cup?
Solution
a .Recall that the formula for the z-score for a data value x is
x−x
zx =
s
The z-score for 11.25 oz is
= = -1.25
11.25−11.5
z 11.25
0.2
Table for areas of a normal curve indicates that 0.394 (39.4%) of a data in a normal distribution
are between z= 0 and z = 1.25.
Because the data are normally distributed, 39.4% of the data is also between z = 0 and z = 1.25.
The percent of data to the left of z = -1.25 is 50% - 39.4% = 10.6%.
Thus 10.6% of the cups filled by the soda machine will receive less than 11.25 oz of soda.
b. The z-score for 11.55 ounces is

z 11.55−11.5
11.55= =0.25
0.2
Table 6.2 indicates that 0.433 (43.3%) of the data

in a normal distribution are between z=0 and
z=1.5.
Because the data are normally distributed, 43.3%
of the data are also between z=0 and z= -1.5.
Thus the percent of the cups that the vending
machine will with between 11.2 oz and 11.55 oz of
soda is 43.3% + 9.9% = 53.2%
c. A cup will overflow if it receives more than 12 oz of soda. The z-score for 12 oz is
z 12−11.5
12= =2.5
0.2
Table 6.2 indicates that 0.494 (49.4%) of the data in the standard normal distribution are between
z=0 and z=2.5.
The percent of data to the right of z=2.5 is determined by subtracting 49.4% from 50%. See
Figure 4.13.
Thus 0.6% of the time the machine produces an overflow, and the probability that a cup filled at
random will overflow is 0.006.
Note: Table for areas of a normal curve is found in any statistics book
Lesson 7.3. Outcome-based Teaching-Learning and Score Distribution
If teachers teach in accordance with the principles of outcome-based teaching-learning

and so align content and assessment with the intended learning outcomes and re-teach
till mastery what has/have not been understood as revealed by the formative
assessment process, then student scores in the assessment phaseof the lesson will tend
to congregate on the higher end of the score distribution. Score distribution will be
positively skewed. On the other hand, if what teachers teach and assess are not aligned
with the intended learning outcomes, the opposite will be true. Score distribution will be
negatively skewed which means that scores tend to congregate on the lower end of the
score distribution.
Lesson 7.4: MEASURES OF VARIABILITY/DISPERSION
The measure of central tendency tells us only the typical or the average value of a set of
measurements in the given data set but fails to describe how spread these measurements are
about their average values.
For instance, given the three sets of data, the mean, the median, and the mode are :
Set A: 3, 3, 3, 3, 3=> μ=3; Md= 3; Mo=3

Set B: 2, 3, 3, 3, 4=> μ=3; Md= 3; Mo=3
Set C: 1, 3, 3, 3, 5=> μ=3; Md= 3; Mo=3
Definitely, the three sets of data differ, particularly in the spread of variability of the
observations in each data set. But such difference could not be shown by using the three measures
of central tendency.
The Measures of Variability or Dispersion is a value that measures the spread or variability
of the observations of the observations in the data set. Some measures of variability are the
range, the variance, and the standard deviation
RANGE- the range is the difference between the highest value and the lowest value in the
ungrouped data set. In the grouped data set, the range is the difference between the
upper limit of the highest class interval and the lower limit of the lowest class interval. In
symbol,
Ungrouped data: R= HV – HL
VARIANCE- the variance, denoted by σ 2 is the mean of the squared deviation of the
observations from their arithmetic mean. In symbol, the variance is given by
∑ ( X−μ )2 for ungrouped data

σ 2= i =1
N
N
∑ f ( X−μ )2
σ 2= i =1 k for grouped data
∑f
i=1
STANDARD DEVIATION- the standard deviation, denoted by σ is the positive root of the
variance. In symbol,
√
N
∑ ( X−¿ μ)2 for ungrouped data

σ =√ σ 2= i=1
¿
N
√
N
∑ f (X−¿ μ)2
σ =√ σ 2= i=1
k
¿ for grouped data
∑f
i =1
Standard Deviation:
σ =√ σ 2=¿ √11.19=¿ 3.3451 ≈ 3.35
1. for Ungrouped Data:
Consider the weights of 4 female college students (in Kgs): 49, 57, 49, 50.
Range: HV-LV = 57-49= 8
mean μ=51.25
Variance =
∑ ( X−μ )2
σ 2= i =1 for Ungrouped Data
N
2 {( 49−51.25 )2 + ( 57−51.25 )2+ ( 49−51.25 )2+ (50−51.25 )2 }

σ =
4
2 5. 0625+33.0625+5.0625+ 1.5625
σ =
4
2
σ =11.1875∨11.19
Standard Deviation:
σ =√ σ 2=¿ √ 11.19=¿ 3.3451 ≈ 3.35
Lesson 8.1. Norm and Criterion Referenced Grading

Norm-referenced Grading - The most commonly used grading system which
refers to a grading system where a student grade is placed in relation to the
performance of a group. Thus in this system, a grade of 80 means that the student
performed better than or the same 80% of the class or group.
Students are actually in competition to achieve a standard of performance that
will classify them into the desired grade range. The objective is to find out the best
performers in the group
For example, a teacher may establish a grading policy whereby the top 15
percent of students will receive a mark excellent or outstanding, which in a class of 100
enrolled students will be 15 persons. Such a grading policy is illustrated below:
1.00 (excellent) = top 15 % of class
1.50 (Good) = Next 15 % of class
2.0 (Average, Fair) = Next 45% of class
3.0 (Poor, Pass) = Next 15% of class
5.0 (Failure) = Next 10% of class
In underlying assumption in norm-referenced grading is that the students, have
abilities (as reflected in their raw scores) that obey the normal distribution. The objective
is to find out the best performers in the group. Norm-referenced systems are most often
used for screening selected student populations in condition where it is known that not
all students can advance due to limitations such as available places, jobs, or other
controlling factors. For example in the Philippine setting, since not all high school
students can actually advance to college or university level because of financial
constraints, the norm-referenced grading system can be applied.
Example: In a class of 100 students, the mean score in a test is 70 with a
standard deviation of 5. Construct a norm-referenced grading table that would have
seven-grade scales and such that students scoring between plus or minus one standard
deviation from the mean receives an average grade.
Solution: the following intervals of raw scores to grade equivalents are computed:
Raw Score Grade Percentage
Equivalent
Below 55 Fail 1%
55-60 Marginal pass 4%
61-65 Pass 11%
66-75 Average 68%
76-80 Above Average 11%
81-85 Very Goog 4%
Above 85 Excellent 1%
Criterion-Referenced Grading – are based on a fixed criterion measure. There

is fixed target and the student must achieve that target in order to obtain a passing
grade in a course regardless of how the other students in the class perform. The scale
does not change regardless of the quality or lack of thereof, of the students.
This is used where teachers agreed on the meaning of standard of performance in a
subject but the quality of the students is unknown or uneven where the work involves
student collaboration or teamwork and where there is no external driving factor such as
needing to systematically reduce a pool of eligible students.
For example, in a class of 100 students using the table below, no one might get a grade
of excellent if no one scores 98 above or 85 above depending on the criterion used.
There are no fixed percentage of students who are expected to get the various grades in
the criterion-referenced grading system.
1.0 (excellent) = 98-100 or 85-100
1.5 (Good) = 88-97 or 80-84
2.0 (Fair) = 75-87 or 70-79
3.0 (Poor/Pass) = 65-74 or 60-69
5.0 (Failure) = below 65 or below 60
Lesson 8.2. Cumulative and Averaging Systems of Grading
In the Philippines, there are two types of grading systems used, the averaging and
cumulative grading systems.
In averaging system, the grade of a student on a particular grading period Is the
average of grades obtained in the prior grading periods and the current grading period
In cumulative grading system, the grade of a student in a grading period equals his
current grading period grade which is assumed to have the cumulative effects of the
previous grading periods

Final Reviewer Educ7a

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Final Reviewer Educ7a

Uploaded by

Copyright:

Available Formats

Assessment of Learning focuses on the development and utilization of assessment

Lesson 2.1. Measurement

Lesson 2.2. ASSESSMENT

Assessment - the process of gahering students’ performance over a period of time to

-The overall goal of assessment is to improve student learning

Role of Assessment in Classroom Instruction

Lesson 2.3. EVALUATION

Evaluation - is a process designed to provide information that will help us make a

Assessment for Learning

Recordkeeping in assessment as Learning is a personal affair. Students and teachers decide

Lesson 4.1. Principles of Good Practice in Assessing Learning Outcomes

1. The assessment of student learning begins with educational

Lesson 4.3. Constructive Alignment

What is Constructive Alignment?

Lesson 4.4. VARIETY OF ASSESSMENT METHODS, TOOLS AND TASKS

Traditional assessments refer to conventional methods of testing (pen-and-paper test), usually

Basic examples of paper-and-pencil tests are:

Lesson 4.5. Portfolio

Working or Development Portfolio

Display, Showcase or Best Works Portfolios

Assessment or Evaluation Portfolio

As the name implies, the main function of an assessment portfolio is

Lesson 4.6. Scoring Rubrics

Two Types of Rubrics

 two-dimensional rubrics with levels of achievement as columns and assessment

Lesson 5.1. Planning a Test and Construction of Table of Specifications

1. Identifying Test Objectives

2. Deciding on the type of objective test

Lesson 5.2. Constructing Types of Paper-and-Pencil Tests

Example: Christmas always falls on a Sunday because it is a Sabbath day.

Rule 7. With true or false questions, avoid a grossly disproportionate number of

B. CONSTRUCTING MULTIPLE CHOICE TESTS

Rule 4: Do not use negatives or double negatives as such statements tend to be

Rule 6: Distracters should be equally plausible and attractive.

Rule 8: The length, explicitness, or degree of technicality of alternatives should not

Example: Match the items in column A with the items in column B.

1. Square root of 9 ______ a. -3

Essays, classified as non-objective tests, allow for the assessment of higher-order

Essay questions can be used to measure the attainment of a variety of objectives.

Rule 3: Put a time limit on the essay test.

Lesson 6.1. Item Analysis

 Item discrimination: A measure of whether an item discriminated between students who

 Effectiveness of alternatives: Determination of whether distractors (incorrect but

Uses for item analysis:

View test statistics:

item discrimination (IDis). The formula is:

The following levels may be used as a guideline for acceptable items.

0% - 24% Usually unacceptable – might be approved

25% - 39% Good item

40% - 100% Excellent item

.30 and above Very good items

.20 to .29 Reasonably good items, but subject to improvement

.10 to .19 Marginal items, usually needing improvement

.00 to .09 Poor items, to be rejected or revised

Distractors & Effectiveness

Lesson 6.2. Reliability

1. increase the length of the test;

Test-retest reliability is a measure of reliability obtained by administering the

Example: A test designed to assess student learning in psychology could be given to a

Parallel forms reliability is a measure of reliability obtained by administering

Inter-rater reliability is a measure of reliability used to assess the degree to which

Internal consistency reliability is a measure of reliability used to evaluate the

Average inter-item correlation is a subtype of internal consistency reliability. It is